[[PageOutline]] = Workunit and result state transitions = The processing of workunits and results can be described in terms of transitions of their state variables. Workunit '''parameters''' are described [JobIn on a separate page]. == Workunit state variables == #wu === `canonical_resultid` === #wu_canonical_resultid The ID of the canonical result for this workunit, or zero. * Initially zero. * Set by the validator (by `check_set()`). === `transition_time` === #wu_transition_time The next time to check for state transitions for this WU. * Initially now. * Set to now by scheduler when get a result for this WU. * Set to min(current value, now + delay_bound) by scheduler when send a result for this WU. * Set to min(x.sent_time + wu.delay_bound) over IN_PROGRESS results x by transitioner when done handling this WU. * Set to now by validator if it finds canonical result, or if there is already a canonical result and some other results have validate_state = INIT, or if there is no consensus and the number of successful results is > wu.max_success_results. === `file_delete_state` === #wu_file_delete_state Indicates whether input files should be deleted. * Initially INIT (0). * Set to READY (1) by transitioner when all results have [#result_server_state server_state] = OVER and workunit [#wu_assimilate_state assimilate_state] = DONE. '''Note:''' db_purge purges a WU and all its results when file_delete_state = DONE; therefore it is critical that it only be set to DONE if all results have [#result_server_state server_state] = OVER. * Set to DONE (2) by file_deleter when it has attempted to delete files. === `assimilate_state` === #wu_assimilate_state Indicates whether the workunit should be assimilated. * Initially INIT (0). * Set to READY (1) by transitioner if wu.assimilate_state = INIT and WU has error condition. * Set to READY (1) by validator when find canonical result and wu.assimilate_state = INIT. * Set to DONE (2) by assimilator when done. === `need_validate` === #wu_need_validate Indicates that the workunit has a result that needs validation. * Initially FALSE. * Set to TRUE by transitioner if the number of success results is at least wu.min_quorum and there is a success result not validated yet. * Set to FALSE by validator. === `error_mask` === #wu_error_mask A bit mask for error conditions. * Initially zero. * Transitioner sets COULDNT_SEND_RESULT (1) if some result couldn't be sent. * Transitioner sets TOO_MANY_RESULTS (2) if too many error results. * Validator sets TOO_MANY_SUCCESS_RESULTS (4) if no consensus and too many success results. * Transitioner sets TOO_MANY_TOTAL_RESULTS (8) if too many total results. === Workunit invariants === #wu_invariants * Eventually either canonical_resultid or error_mask is set. * Eventually transition_time = infinity. * Each WU is assimilated exactly once. === Notes on deletion of input files === #wu_deletion_notes * Input files are eventually deleted, but only when all results have state = OVER (so that clients don't get download failures) and the WU has been assimilated (in case the project wants to examine input files in error cases). == Result state variables == #result === `report_deadline` === #result_report_deadline Give up on result (and possibly delete input files) if don't get reply by this time. * Set by scheduler to now + wu.delay_bound when send result. === `server_state` === #result_server_state Values: UNSENT, IN_PROGRESS, OVER * Initially UNSENT (2). * Set by scheduler to IN_PROGRESS (4) when send result. * Set by scheduler to OVER (5) when result is reported in request message from client. * Set by scheduler to OVER (5) when it thinks host has detached project. * Set by transitioner to OVER (5) if now > result.report_deadline. * Set by transitioner to OVER (5) if WU has error condition and result.server_state = UNSENT. * Set by validator to OVER (5) if WU has canonical result and result.server_state = UNSENT. === `outcome` === #result_outcome Values: SUCCESS, COULDNT_SEND, CLIENT_ERROR, NO_REPLY, DIDNT_NEED, VALIDATE_ERROR, CLIENT_DETACHED. Defined iff result.server_state = OVER * Set by scheduler to SUCCESS (1) if get reply and no client error. * Set by scheduler to CLIENT_ERROR (3) if get reply and client error. * Set by scheduler to NO_REPLY (4) if it thinks host has detached project. * Set by transitioner to NO_REPLY (4) if server_state = IN_PROGRESS and now < report_deadline. * Set by transitioner to DIDNT_NEED (5) if WU has error condition and result.server_state = UNSENT. * Set by validator to DIDNT_NEED (5) if WU has canonical result and result.server_state = UNSENT. * Set by validator to VALIDATE_ERROR (6) if outcome was initially SUCCESS, but the validator had a permanent error reading a result file, or a file had a syntax error. Prevents the validator from trying again. * Set by scheduler to CLIENT_DETACHED (7) if it gets a request indicating that the client detached, then reattached. === `client_state` === #result_client_state Records the client state (DOWNLOADING, DOWNLOADED, COMPUTE_ERROR, UPLOADING, UPLOADED, ABORTED) where an error occurred. Defined if outcome is CLIENT_ERROR. === `file_delete_state` === #result_file_delete_state * Initially INIT (0). * Set by transitioner to READY (1) if this is the canonical result, and file_delete_state = INIT, and [#wu_assimilate_state assimilate_state] = DONE, and all the results have [#result_server_state server_state] = OVER, and all all the results with [#result_outcome outcome] = SUCCESS and have [#result_validate_state validate_state] <> INIT. * Set by transitioner to READY (1) if [#wu_assimilate_state assimilate_state] = DONE and #result_outcome outcome] = CLIENT_ERROR or [#result_validate_state validate_state] != INIT. === `validate_state` === #result_validate_state Defined iff result.outcome = SUCCESS * Initially INIT (0). * Set by validator to VALID (1) if outcome = SUCCESS and matches canonical result. * Set by validator to INVALID (2) if outcome = SUCCESS and doesn't match canonical result. * Set by transitioner to NO_CHECK (3) if the WU had an error; this avoids showing claimed credit as 'pending'. * Set by validator to ERROR if outcome = SUCCESS and had a permanent error trying to read an output file, or an output file had a syntax error. * Set by validator to INCONCLUSIVE (4) if check_set() didn't find a consensus in a set of results containing this one. * Set by scheduler to TOO_LATE (5) if the result was reported after the canonical result's files were deleted. === Result invariants === #result_invariants * Eventually [#result_server_state server_state] = OVER. * Output files are eventually deleted. === Notes on deletion of output files === #result_deletion_notes * Non-canonical results can be deleted as soon as the WU is assimilated. * Canonical results can be deleted only when all results have [#result_server_state server_state] = OVER and all success results are validated. * If a result reply arrives after its timeout, the output files can be immediately deleted. How do we delete output files that arrive REALLY late? (e.g. uploaded after all results have timed out, and never reported)? Possible answer: let X = create time of oldest unassimilated WU. Any output files created before X can be deleted.