wiki:ClientDataModel

Version 7 (modified by davea, 13 years ago) (diff)

--

Client data model

This document describes proposed changes to the client to support distributed storage.

Current

FILE_INFO elements

  • status (present, not present, error)
  • urls
  • bool generated_locally
  • bool upload_when_present
  • bool uploaded
  • bool sticky
  • bool optional (applies to output files)

Problems

Many. Example: suppose the server asks the client to upload a file that the client doesn't have. Since generated_locally is false and the file is not present, the client will try to download it (from the upload URL!).

Proposed

FILE_INFO elements:

  • status
  • upload_urls
  • download_urls
  • bool uploaded
  • bool sticky
  • bool optional_output
  • bool optional_input

Policy:

  • If a file has a download URL and is not present, download it
  • If a file has an upload URL, is present, and uploaded is false, upload it
  • start a job if its input files are either present or optional_input

Handling <file_info> elements in scheduler replies:

  • if referenced from an app version or workunit, store URLs in download_urls
  • if referenced from a result, store URL in upload_urls.

Deprecated fields in scheduler replies

  • <generated_locally>
  • <upload_when_present>

Handling upload requests:

  • Clear "uploaded" flag
  • If the file isn't present, mark result as error and put appropriate text in stderr_out.

Compatibility: we'll change <file_info> to <file> in client_state.xml. We'll parse <file_info> elements (for upward compatibility). We won't provide backward compatibility (i.e., if you upgrade to 6.13.x, then downgrade to 6.12, all tasks and app versions will disappear).

Locally-generated input files

One (hypothetical) class of files: input files which, if not present, are generated computationally by the app. Such files should be listed (in sched reply) as sticky optional input files with no download URL, and as optional output files (this causes them to be marked as present).

The app must use file locking to ensure that two jobs don't try to generate the file at the same time.