wiki:DataFlow

Version 2 (modified by davea, 16 years ago) (diff)

--

Jobs and data

BOINC is designed for high throughput: millions of volunteer hosts, millions of jobs per day. To maximize your project's performance it's important to understand the life-cycle of a job:

  • The job and its associated input files are generated (typically by a work generator) program.
  • One or more instances of the job are created.
  • The instances are dispatched to different hosts.
  • Each host downloads the input files.
  • After some queueing delay due to other jobs in progress, it executes the job, then uploads its output files.
  • It reports the completed job, possible after an additional delay (whose purpose is to reduce the rate of scheduler requests).
  • A validator program checks the output files, perhaps comparing replicas.
  • When a valid instance is found, an assimilator program handles the results (e.g., by inserting them in a separate database).
  • When all instances have been completed, a file deleter deletes the input and output files.
  • A DB purge program deletes the database entries for the job and job instances.

Input and output files

Each job can have arbitrarily many input and output files. Each file has various attributes, e.g.:

  • Sticky: the file should remain on the client even when no job is using it.
  • Upload when present: upload the file when it's complete (the default is to not upload it).

Jobs don't need to have any input or output files; the input can come from command-line arguments (which are stored in the database record for the job) and the output can be written to stderr (which is returned to the server and stored in the DB).

Suppose you have many jobs that use the same input file. In that case mark is as sticky; clients will then download it just once.

Suppose your application generates a file which can potentially be used by subsequent jobs. In that case make it a sticky output file, without upload-when-present.

Locality scheduling

Suppose you have an application that has large input files, and many jobs use the same input file. In that case, once a file has been downloaded to a client, you'd like to issue it jobs that use that file if possible. To support this, BOINC offers a mechanism called locality scheduling.

Attachments (1)

Download all attachments as: .zip