wiki:JobTemplates

Version 7 (modified by davea, 10 years ago) (diff)

--

Input and output templates

Various properties of jobs, such as the number and naming of their input and output files, are described by a pair of XML documents called input and output templates. Typically the same templates are used for many jobs.

Input templates

An input template file describes the job's input files, resource requirements, and scheduling parameters. It has the form

<input_template>
    <file_info>
        <number>0</number>
        [ <gzip/> ]
        [ <sticky/> ]
        [ <no_delete/> ]
        [ <report_on_rpc/> ]
        [ <url>...</url> ]
        [ <url>...</url> ]
        [ <md5_cksum>...</md5_cksum> ]
        [ <nbytes>...</nbytes> ]
    </file_info>
    [ ... other files  ]
    <workunit>
        <file_ref>
            <file_number>0</file_number>
            <open_name>NAME</open_name>
            [ <copy_file/> ]
        </file_ref>
        [ ... other files ]
        [ <command_line>-flags xyz</command_line> ]
        [ <rsc_fpops_est>x</rsc_fpops_est> ]
        [ <rsc_fpops_bound>x</rsc_fpops_bound> ]
        [ <rsc_memory_bound>x</rsc_memory_bound> ]
        [ <rsc_disk_bound>x</rsc_disk_bound> ]
        [ <delay_bound>x</delay_bound> ]
        [ <min_quorum>x</min_quorum> ]
        [ <target_nresults>x</target_nresults> ]
        [ <max_error_results>x</max_error_results> ]
        [ <max_total_results>x</max_total_results> ]
        [ <max_success_results>x</max_success_results> ]
        [ <size_class>N</size_class> ]
    </workunit>
</input_template>

Elements and tags must be on separate lines as shown.

Each <file_info> describes an input file:

<number>
use 0, 1, ...
<gzip/>
transfer the file in gzipped (compressed) format to reduce network usage. You must stage the file with the --gzip option. Only 7.0+ clients can handle compressed transfers; older clients will download the file in uncompressed form.
<sticky/>
if present, the file remains on the client after job is finished.
<no_delete/>
if present, the file is not deleted from the server after job is completed. Use this if the file is used as input to more than one job.
<report_on_rpc/>
if present, report file in each scheduler request (for sticky files). Include this for compatibility with old (pre-7.x) clients; 7.0+ clients report all sticky files.

The following are used for files that are staged to a server (or servers) other than your BOINC server:

<url>
specifies a directory (i.e. it should end with a /) to which the file name will be appended to give the URL. If the file is replicated, you can supply more than one.
<md5_cksum>
the file's MD5 checksum
<nbytes>
the file size. <gzipped_nbytes>: if <gzip/> is specified, the size of the gzip file.

The <file_ref> describes the way the file is referenced:

<file_number>
0, 1, etc.
<open_name>
the logical name of the file
<copy_file>
if present, the file is copied into the job's slot directory

The job parameters include:

<command_line>
The command-line arguments to be passed to the main program. Note: if you're using the BOINC wrapper, use <append_cmdline_args/> in your job.xml file to pass command-line arguments from the wrapper to the wrapped application.

<rsc_fpops_est> etc.
Job attributes such has how much disk space will be used. BOINC will supply reasonable defaults for these, but you should supply the correct values; otherwise, for example, BOINC might try to run the job on a host with insufficient disk space.
<size_class>
Specify the job's size class.

The input template (substituted with filenames and URLs) is stored in a database field with a 64KB limit. This is enough for about 200 input files, fewer if you use long file names or multiple download URLs. If this isn't enough, you can use BOINC file compression to zip several files into a single file reference for download, and expanding them prior to running on the client machine.

Output templates

An output template file describes a job's output files. It has the form

<output_template>
    <file_info>
        <name><OUTFILE_0/></name>
        <generated_locally/>
        <upload_when_present/>
        <max_nbytes>32768</max_nbytes>
        <url><UPLOAD_URL/></url>
    </file_info>
    <result>
        <file_ref>
            <file_name><OUTFILE_0/></file_name>
            <open_name>result.sah</open_name>
            [ <copy_file>0|1</copy_file> ]
            [ <optional>0|1</optional> ]
            [ <no_validate>0|1</no_validate> ]
        </file_ref>
        [ <report_immediately/> ]
    </result>
</output_template>

Elements and tags must be on separate lines as shown. The elements include:

<file_info>
describes an output file.
<name>
the physical file name. Typically use <OUTFILE_0>, <OUTFILE_1> etc.; BOINC will replace this with a generated name based on the job name.
<upload_when_present/>
deprecated, but you need to include this to work with pre-7.0 clients.
<file_ref>
describes how an output file will be referenced by the application.
<open_name>
the logical name by which the application will reference the file.
<copy_file/>
if present, the file will be generated in the slot directory, and copied to the project directory after the job has finished. Use this for legacy applications.
<generated_locally/>
always include this for output files.
<max_nbytes>
maximum file size. If the actual size exceeds this, the file will not be uploaded, and the job will be marked as an error.
<url>
the URL of the file upload handler. You may include this explicitly, or use <UPLOAD_URL/> to use the URL in your project's config.xml file.
<optional>
if 0 or absent, your application must create the file, otherwise the job will be marked as an error.
<no_validate>
if true, don't include this file in the result validation process (relevant only if you are using the sample bitwise validator).
<no_delete/>
if present, the file will not be deleted on the server even after the job is finished.
<report_immediately/>
if present, clients will report this job immediately after the output files are uploaded. Otherwise they may wait up to a day. (Implemented in 6.12.27+ clients only).

Note: when a job is created, the name of its output template file is stored in the database. The file is read when instances of the job are created, which may happen days or weeks later. Thus, editing an output template file can affect existing jobs. If this is not desired, you must create a new output template file.

You can safely remove an input template file after creating your last job with it. However, output template files must exist until any task that refers to it is completed (i.e. no more replicas will be created).

The output template, substituted with filenames and URLs, is stored in a database field with a 64KB limit. This imposes a limit of about 50 output files; the exact number depends upon the length of your filenames and URLs. If you need more files, you can use BOINC file compression to zip several files into a single file reference for upload, prior to completing each task on the client machine. Once you have run some jobs through your project, you can compare the size of the expanded xml with the 65,535 limit by running the following MySQL statement:

select max(length(xml_doc_in)), max(length(xml_doc_out)) from result;