Changes between Version 23 and Version 24 of JobSubmission


Ignore:
Timestamp:
Feb 15, 2013, 2:28:30 PM (11 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • JobSubmission

    v23 v24  
    11[[PageOutline]]
    22= Submitting jobs =
    3 
    4 To submit a job you must
    5 
    6  1. Write XML 'template files' that describe the job's input and output files (typically the same template files can be used for many jobs).
    7  1. "Stage" the job's input file(s) (see below).
    8  1. Invoke a BOINC function or script that submits the job.
    9 
    10 Once this is done, BOINC takes over: it creates one or more instances of the job,
    11 distributes them to client hosts, collects the output files.
    12 It [ValidationIntro validates] and
    13 [AssimilateIntro processes] the results,
    14 and deletes the input and output files.
    15 
    16 Typically, steps 2) and 3) are done by a [WorkGeneration work generator] program
    17 that creates lots of jobs.
    18 
    19 == Input template files ==
    20 
    21 An input template file has the form
    22 {{{
    23 
    24 <input_template>
    25     <file_info>
    26         <number>0</number>
    27         [ <gzip/> ]
    28         [ <sticky/> ]
    29         [ <no_delete/> ]
    30         [ <report_on_rpc/> ]
    31         [ <url>...</url> ]
    32         [ <url>...</url> ]
    33         [ <md5_cksum>...</md5_cksum> ]
    34         [ <nbytes>...</nbytes> ]
    35     </file_info>
    36     [ ... other files  ]
    37     <workunit>
    38         <file_ref>
    39             <file_number>0</file_number>
    40             <open_name>NAME</open_name>
    41             [ <copy_file/> ]
    42         </file_ref>
    43         [ ... other files ]
    44         [ <command_line>-flags xyz</command_line> ]
    45         [ <rsc_fpops_est>x</rsc_fpops_est> ]
    46         [ <rsc_fpops_bound>x</rsc_fpops_bound> ]
    47         [ <rsc_memory_bound>x</rsc_memory_bound> ]
    48         [ <rsc_disk_bound>x</rsc_disk_bound> ]
    49         [ <delay_bound>x</delay_bound> ]
    50         [ <min_quorum>x</min_quorum> ]
    51         [ <target_nresults>x</target_nresults> ]
    52         [ <max_error_results>x</max_error_results> ]
    53         [ <max_total_results>x</max_total_results> ]
    54         [ <max_success_results>x</max_success_results> ]
    55     </workunit>
    56 </input_template>
    57 }}}
    58 Elements and tags must be on separate lines as shown.
    59 The components are:
    60 
    61  '''<file_info>''':: describes an [BoincFiles#Fileproperties input file].
    62   '''<number>''':: use 0, 1, ...
    63   '''<gzip/>''':: transfer the file in gzipped (compressed) format to reduce network usage.  '''You must stage the file with the --gzip option (see below)'''.  Only 7.0+ clients can handle compressed transfers; older clients will get the file in uncompressed form.
    64   '''<sticky/>''':: if present, the file remains on the client after job is finished.
    65   '''<no_delete/>''':: if present, the file is not deleted from the server after job is completed.  Use this if the file is used as input to more than one job.
    66   '''<report_on_rpc/>''':: if present, report file in each scheduler request (for sticky files)
    67   '''<url>, <md5_cksum>, <nbytes>''':: used only for "non-local" input files (see below)
    68 
    69  '''<file_ref>''':: describes [BoincFiles#Filereferences the way the file is referenced].
    70   '''<file_number>''':: 0, 1, etc.
    71   '''<open_name>''':: the logical name of the file
    72   '''<copy_file>''':: if present, the file is copied into the job's slot directory
    73 
    74  '''<command_line>''':: The command-line arguments to be passed to the main program.
    75   Note: if you're using the [WrapperApp BOINC wrapper],
    76   use <append_cmdline_args/> in your job.xml file to pass command-line arguments from the wrapper
    77   to the wrapped application.
    78 
    79  '''<rsc_fpops_est>''' etc.::
    80    [JobIn Job attributes] such has how much disk space will be used.
    81    BOINC will supply reasonable defaults for these,
    82    but you should supply the correct values;
    83    otherwise, for example, BOINC might try to run the job
    84    on a host with insufficient disk space.
    85 
    86 Notes:
    87  * The input template is copied into a BLOB column of the workunit table and will substitute filenames,
    88   and have download urls, signatures and other elements inserted into your provided template.
    89   The total expanded BLOB cannot exceed 65,535 bytes.
    90   This is enough for about 200 input files,
    91   fewer if you use long file names are multiple download URLs.
    92   If this isn't enough, you can use [FileCompression BOINC file compression] to zip several files into a single file reference for download,
    93   and expanding them prior to running on the client machine.
    94 
    95 == Output template files ==
    96 
    97 An output template file has the form
    98 {{{
    99 <output_template>
    100     <file_info>
    101         <name><OUTFILE_0/></name>
    102         <generated_locally/>
    103         <upload_when_present/>
    104         <max_nbytes>32768</max_nbytes>
    105         <url><UPLOAD_URL/></url>
    106     </file_info>
    107     <result>
    108         <file_ref>
    109             <file_name><OUTFILE_0/></file_name>
    110             <open_name>result.sah</open_name>
    111             [ <copy_file>0|1</copy_file> ]
    112             [ <optional>0|1</optional> ]
    113             [ <no_validate>0|1</no_validate> ]
    114         </file_ref>
    115         [ <report_immediately/> ]
    116     </result>
    117 </output_template>
    118 }}}
    119 
    120 Elements and tags must be on separate lines as shown.
    121 The elements include:
    122 
    123  '''<file_info>''':: describes an output file.
    124  '''<name>''':: the physical file name.
    125   Typically use <OUTFILE_0>, <OUTFILE_1> etc.;
    126   BOINC will replace this with a generated name based on the job name.
    127 
    128  '''<file_ref>''':: describes how an output file will be referenced by the application.
    129  '''<open_name>''':: the "logical name" by which the application will reference the file.
    130  '''<copy_file/>''':: if present, the file will be generated in the slot directory,
    131    and copied to the project directory after the job has finished.
    132    Use this for [WrapperApp legacy applications].
    133  '''<generated_locally/>''':: always include this for output files.
    134  '''<max_nbytes>''':: maximum file size.
    135   If the actual size exceeds this, the file will not be uploaded,
    136   and the job will be marked as an error.
    137  '''<url>''':: the URL of the file upload handler.
    138   You may include this explicitly, or use '''<UPLOAD_URL/>'''
    139   to use the URL in your project's config.xml file.
    140  '''<optional>''':: if 0 or absent, your application must create the file,
    141   otherwise the job will be marked as an error.
    142  '''<no_validate>''':: if true, don't include this file in the result validation process
    143   (relevant only if you are using the sample bitwise validator).
    144  '''<no_delete/>''':: if present, the file will not be deleted on the server
    145   even after the job is finished.
    146 
    147  '''<report_immediately/>''':: if present, clients will report this job
    148   immediately after the output files are uploaded.
    149   Otherwise they may wait up to a day.
    150   (Implemented in 6.12.27+ clients only).
    151 
    152 Note: when a job is created, the name of its output template file is stored in the database.
    153 The file is read when instances of the job are created, which may happen days or weeks later.
    154 Thus, editing an output template file can affect existing jobs.
    155 If this is not desired, you must create a new output template file.
    156 
    157 You can safely remove an input template file after creating your last job with it.
    158 However, output template files must exist until any task that refers to it is completed
    159 (i.e. no more replicas will be created).
    160 
    161 In general, you should not attempt to use more then 50 files in your output template. This is because the provided template will have upload URLs, and file names expanded, and signatures added to it. The total size of the result must remain under 65,535 bytes to fit in the xml_doc_in and xml_doc_out BLOBs in the result table. Your actual limit on number of files will depend upon the length of your job names, and upload URLs. If your expanded output template is approaching the size limit, you can use [FileCompression BOINC file compression] to zip several files into a single file reference for upload, prior to completing each task on the client machine. Once you have run some jobs through your project, you can compare the size of the expanded xml with the 65,535 limit by running the following MySQL statement:[[BR]]
    162 {{{select max(length(xml_doc_in)), max(length(xml_doc_out)) from result;}}}[[BR]]
    163 Note that here "_in" and "_out" are both references to the output template. "_in" being when the task was created, and "_out" how it looked, with actual file sizes and checksums, at the time the task was returned. So don't confuse these column names with the "input" and "output" templates.
    164 
    165 == Staging input files ==
    166 
    167 Input files may be "local" (resident on the project server) or "non-local".
    168 For local files, BOINC fills in the download URL, the file size,
    169 and the MD5.
    170 For non-local files, you must supply these yourself in the input template.
    171 You can supply multiple URLs if the file is on multiple data servers.
    172 
    173 === Staging local input files ===
    174 Before submitting a job, you must '''stage''' its local input files using
    175 {{{
    176 bin/stage_file [--gzip] [--copy] file
    177 }}}
    178  --gzip:: send the file in compressed form to 7.0+ clients.  Note: you must also include the '''<gzip/>''' attribute for this file in the job's input template (see above).
    179  --copy:: copy the file from its current location to the BOINC download directory.  The default is to move it.
    180 
    181 Note: '''stage_file''' was added to the BOINC trunk on 16 Oct 2012.
    182 If your server code is older than that, use
    183 {{{
    184 cp test_files/12ja04aa `bin/dir_hier_path 12ja04aa`
    185 }}}
    186 
    187 == Submitting a job on the command line == #creatework-tool
     3== On the command line == #creatework-tool
    1884
    1895'''create_work''' is a command-line tool for submitting jobs.
     
    22339 --additional_xml 'x':: This can be used to supply, for example, <credit>12.4</credit>.
    22440
    225 == Submitting jobs from a C++ program == #cpp-workgen
     41== From a C++ program == #cpp-workgen
    22642
    22743BOINC's library provides a function for submitting jobs: