wiki:WorkGeneration

Version 6 (modified by davea, 17 years ago) (diff)

--

Generating work

To submit a job:

  • Write XML 'template files' that describe the job's input and outputs (typically the same template files can be used for many jobs).
  • Create the job's input file(s), and put them in the right places in the download directory hierarchy.
  • Invoke a BOINC function or script that submits the job.

Once this is done, BOINC takes over: it creates one or more instances of the job, distributes them to client hosts, collects the output files, finds a canonical instance, assimilates the canonical instance, and deletes files.

During the testing phase of a project, you can use the make_work daemon to replicate a given workunit as needed to maintain a constant supply of work. This is useful while testing and debugging the application.

Input and output template files

An input template file has the form

<file_info>
    <number>0</number>
    [ <sticky/>, other attributes]
</file_info>
[ ... ]
<workunit>
    <file_ref>
        <file_number>0</file_number>
        <open_name>NAME</open_name>
    </file_ref>
    [ ... ]
    [ <command_line>-flags xyz</command_line> ]
    [ <rsc_fpops_est>x</rsc_fpops_est> ]
    [ <rsc_fpops_bound>x</rsc_fpops_bound> ]
    [ <rsc_memory_bound>x</rsc_memory_bound> ]
    [ <rsc_disk_bound>x</rsc_disk_bound> ]
    [ <delay_bound>x</delay_bound> ]
    [ <min_quorum>x</min_quorum> ]
    [ <target_nresults>x</target_nresults> ]
    [ <max_error_results>x</max_error_results> ]
    [ <max_total_results>x</max_total_results> ]
    [ <max_success_results>x</max_success_results> ]
    [ <credit>X</credit> ]
</workunit>

The components are:

<file_info>, <file_ref>

Each pair describes an input file and the way it's referenced.

<command_line>

The command-line arguments to be passed to the main program.

<credit>

The amount of credit to be granted for successful completion of this workunit. Use this only if you know in advance how many FLOPs it will take. Your validator must use get_credit_from_wu() as its compute_granted_credit() function.

Other elements
Work unit attributes

An output template file has the form

<file_info>
    <name><OUTFILE_0/></name>
    <generated_locally/>
    <upload_when_present/>
    <max_nbytes>32768</max_nbytes>
    <url><UPLOAD_URL/></url>
</file_info>
<result>
    <file_ref>
        <file_name><OUTFILE_0/></file_name>
        <open_name>result.sah</open_name>
    </file_ref>
</result>

Submitting a job manually

To move an input file to the download directory, use

dir_hier_path filename

This prints the full pathname and creates the directory if needed. Run this in the project's root directory. For example:

cp test_files/12ja04aa `bin/dir_hier_path 12ja04aa`

copies an input file from the test_files directory to the download directory hierarchy.

To submit a job, run the program

create_work [ arguments] infile_1 ... infile_n

Mandatory argumens are:

-appname name
application name
-wu_name name
workunit name
-wu_template filename
WU template filename relative to project root; usually in templates/
-result_template filename
result template filename, relative to project root; usually in templates/
Optional arguments are:
-batch n
-priority n

The following may be passed in the WU template, or as command-line arguments to create_work, or not passed at all (defaults will be used)

-command_line "-flags foo"
-rsc_fpops_est x
-rsc_fpops_bound x
-rsc_memory_bound x
-rsc_disk_bound x
-delay_bound x
-min_quorum x
-target_nresults x
-max_error_results x
-max_total_results x
-max_success_results x
-additional_xml 'x'

The program must be run in the project root directory. The workunit parameters are documented here. The -additional_xml argument can be used to supply, for example, <credit>12.4</credit>.

You must put each input file in the appropriate directory; the directory is determined by the file's name. To find this directory, call the C++ function

dir_hier_path(
    const char* filename,
    const char* root,       // root of download directory
    int fanout,             // from config.xml
    char* result,           // path of file in hierarchy
    bool create_dir=false   // create dir if it's not there
);

Submitting jobs from a C++ program

BOINC's library (backend_lib.C,h) provides the functions:

int create_work(
    DB_WORKUNIT&,
    const char* wu_template,                  // contents, not path
    const char* result_template_filename,     // relative to project root
    const char* result_template_filepath,     // absolute or relative to current dir
    const char** infiles,                     // array of input file names
    int ninfiles
    SCHED_CONFIG&,
    const char* command_line = NULL,
    const char* additional_xml = NULL
);

create_work() submits a job. The name and appid fields of the DB_WORKUNIT structure must always be initialized. Other job parameters may be passed either in the DB_WORKUNIT structure or in the input template file (the latter has priority).

Making one workunit

Here's a program that submits one job (error-checking is omitted for clarity):

#include "boinc_db.h"
#include "backend_lib.h"

int main() {
    DB_APP app;
    DB_WORKUNIT wu;
    char* wu_template;
    char* infiles[] = {"infile"};
    char path[1024];

    SCHED_CONFIG config;
    config.parse_file();

    boinc_db.open(config.db_name, config.db_host, config.db_user, config.db_passwd);
    app.lookup("where name='myappname'");

    // write input file in the download directory
    //
    config.download_path("infile", path);
    FILE* f = fopen(path, "w");
    fwrite(f, "random stuff");
    fclose(f);

    read_file_malloc("templates/input_template.xml", wu_template);
    wu.clear();     // zeroes all fields
    strcpy(wu.name, "test_name");
    wu.appid = app.id;
    wu.min_quorum = 2;
    wu.target_nresults = 2;
    wu.max_error_results = 5;
    wu.max_total_results = 5;
    wu.max_success_results = 5;
    wu.rsc_fpops_est = 1e10;
    wu.rsc_fpops_bound = 1e11;
    wu.rsc_memory_bound = 1e8;
    wu.rsc_disk_bound = 1e8;
    wu.delay_bound = 7*86400;
    create_work(
        wu,
        wu_template,
        "templates/output_template.xml",
        "templates/output_template.xml",
        infiles,
        1,
        config
    );
}

The program must be run in the project directory.

Making lots of workunits

If you're making lots of workunits (e.g. to do the various parts of a parallel computation) you'll want the workunits to differ either in their input files, their command-line arguments, or both.

For example, let's say you want to run a program on ten input files 'file0', 'file1', ..., 'file9'. You might modify the above program with the following code:

    char filename[256];
    char* infiles[1];
    infiles[0] = filename;
    ...
    for (i=0; i<10; i++) {
        sprintf(filename, "file%d", i);
        create_work(
            wu,
            wu_template,
            "templates/results_template.xml",
            "templates/results_template.xml",
            infiles,
            1,
            config
        );
    }

Note that you only need one workunit template file and one result template file.

Now suppose you want to run a program against a single input file, but with ten command lines, '-flag 0', '-flag 1', ..., '-flag 9'. You might modify the above program with the following code:

    char command_line[256];
    ...
    for (i=0; i<10; i++) {
        sprintf(command_line, "-flag %d", i);
        create_work(
            wu,
            wu_template,
            "templates/results_template.xml",
            "templates/results_template.xml",
            infiles,
            1,
            config,
            command_line
        );
    }

Again, you only need one input template file and one output template file.