wiki:ValidationSimple
Last modified 24 hours ago Last modified on 10/25/14 05:12:15

Developing a custom validator

To create a validator, you must supply three functions:

extern int init_result(RESULT& result, void*& data);

This takes a result, reads its output file(s), parses them into a memory structure, and returns (via the 'data' argument) a pointer to this structure. The return value is:

  • Zero on success,
  • ERR_OPENDIR if there was a transient error, e.g. the output file is on a network volume that is not available. The validator will try this result again later.
  • Any other return value indicates a permanent error. Example: an output file is missing, or has a syntax error. The result will be marked as invalid.

To locate and open the result's output files, use utility functions such as get_output_file_path() and try_fopen() (see example below).

extern int compare_results(
    RESULT& r1, void* data1, RESULT& r2, void* data2, bool& match
);

This takes two results and their associated memory structures. It returns (via the 'match' argument) true if the two results are equivalent (within the tolerances of the application).

extern int cleanup_result(RESULT& r, void* data);

This frees the structure pointed to by data, if it's non-NULL.

You must link these functions with the files validator.cpp, validate_util.cpp, and validate_util2.cpp. The result is your custom validator.

If for some reason you need to access the WORKUNIT in your init_result() etc. functions:

DB_WORKUNIT wu;
wu.lookup_id(result->workunitid);

Runtime outliers

BOINC's mechanisms for estimating job runtimes are based on the assumption that a job's computation is roughly proportional to its FLOPS estimate (workunit.rsc_fpops_est). If there are exceptions to this (e.g. jobs that exit immediately because of unusual input data) these mechanisms will work better if you label them as such. To do this, set

result.runtime_outlier = true;

in init_result() for these results.

Example

Here's an example for an application whose output file contains an integer and a double. Two results are considered equivalent if the integers are equal and the doubles differ by no more than 0.01.

#include <string>
#include <vector>
#include <math.h>
#include "error_numbers.h"
#include "boinc_db.h"
#include "sched_util.h"
#include "validate_util.h"
using std::string;
using std::vector;

struct DATA {
    int i;
    double x;
};

int init_result(RESULT const & result, void*& data) {
    FILE* f;
    OUTPUT_FILE_INFO fi;
    int i, n, retval;
    double x;

    retval = get_output_file_path(result, fi.path);
    if (retval) return retval;
    retval = try_fopen(fi.path.c_str(), f, "r");
    if (retval) return retval;
    n = fscanf(f, "%d %f", &i, &x);
    fclose(f);
    if (n != 2) return ERR_XML_PARSE;
    DATA* dp = new DATA;
    dp->i = i;
    dp->x = x;
    data = (void*) dp;
    return 0;
}

int compare_results(
    RESULT& r1, void* _data1, RESULT const& r2, void* _data2, bool& match
) {
    DATA* data1 = (DATA*)_data1;
    DATA* data2 = (DATA*)_data2;
    match = true;
    if (data1->i != data2->i) match = false;
    if (fabs(data1->x - data2->x) > 0.01) match = false;
    return 0;
}

int cleanup_result(RESULT const& r, void* data) {
    if (data) delete (DATA*) data;
    return 0;
}

Using scripting languages

The validator script_validator allows you to write your validation logic in your language of choice (Python, PHP, Perl, Java, bash). script_validator takes two additional command-line arguments:

--init_script "filename arg1 ... argn"
script to check the validity of a result. Exit zero if valid.
--compare_script "filename arg1 ... argn"
script to compare two results. Exit zero if outputs are equivalent.

arg1 ... argn represent cmdline args to be passed to the scripts. The options for init_script are:

files
list of paths of output files of the result
result_id
result ID
runtime
task runtime in seconds

Additional options for compare_script, for the second result:

files2
list of paths of output files
result_id2
result ID
runtime2
task runtime

arg1 ... argn can be omitted, in which case only the output file paths are passed to the scripts.

The scripts must be put in your project's bin/ directory.

For applications that don't use replication, the compare script need not be given. For applications that don't need output file syntax checking, the init script need not be given.

As an example, the following PHP script, used as a compare script, would require that results match exactly:

#!/usr/bin/env php
<?php

$f1 = $argv[1];
$f2 = $argv[2];
if (md5_file($f1) != md5_file($f2)) {
    fwrite(STDERR, "$f1 and $f2 don't match\n");
    exit(1);
}

?>

The corresponding entry in config.xml would look like

<daemon>
    <cmd>script_validator --app uppercase -d 3 --compare_script compare.php</cmd>
</daemon>

Testing your validator

While you're developing a validator, it's convenient to run it in "standalone mode", i.e. run it manually against particular output files. BOINC provides a test harness that lets you do this:

  • In boinc/sched/, copy makefile_validator_test to your own file, say makefile_vt.
  • Edit this makefile, changing VALIDATOR_SRC to refer to the .cpp file containing your init_result(), compare_result(), and cleanup_result() functions.
  • Do make -f makefile_vt.

This creates a program validator_test. Do

validator_test file1 file2

to test your code against the given output files. It will show the result of each function call.

Notes:

  • Currently this assumes that results have a single output file. If you need this generalized, let us know.