wiki:ValidationSimple

Version 10 (modified by davea, 16 years ago) (diff)

--

Simple validator framework

To create a validator using the simple framework, you must supply four functions:

extern int init_result(RESULT& result, void*& data);

This takes a result, reads its output file(s), parses them into a memory structure, and returns (via the 'data' argument) a pointer to this structure. It returns:

  • Zero on success,
  • ERR_OPENDIR if there was a transient error, e.g. the output file is on a network volume that is not available. The validator will try this result again later.
  • Any other return value indicates a permanent error. Example: an output file is missing, or has a syntax error. The result will be marked as invalid.
extern int compare_results(
    RESULT& r1, void* data1, RESULT& r2, void* data2, bool& match
);

This takes two results and their associated memory structures. It returns (via the 'match' argument) true if the two results are equivalent (within the tolerances of the application).

extern int cleanup_result(RESULT& r, void* data);

This frees the structure pointed to by data, if it's non-NULL.

extern double compute_granted_credit(WORKUNIT&, vector<RESULT>& results);

Given a set of results (at least one of which is valid) compute the credit to be granted to all of them. Normally this function simply returns median_mean_credit(wu, results). If credit is specified in the workunit, call get_credit_from_wu(wu, results).

You must link these functions with the files validator.C, validate_util.C, and validate_util2.C. The result is your custom validator.

If for some reason you need to access the WORKUNIT in your init_result() etc. functions, it's pointed to by the global variable g_wup.

Example

Here's an example in which the output file contains an integer and a double. Two results are considered equivalent if the integer is equal and the doubles differ by no more than 0.01.

This example uses utility functions get_output_file_path() and try_fopen().

#include <string>
#include <vector>
#include <math.h>
#include "error_numbers.h"
#include "boinc_db.h"
#include "sched_util.h"
#include "validate_util.h"
using std::string;
using std::vector;

struct DATA {
    int i;
    double x;
};

int init_result(RESULT const & result, void*& data) {
    FILE* f;
    FILE_INFO fi;
    int i, n, retval;
    double x;

    retval = get_output_file_path(result, fi);
    if (retval) return retval;
    retval = try_fopen(fi.path.c_str(), f, "r");
    if (retval) return retval;
    n = fscanf(f, "%d %f", &i, &x);
    fclose(f);
    if (n != 2) return ERR_XML_PARSE;
    DATA* dp = new DATA;
    dp->i = i;
    dp->x = x;
    data = (void*) dp;
    return 0;
}

int compare_results(
    RESULT& r1, void* _data1, RESULT const& r2, void* _data2, bool& match
) {
    DATA* data1 = (DATA*)_data1;
    DATA* data2 = (DATA*)_data2;
    match = true;
    if (data1->i != data2->i) match = false;
    if (fabs(data1->x - data2->x) > 0.01) match = false;
    return 0;
}

int cleanup_result(RESULT const& r, void* data) {
    if (data) delete (DATA*) data;
    return 0;
}

double compute_granted_credit(WORKUNIT& wu, vector<RESULT>& results) {
    return median_mean_credit(wu, results);
}

Other credit-granting formulas

stddev_credit()

Useful for 3 or more valid results.

Computes basic stats for the claimed credits and if they are all close together it averages them. If some are close together but there are a couple of outliers, then it will average the results within the cluster and ignore the outliers.

two_credit()

Useful for 2 valid results.

It will average them if the claimed credits are close together. If they aren't, then it will compare each claimed credit against that computer's historical granted credit per CPU sec to see which one is claiming closer to their historical average. It will grant credit to both computers with the claimed credit that is closer to the historical value. This helps grant the most appropriate credit for situations where there is a computer that always claims too low or a computer that always claims too high.