wiki:AppCoprocessor

Version 19 (modified by davea, 9 years ago) (diff)

--

Applications that use coprocessors

BOINC supports applications that use coprocessors. The supported coprocessor types (as of [18892])are NVIDIA and API GPUs.

The BOINC client probes for coprocessors and reports them in scheduler requests. The client keeps track of coprocessor allocation, i.e. how many instances of each are free. It only runs an app if enough instances are available.

You can develop your application using any programming system, e.g. CUDA (for NVIDIA), Brook+ (for ATI) or OpenCL.

Command-line arguments

Some hosts have multiple GPUs. When your application is run by BOINC, it will be passed a command-line argument

--device N

where N is the device number of the GPU that is to be used. If your application uses multiple GPUs, it will be passed multiple --device arguments, e.g.

--device 0 --device 3

Deploying a coprocessor app

When you deploy a coprocessor app you must specify:

  • its hardware and software requirements
  • an estimate of what fraction of a CPU it will use
  • an estimate of its performance on individual hosts

This information is specified in an application planning function that you link into your scheduler. Specifically, you must:

  • Choose a "plan class" name for your program, say "cuda" (see below).
  • Create an app version, specifying its plan class as "cuda".
  • Edit the function app_plan() in sched/sched_customize.cpp so that it contains a clause for your plan class.

The default app_plan() contains a clause for plan class cuda. We will explain its logic; you may need to modify it for your CUDA app.

First, we check if the host has an NVIDIA GPU.

int app_plan(SCHEDULER_REQUEST& sreq, char* plan_class, HOST_USAGE& hu) {
    ...
    if (!strcmp(plan_class, "cuda")) {
        COPROC_CUDA* cp = (COPROC_CUDA*)sreq.coprocs.lookup("CUDA");
        if (!cp) {
            if (config.debug_version_select) {
                log_messages.printf(MSG_NORMAL,
                    "[version] Host lacks CUDA coprocessor for plan class cuda\n"
                );
            }
            return PLAN_REJECT_CUDA_NO_DEVICE;
        }

Check the compute capability (1.0 or better):

        int v = (cp->prop.major)*100 + cp->prop.minor;
        if (v < 100) {
            if (config.debug_version_select) {
                log_messages.printf(MSG_NORMAL,
                    "[version] CUDA version %d < 1.0\n", v
                );
            }
            return PLAN_REJECT_CUDA_VERSION;
        } 

Check the CUDA runtime version. As of client version 6.10, all clients report the CUDA runtime version (cp->cuda_version); use that if it's present. In 6.8 and earlier, the CUDA runtime version isn't reported. Windows clients report the driver version, from which the CUDA version can be inferred; Linux clients don't return the driver version, so we don't know what the CUDA version is.

        // for CUDA 2.3, we need to check the CUDA RT version.
        // Old BOINC clients report display driver version;
        // newer ones report CUDA RT version
        //
        if (!strcmp(plan_class, "cuda23")) {
            if (cp->cuda_version) {
                if (cp->cuda_version < 2030) {
                    return PLAN_REJECT_CUDA_VERSION;
                }
            } else if (cp->display_driver_version) {
                if (cp->display_driver_version < PLAN_CUDA23_MIN_DRIVER_VERSION) {
                    return PLAN_REJECT_CUDA_VERSION;
                }
            } else {
                return PLAN_REJECT_CUDA_VERSION;
            }

Check for the amount of video RAM:

        if (cp->prop.dtotalGlobalMem < PLAN_CUDA_MIN_RAM) {
            if (config.debug_version_select) {
                log_messages.printf(MSG_NORMAL,
                    "[version] CUDA mem %d < %d\n",
                    cp->prop.dtotalGlobalMem, PLAN_CUDA_MIN_RAM
                );
            }
            return PLAN_REJECT_CUDA_MEM;
        }

Estimate the FLOPS:

        hu.flops = cp->flops_estimate();

Estimate its CPU usage:

        // assume we'll need 0.5% as many CPU FLOPS as GPU FLOPS
        // to keep the GPU fed.
        //
        double x = (hu.flops*0.005)/sreq.host.p_fpops;
        hu.avg_ncpus = x;
        hu.max_ncpus = x;

Indicate the number of GPUs used. Typically this will be 1. If your application uses only a fraction X<1 of the CPU processors, and a fraction Y<1 of video RAM, reports the number of GPUs as min(X, Y). In this case BOINC will attempt to run multiple jobs per GPU is possible.

        hu.ncudas = 1;

Return 0 to indicate that the application can be run on the host:

        return 0;