wiki:AppCoprocessor

Version 8 (modified by davea, 16 years ago) (diff)

--

Applications that use coprocessors

This document describes BOINC's support for applications that use coprocessors such as

  • GPUs
  • Cell SPEs

We'll assume that these resources are allocated rather than scheduled: i.e., an application using a coprocessor has it locked while the app is in memory, even if the app is suspended by BOINC or descheduled by the OS.

Proposed design

The BOINC client probes for coprocessors and reports them in scheduler requests. The XML looks like:

<coprocs>
   <coproc_cuda>
      <count>1</count>
      <name>GeForce 8800 GT (1)</name>
      <totalGlobalMem>...</totalGlobalMem>
      ...
   </coproc_cuda>
</coprocs>

An app_version record (in the server DB) has a character string field class.

The scheduler is linked with a project-supplied function

bool analyze_app(HOST&, char* class, HOST_USAGE&);

struct HOST_USAGE {
   COPROCS coprocs;   // coprocessors used by the app (name and count)
   double ncpus;      // #CPUs used by app (may be fractional)
   double flops;      // estimated FLOPS
   char opaque[256];  // passed to the app in init_data.xml
};

The HOST argument describes the host's CPU(s), and includes a field 'coprocs' listing its coprocessors.

The function returns true if the host's resources are sufficient for the app version. If true, it populates the HOST_USAGE structure.

When deciding whether to send a job to a host, the scheduler examines all latest-version app_versions for the platform, and selects the one for which flops is greatest.

The scheduler reply includes, for each app version, an XML encoding of HOST_USAGE.

The client keeps track of coprocessor allocation, i.e. how many instances of each are free. It only runs an app if enough instances are available.

The client uses app_version.usage.flops to estimate job completion times.

Questions

  • How does BOINC know if non-BOINC applications are using resources?

Possible future additions

  • Allow app_versions to specify min and max requirements (and have a corresponding allocation scheme in the client).
  • Let projects define their own resources, unknown to BOINC, and have "probe" programs (using the assigned-job mechanism) that surveys the resources on each host.
  • Store the resource descriptions in the DB (or maybe flat files), so that you can study your host population.