Version 1 (modified by davea, 10 years ago) (diff)


Managing allocated resources

BOINC has a hardwired model of the following resources:

  • CPUs
  • physical memory and swap space
  • disk space

What about other types of resources, such as

  • GPU(s)
  • SPEs in a Cell processor

We'll assume that these resources are "allocated" rather than "scheduled": an application using a resource has it "locked" while the app is in memory, even if the app is suspended by BOINC or descheduled by the OS.

Proposed design

  1. We define an XML notation for resources. This might look like
        <type>Cell SPE</type>
        <type>NVIDIA 8800 GPU with 1.7 driver</type>
  2. The BOINC client will discover resources, and will pass the resource description in scheduler request messages
  3. An app_version record (in the server DB) will have a new field resource_requirements of the form
    A host is "compatible" with an app_version if, for each required resource, the host has at least n instances of a resource whose name matches REGEXP.
  4. In addition, app_version will have an acceleration field, representing the (approximate) speedup relative to CPU-only execution.
  5. The scheduler will be modified so that, when sending a job to a host, it finds the compatible app_version for which acceleration is greatest.
  6. The scheduler reply will include app_version.resource_requirements and app_version.acceleration.
  7. The client will be modified so that it keeps track of resource allocation, i.e. how many instances of each resource are free. It only runs an app if enough instances are available, and it decrements the counts accordingly.
  8. The client will be modified to use app_version.acceleration in estimating job completion times.

Possible future additions

  • Allow app_versions to specify min and max requirements (and have a corresponding allocation scheme in the client).
  • Let projects define their own resources, unknown to BOINC, and have "probe" programs (using the assigned-job mechanism) that surveys the resources on each host
  • Store the resource descriptions in the DB (or maybe flat files), so that you can study your host population