wiki:HomogeneousAppVersion

Version 1 (modified by davea, 13 years ago) (diff)

--

Homogeneous app version

BOINC's homogeneous redundancy (HR) mechanism lets you specify that multiple instances of a job must be run on hosts whose CPU and OS type are similar, to ensure that correct results are identical or sufficiently similar to compare.

The HR mechanism doesn't handle GPU app versions; e.g. it can't prevent situations where one instance is run with a GPU app version and another instance is run with a CPU app version.

We considered adding GPU info to the HR mechanism. This turned out to be infeasible.

Instead, Kevin Reed and I propose adding a new mechanism called homogeneous app version (HAV), which ensures that instances of a given job are run using the same app version (e.g., Win32/CUDA etc.). This can be specified on a per-application basis.

Notes:

  • You can use this together with HR.
  • Use this only when you're sure that all app versions are correct, since it eliminates cross-checking between versions.

Implementation notes

New DB fields

  • APP::homogeneous_app_version (bool)
  • WORKUNIT::app_version_id (int)

The latter is maintained like wu.hr_class: it's set when we first dispatch an instance of the job, and it's cleared if all instances error out.

Change to best_app_version():

if app.homogeneous_app_version and wu.app_version_id
   check if this host supports the app version's platform
   if app version has plan class, check if host can handle it
   check if we need work for the resource type

In some cases this may result in using a non-optional app version; e.g. we might use a CUDA 2.0 version for a host capable of running CUDA 2.3. So be it.

It's possible that the shared-memory job cache could get clogged up with jobs already committed to rare app versions. I don't have a plan for dealing with this.