Changes between Initial Version and Version 1 of HomogeneousAppVersion


Ignore:
Timestamp:
Jun 3, 2011, 3:18:44 PM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • HomogeneousAppVersion

    v1 v1  
     1= Homogeneous app version =
     2
     3BOINC's [HomogeneousRedundancy homogeneous redundancy] (HR) mechanism lets you
     4specify that multiple instances of a job must be run on hosts
     5whose CPU and OS type are similar,
     6to ensure that correct results are identical or
     7sufficiently similar to compare.
     8
     9The HR mechanism doesn't handle GPU app versions;
     10e.g. it can't prevent situations where one instance is
     11run with a GPU app version and another instance is run with a CPU app version.
     12
     13We considered adding GPU info to the HR mechanism.
     14This turned out to be infeasible.
     15
     16Instead, Kevin Reed and I propose adding a new mechanism called '''homogeneous app version''' (HAV),
     17which ensures that instances of a given job are run using the same app version
     18(e.g., Win32/CUDA etc.).
     19This can be specified on a per-application basis.
     20
     21Notes:
     22
     23 * You can use this together with HR.
     24 * Use this only when you're sure that all app versions are correct,
     25   since it eliminates cross-checking between versions.
     26
     27== Implementation notes ==
     28
     29New DB fields
     30
     31 * APP::homogeneous_app_version (bool)
     32 * WORKUNIT::app_version_id  (int)
     33
     34The latter is maintained like wu.hr_class:
     35it's set when we first dispatch an instance of the job,
     36and it's cleared if all instances error out.
     37
     38Change to best_app_version():
     39{{{
     40if app.homogeneous_app_version and wu.app_version_id
     41   check if this host supports the app version's platform
     42   if app version has plan class, check if host can handle it
     43   check if we need work for the resource type
     44}}}
     45
     46In some cases this may result in using a non-optional app version;
     47e.g. we might use a CUDA 2.0 version for a host capable of running CUDA 2.3.
     48So be it.
     49
     50It's possible that the shared-memory job cache could get clogged up
     51with jobs already committed to rare app versions.
     52I don't have a plan for dealing with this.