Context Navigation

Changes between Version 3 and Version 4 of CreditNew

Timestamp:: Nov 3, 2009, 9:25:51 AM (14 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v3
+                      v4
 For GPUs, it's given by a manufacturer-supplied formula.
+Applications access memory,
+However, other factors affect application performance.
+For example, applications access memory,
 and the speed of a host's memory system is not reflected
 in its Whetstone score.
 …
 is the ratio of actual FLOPS to peak FLOPS.
 GPUs typically have a much higher (50-100X) peak speed than GPUs.
+GPUs typically have a much higher (50-100X) peak FLOPS than GPUs.
 However, application efficiency is typically lower
 (very roughly, 10% for GPUs, 50% for CPUs).
 …
    about the same amount of credit per day for a given host.
+It's easy to show that both goals can't be satisfied simultaneously
+when there is more than one type of processing resource.
+It's easy to show that both goals can't be satisfied simultaneously.
 == The first credit system ==
 …
 }}}
 There were then various schemes for taking the
 average or min of the claimed credit of the replicas of a job,
+average or min claimed credit of the replicas of a job,
 and using that as the "granted credit".
 …
 We call this approach "Actual-FLOPs-based".
 SETI@home had an application that allowed counting of FLOPs,
 and they adopted this system.
 They added a scaling factor so that the average credit per job
+SETI@home's application allowed counting of FLOPs,
+and they adopted this system,
+adding a scaling factor so that average credit per job
 was the same as the first credit system.
 …
 == Goals of the new (third) credit system ==
  * Completely automate credit - projects don't have to
+ * Completely automated - projects don't have to
    change code, settings, etc.
 …
  * Limited project neutrality: different projects should grant
+   about the same amount of credit per CPU hour,
+   averaged over hosts.
+   about the same amount of credit per CPU hour, averaged over hosts.
    Projects with GPU apps should grant credit in proportion
    to the efficiency of the apps.
 …
 == Peak FLOP Count (PFC) ==
 This system goes back to the Peak-FLOPS-based approach,
+This system uses the Peak-FLOPS-based approach,
 but addresses its problems in a new way.
 …
    For now, though, we'll just use the scheduler's estimate.
 The idea of the system is that granted credit for a job J is proportional to PFC(J),
+The granted credit for a job J is proportional to PFC(J),
 but is normalized in the following ways:
 == Cross-version normalization ==
 If a given application has multiple versions (e.g., CPU and GPU versions)
+the average granted credit is the same for each version.
+the granted credit per job is adjusted
+so that the average is the same for each version.
 The adjustment is always downwards:
+we maintain the average PFC*(V) of PFC() for each app version,
+find the minimum X,
+then scale each app version's jobs by (X/PFC*(V)).
+The result is called "Version-Normalized Peak FLOP Count", or VNPFC(J).
+Notes:
+ * This mechanism provides device neutrality.
+we maintain the average PFC*(V) of PFC() for each app version V,
+find the minimum X.
+An app version V's jobs are then scaled by the factor
+{{{
+S(V) = (X/PFC*(V))
+}}}
+The result for a given job J
+is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
+{{{
+VNPFC(J) = PFC(J) * (X/PFC*(V))
+}}}
+Notes:
  * This addresses the common situation
    where an app's GPU version is much less efficient than the CPU version
 …
    It's not exactly "Actual FLOPs", since the most efficient
    version may not be 100% efficient.
- * Averages are computed as a moving average,
-   so that the system will respond quickly as job sizes change
-   or new app versions are deployed.
 == Cross-project normalization ==
 …
 If an application has both CPU and GPU versions,
 then the version normalization mechanism uses the CPU
 version as a "sanity check" to limit the credit granted for GPU jobs.
+version as a "sanity check" to limit the credit granted to GPU jobs.
 Suppose a project has an app with only a GPU version,
 so there's no CPU version to act as a sanity check.
 If we grant credit based only on GPU peak speed,
 the project will grant much more credit per GPU hour than
 other projects, violating limited project neutrality.
 The solution to this is: if an app has only GPU versions,
 then we scale its granted credit by the average scaling factor
+for that GPU type among projects that
 do have both CPU and GPU versions.
+the project will grant much more credit per GPU hour than other projects,
+violating limited project neutrality.
+A solution to this: if an app has only GPU versions,
+then for each version V we let
+S(V) be the average scaling factor
+for that GPU type among projects that do have both CPU and GPU versions.
 This factor is obtained from a central BOINC server.
+V's jobs are then scaled by S(V) as above.
 Notes:
  * Projects will run a periodic script to update the scaling factors.
  * Rather than GPU type, we'll actually use plan class,
+ * Rather than GPU type, we'll probably use plan class,
    since e.g. the average efficiency of CUDA 2.3 apps may be different
    from that of CUDA 2.1 apps.
+   than that of CUDA 2.1 apps.
  * Initially we'll obtain scaling factors from large projects
    that have both GPU and CPU apps (e.g., SETI@home).
+   Eventually we'll use an average (weighted by work done) over multiple projects.
+   Eventually we'll use an average (weighted by work done) over multiple projects
+   (see below).
 == Host normalization ==
+For a given application, all hosts should get the same average granted credit per job.
+For a given application,
+all hosts should get the same average granted credit per job.
 To ensure this, for each application A we maintain the average VNPFC*(A),
 and for each host H we maintain VNPFC*(H, A).
 …
    some (presumably larger) jobs to GPUs with more processors.
    To deal with this, we can weight jobs by workunit.rsc_flops_est.
+== Computing averages ==
+ * Averages are computed as a moving average,
+   so that the system will respond quickly as job sizes change
+   or new app versions are deployed.
+== Jobs versus app units ==
+== Cross-project scaling factors ==
 == Replication and cheating ==
 …
 double min_avg_vnpfc;           // min value of app_version.avg_vnpfc
 }}}