Context Navigation

Changes between Version 6 and Version 7 of CreditNew

Timestamp:: Nov 3, 2009, 2:37:20 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v6
+                      v7
 is the ratio of actual FLOPS to peak FLOPS.
 GPUs typically have a much higher (50-100X) peak FLOPS than GPUs.
+GPUs typically have a much higher (50-100X) peak FLOPS than CPUs.
 However, application efficiency is typically lower
 (very roughly, 10% for GPUs, 50% for CPUs).
 …
    It's not exactly "Actual FLOPs", since the most efficient
    version may not be 100% efficient.
+ * There are two sources of variance in PFC(V):
+   the variation in host efficiency,
+   and possibly the variation in job size.
+   If we have an ''a priori'' estimate of job size
+   (e.g., workunit.rsc_fpops_est)
+   we can normalize by this to reduce the variance,
+   and make PFC*(V) converge more quickly.
+ * ''a posteriori'' estimates of job size may exist also
+   (e.g., an iteration count reported by the app)
+   but using this for anything introduces a new cheating risk,
+   so it's probably better not to.
 == Cross-project normalization ==
 …
 Assuming that hosts are sent jobs for a given app uniformly,
 then for a given app
+then, for that app,
 hosts should get the same average granted credit per job.
 To ensure this, for each application A we maintain the average VNPFC*(A),
 …
 There are some cases where hosts are not sent jobs uniformly:
  * job-size matching
+ * job-size matching (smaller jobs sent to slower hosts)
  * GPUGrid.net's scheme for sending some (presumably larger)
    jobs to GPUs with more processors.
+In these cases we must scale
+In these cases average credit per job must differ between hosts,
+according to the types of jobs that are sent to them.
+This can be done by dividing
+each sample in the computation of VNPFC* by WU.rsc_fpops_est
+(in fact, there's no reason not to always do this).
 Notes:
  * This mechanism reduces the claimed credit of hosts
+ * The host normalization mechanism reduces the claimed credit of hosts
    that are less efficient than average,
    and increases the claimed credit of hosts that are more efficient
 …
 }}}
-== Jobs versus app units ==
-   To deal with this, we can weight jobs by workunit.rsc_flops_est.
-If a project changes between jobs to app units,
-it must reset
 == Cross-project scaling factors ==
 …
 granted credit = claimed credit.
 For jobs that are replicated, granted credit is be
+For jobs that are replicated, granted credit should be
 set to the min of the valid results
 (min is used instead of average to remove the incentive
 …
 == Job runtime estimates ==
+Unrelated to the credit proposal, but in a similar spirit.
+The server will maintain ET*(H, V), the statistics of
+job runtimes (normalized by wu.rsc_fpops_est) per
+host and application version.
+The server's estimate of a job's runtime is then
+{{{
+R(J, H) = wu.rsc_fpops_est * ET*(H, V)
+}}}
 == Implementation ==