Context Navigation

Changes between Version 24 and Version 25 of CreditNew

Timestamp:: Mar 10, 2010, 9:02:47 AM (14 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v24
+                      v25
 When a job J is issued to a host,
+the scheduler specifies flops_est(J),
+a FLOPS estimate based on the resources used by the job
+and their peak speeds.
+the scheduler computes peak_flops(J)
+based on the resources used by the job and their peak speeds.
 If the job is finished in elapsed time T,
 we define peak_flop_count(J), or PFC(J) as
 {{{
 PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
+PFC(J) = T * peak_flops(J)
 }}}
 Notes:
- * PFC(J) is
  * We use elapsed time instead of actual device time (e.g., CPU time).
    If a job uses a resource inefficiently
 …
    The key thing is that BOINC reserved the device for the job,
    whether or not the job used it efficiently.
  * usage(J,D) may not be accurate; e.g., a GPU job may take
+ * peak_flops(J) may not be accurate; e.g., a GPU job may take
    more or less CPU than the scheduler thinks it will.
    Eventually we may switch to a scheme where the client
 …
  * For projects (CPDN) that grant partial credit via
    trickle-up messages, substitute "partial job" for "job".
    These projects must include elapsed time,
    app version ID, and FLOPS estimate in the trickle message.
 The granted credit for a job J is proportional to PFC(J),
+   These projects must include elapsed time and result ID
+   in the trickle message.
+The credit for a job J is proportional to PFC(J),
 but is normalized in the following ways:
+== ''A priori'' job size estimates ==
+If we have an ''a priori'' estimate of job size,
+we can normalize by this to reduce the variance
+of various distributions (see below).
+This makes estimates of the means converge more quickly.
+We'll use workunit.rsc_fpops_est as this a priori estimate,
+and we'll denote it E(J).
+''A posteriori'' estimates of job size may exist also
+(e.g., an iteration count reported by the app)
+but using this for anything introduces a new cheating risk,
+so it's probably better not to.
 == Cross-version normalization ==
 If a given application has multiple versions (e.g., CPU and GPU versions)
 the granted credit per job is adjusted
+the credit per job is adjusted
 so that the average is the same for each version.
 We maintain the average PFC^mean^(V) of PFC() for each app version V.
+We maintain the average PFC^mean^(V) of PFC(J)/E(J) for each app version V.
 We periodically compute PFC^mean^(CPU) and PFC^mean^(GPU),
 and let X be the min of these.
 …
  S(V) = (X/PFC^mean^(V))
+The result for a given job J
+is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
+The "Version-Normalized Peak FLOP Count", or VNPFC(J) is
  VNPFC(J) = S(V) * PFC(J)
 Notes:
  * This addresses the common situation
+ * Version normalization addresses the common situation
    where an app's GPU version is much less efficient than the CPU version
    (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
 …
    It's not exactly "Actual FLOPs", since the most efficient
    version may not be 100% efficient.
- * There are two sources of variance in PFC(V):
-   the variation in host efficiency,
-   and possibly the variation in job size.
-   If we have an ''a priori'' estimate of job size
-   (e.g., workunit.rsc_fpops_est)
-   we can normalize by this to reduce the variance,
-   and make PFC^mean^(V) converge more quickly.
- * ''a posteriori'' estimates of job size may exist also
-   (e.g., an iteration count reported by the app)
-   but using this for anything introduces a new cheating risk,
-   so it's probably better not to.
 == Cross-project normalization ==
 If an application has both CPU and GPU versions,
+then the version normalization mechanism uses the CPU
+version as a "sanity check" to limit the credit granted to GPU jobs.
+the version normalization mechanism uses the CPU
+version as a "sanity check" to limit the credit granted to GPU jobs
+(or vice versa).
 Suppose a project has an app with only a GPU version,
 …
 then for each version V we let
 S(V) be the average scaling factor
 for that plan class among projects that do have both CPU and GPU versions.
+for that resource type among projects that have both CPU and GPU versions.
 This factor is obtained from a central BOINC server.
 V's jobs are then scaled by S(V) as above.
 …
 Notes:
+ * wu use plan class,
+   since e.g. the average efficiency of CUDA 2.3 apps may be different
+   than that of CUDA 2.1 apps.
+ * Initially we'll obtain scaling factors from large projects
+   that have both GPU and CPU apps (e.g., SETI@home).
+   Eventually we'll use an average (weighted by work done)
+   over multiple projects (see below).
+ * The "average scaling factor" is weighted by work done.
 == Host normalization ==
 …
 Then the average credit per job should be the same for all hosts.
 To ensure this, for each app version V and host H
+we maintain PFC^mean^(H, A).
+we maintain PFC^mean^(H, A),
+the average of PFC(J)/E(J) for jobs completed by H using A.
 The '''claimed FLOPS''' for a given job J is then
 …
  * GPUGrid.net's scheme for sending some (presumably larger)
    jobs to GPUs with more processors.
+In these cases average credit per job must differ between hosts,
+according to the types of jobs that are sent to them.
+This can be done by dividing
+each sample in the computation of PFC^mean^ by WU.rsc_fpops_est
+(in fact, there's no reason not to always do this).
+The normalization by E(J) handles this
+(assuming that wu.fpops_est is set appropriately).
 Notes:
 …
    and increases the claimed credit of hosts that are more efficient
    than average.
- * PFC^mean^ is averaged over jobs, not hosts.
 == Computing averages ==
 …
  * The quantities being averaged may gradually change over time
+   (e.g. average job size may change,
+   app version efficiency may change as new versions are deployed)
+   (e.g. average job size may change)
    and we need to track this.
  * A given sample may be wildly off,
    and we can't let this mess up the average.
+In addition, we may as well maintain the variance of the quantities,
+although the current system doesn't use it.
+The code that does all this is
+The code that does this is
 [http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here].
 …
 and sets their scaling factor based on the above.
+== Anonymous platform ==
+For anonymous platform apps, since we don't reliably
+know anything about the devices involved,
+we don't try to estimate PFC.
+For each app, we maintain claimed_credit^mean^(A),
+the average of claimed_credit(J)/E(J).
+The claimed credit for anonymous platform jobs is
+ claimed_credit^mean^(A)*E(J)
+The server maintains host_app_version records for anonymous platform,
+and it keeps track of elapsed time statistics there.
+These have app_version_id = -1 for CPU, -2 for NVIDIA GPU, -3 for ATI.
+== Replication ==
+We take the set of hosts that
+are not anon platform and not on scale probation (see below).
+If this set is nonempty, we grant the average of their claimed credit.
+Otherwise we grant
+ claimed_credit^mean^(A)*E(J)
 == Cheat prevention ==
 …
 by claiming excessive credit
 (i.e., by falsifying benchmark scores or elapsed time).
 An exaggerated claim will increase VNPFC*(H,A),
 causing subsequent claimed credit to be scaled down proportionately.
+An exaggerated claim will increase PFC^mean^(H,A),
+causing subsequent credit to be scaled down proportionately.
 This means that no special cheat-prevention scheme
 …
 in this case, granted credit = claimed credit.
-For jobs that are replicated,
-granted credit is set to:
- * if the larger host is on scale probation, the smaller
- * if larger > 2*smaller, granted = 1.5*smaller
- * else granted = (larger+smaller)/2
 However, two kinds of cheating still have to be dealt with:
 …
 For example, claiming a PFC of 1e304.
+This can be minimized by
+capping VNPFC(J) at some multiple (say, 20) of VNPFC^mean^(A).
 If this is enforced, the host's error rate is set to the initial value,
+If PFC(J) exceeds some multiple (say, 20) of PFC^mean^(V),
+the host's error rate is set to the initial value,
 so it won't do single replication for a while,
 and scale_probation (see below) is set to true.
 …
 In addition, to limit the extent of cheating
 (in case the above mechanism is defeated somehow)
+the host scaling factor will be min'd with a
+project-wide config parameter (default, say, 3).
+== Trickle credit ==
+CPDN breaks jobs into segments,
+has the client send a trickle-up message on completion of each segment,
+and grants credit in the trickle-up handler.
+In this case, the trickle-up message should include
+the incremental elapsed time of the the segment.
+The trickle-up handler should then call {{{compute_claimed_credit()}}}
+(see below) to determine the claimed credit.
+In this case segments play the role of jobs in the credit-related DB fields.
+the host scaling factor will be min'd with a constant (say, 3).
 == Error rate, host punishment, and turnaround time estimation ==
 …
 so we'll move "max_results_day" from the host table to host_app_version.
-== Anonymous platform ==
-For anonymous platform apps, since we don't necessarily
-know anything about the devices involved,
-we don't try to estimate PFC.
-Instead, we give the average credit for the app,
-scaled by the job size.
-The server maintains host_app_version record for anonymous platform,
-and it keeps track of elapsed time statistics there.
-These have app_version_id = -1 for CPU, -2 for NVIDIA GPU, -3 for ATI.
 == App plan functions ==
 …