Context Navigation

Changes between Version 28 and Version 29 of CreditNew

Timestamp:: Mar 25, 2010, 4:49:20 PM (14 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

-                      v28
+                      v29
 == Credit system goals ==
 Some possible goals in designing a credit system:
+Some goals in designing a credit system:
  * Device neutrality: similar jobs should get similar credit
 …
    about the same amount of credit per host, averaged over all hosts.
+ * Cheat-proof: there should be a bound (say, 1.1)
+   on the ratio of credit granted to credit deserved per user account,
+   regardless of what the user does.
 == The first credit system ==
 In the first iteration of BOINC's credit system,
 "claimed credit" was defined as
+{{{
 C1 = H.whetstone * J.cpu_time
+}}}
+ C1 = H.whetstone * J.cpu_time
 There were then various schemes for taking the
 average or min claimed credit of the replicas of a job,
 …
 but multiplied it by a scaling factor to match SETI@home's average.
+This system had several problems:
+ * It didn't address GPUs.
+ * Project that couldn't count FLOPs still had device neutrality problems.
+ * It didn't prevent credit cheating when single replication was used.
+This system has several problems:
+ * It doesn't address GPUs properly; projects using GPUs
+   have to write custom code.
+ * Project that can't count FLOPs still have device neutrality problems.
+ * It doesn't prevent credit cheating when single replication is used.
 …
    grant more credit than projects with inefficient apps.  That's OK).
+== ''A priori'' job size estimates ==
+If we have an ''a priori'' estimate of job size,
+we can normalize by this to reduce the variance
+of various distributions (see below).
+This makes estimates of the means converge more quickly.
+We'll use workunit.rsc_fpops_est as this a priori estimate,
+and denote it E(J).
+(''A posteriori'' estimates of job size may exist also,
+e.g., an iteration count reported by the app,
+but aren't cheat-proof; we don't use them.)
 == Peak FLOP Count (PFC) ==
 …
 If the job is finished in elapsed time T,
 we define peak_flop_count(J), or PFC(J) as
+{{{
+PFC(J) = T * peak_flops(J)
+}}}
+ PFC(J) = T * peak_flops(J)
 Notes:
 …
    in the trickle message.
 The credit for a job J is proportional to PFC(J),
 but is normalized in the following ways:
 == ''A priori'' job size estimates ==
 If we have an ''a priori'' estimate of job size,
+we can normalize by this to reduce the variance
+of various distributions (see below).
+This makes estimates of the means converge more quickly.
+We'll use workunit.rsc_fpops_est as this a priori estimate,
+and we'll denote it E(J).
+(''A posteriori'' estimates of job size may exist also,
+e.g., an iteration count reported by the app,
+but aren't cheat-proof; we don't use them.)
+By default, the credit for a job J is proportional to PFC(J),
+but is limited and normalized in the following ways:
+== Sanity check ==
+If PFC(J) is infinite or is > wu.rsc_fpops_bound,
+J is assigned a "default PFC" and other processing is skipped.
+Default PFC is determined as follows:
+ * If min_avg_pfc(A) is defined (see below) then
+ D = min_avg_pfc(A) * E(J)
+ * Otherwise
+ D = wu.rsc_fpops_est
 == Cross-version normalization ==
 …
 We maintain the average PFC^mean^(V) of PFC(J)/E(J) for each app version V.
 We periodically compute PFC^mean^(CPU) and PFC^mean^(GPU),
+and let X be the min of these.
+An app version V's jobs are then scaled by the factor
+and compute X as follows:
+ * If there are only CPU or only GPU versions,
+   and at least 2 versions are above a sample threshold,
+   X is the average.
+ * If there are both, and at least 1 of each is above a sample
+   threshold, let X be the min of the averages.
+If X is defined, then we set
+ min_avg_pfc(A) = X
+This is an estimate of the app's average actual FLOPS.
+We also set
  Scale(V) = (X/PFC^mean^(V))
+An app version V's jobs are scaled by this factor.
 Notes:
+ * Version normalization is only applied if at least two
+   versions are above sample threshold.
  * Version normalization addresses the common situation
    where an app's GPU version is much less efficient than the CPU version
 …
    then this mechanism doesn't work as intended.
    One solution is to create separate apps for separate types of jobs.
+== Cross-project normalization ==
+ * Cheating or erroneous hosts can influence PFC^mean^(V) to
+   some extent.
+   This is limited by the Sanity Check mechanism,
+   and by the fact that only validated jobs are used.
+   The effect on credit will be negated by host normalization
+   (see below).
+   There may be an effect on cross-version normalization.
+   This could be eliminated by computing PFC^mean^(V)
+   as the sample-median value of PFC^mean^(H, V) (see below).
+== Host normalization ==
+The second normalization is across hosts.
+Assume jobs for a given app are distributed uniformly among hosts.
+Then the average credit per job should be the same for all hosts.
+To ensure this, for each app version V and host H
+we maintain PFC^mean^(H, A),
+the average of PFC(J)/E(J) for jobs completed by H using A.
+This yields the host scaling factor
+ Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A))
+There are some cases where hosts are not sent jobs uniformly:
+ * job-size matching (smaller jobs sent to slower hosts)
+ * GPUGrid.net's scheme for sending some (presumably larger)
+   jobs to GPUs with more processors.
+The normalization by E(J) handles this
+(assuming that wu.fpops_est is set appropriately).
+Notes:
+ * For some apps, the host normalization mechanism is prone to
+   a type of cheating called "cherry picking".
+   A mechanism for defeating this is described below.
+ * The host normalization mechanism reduces the claimed credit of hosts
+   that are less efficient than average,
+   and increases the claimed credit of hosts that are more efficient
+   than average.
+== Computing averages ==
+Computation of averages needs to take into account:
+ * The quantities being averaged may gradually change over time
+   (e.g. average job size may change)
+   and we need to track this.
+   This done as follows: for the first N samples
+   (N = ~100 for app versions, ~10 for hosts)
+   we take the straight average.
+   After that we use an exponential average
+   (with appropriate alpha for app version and host)
+ * A given sample may be wildly off,
+   and we can't let this mess up the average.
+   Non-first samples are capped at 10 times the current average.
+== Anonymous platform ==
+For anonymous platform apps,
+since we don't reliably know anything about the devices involved,
+we don't try to estimate PFC.
+For each app, we maintain min_avg_pfc(A),
+the average PFC for the most efficient version of A.
+The claimed credit for anonymous platform jobs is
+ claimed_credit^mean^(A)*E(J)
+The server maintains host_app_version records for anonymous platform,
+and it keeps track of elapsed time statistics there.
+These have app_version_id = -2 for CPU, -3 for NVIDIA GPU, -4 for ATI.
+== Claimed and granted credit ==
+The '''claimed FLOPS''' for a given job J is
+ F = PFC(J) * S(V) * S(H)
+and the claimed credit (in Cobblestones) is
+ C = F*100/86400e9
+When replication is used,
+We take the set of hosts that
+are not anon platform and not on scale probation (see below).
+If this set is nonempty, we grant the average of their claimed credit.
+Otherwise we grant
+ claimed_credit^mean^(A)*E(J)
+== Cross-project version normalization ==
 If an application has both CPU and GPU versions,
 …
 Projects will export the following data:
+{{{
 for each app version
+ for each app version
    app name
    platform name
 …
    plan class
    scale factor
-}}}
 The BOINC server will collect these from several projects
 and will export the following:
+{{{
 for each plan class
+ for each plan class
    average scale factor (weighted by RAC)
+}}}
 We'll provide a script that identifies app versions
 for GPUs with no corresponding CPU app version,
 …
  * The "average scaling factor" is weighted by work done.
-== Host normalization ==
-The second normalization is across hosts.
-Assume jobs for a given app are distributed uniformly among hosts.
-Then the average credit per job should be the same for all hosts.
-To ensure this, for each app version V and host H
-we maintain PFC^mean^(H, A),
-the average of PFC(J)/E(J) for jobs completed by H using A.
-This yields the host scaling factor
- Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A))
-There are some cases where hosts are not sent jobs uniformly:
- * job-size matching (smaller jobs sent to slower hosts)
- * GPUGrid.net's scheme for sending some (presumably larger)
-   jobs to GPUs with more processors.
-The normalization by E(J) handles this
-(assuming that wu.fpops_est is set appropriately).
-Notes:
- * The host normalization mechanism reduces the claimed credit of hosts
-   that are less efficient than average,
-   and increases the claimed credit of hosts that are more efficient
-   than average.
-== Claimed credit ==
-The '''claimed FLOPS''' for a given job J is then
- F = PFC(J) * S(V) * S(H)
-and the claimed credit (in Cobblestones) is
- C = F*100/86400e9
-== Computing averages ==
-We need to compute averages carefully because
- * The quantities being averaged may gradually change over time
-   (e.g. average job size may change)
-   and we need to track this.
- * A given sample may be wildly off,
-   and we can't let this mess up the average.
-The code that does this is
-[http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here].
-== Anonymous platform ==
-For anonymous platform apps,
-since we don't reliably know anything about the devices involved,
-we don't try to estimate PFC.
-For each app, we maintain min_avg_pfc(A),
-the average PFC for the most efficient version of A.
-The claimed credit for anonymous platform jobs is
- claimed_credit^mean^(A)*E(J)
-The server maintains host_app_version records for anonymous platform,
-and it keeps track of elapsed time statistics there.
-These have app_version_id = -2 for CPU, -3 for NVIDIA GPU, -4 for ATI.
-== Replication ==
-We take the set of hosts that
-are not anon platform and not on scale probation (see below).
-If this set is nonempty, we grant the average of their claimed credit.
-Otherwise we grant
- claimed_credit^mean^(A)*E(J)
 == Cheat prevention ==
 …
 doesn't deal effectively with cherry picking,
 We propose the following mechanism to deal with cherry picking:
+The following mechanism deals with cherry picking:
  * For each (host, app version) maintain "host_scale_time".
 …
 Because this mechanism is punitive to hosts
 that experience actual failures,
 we'll make it selectable on a per-application basis (default off).
+it's selectable on a per-application basis (default off).
 In addition, to limit the extent of cheating
 (in case the above mechanism is defeated somehow)
 the host scaling factor will be min'd with a constant (say, 3).
+the host scaling factor will be min'd with a constant (say, 10).
 == Error rate, host punishment, and turnaround time estimation ==