Changes between Version 14 and Version 15 of CreditNew


Ignore:
Timestamp:
Nov 16, 2009, 10:57:11 AM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v14 v15  
    341341New fields in '''app_version''':
    342342{{{
     343= New credit system design =
     344
     345== Peak FLOPS and efficiency ==
     346
     347BOINC estimates the peak FLOPS of each processor.
     348For CPUs, this is the Whetstone benchmark score.
     349For GPUs, it's given by a manufacturer-supplied formula.
     350
     351However, other factors affect application performance.
     352For example, applications access memory,
     353and the speed of a host's memory system is not reflected
     354in its Whetstone score.
     355So a given job might take the same amount of CPU time
     356and a 1 GFLOPS host as on a 10 GFLOPS host.
     357The "efficiency" of an application running on a given host
     358is the ratio of actual FLOPS to peak FLOPS.
     359
     360GPUs typically have a much higher (50-100X) peak FLOPS than CPUs.
     361However, application efficiency is typically lower
     362(very roughly, 10% for GPUs, 50% for CPUs).
     363
     364Notes:
     365
     366 * The peaks FLOPS of a device is single or double precision,
     367   whichever is higher.
     368   Differentiating between single and double would unnecessarily
     369   complicate things, and the distinction will disappear soon anyway.
     370
     371== Credit system goals ==
     372
     373Some possible goals in designing a credit system:
     374
     375 * Device neutrality: similar jobs should get similar credit
     376   regardless of what processor or GPU they run on.
     377
     378 * Project neutrality: different projects should grant
     379   about the same amount of credit per day for a given processor.
     380
     381It's easy to show that both goals can't be satisfied simultaneously.
     382
     383== The first credit system ==
     384
     385In the first iteration of BOINC's credit system,
     386"claimed credit" was defined as
     387{{{
     388C1 = H.whetstone * J.cpu_time
     389}}}
     390There were then various schemes for taking the
     391average or min claimed credit of the replicas of a job,
     392and using that as the "granted credit".
     393
     394We call this system "Peak-FLOPS-based" because
     395it's based on the CPU's peak performance.
     396
     397The problem with this system is that, for a given app version,
     398efficiency can vary widely between hosts.
     399In the above example,
     400the 10 GFLOPS host would claim 10X as much credit,
     401and its owner would be upset when it was granted only a tenth of that.
     402
     403Furthermore, the credits granted to a given host for a
     404series of identical jobs could vary widely,
     405depending on the host it was paired with by replication.
     406This seemed arbitrary and unfair to users.
     407
     408== The second credit system ==
     409
     410We then switched to the philosophy that
     411credit should be proportional to number of FLOPs actually performed
     412by the application.
     413We added API calls to let applications report this.
     414We call this approach "Actual-FLOPs-based".
     415
     416SETI@home's application allowed counting of FLOPs,
     417and they adopted this system,
     418adding a scaling factor so that average credit per job
     419was the same as the first credit system.
     420
     421Not all projects could count FLOPs, however.
     422So SETI@home published their average credit per CPU second,
     423and other projects continued to use benchmark-based credit,
     424but multiplied it by a scaling factor to match SETI@home's average.
     425
     426This system had several problems:
     427
     428 * It didn't address GPUs.
     429 * Project that couldn't count FLOPs still had device neutrality problems.
     430 * It didn't prevent credit cheating when single replication was used.
     431
     432
     433== Goals of the new (third) credit system ==
     434
     435 * Completely automated - projects don't have to
     436   change code, settings, etc.
     437
     438 * Device neutrality
     439
     440 * Limited project neutrality: different projects should grant
     441   about the same amount of credit per CPU hour, averaged over hosts.
     442   Projects with GPU apps should grant credit in proportion
     443   to the efficiency of the apps.
     444   (This means that projects with efficient GPU apps will
     445   grant more credit on average.  That's OK).
     446
     447== Peak FLOP Count (PFC) ==
     448
     449This system uses the Peak-FLOPS-based approach,
     450but addresses its problems in a new way.
     451
     452When a job is issued to a host, the scheduler specifies usage(J,D),
     453J's usage of processing resource D:
     454how many CPUs and how many GPUs (possibly fractional).
     455
     456If the job is finished in elapsed time T,
     457we define peak_flop_count(J), or PFC(J) as
     458{{{
     459PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
     460}}}
     461
     462Notes:
     463
     464 * We use elapsed time instead of actual device time (e.g., CPU time).
     465   If a job uses a resource inefficiently
     466   (e.g., a CPU job that does lots of disk I/O)
     467   PFC() won't reflect this.  That's OK.
     468   The key thing is that BOINC reserved the device for the job,
     469   whether or not the job used it efficiently.
     470 * usage(J,D) may not be accurate; e.g., a GPU job may take
     471   more or less CPU than the scheduler thinks it will.
     472   Eventually we may switch to a scheme where the client
     473   dynamically determines the CPU usage.
     474   For now, though, we'll just use the scheduler's estimate.
     475
     476The granted credit for a job J is proportional to PFC(J),
     477but is normalized in the following ways:
     478
     479== Cross-version normalization ==
     480
     481If a given application has multiple versions (e.g., CPU and GPU versions)
     482the granted credit per job is adjusted
     483so that the average is the same for each version.
     484The adjustment is always downwards:
     485we maintain the average PFC^mean^(V) of PFC() for each app version V,
     486find the minimum X.
     487An app version V's jobs are then scaled by the factor
     488
     489 S(V) = (X/PFC^mean^(V))
     490
     491
     492The result for a given job J
     493is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
     494
     495 VNPFC(J) = PFC(J) * (X/PFC^mean^(V))
     496
     497Notes:
     498 * This addresses the common situation
     499   where an app's GPU version is much less efficient than the CPU version
     500   (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
     501   To a certain extent, this mechanism shifts the system
     502   towards the "Actual FLOPs" philosophy,
     503   since credit is granted based on the most efficient app version.
     504   It's not exactly "Actual FLOPs", since the most efficient
     505   version may not be 100% efficient.
     506 * There are two sources of variance in PFC(V):
     507   the variation in host efficiency,
     508   and possibly the variation in job size.
     509   If we have an ''a priori'' estimate of job size
     510   (e.g., workunit.rsc_fpops_est)
     511   we can normalize by this to reduce the variance,
     512   and make PFC^mean^(V) converge more quickly.
     513 * ''a posteriori'' estimates of job size may exist also
     514   (e.g., an iteration count reported by the app)
     515   but using this for anything introduces a new cheating risk,
     516   so it's probably better not to.
     517
     518
     519== Cross-project normalization ==
     520
     521If an application has both CPU and GPU versions,
     522then the version normalization mechanism uses the CPU
     523version as a "sanity check" to limit the credit granted to GPU jobs.
     524
     525Suppose a project has an app with only a GPU version,
     526so there's no CPU version to act as a sanity check.
     527If we grant credit based only on GPU peak speed,
     528the project will grant much more credit per GPU hour than other projects,
     529violating limited project neutrality.
     530
     531A solution to this: if an app has only GPU versions,
     532then for each version V we let
     533S(V) be the average scaling factor
     534for that GPU type among projects that do have both CPU and GPU versions.
     535This factor is obtained from a central BOINC server.
     536V's jobs are then scaled by S(V) as above.
     537
     538Notes:
     539
     540 * Projects will run a periodic script to update the scaling factors.
     541 * Rather than GPU type, we'll probably use plan class,
     542   since e.g. the average efficiency of CUDA 2.3 apps may be different
     543   than that of CUDA 2.1 apps.
     544 * Initially we'll obtain scaling factors from large projects
     545   that have both GPU and CPU apps (e.g., SETI@home).
     546   Eventually we'll use an average (weighted by work done) over multiple projects
     547   (see below).
     548
     549== Host normalization ==
     550
     551Assuming that hosts are sent jobs for a given app uniformly,
     552then, for that app,
     553hosts should get the same average granted credit per job.
     554To ensure this, for each application A we maintain the average VNPFC^mean^(A),
     555and for each host H we maintain VNPFC^mean^(H, A).
     556The '''claimed FLOPS''' for a given job J is then
     557
     558 F = VNPFC(J) * (VNPFC^mean^(A)/VNPFC^mean^(H, A))
     559
     560and the claimed credit (in Cobblestones) is
     561
     562 C = F*100/86400e9
     563
     564There are some cases where hosts are not sent jobs uniformly:
     565 * job-size matching (smaller jobs sent to slower hosts)
     566 * GPUGrid.net's scheme for sending some (presumably larger)
     567   jobs to GPUs with more processors.
     568In these cases average credit per job must differ between hosts,
     569according to the types of jobs that are sent to them.
     570
     571This can be done by dividing
     572each sample in the computation of VNPFC^mean^ by WU.rsc_fpops_est
     573(in fact, there's no reason not to always do this).
     574
     575Notes:
     576 * The host normalization mechanism reduces the claimed credit of hosts
     577   that are less efficient than average,
     578   and increases the claimed credit of hosts that are more efficient
     579   than average.
     580 * VNPFC^mean^ is averaged over jobs, not hosts.
     581
     582== Computing averages ==
     583
     584We need to compute averages carefully because
     585
     586 * The quantities being averaged may gradually change over time
     587   (e.g. average job size may change,
     588   app version efficiency may change as new versions are deployed)
     589   and we need to track this.
     590 * A given sample may be wildly off,
     591   and we can't let this mess up the average.
     592 * Averages should be weighted by job size.
     593
     594In addition, we may as well maintain the variance of the quantities,
     595although the current system doesn't use it.
     596
     597The code that does all this is
     598[http://boinc.berkeley.edu/trac/browser/trunk/boinc/lib/average.h here].
     599
     600== Cross-project scaling factors ==
     601
     602We'll have a script that publishes a project's
     603accounting data (see Implementation).
     604The BOINC web site will collect these from a few big projects
     605and publish the averages.
     606
     607== Replication and cheating ==
     608
     609Host normalization mostly eliminates the incentive to cheat
     610by claiming excessive credit
     611(i.e., by falsifying benchmark scores or elapsed time).
     612An exaggerated claim will increase VNPFC*(H,A),
     613causing subsequent claimed credit to be scaled down proportionately.
     614This means that no special cheat-prevention scheme
     615is needed for single replications;
     616granted credit = claimed credit.
     617
     618For jobs that are replicated, granted credit should be
     619set to the min of the valid results
     620(min is used instead of average to remove the incentive
     621for cherry-picking, see below).
     622
     623However, there are still some possible forms of cheating.
     624
     625 * One-time cheats (like claiming 1e304) can be prevented by
     626   capping VNPFC(J) at some multiple (say, 10) of VNPFC^mean^(A).
     627 * Cherry-picking: suppose an application has two types of jobs,
     628  which run for 1 second and 1 hour respectively.
     629  Clients can figure out which is which, e.g. by running a job for 2 seconds
     630  and seeing if it's exited.
     631  Suppose a client systematically refuses the 1 hour jobs
     632  (e.g., by reporting a crash or never reporting them).
     633  Its VNPFC^mean^(H, A) will quickly decrease,
     634  and soon it will be getting several thousand times more credit
     635  per actual work than other hosts!
     636  Countermeasure:
     637  whenever a job errors out, times out, or fails to validate,
     638  set the host's error rate back to the initial default,
     639  and set its VNPFC^mean^(H, A) to VNPFC^mean^(A) for all apps A.
     640  This puts the host to a state where several dozen of its
     641  subsequent jobs will be replicated.
     642
     643== Trickle credit ==
     644
     645
     646== Job runtime estimates ==
     647
     648Unrelated to the credit proposal, but in a similar spirit.
     649The server will maintain ET^mean^(H, V), the statistics of
     650job runtimes (normalized by wu.rsc_fpops_est) per
     651host and application version.
     652
     653The server's estimate of a job's runtime is then
     654
     655 R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
     656
     657
     658== Error rate,host punishment, and turnaround time estimation ==
     659
     660Unrelated to the credit proposal, but in a similar spirit.
     661
     662Due to hardware problems (e.g. a malfunctioning GPU)
     663a host may have a 100% error rate for one app version
     664and a 0% error rate for another.
     665Similar for turnaround time.
     666
     667The host punishment mechanism is designed to
     668deal with malfunctioning hosts.
     669For each host the server maintains '''max_results_day'''.
     670This is initialized to a project-specified value (e.g. 200)
     671and scaled by the number of CPUs and/or GPUs.
     672It's decremented if the client reports a crash
     673(but not if the job was aborted).
     674It's doubled when a successful (but not necessarily valid)
     675result is received.
     676
     677So we'll move the "error_rate", "turnaround_time"
     678and "max_results_day" fields from the host table to host_app_version.
     679
     680== Cherry picking ==
     681
     682Suppose an application has a mix of long and short jobs.
     683If a client intentionally discards
     684(or aborts, or reports errors from) the long jobs,
     685but completes the short jobs,
     686its host scaling factor will become large,
     687and it will get excessive credit for the short jobs.
     688This is called "cherry picking".
     689
     690Note: the host punishment mechanism
     691doesn't deal effectively with cherry picking,
     692
     693We propose the following mechanism to deal with cherry picking:
     694
     695 * For each (host, app version), maintain "host_scale_time".
     696   This is the earliest time at which host scaling will be applied.
     697   This is set initially, and each time a job times out
     698   or errors out, to now+X, where X is the app's delay bound.
     699
     700The idea is to apply the host scaling factor
     701only if there's solid evidence that the host is NOT cherry picking.
     702
     703== Implementation ==
     704
     705=== Database changes ===
     706
     707New table '''host_app''':
     708{{{
     709int    host_id;
     710int    app_id;
     711int    vnpfc_n;
     712double vnpfc_sum;
     713double vnpfc_exp_avg;
     714}}}
     715
     716New table '''host_app_version''':
     717{{{
     718int    host_id;
     719int    app_version_id;
     720int    et_n;
     721double et_sum;
     722double et_exp_avg;
     723// some variable for recent error rate,
     724// replacing host.error_rate and host.max_results_day
     725// make sure that a host w/ 1 good and 1 bad GPU gets few GPU jobs
     726}}}
     727
     728New fields in '''app_version''':
     729{{{
    343730int    pfc_n;
    344731double pfc_sum;
     
    367754
    368755== Compatibility ==
     756int    pfc_n;
     757double pfc_sum;
     758double pfc_exp_avg;
     759double pfc_scaling_factor;
     760}}}
     761
     762New fields in '''app''':
     763{{{
     764int    vnpfc_n;
     765double vnpfc_sum;
     766double vnpfc_exp_avg;
     767}}}
     768
     769=== New request message fields ===
     770
     771=== New reply message fields ===
     772
     773=== Scheduler changes ===
     774
     775=== Client changes ===
     776
     777=== Validator changes ===
     778
     779=== Server APIs for computing and granting credit ===
     780
     781== Compatibility ==
     782