Changes between Version 30 and Version 31 of CreditNew


Ignore:
Timestamp:
Mar 26, 2010, 1:36:22 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v30 v31  
    100100== ''A priori'' job size estimates and bounds ==
    101101
    102 Projects supply estimates of the FLOPs used by a job
    103 (wu.rsc_fpops_est)
    104 and a limit on FLOPS, after which the job will be aborted
    105 (wu.rsc_fpops_bound).
    106 
    107 Previously, inaccuracy of rsc_fpops_est caused problems.
    108 The new system still uses rsc_fpops_est,
    109 but its primary purpose is now to indicate the relative size of jobs.
    110 Averages of job sizes are normalized by rsc_fpops_est,
    111 and if rsc_fpops_est is correlated with actual size,
     102For each job, the project supplies
     103 * an estimate of the FLOPs used by a job (wu.fpops_est)
     104 * a limit on FLOPS, after which the job will be aborted
     105  (wu.fpops_bound).
     106
     107Previously, inaccuracy of fpops_est caused problems.
     108The new system still uses fpops_est,
     109but its primary purpose is now to indicate the relative sizes of jobs.
     110
     111Averages of FLOP count and elapsed time
     112are normalized by fpops_est (see below),
     113and if fpops_est is correlated with actual size,
    112114these averages will converge more quickly.
    113 
    114 We'll denote workunit.rsc_fpops_est as E(J).
    115115
    116116Notes:
     
    129129based on the resources used by the job and their peak speeds.
    130130
    131 If the job is finished in elapsed time T,
     131When the job is finished in elapsed time T,
    132132we define peak_flop_count(J), or PFC(J) as
    133133
     
    136136Notes:
    137137
    138  * PFC(J) is not cheat-proof; e.g. cheaters can falsify elapsed time.
     138 * PFC(J) is not cheat-proof;
     139   cheaters can falsify elapsed time or device attributes.
    139140 * We use elapsed time instead of actual device time (e.g., CPU time).
    140141   If a job uses a resource inefficiently
     
    156157but is limited and normalized in the following ways:
    157158
     159== Computing averages ==
     160
     161The policies described below involve computing averages
     162of various quantities.
     163This computation must take into account:
     164
     165 * The quantities being averaged may gradually change over time
     166   (e.g. average job size may change)
     167   and we need to track this.
     168   This done as follows: for the first N samples
     169   (N = ~100 for app versions, ~10 for hosts)
     170   we take the straight average.
     171   After that we use an exponentially-weighted average
     172   (with appropriate parameter for app version and host)
     173
     174 * A given sample may be wildly off,
     175   and we can't let this mess up the average.
     176   Samples after the first are capped at 10 times the current average.
     177
     178 * We keep track of the number of samples,
     179   and use an average only if its number of samples
     180   is above a '''sample threshold'''.
     181
     182== Data ==
     183
     184We maintain the following estimates:
     185
     186 app.min_avg_pfc:: an estimate of the average actual FLOPS for an app
     187   (normalized by wu.fpops_est)
     188 app_version.pfc_avg:: the average of PFC(J)/wu.fpops_est for an app version.
     189 host_app_version.pfc_avg:: for each app version V and host H,
     190   the average of PFC(J)/wu.fpops_est for jobs completed by H using A.
     191
    158192== Sanity check ==
    159193
    160 If PFC(J) is infinite or is > wu.rsc_fpops_bound,
     194If PFC(J) is infinite or is > wu.fpops_bound,
    161195J is assigned a "default PFC" and other processing is skipped.
    162196Default PFC is determined as follows:
    163197
    164  * If min_avg_pfc(A) is defined (see below) then
    165 
    166  D = min_avg_pfc(A) * E(J)
     198 * If app.min_avg_pfc is defined then
     199
     200 D = app.min_avg_pfc * wu.fpops_est
    167201
    168202 * Otherwise
    169203
    170  D = wu.rsc_fpops_est
     204 D = wu.fpops_est
    171205
    172206== Cross-version normalization ==
     
    179213so that the average is the same for each version.
    180214
    181 We maintain the average PFC^mean^(V) of PFC(J)/E(J) for each app version V.
    182 We periodically compute PFC^mean^(CPU) and PFC^mean^(GPU),
    183 and compute X as follows:
     215For each app, we periodically compute cpu_pfc
     216(the weighted average of app_version.pfc over CPU app versions)
     217and similarly gpu_pfc.
     218We then compute X as follows:
    184219
    185220 * If there are only CPU or only GPU versions,
    186    and at least 2 versions are above a sample threshold,
    187    X is the average.
    188 
    189  * If there are both, and at least 1 of each is above a sample
     221   and at least 2 versions are above sample threshold,
     222   X is their average (weighted by # samples).
     223
     224 * If there are both, and at least 1 of each is above sample
    190225   threshold, let X be the min of the averages.
    191226
    192 If X is defined, then for each version V we set
    193 
    194  Scale(V) = (X/PFC^mean^(V))
    195 
    196 An app version V's jobs are scaled by this factor.
    197 
    198 For each app, we maintain min_avg_pfc(A),
    199 the average PFC for the most efficient version of A.
    200 This is an estimate of the app's average actual FLOPS.
     227If X is defined, then for each app version
     228
     229 app_version.pfc_scale = (X/app_version.pfc_avg)
     230
     231The PFC of the app version's jobs are scaled by this factor.
    201232
    202233If X is defined, then we set
    203234
    204  min_avg_pfc(A) = X
    205 
    206 Otherwise, if a version V is above sample threshold, we set
    207 
    208  min_avg_pfc(A) = PFC^mean^(V)
    209 
    210 Notes:
     235 app.min_avg_pfc = X
     236
     237Otherwise, if an app version is above sample threshold, we set
     238
     239 app.min_avg_pfc = app_version.pfc_avg
     240
     241Notes:
     242 * Doesn't host normalization (see below) subsume version normalization?
     243   Not if there are both CPU and GPU versions, because of the "min".
    211244 * Version normalization is only applied if at least two
    212245   versions are above sample threshold.
     
    237270Assume jobs for a given app are distributed uniformly among hosts.
    238271Then the average credit per job should be the same for all hosts.
    239 To ensure this, for each app version V and host H
    240 we maintain PFC^mean^(H, A),
    241 the average of PFC(J)/E(J) for jobs completed by H using A.
    242 
    243 This yields the host scaling factor
    244 
    245  Scale(H) = (PFC^mean^(V)/PFC^mean^(H, A))
     272
     273We scale PFC by the factor
     274
     275 app_version.pfc_avg / host_app_version.pfc_avg
    246276
    247277There are some cases where hosts are not sent jobs uniformly:
     
    251281   jobs to GPUs with more processors.
    252282
    253 The normalization by E(J) handles this
    254 (assuming that wu.fpops_est is set appropriately).
    255 
    256 Notes:
    257  * For some apps, the host normalization mechanism is prone to
     283The normalization by wu.fpops_est handles this.
     284
     285Notes:
     286 * For apps with large variance of job sizes,
     287   the host normalization mechanism is prone to
    258288   a type of cheating called "cherry picking".
    259289   A mechanism for defeating this is described below.
     
    262292   and increases the claimed credit of hosts that are more efficient
    263293   than average.
    264 
    265 == Computing averages ==
    266 
    267 Computation of averages needs to take into account:
    268 
    269  * The quantities being averaged may gradually change over time
    270    (e.g. average job size may change)
    271    and we need to track this.
    272    This done as follows: for the first N samples
    273    (N = ~100 for app versions, ~10 for hosts)
    274    we take the straight average.
    275    After that we use an exponential average
    276    (with appropriate alpha for app version and host)
    277 
    278  * A given sample may be wildly off,
    279    and we can't let this mess up the average.
    280    Non-first samples are capped at 10 times the current average.
    281294
    282295== Anonymous platform ==
     
    290303(-2 for CPU, -3 for NVIDIA GPU, -4 for ATI).
    291304
    292 If min_avg_pfc(A) is defined and
    293 PFC^mean^(H, V) is above a sample threshold,
     305If app.min_avg_pfc is defined and
     306host_app_version.pfc_avg is above sample threshold,
    294307we normalize PFC by the factor
    295308
    296  min_avg_pfc(A)/PFC^mean^(H, V)
     309 app.min_avg_pfc/host_app_version.pfc_avg
    297310
    298311Otherwise the claimed PFC is
    299312
    300  min_avg_pfc(A)*E(J)
    301 
    302 If min_avg_pfc(A) is not defined, the claimed PFC is
    303 
    304  wu.rsc_fpops_est
     313 app.min_avg_pfc(A)*wu.fpops_est
     314
     315If app.min_avg_pfc is not defined, the claimed PFC is
     316
     317 wu.fpops_est
     318
     319Notes:
     320
     321 * We don't assume that anonymous platform apps on
     322   different hosts but with the same platform and resource type
     323   are comparable.
    305324
    306325== Summary ==
     
    309328
    310329 * the "claimed PFC" F
    311  * a flag "approx" that is true if F
    312    is an approximation and may not be comparable
    313    with other instances of the job
     330 * a flag "approx" that is true if F is an approximation
     331   and may not be comparable with other instances of the job
    314332
    315333The algorithm:
     
    317335 pfc = peak FLOP count(J)
    318336 approx = true;
    319  if pfc > wu.rsc_fpops_bound
    320    if min_avg_pfc(A) is defined
    321      F = min_avg_pfc(A) * E(J)
     337 if pfc > wu.fpops_bound
     338   if app.min_avg_pfc is defined
     339     F = app.min_avg_pfc * wu.fpops_est
    322340   else
    323      F = wu.rsc_fpops_est
     341     F = wu.fpops_est
    324342 else
    325343   if job is anonymous platform
    326      hav = host_app_version record
    327          if min_avg_pfc(A) is defined
    328        if hav.pfc.n > threshold
     344         if app.min_avg_pfc is defined
     345       if host_app_version.pfc_avg is above sample threshold
    329346             approx = false
    330              F = min_avg_pfc(A) /hav.pfc.avg
     347             F = app.min_avg_pfc / host_app_version.pfc_avg
    331348           else
    332              F = min_avg_pfc(A) * E(J)
     349             F = app.min_avg_pfc * wu.fpops_est
    333350     else
    334            F = wu.rsc_fpops_est
     351           F = wu.fpops_est
    335352   else
    336353     F = pfc;
     
    344361The claimed credit of a job (in Cobblestones) is
    345362
    346  C = F* 200/86400e9
     363 C = F * 200/86400e9
    347364
    348365If replication is not used, this is the granted credit.
     
    353370Otherwise:
    354371
    355  if min_avg_pfc(A) is defined
    356    C = min_avg_pfc(A)*E(J)
     372 if app.min_avg_pfc is defined
     373   C = app.min_avg_pfc*wu.fpops_est
    357374 else
    358    C = wu.rsc_fpops_est * 200/86400e9
     375   C = wu.fpops_est * 200/86400e9
    359376
    360377== Cross-project version normalization ==
     
    505522Unrelated to the credit proposal, but in a similar spirit.
    506523The server will maintain ET^mean^(H, V), the statistics of
    507 job runtimes (normalized by wu.rsc_fpops_est) per
     524job runtimes (normalized by wu.fpops_est) per
    508525host and application version.
    509526
    510527The server's estimate of a job's runtime is then
    511528
    512  R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V)
     529 R(J, H) = wu.fpops_est * ET^mean^(H, V)
    513530
    514531
     
    522539int    app_version_id;          // generalized for anon platform
    523540AVERAGE pfc;
    524 AVERAGE_VAR et;                         // elapsed time / wu.rsc_fpops_est
     541AVERAGE_VAR et;                         // elapsed time / wu.fpops_est
    525542double host_scale_time;
    526543bool scale_probation;