Changes between Version 6 and Version 7 of CreditNew


Ignore:
Timestamp:
Nov 3, 2009, 2:37:20 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v6 v7  
    1616is the ratio of actual FLOPS to peak FLOPS.
    1717
    18 GPUs typically have a much higher (50-100X) peak FLOPS than GPUs.
     18GPUs typically have a much higher (50-100X) peak FLOPS than CPUs.
    1919However, application efficiency is typically lower
    2020(very roughly, 10% for GPUs, 50% for CPUs).
     
    156156   It's not exactly "Actual FLOPs", since the most efficient
    157157   version may not be 100% efficient.
     158 * There are two sources of variance in PFC(V):
     159   the variation in host efficiency,
     160   and possibly the variation in job size.
     161   If we have an ''a priori'' estimate of job size
     162   (e.g., workunit.rsc_fpops_est)
     163   we can normalize by this to reduce the variance,
     164   and make PFC*(V) converge more quickly.
     165 * ''a posteriori'' estimates of job size may exist also
     166   (e.g., an iteration count reported by the app)
     167   but using this for anything introduces a new cheating risk,
     168   so it's probably better not to.
     169
    158170
    159171== Cross-project normalization ==
     
    190202
    191203Assuming that hosts are sent jobs for a given app uniformly,
    192 then for a given app
     204then, for that app,
    193205hosts should get the same average granted credit per job.
    194206To ensure this, for each application A we maintain the average VNPFC*(A),
     
    200212
    201213There are some cases where hosts are not sent jobs uniformly:
    202  * job-size matching
     214 * job-size matching (smaller jobs sent to slower hosts)
    203215 * GPUGrid.net's scheme for sending some (presumably larger)
    204216   jobs to GPUs with more processors.
    205 In these cases we must scale
     217In these cases average credit per job must differ between hosts,
     218according to the types of jobs that are sent to them.
     219
     220This can be done by dividing
     221each sample in the computation of VNPFC* by WU.rsc_fpops_est
     222(in fact, there's no reason not to always do this).
    206223
    207224Notes:
    208  * This mechanism reduces the claimed credit of hosts
     225 * The host normalization mechanism reduces the claimed credit of hosts
    209226   that are less efficient than average,
    210227   and increases the claimed credit of hosts that are more efficient
     
    269286}}}
    270287
    271 == Jobs versus app units ==
    272    To deal with this, we can weight jobs by workunit.rsc_flops_est.
    273 
    274 If a project changes between jobs to app units,
    275 it must reset
    276 
    277288== Cross-project scaling factors ==
    278289
     
    288299granted credit = claimed credit.
    289300
    290 For jobs that are replicated, granted credit is be
     301For jobs that are replicated, granted credit should be
    291302set to the min of the valid results
    292303(min is used instead of average to remove the incentive
     
    315326== Job runtime estimates ==
    316327
     328Unrelated to the credit proposal, but in a similar spirit.
     329The server will maintain ET*(H, V), the statistics of
     330job runtimes (normalized by wu.rsc_fpops_est) per
     331host and application version.
     332
     333The server's estimate of a job's runtime is then
     334{{{
     335R(J, H) = wu.rsc_fpops_est * ET*(H, V)
     336}}}
     337
    317338== Implementation ==
    318339