Changes between Version 3 and Version 4 of CreditNew


Ignore:
Timestamp:
Nov 3, 2009, 9:25:51 AM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • CreditNew

    v3 v4  
    77For GPUs, it's given by a manufacturer-supplied formula.
    88
    9 Applications access memory,
     9However, other factors affect application performance.
     10For example, applications access memory,
    1011and the speed of a host's memory system is not reflected
    1112in its Whetstone score.
     
    1516is the ratio of actual FLOPS to peak FLOPS.
    1617
    17 GPUs typically have a much higher (50-100X) peak speed than GPUs.
     18GPUs typically have a much higher (50-100X) peak FLOPS than GPUs.
    1819However, application efficiency is typically lower
    1920(very roughly, 10% for GPUs, 50% for CPUs).
     
    2930   about the same amount of credit per day for a given host.
    3031
    31 It's easy to show that both goals can't be satisfied simultaneously
    32 when there is more than one type of processing resource.
     32It's easy to show that both goals can't be satisfied simultaneously.
    3333
    3434== The first credit system ==
     
    4040}}}
    4141There were then various schemes for taking the
    42 average or min of the claimed credit of the replicas of a job,
     42average or min claimed credit of the replicas of a job,
    4343and using that as the "granted credit".
    4444
     
    6565We call this approach "Actual-FLOPs-based".
    6666
    67 SETI@home had an application that allowed counting of FLOPs,
    68 and they adopted this system.
    69 They added a scaling factor so that the average credit per job
     67SETI@home's application allowed counting of FLOPs,
     68and they adopted this system,
     69adding a scaling factor so that average credit per job
    7070was the same as the first credit system.
    7171
     
    8484== Goals of the new (third) credit system ==
    8585
    86  * Completely automate credit - projects don't have to
     86 * Completely automated - projects don't have to
    8787   change code, settings, etc.
    8888
     
    9090
    9191 * Limited project neutrality: different projects should grant
    92    about the same amount of credit per CPU hour,
    93    averaged over hosts.
     92   about the same amount of credit per CPU hour, averaged over hosts.
    9493   Projects with GPU apps should grant credit in proportion
    9594   to the efficiency of the apps.
     
    9998== Peak FLOP Count (PFC) ==
    10099
    101 This system goes back to the Peak-FLOPS-based approach,
     100This system uses the Peak-FLOPS-based approach,
    102101but addresses its problems in a new way.
    103102
     
    126125   For now, though, we'll just use the scheduler's estimate.
    127126
    128 The idea of the system is that granted credit for a job J is proportional to PFC(J),
     127The granted credit for a job J is proportional to PFC(J),
    129128but is normalized in the following ways:
    130129
    131130== Cross-version normalization ==
    132131
    133 
    134132If a given application has multiple versions (e.g., CPU and GPU versions)
    135 the average granted credit is the same for each version.
     133the granted credit per job is adjusted
     134so that the average is the same for each version.
    136135The adjustment is always downwards:
    137 we maintain the average PFC*(V) of PFC() for each app version,
    138 find the minimum X,
    139 then scale each app version's jobs by (X/PFC*(V)).
    140 The result is called "Version-Normalized Peak FLOP Count", or VNPFC(J).
    141 
    142 Notes:
    143  * This mechanism provides device neutrality.
     136we maintain the average PFC*(V) of PFC() for each app version V,
     137find the minimum X.
     138An app version V's jobs are then scaled by the factor
     139{{{
     140S(V) = (X/PFC*(V))
     141}}}
     142
     143The result for a given job J
     144is called "Version-Normalized Peak FLOP Count", or VNPFC(J):
     145{{{
     146VNPFC(J) = PFC(J) * (X/PFC*(V))
     147}}}
     148
     149Notes:
    144150 * This addresses the common situation
    145151   where an app's GPU version is much less efficient than the CPU version
     
    150156   It's not exactly "Actual FLOPs", since the most efficient
    151157   version may not be 100% efficient.
    152  * Averages are computed as a moving average,
    153    so that the system will respond quickly as job sizes change
    154    or new app versions are deployed.
    155158
    156159== Cross-project normalization ==
     
    158161If an application has both CPU and GPU versions,
    159162then the version normalization mechanism uses the CPU
    160 version as a "sanity check" to limit the credit granted for GPU jobs.
     163version as a "sanity check" to limit the credit granted to GPU jobs.
    161164
    162165Suppose a project has an app with only a GPU version,
    163166so there's no CPU version to act as a sanity check.
    164167If we grant credit based only on GPU peak speed,
    165 the project will grant much more credit per GPU hour than
    166 other projects, violating limited project neutrality.
    167 
    168 The solution to this is: if an app has only GPU versions,
    169 then we scale its granted credit by the average scaling factor
    170 for that GPU type among projects that
    171 do have both CPU and GPU versions.
     168the project will grant much more credit per GPU hour than other projects,
     169violating limited project neutrality.
     170
     171A solution to this: if an app has only GPU versions,
     172then for each version V we let
     173S(V) be the average scaling factor
     174for that GPU type among projects that do have both CPU and GPU versions.
    172175This factor is obtained from a central BOINC server.
     176V's jobs are then scaled by S(V) as above.
    173177
    174178Notes:
    175179
    176180 * Projects will run a periodic script to update the scaling factors.
    177  * Rather than GPU type, we'll actually use plan class,
     181 * Rather than GPU type, we'll probably use plan class,
    178182   since e.g. the average efficiency of CUDA 2.3 apps may be different
    179    from that of CUDA 2.1 apps.
     183   than that of CUDA 2.1 apps.
    180184 * Initially we'll obtain scaling factors from large projects
    181185   that have both GPU and CPU apps (e.g., SETI@home).
    182    Eventually we'll use an average (weighted by work done) over multiple projects.
     186   Eventually we'll use an average (weighted by work done) over multiple projects
     187   (see below).
    183188
    184189== Host normalization ==
    185190
    186 For a given application, all hosts should get the same average granted credit per job.
     191For a given application,
     192all hosts should get the same average granted credit per job.
    187193To ensure this, for each application A we maintain the average VNPFC*(A),
    188194and for each host H we maintain VNPFC*(H, A).
     
    201207   some (presumably larger) jobs to GPUs with more processors.
    202208   To deal with this, we can weight jobs by workunit.rsc_flops_est.
     209
     210== Computing averages ==
     211 * Averages are computed as a moving average,
     212   so that the system will respond quickly as job sizes change
     213   or new app versions are deployed.
     214
     215== Jobs versus app units ==
     216
     217== Cross-project scaling factors ==
    203218
    204219== Replication and cheating ==
     
    262277double min_avg_vnpfc;           // min value of app_version.avg_vnpfc
    263278}}}
     279