Changes between Initial Version and Version 1 of AutoFlops


Ignore:
Timestamp:
Aug 28, 2009, 12:32:48 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • AutoFlops

    v1 v1  
     1= Automated estimation of job and app version characteristics =
     2
     3== Goals ==
     4
     5 * eliminate the need for projects to supply FLOPs estimate for jobs
     6 * eliminate the need for projects to supply FLOPS estimates for app versions (in app_plan())
     7
     8== Outline ==
     9
     10=== Server ===
     11For each app, maintain
     12 * flops_avg: the estimated average number of FLOPs used by the app's jobs
     13 * flops_stdev: the standard deviation of the above
     14
     15Note: if a project has different types of jobs for a given app,
     16with widely differing durations,
     17they should create separate apps for them.
     18That will reduce the variance of the estimate.
     19
     20Initially this is set to a high value (e.g. 1 GFLOP-day).
     21
     22Update: whenever a complete job is reported,
     23let x = duration * (host's flops_est).
     24Update flops_avg in a way that favors decrease over increase.
     25That way, hosts that execute the app efficiently (close to peak hardware speed)
     26have a larger weight in the estimate.
     27
     28Job completion time estimate:
     29app.flops_avg / (host's flops_est for this app version)
     30
     31
     32=== client ===
     33for each app version, maintain
     34
     35 * flops_est: dynamic estimate of the real FLOPS of the app version on this host.
     36
     37Initially this is based on the peak hardware speed, i.e. ngpus*(GPU peak FLOPs) + avg_ncus * (whetstone).
     38Update: when a job finishes, let x = (job.flops / duration).
     39Update accordingly (but cap at peak hardware speed).
     40
     41Note: this replaces "duration correction factor".
     42
     43=== protocol ===
     44
     45Request message: add flops_est for each app version
     46
     47== Credit ==
     48
     49Grant each validated job credit proportional to app.flops_avg
     50
     51
     52