Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of CreditNew

Timestamp:: Oct 30, 2009, 2:35:19 PM (15 years ago)
Author:: davea
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

CreditNew

                       v1
+= New credit system design =
+== Introduction ==
+We can estimate the peak FLOPS of a given processor.
+For CPUs, this is the Whetstone benchmark score.
+For GPUs, it's given by a manufacturer-supplied formula.
+Applications access memory,
+and the speed of a host's memory system is not reflected
+in its Whetstone score.
+So a given job might take the same amount of CPU time
+and a 1 GFLOPS host as on a 10 GFLOPS host.
+The "efficiency" of an application running on a given host
+is the ratio of actual FLOPS to peak FLOPS.
+GPUs typically have a much higher (50-100X) peak speed than GPUs.
+However, application efficiency is typically lower
+(very roughly, 10% for GPUs, 50% for CPUs).
+== The first credit system ==
+In the first iteration of credit system, "claimed credit" was defined as
+{{{
+C1 = H.whetstone * J.cpu_time
+}}}
+There were then various schemes for taking the
+average or min of the claimed credit of the
+replicas of a job, and using that as the "granted credit".
+We call this system "Peak-FLOPS-based" because
+it's based on the CPU's peak performance.
+The problem with this system is that, for a given app version,
+efficiency can vary widely.
+In the above example,
+host B would claim 10X as much credit,
+and its owner would be upset when it was granted
+only a tenth of that.
+Furthermore, the credits granted to a given host for a
+series of identical jobs could vary widely,
+depending on the host it was paired with by replication.
+So host neutrality was achieved,
+but in a way that seemed arbitrary and unfair to users.
+== The second credit system ==
+To address the problems with host neutrality,
+we switched to the philosophy that
+credit should be proportional to number of FLOPs actually performed
+by the application.
+We added API calls to let applications report this.
+We call this approach "Actual-FLOPs-based".
+SETI@home had an application that allowed counting of FLOPs,
+and they adopted this system.
+They added a scaling factor so that the average credit
+was about the same as in the first credit system.
+Not all projects could count FLOPs, however.
+So SETI@home published their average credit per CPU second,
+and other projects continued to use benchmark-based credit,
+but multiplied it by a scaling factor to match SETI@home's average.
+This system had several problems:
+ * It didn't address GPUs.
+ * project that couldn't count FLOPs still had host neutrality problem
+ * didn't address single replication
+== Goals of the new (third) credit system ==
+ * Completely automate credit - projects don't have to
+   change code, settings, etc.
+ * Device neutrality: similar jobs should get similar credit
+   regardless of what processor or GPU they run on.
+ * Limited project neutrality: different projects should grant
+   about the same amount of credit per CPU hour,
+   averaged over hosts.
+   Projects with GPU apps should grant credit in proportion
+   to the efficiency of the apps.
+   (This means that projects with efficient GPU apps will
+   grant more credit on average.  That's OK).
+== Peak FLOP Count (PFC) ==
+This system uses to the Peak-FLOPS-based approach,
+but addresses its problems in a new way.
+When a job is issued to a host, the scheduler specifies usage(J,D),
+J's usage of processing resource D:
+how many CPUs, and how many GPUs (possibly fractional).
+If the job is finished in elapsed time T,
+we define peak_flop_count(J), or PFC(J) as
+{{{
+PFC(J) = T * (sum over devices D (usage(J, D) * peak_flop_rate(D))
+}}}
+Notes:
+ * We use elapsed time instead of actual device time (e.g., CPU time).
+   If a job uses a resource inefficiently
+   (e.g., a CPU job that does lots of disk I/O)
+   PFC() won't reflect this.  That's OK.
+ * usage(J,D) may not be accurate; e.g., a GPU job may take
+   more or less CPU than the scheduler thinks it will.
+   Eventually we may switch to a scheme where the client
+   dynamically determines the CPU usage.
+   For now, though, we'll just use the scheduler's estimate.
+The idea of the system is that granted credit for a job J
+is proportional to PFC(J),
+but is normalized in the following ways:
+== Version normalization ==
+If a given application has multiple versions (e.g., CPU and GPU versions)
+the average granted credit is the same for each version.
+The adjustment is always downwards:
+we maintain the average PFC*(V) of PFC() for each app version,
+find the minimum X,
+then scale each app version's jobs by (X/PFC*(V)).
+The results is called NPFC(J).
+Notes:
+ * This mechanism provides device neutrality.
+ * This addresses the common situation
+   where an app's GPU version is much less efficient than the CPU version
+   (i.e. the ratio of actual FLOPs to peak FLOPs is much less).
+   To a certain extent, this mechanism shifts the system
+   towards the "Actual FLOPs" philosophy,
+   since credit is granted based on the most efficient app version.
+   It's not exactly "Actual FLOPs", since the most efficient
+   version may not be 100% efficient.
+ * Averages are computed as a moving average,
+   so that the system will respond quickly as job sizes change
+   or new app versions are deployed.
+== Project normalization ==
+If an application has both CPU and GPU versions,
+then the version normalization mechanism uses the CPU
+version as a "sanity check" to limit the credit granted for GPU jobs.
+Suppose a project has an app with only a GPU version,
+so there's no CPU version to act as a sanity check.
+If we grant credit based only on GPU peak speed,
+the project will grant much more credit per GPU hour than
+other projects, violating limited project neutrality.
+The solution to this is: if an app has only GPU versions,
+then we scale its granted credit by a factor,
+obtained from a central BOINC server,
+which is based on the average scaling factor
+for that GPU type among projects that
+do have both CPU and GPU versions.
+Notes:
+ * Projects will run a periodic script to update the scaling factors.
+ * Rather than GPU type, we'll actually use plan class,
+   since e.g. the average efficiency of CUDA 2.3 apps may be different
+   from that of CUDA 2.1 apps.
+ * Initially we'll obtain scaling factors from large projects
+   that have both GPU and CPU apps (e.g., SETI@home).
+   Eventually we'll use an average (weighted by work done) over multiple projects.
+== Host normalization ==
+For a given application, all hosts should get the same average granted credit per job.
+To ensure this, for each application A we maintain the average NPFC*(A),
+and for each host H we maintain NPFC*(H, A).
+The "claimed credit" for a given job J is then
+{{{
+NPFC(J) * (NPFC*(A)/NPFC*(H, A))
+}}}
+Notes:
+ * NPFC* is averaged over jobs, not hosts.
+ * Both averages are recent averages, so that they respond to
+   changes in job sizes and app versions characteristics.
+ * This assumes that all hosts are sent the same distribution of jobs.
+   There are two situations where this is not the case:
+   a) job-size matching, and b) GPUGrid.net's scheme for sending
+   some (presumably larger) jobs to GPUs with more processors.
+   To deal with this, we'll weight the average by workunit.rsc_flops_est.
+== Replication and cheating ==
+Host normalization mostly eliminates the incentive to cheat
+by claiming excessive credit
+(i.e., by falsifying benchmark scores or elapsed time).
+An exaggerated claim will increase NPFC*(H,A),
+causing subsequent claimed credit to be scaled down proportionately.
+This means that no special cheat-prevention scheme
+is needed for single replications;
+granted credit = claimed credit.
+For jobs that are replicated, granted credit is be
+set to the min of the valid results
+(min is used instead of average to remove the incentive
+for cherry-picking, see below).
+However, there are still some possible forms of cheating.
+ * One-time cheats (like claiming 1e304) can be prevented by
+   capping NPFC(J) at some multiple (say, 10) of NPFC*(A).
+ * Cherry-picking: suppose an application has two types of jobs,
+        which run for 1 second and 1 hour respectively.
+        Clients can figure out which is which, e.g. by running a job for 2 seconds
+        and seeing if it's exited.
+        Suppose a client systematically refuses the 1 hour jobs
+        (e.g., by reporting a crash or never reporting them).
+        Its NPFC*(H, A) will quickly decrease,
+        and soon it will be getting several thousand times more credit
+        per actual work than other hosts!
+        Countermeasure:
+        whenever a job errors out, times out, or fails to validate,
+        set the host's error rate back to the initial default,
+        and set its NPFC*(H, A) to NPFC*(A) for all apps A.
+        This puts the host to a state where several dozen of its
+        subsequent jobs will be replicated.
+== Implementation ==