Changes between Version 1 and Version 2 of GpuSched


Ignore:
Timestamp:
Oct 13, 2008, 10:33:23 AM (16 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuSched

    v1 v2  
    11= Client CPU/GPU scheduling =
    22
    3 Prior to version 6.3, the BOINC client assumed that a running application
    4 uses 1 CPU.
     3Prior to version 6.3, the BOINC client assumed that each running application uses 1 CPU.
    54Starting with version 6.3, this is generalized.
    6  * Apps may use coprocessors (such as GPUs)
     5 * Apps may use coprocessors (such as GPUs).
    76 * The number of CPUs used by an app may be more or less than one, and it need not be an integer.
    87
     
    1514== The way things used to work ==
    1615
    17 The old scheduling policy:
     16The old scheduling policy is:
    1817
    19  * Make a list of runnable jobs, ordered by "importance" (as determined by whether the job is in danger of missing its deadline, and the long-term debt of its project).
     18 * Order runnable jobs by "importance" (determined by whether the job is in danger of missing its deadline, and the long-term debt of its project).
    2019 * Run jobs in order of decreasing importance.  Skip those that would exceed RAM limits.  Keep going until we're running NCPUS jobs.
    2120
    22 There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint -
     21There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpointed recently -
    2322but that's the basic idea.
    2423
    2524== How things work in 6.3 ==
    2625
    27 Suppose we're on a machine with 1 CPU and 1 GPU,
     26The main design goal of the new scheduler is to use all resources.
     27In particular, we try to always use the GPU even if that means
     28overcommitting the CPU.
     29"Overcommitting" means running a set of apps whose demand for for CPUs exceeds
     30the actual number of CPUs.
     31
     32The new policy is:
     33 * Scan the set of runnable jobs in decreasing order of importance.
     34 * If a job uses a resource that's not already fully utilized, and fits in RAM, run it.
     35
     36Example: suppose we're on a machine with 1 CPU and 1 GPU,
    2837and that we have the following runnable jobs (in order of decreasing importance):
    2938{{{
     
    3847and it seems like we should use it if at all possible.
    3948
    40 This leads to the following policy:
     49The new policy will do the following:
    4150
     51 * Run job 1.
     52 * Skip job 2 because the CPU is already fully utilized.
     53 * Run job 3 because the GPU is not fully utilized.
     54
     55So we end up running jobs whose CPU demand is 1.5.
     56That's OK - they just run slower than if running alone.
    4257
    4358== Unresolved issues ==
     
    5368the GPU sits idle and the entire program runs slowly.
    5469
     70The CPU scheduler on Windows doesn't work well,
     71and when the CPU is overcommitted the CPU part of GPU applications
     72doesn't run as often as it needs to in order to keep the GPU "fed".
     73As a result the GPU is underutilized and the program runs slowly.
     74(This seems to happen even if the GPU app is run at high priority
     75while other apps run at low priority).
     76
     77If we can't resolve this we'll have to change the scheduling policy
     78to avoid overcommitting the CPU in the presence of GPU apps.