Changes between Initial Version and Version 1 of GpuSched


Ignore:
Timestamp:
Oct 10, 2008, 11:11:17 AM (16 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuSched

    v1 v1  
     1= Client CPU/GPU scheduling =
     2
     3Prior to version 6.3, the BOINC client assumed that a running application
     4uses 1 CPU.
     5Starting with version 6.3, this is generalized.
     6 * Apps may use coprocessors (such as GPUs)
     7 * The number of CPUs used by an app may be more or less than one, and it need not be an integer.
     8
     9For example, an app might use 2 CUDA GPUs and 0.5 CPUs.
     10This information is visible in the BOINC Manager.
     11
     12The client's scheduler (i.e., the decision of which apps to run)
     13has been modified to accommodate this diversity of apps.
     14
     15== The way things used to work ==
     16
     17The old scheduling policy:
     18
     19 * Make a list of runnable jobs, ordered by "importance" (as determined by whether the job is in danger of missing its deadline, and the long-term debt of its project).
     20 * Run jobs in order of decreasing importance.  Skip those that would exceed RAM limits.  Keep going until we're running NCPUS jobs.
     21
     22There's a bit more to it than that - e.g., we avoid preempting jobs that haven't checkpoint -
     23but that's the basic idea.
     24
     25== How things work in 6.3 ==
     26
     27Suppose we're on a machine with 1 CPU and 1 GPU,
     28and that we have the following runnable jobs (in order of decreasing importance):
     29{{{
     301) 1 CPU, 0 GPU
     312) 1 CPU, 0 GPU
     323) .5 CPU, 1 GPU
     33}}}
     34
     35What should we run?
     36If we use the old policy we'll just run 1), and the GPU will be idle.
     37This is bad - the GPU typically is 50X faster than the CPU,
     38and it seems like we should use it if at all possible.
     39
     40This leads to the following policy:
     41
     42
     43== Unresolved issues ==
     44
     45Apps that use GPUs use the CPU as well.
     46The CPU part typically is a polling loop:
     47it starts a "kernel" on the GPU,
     48waits for it to finish (checking once per .01 sec, say)
     49then starts another kernel.
     50
     51If there's a delay between when the kernel finishes
     52and when the CPU starts another one,
     53the GPU sits idle and the entire program runs slowly.
     54