Changes between Initial Version and Version 1 of ClientSchedOctTen


Ignore:
Timestamp:
Oct 26, 2010, 12:13:24 PM (14 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ClientSchedOctTen

    v1 v1  
     1= Client scheduling changes =
     2
     3Design document for changes to the client work fetch and job scheduling policies,
     4started Oct 2010.
     5
     6This supercedes the following design docs:
     7 * GpuWorkFetch
     8 * GpuSched
     9 * ClientSched
     10
     11== Problems with current system ==
     12
     13The current policies, described [GpuWorkFetch here],
     14maintain long- and short-term debts for each
     15(project, resource type) pair.
     16
     17Job scheduling for a given resource type is based on STD.
     18Projects with greater STD for the resource are given priority.
     19
     20Work fetch is based on a weighted sum of LTDs.
     21Work is typically fetched from the project for which this sum is greatest,
     22and typically work is requested for all resource types.
     23
     24These policies fail to meet their goals in many cases.
     25Here are two scenarios that illustrate the underlying problems:
     26
     27=== Example 1 ===
     28
     29A host has a fast GPU and a slow CPU.
     30Project A has apps for both GPU and CPU.
     31Project B has apps only for CPU.
     32Equal resource shares.
     33
     34In the current system each project will get 50% of the CPU.
     35The target behavior, which matches resource shares better,
     36is that project B gets 100% of the CPU
     37and project A gets 100% of the GPU.
     38
     39=== Example 2 ===
     40
     41Same host.
     42Additional project C has only CPU apps.
     43
     44In this case A's CPU LTD will stay around zero,
     45and the CPU LTD for B and C goes unboundedly negative,
     46and gets clamped at the cutoff.
     47All information about the relative debt of B and C is lost.