wiki:ClientSchedOctTen

Version 1 (modified by davea, 14 years ago) (diff)

--

Client scheduling changes

Design document for changes to the client work fetch and job scheduling policies, started Oct 2010.

This supercedes the following design docs:

Problems with current system

The current policies, described here, maintain long- and short-term debts for each (project, resource type) pair.

Job scheduling for a given resource type is based on STD. Projects with greater STD for the resource are given priority.

Work fetch is based on a weighted sum of LTDs. Work is typically fetched from the project for which this sum is greatest, and typically work is requested for all resource types.

These policies fail to meet their goals in many cases. Here are two scenarios that illustrate the underlying problems:

Example 1

A host has a fast GPU and a slow CPU. Project A has apps for both GPU and CPU. Project B has apps only for CPU. Equal resource shares.

In the current system each project will get 50% of the CPU. The target behavior, which matches resource shares better, is that project B gets 100% of the CPU and project A gets 100% of the GPU.

Example 2

Same host. Additional project C has only CPU apps.

In this case A's CPU LTD will stay around zero, and the CPU LTD for B and C goes unboundedly negative, and gets clamped at the cutoff. All information about the relative debt of B and C is lost.