wiki:GpuWorkFetch

Version 4 (modified by davea, 15 years ago) (diff)

--

Work fetch and GPUs

Current policy

  • Weighted round-robin simulation
    • get per-project and overall CPU shortfalls
    • see what misses deadline
  • If overall shortfall, get work from project with highest LTD
  • Scheduler request includes just "work_req_seconds".

Problems:

There may be no CPU shortfall, but GPU is idle

If GPU is idle, we should get work from a project that potentially has jobs for it.

If the project has both CPU and GPU jobs, we may need to tell to send only GPU jobs.

LTD isn't meaningful with GPUs

New policy

Notion of "processor type": CPU is 1 type, each coproc is another.

Keep track of which projects can use which processor type. Data structure, per (project, processor type):

  • LTD
  • last time project sent a job that used this type
  • shortfall

Round-robin simulator computes:

  • for each proc type:
    • overall shortfall
    • for each project
      • shortfall (determines work req)
      • max idle instances

Scheduler request includes:

  • for each proc type, # idle, and # of seconds to fill
  • still includes work_req_seconds (for backwards compat)

Work fetch:

  • for each proc type (start with coprocs)
    • if shortfall

CPU sched policy