Scheduler job matchmaking

Currently the scheduler's work-sending algorithm is:

  • if host is reliable, scan entire job array, looking for retries
  • if using HR, scan entire job array, looking for jobs committed to this HR class
  • if still need work, scan entire job array with no restrictions

This is bad for several reasons:

  • inefficient: repeated array scans
  • inflexible policy: why should reliable be more important than HR?
  • hard to add new policies, like sending "hard" jobs to fasts hosts

To solve these problems we're going to change this part of the scheduler. Basic idea: given a job J and a host H, there is a function V(J, H) that represents the value of sending J to H. V(J, H) might reflect various factors:

  • the computational "hardness" of J
    • CPU/deadline ratio
    • RAM or disk requirements
  • H already has files required by J(or jobs already planned for sending have such files)
  • J is a retry and H is a fast/reliable host
  • J has already been committed to H's HR class

BOINC includes a default value function. Projects can tweak its weights, or define their own value function.



checks if a set of jobs is feasible (no DB access)

  • disk usage
  • deadline check (EDF sim or crude)
  • one result per user/host (no DB)
feasibility checks that can be done with no DB access
  • WU committed to different platform (no DB check)
  • app filtering
  • memory usage
feasibility checks that need DB access
  • one result per user or host per WU (look in DB)
  • WU committed to different platform (look in DB)


scan at least this many slots (if scan N slots and have enough jobs to send, stop)
scan at most this many slots (even if don't have enough jobs to send yet)
if scan this many locked slots, print warning msg (should increase shmem size)


acquire semaphore
i = random index in shmem
x = ordered list of jobs to send (empty)
slots_scanned = 0
slots_locked = 0
    i = i+1 % array_size
    if slots_scanned > M
    if shmem[i] is empty
    if shmem[i] is locked
    j = shmem[i]
    if !job_feasible_fast(j) continue;
    v = v(h, j)
    if v <= lowest score in x
    S = jobs in x with value >= v
    if !job_set_feasible(S+j)
    lock j
    release semaphore
    if !job_feasible_slow(j)
        acquire semaphore
        unlock j
    acquire semaphore
    add j to x
    while (x minus lowest-val element) satisfies work request
        remove lowest-val element of x
    while !job_set_feasible(x)
        remove lowest-value element of x
    if x satisfies work request and slots_scanned >= N
for each job j in x
    mark slot j as empty
release semaphore
if slots_locked > L
    print "need bigger array" message
Last modified 14 years ago Last modified on Feb 29, 2008, 2:44:43 PM