Changes between Version 1 and Version 2 of JobPrioritization

09/18/12 18:05:34 (5 years ago)



  • JobPrioritization

    v1 v2  
    1 = Server scheduling improvements =
     1= Server job scheduling =
    3 By default, the BOINC scheduler dispatches jobs in the order returned by
    4 a database select, which is more or less FIFO.
     3This document describes proposed changes to server scheduling policies:
    6 This is non-optimal in the following situations:
     5 * Feeder enumeration order
     6 * Of the jobs in shared mem, which to send to a host
     7 * Which app versions to use
     8 * What deadlines to assign
    8  * If a job fails or times out, a '''retry job''' is created.
    9   If there are lots of sendable jobs already in the DB,
    10   it may be days or weeks before the retry job is dispatched.
    11   During this period, completed replicas are uncredited and take up disk space.
    12  * Jobs in the tail end of a batch should be done faster.
     10These policies work in conjunction with
     11[PortalFeatures batch scheduling policies].
    14 To optimize these situations, there are two policies we can play with:
     13== Quality of service types ==
    16  * The order in which the feeder enumerates jobs from the DB.
    17  * Preferentially sending particular jobs to fast/reliable hosts.
     15We seek to handle the following types of QoS:
    19 BOINC has mechanisms in the [BackendPrograms#feeder feeder]
    20 and [ProjectOptions#Acceleratingretries scheduler] that address these issues
    21 to some extent.
    22 However, these mechanisms are out of date.
    23 This is a proposal for revisions to these mechanisms.
     17 * Non-batch, throughput-oriented, fixed latency bound.
     18 * Batches with long deadlines
     19   (greater then the turnaround time of almost all hosts).
     20 * Batches to be completed "as fast as possible" (AFAP),
     21   with no a priori deadline.
     22 * Batches with a short deadline.
    25 (to be completed)
     24== Goals ==
    27 == Notes ==
     26The goals of the policies include:
    29  * We should eliminate as much config as possible.
    30    There should be no thresholds for turnaround time.
    31    (especially a project-wide one; this should be per app).
    32  * The notion of "reliable host" need not be binary.
    33    Maybe we should do it in terms of order statistics -
    34    50th percentile hosts, 90th percentile, etc.
    35    Note: this is on a per (host, app version) basis.
    36  * We need to think about how this interacts with HR.
     28 * Support the QoS features.
     29 * Avoid assigning "tight" deadlines unnecessarily, because
     30  * Doing so may make it impossible to assign tight deadlines
     31    to jobs that actually need it.
     32  * Tight deadlines often force the client to preempt other jobs,
     33    which irritates some volunteers.
     34 * Avoid long delays between the completion of a job instance
     35   and its validation.
     36   These delays irritate volunteers and increase server disk usage.
     37 * Minimize server configuration.
    38 We need to think carefully about the dispatch model.
    39 In general we have some "special" jobs in cache
    40 and we get RPCs, some from "special" hosts.
    41 Two extreme policies:
     39== Host statistics ==
    43  * Send special jobs only to special hosts.
    44   The danger: a special job may sit in the cache
    45   for a long time, maybe forever.
    46  * If we get a request from a non-special host,
    47   and we can't satisfy it with non-special jobs,
    48   send it special jobs too.
    49   The danger: special jobs may be sent to a slow or unreliable host.
     41We need a way to identify hosts that can turn around jobs quickly
     42and reliably.
     44 * This is a property of (host, app version), not host.
     45 * This is not the same as processor speed.
     46   A host may have high turnaround time for various reasons:
     47  * Large min work buffer size.
     48  * Attached to lots of other projects.
     49  * Long periods of unavailability or network disconneciton.
    51 Compromises are possible;
    52 e.g. we could associate a "min percentile" with each job in cache,
    53 and send a job only to (host, app version) of that percentile or greater.
    54 The min percentile could be decayed over time
    55 so that job would always eventually get sent.
     51We propose the following.
     52For each app A:
     54 * For each (host, app version) let X be the percentile
     55   of turnaround time
     56 * For each (host, app version) let Y be the percentile
     57   of "consecutive valid results" (or +infinity if > 10)
     58   over all active hosts and all current app versions.
     59 * Let P(H, AV) = min(X, Y)
     61This will be computed periodically (say, 24 hours)
     62by a utility program.
     65 * When a new app version is deployed,
     66   the host_app_version records for the previous version should be copied,
     67   on the assumption that hosts reliable for on version
     68   will be reliable for the next.
     70== Batch completion estimation ==
     72The proposed policies require estimates C(B) of batch completion,
     73I'm not sure exactly how to compute these, but
     74 * it should be based on completed and validated jobs rather than
     75   a priori FLOPs estimates.
     76 * it should reflect (host, app version) information
     77   (e.g. turnaround and elapsed time statistics)
     78   for the hosts that have completed jobs,
     79   and for the host population as a whole
     80 * They should be computed by a daemon process,
     81   triggered by the passage time and
     82   by the validation of jobs in the batch.
     85 * C(B) is different from the "logical end time" of the batch
     86   used in batch scheduling.
     87 * For long-deadline batches, C(B) should probably be at least
     88   the original delay bound plus the greatest dispatch time of first
     89   instance of a job.
     90   I.e. if it takes a long time to dispatch the first instances,
     91   adjust the deadline accordingly to avoid creating a deadline crunch.
     93== Proposed feeder policy ==
     95Proposed enumeration order:
     97(LET(J) asc, nretries desc)
     99where LET(J) is the logical end time of the job's batch.
     101== Proposed scheduler policy ==
     103For each processor type T (CPU and GPU) we have a "busy time" BT:
     104the time already committed to high-priority jobs.
     105For a given job J we can compute the estimated runtime R.
     106The earliest we can expect to finish J is then BT + R,
     107so that's the earliest deadline we can assign.
     108Call this MD(J, T).
     110For each app A and processor type T, compute the best app version
     111BAV(A, T) at the start of handling each request.
     113The rough policy then is:
     115for each job J in the array, belonging to batch B
     116        for each usable app version AV of type T
     117                if B is AFAP an there's no estimate yet
     118                        if P(H, AV) > 50%
     119                                send J using AV, with deadline BT + R
     120                else
     121                        x = MD(J, T)
     122                        if x < C(B)
     123                                send J using AV, with deadline C(B)
     124                        else if P(H, AV) > 90%
     125                                send J using AV, with deadline x
     128Make an initial pass through the array
     129sending only jobs that have a percentile requirement.
     133 * The 50% and 90% can be parameterized
     134 * Retries are not handled differently at this level,
     135   although we could add a restriction like sending
     136   them only to top 50% hosts
     137 * In the startup case (e.g. new app) no hosts will be high percentile.
     138   How to avoid starvation?
     139 * I think that score-based scheduling is now deprecated.
     140   The feasibility and/or desirability of a job may depend
     141   on what other jobs we're sending,
     142   so it doesn't make sense to assign it a score in isolation.
     143   It's simpler to scan jobs and make a final decision for each one.
     144   There are a few properties we need to give priority to:
     145   * Limited locality scheduling
     146   * beta jobs
     147   We can handle these in separate passes, as we're doing now.