Changes between Version 8 and Version 9 of PortalFeatures


Ignore:
Timestamp:
Jul 24, 2011, 11:15:32 PM (13 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PortalFeatures

    v8 v9  
    33= Scheduling for multi-user projects =
    44
    5 This document proposes two related additions to BOINC to facilitate using
    6 a BOINC project as a computing resource for science portals:
    7 
    8  * A mechanism for allocating computing resources among scientists
    9  * A mechanism for handling batches of jobs
    10 
    115== Computing share ==
    126
    137Demand for a portal's computing power may exceed supply,
    14 in which case we need a mechanism for dividing the supply among scientists.
    15 Assume that every job is associated with a particular BOINC user,
     8in which case we need a mechanism for dividing the supply among users.
     9Assume that every job is associated with a particular user,
    1610representing either a scientist or some other organizational entity,
    17 and that each such user has a numeric '''computing share''' representing
     11and that each such user has a numeric '''quota''' representing
    1812the fraction of the computing resource they should get.
    1913
    20 The way in which computing shares are determined is outside our scope,
    21 but some possibilities are:
    22 
    23  * Each scientist attaches their own PCs to the BOINC project,
    24    and their computing share is the recent average credit of these hosts.
    25  * Volunteers attached to the project can assign shares of their resource
    26    to scientists, and a scientist's overall share is the sum of these,
    27    weighted by the volunteer average credit.
    28  * A scientist's share is increased if they contribute to the portal,
    29    e.g. by participating in the message boards.
    30 
    31 The set of users may be large (1000s) and dynamic.
    32 Typically, only a few may have jobs at any given point.
    33 
    34 What are the precise semantics of computing share?
     14What are the precise semantics of quotas?
    3515Ideally the mechanism should accommodate two classes of scientists:
    3616
     
    4424   then submit a batch that takes a day to complete using the entire resource.
    4525
    46 To accommodate both extremes,
    47 we propose a system in which each user has an associated '''debt''',
    48 i.e. how much computation the system owes to that user.
     26=== Scheduling batches ===
    4927
    50 A user's debt continually increases at a rate equal to their computing share.
    51 Debt is capped at a value corresponding to, say,
    52 use of the entire computing resource for 1 week.
     28Scheduling batches involves two decisions:
    5329
    54 When a job completes (successfully or not) its user's debt is decremented
    55 by the amount of computing done
    56 (more precisely, but the amount of computing that would have been done
    57 by the job's resources running at peak speed for the job's elapsed time).
     30 * which jobs to send to which hosts
     31 * what deadlines to assign to jobs
    5832
    59 In this scheme, latency-oriented users can build up a debt that allows their
    60 batches (if they are sufficiently infrequent) to get the full computing resource
    61 and finish as fast as possible.
     33Arnaud Legrand suggested the following general technique
     34for making these decisions at any point during
     35the batch's execution.
    6236
    63 == Batch scheduling ==
     37 * for each host, estimate the delay until it can start processing jobs,
     38   and the rate at which it can process jobs
     39 * find the minimum time T at which, if all hosts begin processing
     40   jobs as soon as possible, the batch will be completed
     41 * Let H be the set of hosts which complete at least one job by time T.
     42 * Send jobs only to hosts in H, and give them deadline T.
    6443
    65 The second requirement of science portals is support for '''batches''':
    66 groups of jobs that have value only when all of the jobs have been completed.
    67 The interval between batch submission and completion is called '''makespan'''.
     44This algorithm can be repeated as jobs complete or time out,
     45and as new hosts arrive.
    6846
    69 === Minimizing makespan ===
    70 
    71 BOINC's scheduler should be modified to reduce makespan.
    72 The scheduler has control over two things:
    73 
    74  * what hosts to send which job instances to (and when to create new instances)
    75  * what deadline to assign to each job
    76 
    77 A variety of techniques are possible:
    78 
    79  * Use slower hosts for the first jobs of a batch,
    80    with the expectation that the host will finish one job
    81    in the duration of the batch
    82    (and by extension, very slow hosts should possibly
    83    not get any jobs for some batches).
    84  * Use faster, more reliable hosts for the last jobs of a batch.
    85  * Use increased replication for the last jobs of a batch
    86    to reduce the expected minimum turnaround.
    87 
    88 These techniques can be combined into policies.
    89 To study the relative performance of these policies,
    90 we can develop a simulator that takes as input a description
    91 of a real volunteer host population (e.g., SETI@home)
    92 and the parameters of a random batch arrival process,
    93 and computes the statistics of makespan,
    94 and perhaps the difference between predicted and actual makespan (see below).
    95 
    96 === Estimating makespan ===
    97 
    98 Given the description of a possible new batch,
    99 the system should be able to provide a reasonably accurate estimate
    100 of its makespan (perhaps with error bars).
    101 The computation of this estimate must take into account
    102 
    103  * The computing resource (i.e. the distributions of speed and availability
    104    of the volunteer hosts, and their number).
    105  * The existing workload (batches currently in progress, and their state)
    106  * The parameters of the batch
    107  * The computing share and debt of the requesting user
    108  * The batch scheduling policy
    109 
    110 In addition, the system should provide users with a web-based interface
    111 for viewing batches in progress,
    112 showing their fraction done,
    113 an updated estimate of their completion time,
    114 information about errors,
    115 and a button for aborting the batch.
    116 
    117 === Job DAGs ===
    118 
    119 Batches are a special case of Directed Acyclic Graphs (DAGs) of jobs.
    120 The nodes in such a graph represent jobs;
    121 there is an edge from job A to job B if A must be completed
    122 before B can be started.
    123 
    124 A batch can be viewed as a graph consisting of
    125 a "start node" fanning out to N job nodes,
    126 then fanning back in to a "finish node".
    127 
    128 Science portals may offer the ability to submit collections of work
    129 that can be described as DAGs but not as batches:
    130 for example, a collection of units, each of which consists of
    131 running program A and using its output as input to program B.
    132 
    133 (Note: BOINC's wrapper mechanism, which allows multiple applications
    134 to be run in sequence as part of a BOINC job,
    135 supports some simple cases, but it not general because it
    136 requires dependent jobs to run on a single host).
    137 
    138 As an extension of the work described above,
    139 we could support DAGs rather than just batches.
    140 
    141 == Combining multiple scheduling types ==
    142 
    143 BOINC potentially supports the following types of work,
    144 each with its own scheduling mechanism:
    145 
    146  * Job-stream: the input is a stream of jobs;
    147    the goal is to maximize throughput and eventually finish all jobs.
    148  * Batch: described above.
    149  * Locality: a variant of job-stream in which
    150    jobs are assigned according to the files resident on the client.
    151  * Co-scheduling: like batch, except all jobs in each batch
    152    must run simultaneously (e.g., MPI applications).
    153 
    154 A single project (such as a portal) may have work of some or all types.
    155 
    156 How should we handle multiple scheduling types -
    157 in particular, how can we maintain resource shares in their presence?
    158 
    159 One simple approach is to maintain, for each scheduling type T,
    160 the maximum debt D(T) of any user with an unsent job of type T.
    161 Then use this top-level policy:
    162 {{{
    163 for each scheduling type T in order of descending D(T)
    164         send jobs of type T
    165 }}}
    166 How to maintain D(T):
    167 
    168  * Job-stream: maintain in the feeder (consider only jobs in the cache)
    169  * Locality: constraint: only 1 user can use locality scheduling
    170  * Batch: maintain in batch module
    171  * Co-scheduling: maintain in co-scheduling module
    172 
    173 == Maintaining debts ==
    174 
    175 A user's debt is adjusted each time a job is dispatched,
    176 based on the estimated FLOPS of the job.
    177 This quantity is recorded in the job record.
    178 
    179 When a job finishes or times out
    180 (at which point the actual computing done is known)
    181 the user debt is adjusted accordingly.
    182 
    183 == An idea for prioritizing batches ==
     47== Prioritizing batches ==
    18448
    18549Goals:
     
    23498job as a 1-element batch.
    23599
    236 The scheme doesn't use the idea of
    237 accumulated credit proposed above.
    238 This is replaced by LST(U).
     100== Combining multiple scheduling types ==
     101
     102BOINC potentially supports the following types of work,
     103each with its own scheduling mechanism:
     104
     105 * Job-stream: the input is a stream of jobs;
     106   the goal is to maximize throughput and eventually finish all jobs.
     107 * Batch: described above.
     108 * Locality: a variant of job-stream in which
     109   jobs are assigned according to the files resident on the client.
     110 * Co-scheduling: like batch, except all jobs in each batch
     111   must run simultaneously (e.g., MPI applications).
     112
     113A single project (such as a portal) may have work of some or all types.
     114
     115How should we handle multiple scheduling types -
     116in particular, how can we maintain resource shares in their presence?
     117
     118One simple approach is to maintain, for each scheduling type T,
     119the least LET(T) of any unsent job of type T.
     120Then use this top-level policy:
     121{{{
     122for each scheduling type T in order of increasing LET(T)
     123        send jobs of type T
     124}}}
     125How to maintain LET(T):
     126
     127 * Job-stream: maintain in the feeder (consider only jobs in the cache)
     128 * Locality: constraint: only 1 user can use locality scheduling
     129 * Batch: maintain in batch module
     130 * Co-scheduling: maintain in co-scheduling module
     131