Changes between Version 1 and Version 2 of ClientSched


Ignore:
Timestamp:
Apr 25, 2007, 12:47:16 PM (17 years ago)
Author:
Nicolas
Comment:

Required manual changes to automatic conversion.

Legend:

Unmodified
Added
Removed
Modified
  • ClientSched

    v1 v2  
    11= Client scheduling policies =
    22
    3        This document describes three related parts of the BOINC core client:  '''CPU scheduling policy'''::
     3This document describes three related parts of the BOINC core client:
     4
     5 '''CPU scheduling policy'''::
    46        Of the results that are runnable, which ones to execute? BOINC will generally execute NCPUS results at once, where NCPUS is the minimum of the physical number of CPUs (counting hyperthreading) and the user's 'max_cpus' general preference.
    57 '''CPU scheduling enforcement'''::
     
    79 '''Work-fetch policy'''::
    810        When should the core client ask a project for more work, which project should it ask, and how much work should it ask for?
     11
    912The goals of these policies are (in descending priority):
    10 
    1113
    1214 1. Results should be completed and reported by their deadline (because results reported after their deadline may not have any value to the project and may not be granted credit).
     
    1921In previous versions of BOINC, the core client attempted to maintain at least one result for each attached project, and would do weighted round-robin CPU scheduling among all projects. In some scenarios (any combination of slow computer, lots of projects, and tight deadlines) a computer could miss the deadlines of all its results. The new policies solve this problem as follows:
    2022
    21 
    2223 * Work fetch is limited to ensure that deadlines can be met. A computer attached to 10 projects might have work for only a few (perhaps only one) at a given time.
    2324 * If deadlines are threatened, the CPU scheduling policy optimizes the likelihood of meeting deadlines, at the expense of variety.
    2425
    25 
    2626== Concepts and terms ==
    2727
    2828=== Wall CPU time ===
    29  '''Wall CPU time''' is the amount of wall-clock time a process has been runnable at the OS level. The actual CPU time may be less than this, e.g. if the process does a lot of paging, or if other (non-BOINC) processing jobs run at the same time.
     29'''Wall CPU time''' is the amount of wall-clock time a process has been runnable at the OS level. The actual CPU time may be less than this, e.g. if the process does a lot of paging, or if other (non-BOINC) processing jobs run at the same time.
    3030
    3131BOINC uses wall CPU time as the measure of CPU resource usage. Wall CPU time is more fair than actual CPU time in the case of paging apps. In addition, the measurement of actual CPU time depends on apps to report it correctly, and they may not do this.
    3232
    33 
    3433=== Normalized CPU time ===
    35  The '''normalized CPU time''' of a result is an estimate of the wall time it will take to complete, taking into account
    36 
     34The '''normalized CPU time''' of a result is an estimate of the wall time it will take to complete, taking into account
    3735
    3836 * the fraction of time BOINC runs ('on-fraction')
     
    4038 * CPU efficiency (the ratio of actual CPU to wall CPU)
    4139
    42  but not taking into account the project's resource share.
     40but not taking into account the project's resource share.
     41
    4342=== Project-normalized CPU time ===
    44  The '''project-normalized CPU time''' of a result is an estimate of the wall time it will take to complete, taking into account the above factors plus the project's resource share relative to other potentially runnable projects.
     43
     44The '''project-normalized CPU time''' of a result is an estimate of the wall time it will take to complete, taking into account the above factors plus the project's resource share relative to other potentially runnable projects.
    4545
    4646The 'work_req' element of a scheduler RPC request is in units of project-normalized CPU time. In deciding how much work to send, the scheduler must take into account the project's resource share fraction, and the host's on-fraction and active-fraction.
     
    4848For example, suppose a host has 1 GFLOP/sec CPUs, the project's resource share fraction is 0.5, the host's on-fraction is 0.8 and the host's active-fraction is 0.9. Then the expected processing rate per CPU is
    4949
    50 
    5150{{{
    5251(1 GFLOP/sec)*0.5*0.8*0.9 = 0.36 GFLOP/sec
    5352}}}
    54   If the host requests 1000 project-normalized CPU seconds of work, the scheduler should send it at least 360 GFLOPs of work.
     53
     54If the host requests 1000 project-normalized CPU seconds of work, the scheduler should send it at least 360 GFLOPs of work.
     55
    5556=== Result states ===
    56  R is '''runnable''' if
     57
     58R is '''runnable''' if
    5759 * Neither R nor R.project is suspended, and
    5860 * R's input files have been downloaded, and
    5961 * R hasn't finished computing
    6062
    61   R is '''nearly runnable''' if
     63R is '''nearly runnable''' if
    6264 * Neither R nor R.project is suspended, and
    6365 * None of R's input files is in a 'download deferred' state.
    6466 * R hasn't finished computing
    6567
    66 
    6768=== Project states ===
    68  P is '''runnable''' if
     69
     70P is '''runnable''' if
    6971 * P has at least one runnable result (this implies that P is not suspended).
    7072
    71   P is '''downloading''' if
     73P is '''downloading''' if
    7274 * P is not suspended, and
    7375 * P has at least one result whose files are being downloaded and none of the downloads is deferred.
    7476
    75   P is '''fetchable''' (i.e. the work-fetch policy allows work to be fetched from it) if
     77P is '''fetchable''' (i.e. the work-fetch policy allows work to be fetched from it) if
    7678 * P is not suspended, and
    7779 * P is not deferred (i.e. its minimum RPC time is in the past), and
     
    8082 * a fetch of P's master file is not pending
    8183
    82   P is '''latency-limited''' if
     84P is '''latency-limited''' if
    8385 * The client's last scheduler RPC to P returned a 'no work because of deadlines' flag, and
    8486 * the RPC reply's delay request has not yet elapsed.
    8587
    86  This means that P has work available, but didn't send any because the work's deadlines couldn't be met given the existing work queue. P is '''potentially runnable''' if
    87 
     88This means that P has work available, but didn't send any because the work's deadlines couldn't be met given the existing work queue. P is '''potentially runnable''' if
    8889
    8990 * P is either runnable, downloading, fetchable, overworked, or latency-limited.
    9091
    91  This means that, to the best of the client's knowledge, it could do work for P if it wanted to.
     92This means that, to the best of the client's knowledge, it could do work for P if it wanted to.
     93
    9294=== Debt ===
    93  Intuitively, a project's 'debt' is how much work is owed to it, relative to other projects. BOINC uses two types of debt; each is defined for a set S of projects. In each case, the debt is recalculated periodically as follows:
     95
     96Intuitively, a project's 'debt' is how much work is owed to it, relative to other projects. BOINC uses two types of debt; each is defined for a set S of projects. In each case, the debt is recalculated periodically as follows:
    9497 * A = the wall CPU time used by projects in S during this period
    9598 * R = sum of resource shares of projects in S
     
    100103 * P.debt is normalized so that the mean or minimum is zero.
    101104
    102 '''Short-term debt''' is used by the CPU scheduler. It is adjusted over the set of runnable projects. It is normalized so that minimum short-term debt is zero, and maximum short-term debt is no greater than 86,400 (i.e. one day).  '''Long-term debt''' is used by the work-fetch policy. It is defined for all projects, and adjusted over the set of potentially runnable projects. It is normalized so that average long-term debt, over all project, is zero.
    103 
     105'''Short-term debt''' is used by the CPU scheduler. It is adjusted over the set of runnable projects. It is normalized so that minimum short-term debt is zero, and maximum short-term debt is no greater than 86,400 (i.e. one day).
     106
     107'''Long-term debt''' is used by the work-fetch policy. It is defined for all projects, and adjusted over the set of potentially runnable projects. It is normalized so that average long-term debt, over all project, is zero.
    104108
    105109== Round-robin simulation ==
    106  The CPU scheduling and work fetch policies use the results of a simulation of weighted round-robin scheduling applied to the set of nearly runnable results. The simulation takes into account on-fraction and active-fraction. It produces the following outputs:
    107 
     110
     111The CPU scheduling and work fetch policies use the results of a simulation of weighted round-robin scheduling applied to the set of nearly runnable results. The simulation takes into account on-fraction and active-fraction. It produces the following outputs:
    108112
    109113 * deadline_missed(R): whether result R misses its deadline.
     
    114118In the example below, projects A and B have resource shares 2 and 1 respectively. A has results A1 and A2, and B has result B1. The computer has two CPUs. From time 0 to 4 all three results run with equal weighting. At time 4 result A2 finishes. From time 4 to 8, project A gets only a 0.5 share because it has only one result. At time 8, result A1 finishes.
    115119
    116 In this case, shortfall(A) is 4, shortfall(B) is 0, and total_shortfall is 2.  [[Image(http://boinc.berkeley.edu/rr_sim.png)]]
    117 
     120In this case, shortfall(A) is 4, shortfall(B``) is 0, and total_shortfall is 2.
     121
     122[[Image(http://boinc.berkeley.edu/rr_sim.png)]]
    118123
    119124== CPU scheduling policy ==
    120   The CPU scheduler uses an earliest-deadline-first (EDF) policy for results that are in danger of missing their deadline, and weighted round-robin among other projects if additional CPUs exist. This allows the client to meet deadlines that would otherwise be missed, while honoring resource shares over the long term. The scheduling policy is:
    121 
     125
     126The CPU scheduler uses an earliest-deadline-first (EDF) policy for results that are in danger of missing their deadline, and weighted round-robin among other projects if additional CPUs exist. This allows the client to meet deadlines that would otherwise be missed, while honoring resource shares over the long term. The scheduling policy is:
    122127
    123128 1. Set the 'anticipated debt' of each project to its short-term debt
    124129 1. Let P be the project with the earliest-deadline runnable result among projects with deadlines_missed(P)>0. Let R be P's earliest-deadline runnable result not scheduled yet. Tiebreaker: least index in result array.
    125  1. If such an R exists, schedule R,     decrement P's anticipated debt, and decrement deadlines_missed(P).
     130 1. If such an R exists, schedule R, decrement P's anticipated debt, and decrement deadlines_missed(P).
    126131 1. If there are more CPUs, and projects with deadlines_missed(P)>0, go to 1.
    127132 1. If all CPUs are scheduled, stop.
    128  1. If there is a result R that is currently running,     and has been running for less than the CPU scheduling period,    schedule R and go to 5.
     133 1. If there is a result R that is currently running, and has been running for less than the CPU scheduling period, schedule R and go to 5.
    129134 1. Find the project P with the greatest anticipated debt, select one of P's runnable results (picking one that is already running, if possible, else the one received first from the project) and schedule that result.
    130135 1. Decrement P's anticipated debt by the 'expected payoff' (the scheduling period divided by NCPUS).
     
    135140
    136141== CPU schedule enforcement ==
    137  The CPU scheduler decides what results should run, but it doesn't enforce this decision. This enforcement is done by a separate '''scheduler enforcement function''', which is called by the CPU scheduler at its conclusion. Let X be the set of scheduled results that are not currently running, let Y be the set of running results that are not scheduled, and let T be the time the scheduler last ran. The enforcement policy is as follows:
    138 
     142
     143The CPU scheduler decides what results should run, but it doesn't enforce this decision. This enforcement is done by a separate '''scheduler enforcement function''', which is called by the CPU scheduler at its conclusion. Let X be the set of scheduled results that are not currently running, let Y be the set of running results that are not scheduled, and let T be the time the scheduler last ran. The enforcement policy is as follows:
    139144
    140145 1. If deadline_missed(R) for some R in X, then preempt a result in Y, and run R (preempt the result with the least CPU wall time since checkpoint). Repeat as needed.
    141146 1. If there is a result R in Y that checkpointed more recently than T, then preempt R and run a result in X.
    142147
    143 
    144148== Work-fetch policy ==
    145   A project P is '''overworked''' if
    146 
    147 
    148  * P.long_term_debt  This condition occurs if P's results run in EDF mode (and in extreme cases, when a project with large negative LTD is detached). The work-fetch policy avoids getting work from overworked projects. This prevents a situation where a project with short deadlines gets more than its share of CPU time.
     149A project P is '''overworked''' if
     150
     151 * P.long_term_debt < -sched_period
     152
     153This condition occurs if P's results run in EDF mode (and in extreme cases, when a project with large negative LTD is detached). The work-fetch policy avoids getting work from overworked projects. This prevents a situation where a project with short deadlines gets more than its share of CPU time.
    149154
    150155The work-fetch policy uses the functions
    151156
    152 
    153157{{{
    154158frs(project P)
    155159}}}
    156   P's fractional resource share among fetchable projects.   The work-fetch policy function is called every few minutes (or as needed) by the scheduler RPC polling function. It sets the variable '''P.work_request_size''' for each project P, which is the number of seconds of work to request if we do a scheduler RPC to P. This is computed as follows:
    157 
     160
     161P's fractional resource share among fetchable projects.
     162
     163The work-fetch policy function is called every few minutes (or as needed) by the scheduler RPC polling function. It sets the variable '''P.work_request_size''' for each project P, which is the number of seconds of work to request if we do a scheduler RPC to P. This is computed as follows:
    158164
    159165{{{
     
    181187        and are proportional to P.resource_share
    182188}}}
    183   For non-CPU-intensive projects, P.work_request_size is set to 1 if P has no nearly-runnable result, otherwise 0.
     189
     190For non-CPU-intensive projects, P.work_request_size is set to 1 if P has no nearly-runnable result, otherwise 0.
    184191
    185192The scheduler RPC mechanism may select a project to contact because of a user request, an outstanding trickle-up message, or a result that is overdue for reporting. If it does so, it will also request work from that project. Otherwise, the RPC mechanism chooses the project P for which
    186 
    187193
    188194{{{
     
    190196P.long_term_debt + shortfall(P) is greatest
    191197}}}
    192  and requests work from that project. Note: P.work_request_size is in units of normalized CPU time, so the actual work request (which is in units of project-normalized CPU time) is P.work_request_size divided by P's resource share fraction relative to potentially runnable projects.
     198
     199and requests work from that project. Note: P.work_request_size is in units of normalized CPU time, so the actual work request (which is in units of project-normalized CPU time) is P.work_request_size divided by P's resource share fraction relative to potentially runnable projects.
     200
    193201----
    194202
    195203== Scheduler work-send policy ==
    196  NOTE: the following has not been implemented, and is independent of the above policies.
     204
     205NOTE: the following has not been implemented, and is independent of the above policies.
    197206
    198207The scheduler should avoid sending results whose deadlines are likely to be missed, or which are likely to cause existing results to miss their deadlines. This will be accomplished as follows:
    199 
    200 
    201   * Scheduler requests includes connection period, list of queued result (with estimated time remaining and deadline) and project resource fractions.
    202   * The scheduler won't send results whose deadlines are less than now + min_queue.
    203   * The scheduler does an EDF simulation of the initial workload to determine by how much each result misses its deadline. For each result R being considered for sending, the scheduler does an EDF simulation. If R meets its deadline and no result misses its deadline by more than it did previously, R is sent.
    204   * If the scheduler has work but doesn't send any because of deadline misses, it returns a 'no work because of deadlines' flag. If the last RPC to a project returned this flag, it is marked as latency-limited and accumulates LTD.
    205 
     208 * Scheduler requests includes connection period, list of queued result (with estimated time remaining and deadline) and project resource fractions.
     209 * The scheduler won't send results whose deadlines are less than now + min_queue.
     210 * The scheduler does an EDF simulation of the initial workload to determine by how much each result misses its deadline. For each result R being considered for sending, the scheduler does an EDF simulation. If R meets its deadline and no result misses its deadline by more than it did previously, R is sent.
     211 * If the scheduler has work but doesn't send any because of deadline misses, it returns a 'no work because of deadlines' flag. If the last RPC to a project returned this flag, it is marked as latency-limited and accumulates LTD.
    206212
    207213----
    208214
    209215== Describing scenarios ==
    210  We encourage the use of the following notation for describing scheduling scenarios (times are given in hours):
     216
     217We encourage the use of the following notation for describing scheduling scenarios (times are given in hours):
    211218
    212219P(C, D, R)
    213220
    214221This describes a project with
    215 
    216222
    217223  * C = CPU time per task
     
    219225  * R = fractional resource share
    220226
    221   A scenario is described by a list of project, plus the following optional parameters:
     227A scenario is described by a list of project, plus the following optional parameters:
    222228  * NCPUS: number of CPUS (default 1)
    223229  * min_queue
     
    225231  * cpu_scheduling_period
    226232
    227   An example scenario description is:
     233An example scenario description is:
    228234{{{
    229235P1(1000, 2000, .5)
     
    235241
    236242=== Scenario 1 ===
    237 
    238 
    239243
    240244{{{
     
    246250cpu_scheduling_period = 1
    247251}}}
    248  Typically one CPU will process 6-minute tasks for P1, and the other CPU will alternate between P2 and P3. It's critical that the scheduler run each task of P2 and P3 for the full CPU scheduling period. If we went strictly by debt, we'd end up switching between them every 6 minutes, and both P2 and P3 would have to resume from a checkpoint each time. For some apps (e.g. Einstein@home) resuming from a checking takes several minutes. So we'd end up wasting most of the time on one CPU.
    249 
     252
     253Typically one CPU will process 6-minute tasks for P1, and the other CPU will alternate between P2 and P3. It's critical that the scheduler run each task of P2 and P3 for the full CPU scheduling period. If we went strictly by debt, we'd end up switching between them every 6 minutes, and both P2 and P3 would have to resume from a checkpoint each time. For some apps (e.g. Einstein@home) resuming from a checking takes several minutes. So we'd end up wasting most of the time on one CPU.