Changes between Version 29 and Version 30 of GpuWorkFetch

Show
Ignore:
Author:
davea (IP: 128.32.18.181)
Timestamp:
03/18/09 11:14:55 (8 months ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • GpuWorkFetch

    v29 v30  
    11= Work fetch and GPUs = 
    22 
    3 This document describes changes to BOINC's work fetch mechanism, 
    4 in the 6.7 client and the scheduler as of [17024]. 
     3This document describes changes to BOINC's work fetch mechanism 
     4in the 6.6 client and the scheduler as of [17024]. 
    55 
    66== Problems with the old work fetch policy == 
    2626== Examples == 
    2727 
    28 In following, A and B are projects. 
     28In following examples, the client is attached 
     29to projects A and B with equal resource share. 
    2930 
    3031=== Example 1 === 
    4950 
    5051Variation: a new project C is attached when A's job finishes. 
    51 It should immediately share the CPU with B. 
     52It should immediately share the CPU 50/50 with B. 
    5253 
    5354=== Example 3 === 
    5657After a year, B gets a GPU app. 
    5758 
    58 Goal: A and B immediately share the GPU. 
    59  
    60 == Resource types == 
    61  
    62 New abstraction: '''processing resource type''' just "resource type". 
     59Goal: A and B immediately share the GPU 50/50. 
     60 
     61== The new policy == 
     62 
     63=== Resource types === 
     64 
     65New abstraction: '''processing resource type''' or just "resource type". 
    6366Examples of resource types: 
    6467 * CPU 
    6568 * A coprocessor type (a kind of GPU, or the SPE processors in a Cell) 
    66  
    67 A job sent to a client is associated with an app version, 
    68 which uses some number (possibly fractional) of CPUs, 
    69 and some number of instances of a particular coprocessor type. 
    70  
    71 == Scheduler request and reply message == 
    72  
    73 New fields in the scheduler request message: 
    74  
    75  '''double cpu_req_secs''':: number of CPU seconds requested 
    76  '''double cpu_req_instances''':: send enough jobs to occupy this many CPUs 
    77  
    78 And for each coprocessor type: 
    79  
    80  '''double req_secs''':: number of instance-seconds requested 
    81  '''double req_instances''':: send enough jobs to occupy this many instances 
    82  
    83 The semantics: a scheduler should send jobs for a resource type 
    84 only if the request for that type is nonzero. 
    85  
    86 For compatibility with old servers, the message still has '''work_req_seconds''', 
    87 which is the max of the req_seconds. 
    88  
    89 == Per-resource-type backoff == 
    90  
    91 We need to handle the situation where e.g. there's a GPU shortfall 
    92 but no projects are supplying GPU work 
    93 (for either permanent or transient reasons). 
    94 We don't want an overall work-fetch backoff from those projects. 
    95  
    96 Instead, we maintain a separate backoff timer per (project, resource type). 
    97 The backoff interval is doubled up to a limit whenever we ask for work of that type and don't get any work; 
     69Currently there are two resource types: CPU and NVIDIA GPUs. 
     70 
     71Summary of the new policy: it's like the old policy, 
     72but with a separate copy for each resource type, 
     73and scheduler requests can now ask for work for particular resource types. 
     74 
     75=== Per-resource-type backoff === 
     76 
     77We need to keep track of whether projects have work for particular 
     78resource types, 
     79so that we don't keep asking them for types of work they don't have. 
     80 
     81To do this, we maintain a separate backoff timer per (project, resource type). 
     82The backoff interval is doubled up to a limit (1 day) 
     83whenever we ask for work of that type and don't get any work; 
    9884it's cleared whenever we get a job of that type. 
    99  
    100 There is still an overall backoff timer for each project. 
    101 This is triggered by: 
    102  * requests from the project 
    103  * RPC failures 
    104  * job errors 
    105 and so on. 
    106  
    10785Note: if we decide to ask a project for work for resource A, 
    10886we may ask it for resource B as well, even if it's backed off for B. 
    10987 
    110 == Long-term debt == 
     88This is independent of the overall backoff timer for each project, 
     89which is triggered by requests from the project, 
     90RPC failures, job errors and so on. 
     91 
     92=== Long-term debt === 
    11193 
    11294We continue to use the idea of '''long-term debt''' (LTD), 
    145127 * An offset is added so that the maximum debt across all projects is zero (this ensures that when a new project is attached, it starts out debt-free). 
    146128 
    147  
    148 == Client data structures == 
    149  
    150 === RSC_WORK_FETCH === 
    151  
    152 Work-fetch state for a particular resource types. 
    153 There are instances for CPU ('''cpu_work_fetch''') and NVIDIA GPUs ('''cuda_work_fetch'''). 
    154 Data members: 
    155  
    156  '''ninstances''':: number of instances of this resource type 
    157  
    158 Used/set by rr_simulation()): 
    159  
    160  '''double shortfall''':: shortfall for this resource 
    161  '''double nidle''':: number of currently idle instances 
    162  
    163 Member functions: 
    164  
    165  '''rr_init()''':: called at the start of RR simulation.  Compute project shares for this PRSC, and clear overall and per-project shortfalls. 
    166  '''set_nidle()''':: called by RR sim after initial job assignment. 
    167 Set nidle to # of idle instances. 
    168  '''accumulate_shortfall()''':: called by RR sim for each time interval during work buf period. 
    169 {{{  
    170 shortfall += dt*(ninstances - instances in use) 
    171 for each project p not backed off for this PRSC 
    172     p->PRSC_PROJECT_DATA.accumulate_shortfall(dt) 
    173 }}} 
    174  
    175  '''select_project()''':: select the best project to request this type of work from. It's the project not backed off for this PRSC, and for which LTD + p->shortfall is largest, also taking into consideration overworked projects etc. 
    176  
    177  '''accumulate_debt(dt)''':: 
    178 for each project p: 
    179 {{{ 
    180 x = insts of this device used by P's running jobs 
    181 y = P's share of this device 
    182 update P's LTD 
    183 }}} 
    184  
    185 === RSC_PROJECT_WORK_FETCH === 
    186  
    187 State for a (resource type, project pair). 
    188 It has the following "persistent" members (i.e., saved in state file): 
    189  
    190  '''backoff_interval'''::  how long to wait until ask project for work specifically for this PRSC; 
    191 double this any time we ask for work for this rsc and get none (maximum 24 hours). Clear it when we ask for work for this PRSC and get some job. 
    192  '''backoff_time''':: back off until this time 
    193  '''debt''': long term debt 
    194  
    195 And the following transient members (used by rr_simulation()): 
    196  
    197  '''double runnable_share''':: # of instances this project should get based on resource share 
    198 relative to the set of projects not backed off for this PRSC. 
    199  '''instances_used''':: # of instances currently being used 
    200  
    201 === PROJECT_WORK_FETCH === 
    202  
    203 Per-project work fetch state. 
    204 Members: 
    205  '''overall_debt''':: weighted sum of per-resource debts 
    206  
    207 === WORK_FETCH === 
    208  
    209 Overall work-fetch state. 
    210  
    211  '''PROJECT* choose_project()''':: choose a project from which to fetch work. 
    212  
    213  * Do round-robin simulation 
    214  * if a GPU is idle, choose a project to ask for that type of work (using RSC_WORK_FETCH::choose_project()) 
    215  * if a CPU is idle, choose a project to ask for CPU work 
    216  * if GPU has a shortfall, choose a project to ask for GPU work 
    217  * if CPU has a shortfall, choose a project to ask for CPU work 
    218  In the case where a resource type was idle, ask for only that type of work. 
     129=== Summary of the new policy === 
     130 
     131Every 60 seconds, and when various events happen (e.g. jobs finish), 
     132the following is done. 
     133CI is the "connect interval" preference; 
     134AW is the "additional work" preference. 
     135 
     136Auxiliary functions: 
     137 
     138'''get_major_shortfall(resource)''' 
     139 
     140If the resource will have an idle instance before CI, 
     141return the greatest-overall-debt non-backed-off project P 
     142(P may be overworked).  Otherwise return NULL. 
     143 
     144'''get_minor_shortfall(resource)''' 
     145 
     146If the resource will have an idle instance between CI and CI+AW, 
     147return the greatest-overall-debt non-backed-off non-overworked project P 
     148 
     149'''get_starved_project(resource)''' 
     150 
     151If any project is not overworked, not backed off, and has no runnable jobs 
     152for any resource, return the one with greatest overall debt 
     153 
     154Main logic: 
     155 * Do a round-robin simulation of currently queued jobs. 
     156 * p = get_major_shortfall(NVIDIA GPU); if p <> NULL, ask it for work and return 
     157 * ... same for other coprocessor types (we assume that coprocessors are faster, hence more imporant, than CPU) 
     158 * ... same, for CPU 
     159 * p = get_minor(shortfall(NVIDIA GPU); if p <> NULL, ask it for work and return 
     160 * ... same for other coprocessor types, then CPU 
     161 * p = get_starved_project(NVIDIA GPU); if p <> NULL, ask it for work and return 
     162 * ... same for other coprocessor types, then CPU 
     163 
     164In the get_major_shortfall() case, ask only for work of that resource type. 
    219165Otherwise ask for all types of work for which there is a shortfall. 
    220166 
    221 == Scheduler changes == 
     167== Implementation notes == 
     168 
     169A job sent to a client is associated with an app version, 
     170which uses some number (possibly fractional) of CPUs, 
     171and some number of instances of a particular coprocessor type. 
     172 
     173=== Scheduler request and reply message === 
     174 
     175New fields in the scheduler request message: 
     176 
     177 '''double cpu_req_secs''':: number of CPU seconds requested 
     178 '''double cpu_req_instances''':: send enough jobs to occupy this many CPUs 
     179 
     180And for each coprocessor type: 
     181 
     182 '''double req_secs''':: number of instance-seconds requested 
     183 '''double req_instances''':: send enough jobs to occupy this many instances 
     184 
     185The semantics: a scheduler should send jobs for a resource type 
     186only if the request for that type is nonzero. 
     187 
     188For compatibility with old servers, the message still has '''work_req_seconds''', 
     189which is the max of the req_seconds. 
     190 
     191=== Client data structures === 
     192 
     193 RSC_WORK_FETCH:: The work-fetch state for a particular resource type. There are instances for CPU ('''cpu_work_fetch''') and NVIDIA GPUs ('''cuda_work_fetch'''). 
     194 RSC_PROJECT_WORK_FETCH:: The work-fetch state for a (resource type, project pair). 
     195 PROJECT_WORK_FETCH:: Per-project work fetch state. 
     196 WORK_FETCH:: Overall work-fetch state. 
     197 
     198=== Scheduler changes === 
    222199 
    223200 * WORK_REQ has fields for requests (secs, instances) of the various resource types 
    229206 * get_app_version(): skip app versions for resource for which we don't need more work. 
    230207 
    231  
    232208== Notes == 
    233209 

If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.