Changes between Version 4 and Version 5 of LocalityScheduling

Aug 26, 2012, 11:21:22 PM (9 years ago)



  • LocalityScheduling

    v4 v5  
    11= Locality scheduling =
    3 ''Locality scheduling'' is intended for projects for which
     3''Locality scheduling'' is intended for applications for which
    5  * Each workunit has a large input file (it may have other smaller input files as well).
    6  * Each large input file is used by many workunits.
     5 * Jobs have large input files.
     6 * Each input file is used by many jobs.
    8 The goal of locality scheduling is to minimize the amount of data transfer to hosts. In sending work to at given host, the scheduler tries to send results that use input files already on the host.
     8The goal of locality scheduling is to minimize the amount of data transfer to hosts
     9by preferentially sending jobs to hosts that
     10already have some or all of the input files required by those jobs.
     12To use locality scheduling you must declare the large input files as
     13[JobSubmission#Inputtemplatefiles sticky].
    13 To use locality scheduling, projects must do the following:
     15There are two variants of locality scheduling:
    15  * Workunit names must be of the form `FILENAME__*`, where `FILENAME` is the name of the large input file used by that workunit. These filenames cannot contain `__` (note that's a double underscore).
    16  * The <file_info> for each large input file must contain the tags
    17 {{{
    18 <sticky/>
    19 <report_on_rpc/>
    20 }}}
    21  * The config.xml file must contain
    22 {{{
    23 <locality_scheduling/>
    24 }}}
     17 * '''Limited''': use when the total number of input files at any point is
     18  fairly small (of order 100).
     19 * '''Standard''': use in other cases.
     20  Note: this hasn't been implemented yet.
     21  A design document is [LocalityNew here].
    26 Locality scheduling works as follows:
     23== Limited locality scheduling ==
    28  * Each scheduler RPC contains a list of the large files already on the host, if any.
    29  * The scheduler attempts to send results that use a file already on the host. This search for results by file is done in a random non-deterministic order.
    30  * For each file that is on the host and for which no results are available for sending, the scheduler instructs the host to delete the file.
     25Limited locality scheduling (LLS) uses BOINC's standard share-memory job cache scheduling mechanism.
     26It assumes that the ratio of the job cache size to the number of data files
     27is sufficiently large that, on average, there is at least one job
     28in the cache for a given data file.
     29It dispatches jobs that use files resident on the client
     30in preference to jobs that don't.
    32 Note that one can have a deterministic search, in increasing file-name order, by enabling
    33 {{{
    34 <locality_scheduling_sorted_order/>
    35 }}}
     32The size of the job cache is specified in
     33[ProjectOptions#Job-cachescheduling the project config file].
     34It can be be set to 1000 or so on machines with large RAM.
     35Thus LLS works best with applications that have no more than a few hundred active input files.
    37 == On-demand work generation ==
     37To use LLS with a particular application,
     38set the '''locality_scheduling''' field of its database record to 1
     39(currently this must be done by editing the database).
    39 This mechanism, which is used in conjunction with locality scheduling, lets a project create work in response to scheduler requests rather than creating all work ahead of time. The mechanism is controlled by an element in config.xml of the form:
    40 {{{
    41 <locality_scheduling_wait_period> N </locality_scheduling_wait_period>
    42 }}}
    43 where `N` is some number of seconds.
     41When using LLS, you should generate jobs in an order that cycles among the input files.
     42If you generate a long sequence of jobs for a given file,
     43they will fill up the job cache
     44and the policy won't work well.
    45 When a host storing file X requests work, and there are no available results using X, then the scheduler touches a 'trigger file'
    46 {{
    47 PROJECT_ROOT/locality_scheduling/need_work/X
    48 }}
    49 The scheduler then sleeps for N seconds, and makes one additional attempt to find suitable unsent results.
    51 The project must supply a 'on-demand work generator' daemon program that scans the need_work directory. If it finds an entry, it creates additional workunits for the file, and the transitioner then generates results for these workunits. N should be chosen large enough so that both tasks complete within N seconds most of the time (10 seconds is a good estimate).
    53 The work generator should delete the trigger file after creating work.
    55 In addition, if the work generator (or some other project daemon) determines that no further workunits can be made for a file X, then it can touch a trigger file
    56 {{{
    57 PROJECT_ROOT/locality_scheduling/no_work_available/X
    58 }}}
    59 If the scheduler finds this trigger file then it assumes that the project cannot create additional work for this data file and skips the 'notify, sleep, query again' sequence above. Of course it still does the initial query, so if the transitioner has made some new results for an existing (old) WU, they will get picked up.
    61 == Implementation notes ==
    63 Work is organized in a hierarchy:
    64 {{{
    65 File -> workunit -> result
    66 }}}
    67 Let's say there are N active hosts and target_nresults=M. Optimally, we'd like to send each file to M hosts, and have them process all the results for that file.
    69 If the one_result_per_user_per_wu rule is in effect, a file may have work but be 'excluded' for a particular user.
    71 Assigning work to a host with no files:
    73  * maintain a working set of N/M files
    74  * when a host with no file requests work, choose a file F uniformly (randomly or sequentially) from the working set.
    75  * if F is excluded for this user, choose a file using a deterministic algorithm that doesn't involve the working set (don't want to do this in general to avoid flocking)
    77 The working set is represented by a directory
    78 {{{
    79 PROJECT/locality_scheduling/file_working_set/
    80 }}}
    81 whose contents are names of files in the working set. A project-specific 'working set manager' daemon is responsible for maintaining this.
    83 If the scheduler finds that there are no sendable results for a file, it makes a file with that name in
    84 {{{
    85 PROJECT/locality_scheduling/files_no_work/
    86 }}}
    87 The working set manager should poll this directory and remove those files from the working set. NOTE: BOINC may later create more results for the file, so it may be necessary to add it to the working set again.
    89 Assigning work to a host with a file F:
    91  * send more results for file F. To do this efficiently, we maintain the following invariant: For a given user/file pair, results are sent in increasing ID order.
    93 Some projects may want to generate work incrementally. They can do this by supplying a 'work generator' daemon that polls the directory
    94 {{{
    95 PROJECT/locality_scheduling/need_work/
    96 }}}
    97 and creates work for any filenames found there. To enable this, add the element to config.xml; this tells the scheduler how long to wait for work to appear.
    99 NOTE: we assume that all results have app_versions for the same set of platforms. So if any result is rejected for this reason, we give up immediately instead of scanning everything.
     46Currently there is no mechanism for deleting old, unused sticky files from clients.
     47We'll need to add one at some point.