Changes between Version 2 and Version 3 of LocalityNew


Ignore:
Timestamp:
Aug 14, 2012, 2:18:04 PM (12 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LocalityNew

    v2 v3  
    99 * A given file may be used by many jobs.
    1010 * The density of jobs in the file sequence may be variable.
    11  * Several batches may be in progress concurrently.
     11 * Several batches may be in progress concurrently,
     12  for the same or different applications.
    1213
    1314== Goals ==
    1415
    15  * To complete the batch quickly.
     16 * To complete batches quickly.
    1617 * To minimize the amount of data transfer to hosts.
    1718
     
    2324The ideal policy would start each host at a different point in
    2425the job space, separated according to their speeds.
    25 This would potentially send each file to a single host.
     26This would potentially send each file only to a single host.
    2627However, it's impractical for various reasons:
    2728replication, unreliability of hosts, and unpredictability of their speed.
    2829
    2930Instead, we use a policy in which the set of hosts is divided into '''teams''',
    30 and each team starts at a different point in the job space.
     31and each team works on a different area of the job space.
    3132Teams should have these properties:
    3233
     
    3637 * Subject to the above, teams should as small as possible.
    3738   A good size might be 10 or 20.
    38  * The hosts in a team should belong to different users.
     39 * The hosts in a team should belong to different users (for validation purposes).
    3940
    4041Because of host churn, team membership is dynamic;
     
    4445
    4546A '''cursor''' consists of
    46  * a dynamic team of hosts
     47 * a team of hosts
    4748 * a range of jobs
    4849 * status information (see below)
     
    5152allowing cursors to move from one job range to another,
    5253and allowing job ranges to be subdivided.
    53 I think this is needless complexity.
     54I think this is needlessly complex.
    5455
    5556=== Database ===
     
    5859
    5960{{{
    60 batch_host
     61batch
     62        // this table already exists; we may need to add fields to it
     63
     64batch_host              // batch/host association table
    6165        host_id integer
    6266        batch_id integer
     
    6468
    6569locality_cursor
     70        batch_id integer
    6671        expavg_credit double
    6772                // sum of expavg_credit of hosts in the team
     
    7176                // all jobs before this have been completed
    7277        first_ungenerated_job_num integer
    73                 // all jobs before this have workunit records
     78                // we've generated workunit records for all jobs before this
     79        index on (batch_id, expavg_credit)
    7480
    7581workunit (new fields)
    7682        cursor_id integer
    7783        job_num integer
    78 
    79 result (new fields)
    80         cursor_id integer
    8184
    8285}}}
     
    9093   create locality_cursor records
    9194
    92 === Feeder/scheduler ===
     95=== Scheduler ===
    9396
    9497==== Assign host to cursors ====
     
    97100
    98101If this is a new host (i.e. no batch_host record) then
    99  * assign host to cursor with least expavg_credit
     102 * assign host to cursor for this batch with least expavg_credit
    100103 * create batch_host record
    101104 * add host's expavg_credit to cursor's expavg_credit
     
    104107Let C = host's cursor.
    105108If C.expavg_credit > 2*lowest expavg among cursors,
    106 then move this host to the lowest-expavg cursor.
     109then move this host to that cursor.
    107110(This policy may need to be refined a bit).
    108111
     
    127130tell client to delete it.
    128131
     132Note: names of sticky files should encode the batch and file number.
     133
    129134=== Work generator ===
    130135
     136Loop over batches and cursors.
    131137Try to maintain a cushion of N unsent jobs per cursor.
    132138Start generating jobs at cursor.first_ungenerated_job_num.