Changes between Initial Version and Version 1 of LocalityNew


Ignore:
Timestamp:
Aug 14, 2012, 1:54:01 PM (12 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • LocalityNew

    v1 v1  
     1= Locality scheduling redesign =
     2
     3== Assumptions ==
     4
     5 * A '''batch''' consists of a sequence of files F1 ... Fn
     6        and a set of jobs J1 ... Jm operating on these files.
     7 * Each job uses a contiguous set of files.
     8 * A given file may be used by many jobs.
     9 * The density of jobs in the file sequence may be variable.
     10 * Several batches may be in progress concurrently.
     11
     12== Goals ==
     13
     14 * To complete the batch quickly.
     15 * To minimize the amount of data transfer to hosts.
     16
     17== Scheduling policy ==
     18
     19The policy in which we dispatch the jobs in order essentially sends
     20every file to every host, so it fails to achieve the 2nd goal.
     21
     22The ideal policy would start each host at a different point in
     23the job space, separated according to their speeds.
     24This would potentially send each file to a single host.
     25However, it's impractical for various reasons:
     26replication, unreliability of hosts, and unpredictability of their speed.
     27
     28Instead, we use a policy in which the set of hosts is divided into '''teams''',
     29and each team starts at a different point in the job space.
     30Teams should have these properties:
     31
     32 * The fastest host in a team should be no faster than the
     33   total of the other hosts.
     34   Otherwise, it could get unboundedly far ahead of the others.
     35 * Subject to the above, teams should as small as possible.
     36   A good size might be 10 or 20.
     37 * The hosts in a team should belong to different users.
     38
     39Because of host churn, team membership is dynamic;
     40e.g. a team may be empty for a period.
     41
     42== Design ==
     43
     44A '''cursor''' consists of
     45 * a dynamic team of hosts
     46 * a range of jobs
     47 * status information (see below)
     48
     49Note: we discussed having a separate notion of "job range",
     50allowing cursors to move from one job range to another,
     51and allowing job ranges to be subdivided.
     52I think this is needless complexity.
     53
     54=== Database ===
     55
     56New tables:
     57
     58{{{
     59batch_host
     60        host_id integer
     61        batch_id integer
     62        cursor_id integer
     63
     64locality_cursor
     65        expavg_credit double
     66                // sum of expavg_credit of hosts in the team
     67        first_job_num integer
     68        last_job_num integer
     69        first_unfinished_job_num integer
     70                // all jobs before this have been completed
     71        first_ungenerated_job_num integer
     72                // all jobs before this have workunit records
     73
     74workunit (new fields)
     75        cursor_id integer
     76
     77result (new fields)
     78        cursor_id integer
     79
     80}}}
     81
     82=== Initialization ===
     83
     84To initialize a batch:
     85
     86 * create batch record
     87 * based on # of hosts, # of jobs, and job density,
     88   create locality_job_range and locality_cursor records
     89
     90=== Feeder/scheduler ===
     91
     92==== Assign host to cursors ====
     93
     94For each batch:
     95
     96If this is a new host (i.e. no batch_host record) then
     97 * assign host to cursor with least expavg_credit
     98 * create batch_host record
     99 * add host's expavg_credit to cursor's expavg_credit
     100
     101Otherwise, consider moving this host to a different cursor.
     102Let C = host's cursor.
     103If C.expavg_credit > 2*lowest expavg among cursors,
     104then move this host to the lowest-expavg cursor.
     105(This policy may need to be refined a bit).
     106
     107==== Assigning jobs ====
     108
     109==== Deleting files ====
     110
     111=== Work generator ===
     112
     113=== Validator ===
     114
     115=== Transitioner ===
     116
     117=== locality_daemon ===