Changes between Version 82 and Version 83 of ProjectOptions


Ignore:
Timestamp:
Feb 19, 2009, 3:08:16 PM (15 years ago)
Author:
davea
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ProjectOptions

    v82 v83  
    112112The size of the feeder's enumeration query.  Default is 200.
    113113
    114 {{{
    115 <reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround>
    116 <reliable_max_error_rate>X</reliable_max_error_rate>
    117 }}}
    118 Hosts whose average turnaround is at most reliable_max_avg_turnaround
    119 and whose error rate is at most reliable_max_error_rate
    120 are considered 'reliable'.
    121 {{{
    122 <reliable_reduced_delay_bound>X</reliable_reduced_delay_bound>
    123 }}}
    124 When a result is sent to a reliable host, multiply the delay bound by reliable_reduced_delay_bound (typically 0.5 or so).
    125 {{{
    126 <reliable_on_priority>X</reliable_on_priority>
    127 <reliable_priority_on_over>X</reliable_priority_on_over>
    128 <reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error>
    129 }}}
    130 Results with priority at least '''reliable_on_priority''' will be sent only to reliable hosts.
    131 Increase priority of duplicate results by '''reliable_priority_on_over''';
    132 increase priority of duplicates caused by timeout (not error) by '''reliable_priority_on_over_except_error'''.
    133 
    134114== Scheduling: matchmaker scheduling ==
    135115
     
    156136to maintain statistics on the distribution of host speeds.
    157137
     138== Scheduling: accelerating retries ==
     139
     140The goal of this mechanism (which works with job-cache and matchmaker scheduling,
     141but not locality scheduling) is to send timeout-generated retries to
     142hosts that are likely to finish them fast.
     143Here's how it works:
     144 * Hosts are deemed "reliable" (a slight misnomer) if they satisfy turnaround time and error rate criteria.
     145 * A job instance is deemed "need-reliable" if its priority is above a threshold.
     146 * The scheduler tries to send need-reliable jobs to reliable hosts.  When it does, it reduces the delay bound of the job.
     147 * When job replicas are created in response to errors or timeouts, their priority is raised relative to the job's base priority.
     148
     149The configurable parameters are:
     150{{{
     151<reliable_on_priority>X</reliable_on_priority>
     152}}}
     153Results with priority at least '''reliable_on_priority''' are treated as "need-reliable".
     154With matchmaker scheduling, they'll be sent preferentially to reliable hosts;
     155with job-cache scheduling, they'll be sent ONLY to reliable hosts.
     156
     157{{{
     158<reliable_max_avg_turnaround>secs</reliable_max_avg_turnaround>
     159<reliable_max_error_rate>X</reliable_max_error_rate>
     160}}}
     161Hosts whose average turnaround is at most reliable_max_avg_turnaround
     162and whose error rate is at most reliable_max_error_rate are considered 'reliable'.
     163Make sure you set these low enough that a significant fraction (e.g. 25%) of your hosts qualify.
     164{{{
     165<reliable_reduced_delay_bound>X</reliable_reduced_delay_bound>
     166}}}
     167When a need-reliable result is sent to a reliable host,
     168multiply the delay bound by '''reliable_reduced_delay_bound''' (typically 0.5 or so).
     169{{{
     170<reliable_priority_on_over>X</reliable_priority_on_over>
     171<reliable_priority_on_over_except_error>X</reliable_priority_on_over_except_error>
     172}}}
     173
     174If '''reliable_priority_on_over''' is nonzero,
     175increase the priority of duplicate jobs by that amount over the job's base priority.
     176Otherwise, if '''reliable_priority_on_over_except_error''' is nonzero,
     177increase the priority of duplicates caused by timeout (not error) by that amount.
     178(Typically only one of these is nonzero, and is equal to '''reliable_on_priority'''.)
     179
     180NOTE: this mechanism can be used to preferentially send ANY job,
     181not just retries, to fast/reliable hosts.
     182To do so, set the workunit's priority to '''reliable_on_priority''' or greater.
     183
    158184== Scheduling: locality scheduling ==
    159185{{{
     
    198224<ignore_upload_certificates/>
    199225}}}
    200 If upload certificates are not generated, this option must be enabled to force file upload handler accept files being uploaded.
     226If upload certificates are not generated, this option must be enabled to force the file upload handler to accept files.
    201227
    202228== Default preferences ==