Changes between Version 8 and Version 9 of CreditNew
- Timestamp:
- 11/04/09 12:24:38 (3 weeks ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
CreditNew
v8 v9 134 134 so that the average is the same for each version. 135 135 The adjustment is always downwards: 136 we maintain the average PFC *(V) of PFC() for each app version V,136 we maintain the average PFC^mean^(V) of PFC() for each app version V, 137 137 find the minimum X. 138 138 An app version V's jobs are then scaled by the factor 139 {{{ 140 S(V) = (X/PFC*(V))141 }}} 139 140 S(V) = (X/PFC^mean^(V)) 141 142 142 143 143 The result for a given job J 144 144 is called "Version-Normalized Peak FLOP Count", or VNPFC(J): 145 {{{ 146 VNPFC(J) = PFC(J) * (X/PFC*(V)) 147 }}} 145 146 VNPFC(J) = PFC(J) * (X/PFC^mean^(V)) 148 147 149 148 Notes: 162 161 (e.g., workunit.rsc_fpops_est) 163 162 we can normalize by this to reduce the variance, 164 and make PFC *(V) converge more quickly.163 and make PFC^mean^(V) converge more quickly. 165 164 * ''a posteriori'' estimates of job size may exist also 166 165 (e.g., an iteration count reported by the app) 204 203 then, for that app, 205 204 hosts should get the same average granted credit per job. 206 To ensure this, for each application A we maintain the average VNPFC *(A),207 and for each host H we maintain VNPFC *(H, A).205 To ensure this, for each application A we maintain the average VNPFC^mean^(A), 206 and for each host H we maintain VNPFC^mean^(H, A). 208 207 The '''claimed credit''' for a given job J is then 209 {{{ 210 VNPFC(J) * (VNPFC*(A)/VNPFC*(H, A))211 }}} 208 209 VNPFC(J) * (VNPFC^mean^(A)/VNPFC^mean^(H, A)) 210 212 211 213 212 There are some cases where hosts are not sent jobs uniformly: 219 218 220 219 This can be done by dividing 221 each sample in the computation of VNPFC *by WU.rsc_fpops_est220 each sample in the computation of VNPFC^mean^ by WU.rsc_fpops_est 222 221 (in fact, there's no reason not to always do this). 223 222 227 226 and increases the claimed credit of hosts that are more efficient 228 227 than average. 229 * VNPFC *is averaged over jobs, not hosts.228 * VNPFC^mean^ is averaged over jobs, not hosts. 230 229 231 230 == Computing averages == 312 311 313 312 * One-time cheats (like claiming 1e304) can be prevented by 314 capping VNPFC(J) at some multiple (say, 10) of VNPFC *(A).313 capping VNPFC(J) at some multiple (say, 10) of VNPFC^mean^(A). 315 314 * Cherry-picking: suppose an application has two types of jobs, 316 315 which run for 1 second and 1 hour respectively. 319 318 Suppose a client systematically refuses the 1 hour jobs 320 319 (e.g., by reporting a crash or never reporting them). 321 Its VNPFC *(H, A) will quickly decrease,320 Its VNPFC^mean^(H, A) will quickly decrease, 322 321 and soon it will be getting several thousand times more credit 323 322 per actual work than other hosts! 325 324 whenever a job errors out, times out, or fails to validate, 326 325 set the host's error rate back to the initial default, 327 and set its VNPFC *(H, A) to VNPFC*(A) for all apps A.326 and set its VNPFC^mean^(H, A) to VNPFC^mean^(A) for all apps A. 328 327 This puts the host to a state where several dozen of its 329 328 subsequent jobs will be replicated. 335 334 336 335 Unrelated to the credit proposal, but in a similar spirit. 337 The server will maintain ET *(H, V), the statistics of336 The server will maintain ET^mean^(H, V), the statistics of 338 337 job runtimes (normalized by wu.rsc_fpops_est) per 339 338 host and application version. 340 339 341 340 The server's estimate of a job's runtime is then 342 {{{ 343 R(J, H) = wu.rsc_fpops_est * ET*(H, V)344 }}} 341 342 R(J, H) = wu.rsc_fpops_est * ET^mean^(H, V) 343 345 344 346 345 == Implementation ==
