Job size matching
The difference in throughput between a slow processor (e.g. an Android device that runs infrequently) and a fast processor (e.g. a GPU that's always on) can be a factor of 1,000 or more. Having a single job size can therefore present problems:
- If the size is small, hosts with GPUs get huge numbers of jobs. This causes performance problems on the client and a high DB load on the server.
- If the size is large, slow hosts can't get jobs, or they get jobs that take weeks to finish.
This document describes a set of mechanisms that address these issues.
Job size classes
We'll assume that jobs for a given application can be generated in several discrete size classes; the number of size classes is a parameter of the application.
BOINC will try to send jobs of size class i to devices whose effective speed is in the ith quantile, where 'effective speed' is the product of the device speed and the host's on-fraction.
This involves 3 new integer DB fields:
The size class of a job is specified in the call to create_work().
Apps with n_size_classes > 1 are called multi-size apps. A project can have both multi-size and non-multi-size apps.
- It's up to the project to decide how large jobs of different size classes should be. E.g. if there are 2 classes, the ratio of large to small could be 2:1 or 100:1. We should design tools to guide this decision.
- It's up to the project to decide how many jobs of each size class to create.
- app.n_size_classes can be changed but you need to be careful because of existing jobs.
Computing device quantiles
The order statistics of device effective speed will be computed by a new program size_census. For each multi-size app this does:
- enumerate host_app_versions for that app
- compute "effective speed" as hav.et.avg * host.available_frac
- sort the results, and find the quantile points
- write these to a flat file
The feeder reads these flat files (on startup, and when they change) and stores the quantile points in shared memory.
When the scheduler sends jobs of a given multi-size app to a given processor, it should preferentially send jobs whose size class matches the quantile of the processor. Failing that, it should preferentially send jobs of smaller size classes rather than larger.
To implement this policy, we'll use a new version of score-based scheduling. This will work as follows.
The scheduler will make separate pass for each resource types (GPUs first). For each resource:
- For each app, find the best app version and decide whether it's reliable
- For each of these app versions, find the effective speed. Find which quantile the device falls into.
- Scan N entries of the job array, starting at a random point. Make a list of jobs for which an app version is available.
- For each job, compute a "score" that includes various factors. (reliable, beta, previously infeasibly, locality scheduling lite).
- For multi-size apps, include a factor for job size; decrement the score of jobs that are too small, and decrement more for jobs that are too large.
- Sort the list by score.
- Scan the list; for each job
- Make sure it's still in the array
- Do quick checks
- Lock entry and do slow checks
- Send job
- Leave loop if resource request is satisfied or we're out of disk space
- My intent is to have this become the default scheduling mechanism, replacing the current array-scan and score-based mechanisms.
- The scan size N is configurable. E.g. WCG plans to have a job array of 3-4000, and N = ~500. We'd need to experiment with this to minimize contention between active requests and the resource load maintaining a job array of that size.
- All other factors being equal, the scheduler will send jobs of other apps rather than send a job of non-optimal size class. This could potentially lead to starvation issues; we'll have to see if this is a problem.
Regulating the flow of jobs into shared memory
How can we prevent shared memory from becoming "clogged" with jobs of one size?
One approach would be to allocate slots for each size. This would be too complex because we already have two allocation schemes (for HR and all_apps).
We could modify the work generator to so that it polls the number of unsent jobs of each size, and creates a few more jobs of a given size when this number falls below a threshold. Problem: this might not be able to handle a large spike in demand. We'd like to be able to have a large buffer of unsent jobs in the DB.
Instead, we'll do the following:
- when jobs are created for a multi-size app (in the transitioner), set their state to INACTIVE rather than UNSENT.
- have a new daemon (size_regulator) that polls for the number of unsent jobs of each type, and changes a few jobs from INACTIVE to UNSENT is N falls below a certain level.
The parameters of size_regulator are:
- The name of the app
- Low- and high-water marks for the number of UNSENT jobs of each size class.
- The polling frequency.
- Ordering options as for the feeder (--priority_order, --priority_order_create_time, --random_order). This determines the order in which jobs are marked as UNSENT.