wiki:LowLatency

Version 5 (modified by Kevin Reed, 17 years ago) (diff)

--

Low-latency computing

BOINC was originally designed for high-throughput computing, and one of its basic design goals was to minimize the number of scheduler RPCs (in order to reduce server load and increase scalability). In particular, when a client requests work from a server and there is none, the client uses exponential backoff, up to a maximum backoff off 1 day or so. This policy limits the number of scheduler requests to (roughly) one per job. However, this backoff policy is inappropriate for low-latency computing, by which we mean projects whose tasks must be completed in a few minutes or hours. Such projects require a minimum connection rate, rather than seeking to minimize the connection rate.

For example, if you need to get batches of 10,000 jobs completed with 5 minute latency, and each job takes 2 minutes of computing, you'll need to arrange to get 10,000 scheduler requests every 3 minutes (and you'll need a server capable to handling this request rate).

The minimum connection rate

Suppose that, at a given time, the project has N hosts online, and that each host has 1 CPU that computes at X FLOPS.

Suppose that the project's work consists of 'batches' of M jobs. Each batch is generated at a particular time, and all the jobs must be completed within time T. For simplicity, assume that a batch is not created until the previous batch has been completed, and that each has is given at most one job from each batch. Suppose that each job takes Y seconds to complete on the representative X-FLOPS CPU.

Clearly, for feasibility we must have Y = M. Let W = T - Y; a job must be dispatched within W seconds if it is to be completed within T.

Now suppose that each host requests work every Z seconds. Assume Z is small enough so that at least M requests arrive in any given period of length W. (TODO: figure out what this is, given a Poisson arrival process).

Then, within W seconds of the batch creation, all of the jobs have been sent to hosts, and within T seconds (assuming no errors or client failures) they have been completed and reported. Note: this is a simplistic analysis, and doesn't take into account multiprocessors, hosts of different CPU speed, the possibility of sending multiple jobs to one client, the ability for Z to vary between hosts, and probably many other factors. If someone wants to analyze this in more generality, please do!

How to do low-latency computing

The key component in the above is the ability to control Z, the time between requests for a given host. Starting with version 5.6 of the BOINC client, it is now possible to control this: each scheduler reply can include a tag

<next_rpc_delay>x</next_rpc_delay>

telling the client when to contact the scheduler again. By varying this value, a project can achieve a rate of connection requests necessary to achieve its latency bounds. Projects can currently add this tag to the project configuration file to ensure that all clients will perform a RPC no later then X seconds after their last RPC with the project. In the future it would be nice to expand the capability of this field and make it dynamic. The following scheduler changes would need to be done:

  • Keep track of how many active hosts you have (this will change over time).
  • Keep track of the performance statistics of these hosts (means and maybe variances of their FLOPS, for example).
  • Parameterize your workload: how many jobs per batch, latency bound, FLOPs per job, etc.
  • Do the math and write the code for figuring out that the RPC delay should be for a given host (this is dynamic - it will change with the number of active hosts).
  • Change the scheduler so that it uses this value so that it only sends the appropriate number of jobs.

If you're interested in helping add these features to BOINC, please contact us.