= API for multi-thread apps =

[[T(DesignDocument)]]

== Why write a multi-threaded app? ==

The average number of cores per PC will increase over the next few years,
possibly at a faster rate than the average amount of available RAM.

Depending on your application and project, it may be desirable
to develop a multi-threaded application.
Possible reasons to do this:

 * If your application's memory footprint is large enough that, on some PCs, there's not enough RAM to run a separate copy of the app on each CPU.

 * If you want to reduce the turnaround time of your jobs (either because of human factors, or to reduce server occupancy).

Writing and debugging a multi-threaded app is often hard.
You may be able to use existing libraries of
numerical "kernels" that are already multi-threaded.

== Assumptions ==

A 'multi-thread app' A uses multiple threads, say Nthreads(A).
The average number of processors used, Ncpus(A), may be less
(because of I/O or synchronization).

Ideally, on a host with N CPUs, we want
Ncpus(A), summed over running apps, to be about N.
If it's less, we're not using CPU time.
If it's more:
 * we increase latency without increasing throughput
 * we use more RAM than needed
 * higher synchronization overhead

We assume that applications may be able to change Nthreads(A) dynamically
in response to hints from BOINC.
Nthreads(A) need not be equal to the hint.

Example: suppose
 * we have an 80-core CPU
 * app A can use 1,2,4,8,16,32 threads
 * app B can use 1,2,4,8,16,32,64 threads

Then we want to have either (16,64) or (32,32) threads most of the time.

== Proposal ==

API functions:
{{{
int boinc_target_nthreads();
void boinc_actual_nthreads(int);
}}}

An application calls {{{boinc_target_nthreads()}}} periodically,
at points where it is able to change its number of threads.
It calls {{{boinc_actual_nthreads()}}} to report its actual number of threads.

A WU DB record can specify "max average ncpus",
an estimate of Ncpus(A) on a host with arbitrarily many CPUs.
This is used by the client and scheduler to estimate completion time.

== Implementation ==

Shared-memory messages:
 * core->app (process control channel): {{{<target_nthreads>}}}
 * app->core (process control channel): {{{<actual_nthreads>}}}

{{{
#!comment

*ahem*

Unless, before we get to add multi-threading support, we do the sane thing and change the communication use a compact binary protocol over a pipe instead of XML over shared memory.

"shared memory has to be right below 'printing and scanning with OCR' in terms of favorite means of IPC" -- rtyler on #boinc, irc.freenode.net

By the way, it would be nice to stabilize the API documented here, and implement it as a stub that always tells the app to use 1 thread. That way project developers don't have to wait for BOINC support before making multi-threaded applications. They could create apps supporting multiple threads right now, and whenever the BOINC client (or an unofficial fork, since everything will be so nicely documented, right?) adds support for this, they would Just Work(tm).
}}}

Client maintains estimates of CPU efficiency per job,
uses this to scale {{{target_nthreads}}}.

Implementation ({{{enforce_schedule()}}}):
as we schedule jobs, decrement CPU count by scaled {{{actual_nthreads}}}.
{{{rr_simulation()}}} needs to be modified too.