wiki:AppPlan

Version 9 (modified by davea, 15 years ago) (diff)

--

Application planning

Application planning is a mechanism that lets the scheduler decide, using project-supplied logic, whether an application is able to run on a particular host, and if so what resources it will use and how fast it will run. It works as follows.

An app_version record (in the server DB) has a character string field plan_class. This identifies the range of processing resources that the application requires and is able to use. You can define these however you like, e.g. "cuda_1.1" apps require a CUDA-enabled GPU, "mt32" is a multithreaded app able to use 32 CPUs, etc.

The scheduler is linked with a project-supplied function

bool app_plan(SCHEDULER_REQUEST &sreq, char* plan_class, HOST_USAGE&);

The sreq argument contains various data:

  • in sreq.host field, a description of the host's processors and memory
  • in sreq.global_prefs field, a parsed version of the user's global preferences
  • in sreq.coprocs, a list of its coprocessors.

When called with a particular SCHEDULER_REQUEST and plan class, the function returns true if the host's resources are sufficient for apps of that class. If true, it populates the HOST_USAGE structure:

struct HOST_USAGE {
   COPROCS coprocs;   // coprocessors used by the app (name and count)
   double avg_ncpus;  // avg #CPUs used by app (may be fractional)
   double max_ncpus;  // max #CPUs used (relevant if user changes prefs later)
   double flops;      // estimated FLOPS
   char cmdline[256]; // passed to the app as a cmdline argument;
                      // this can be used, e.g. to control the # of threads used
};

When deciding whether to send a job to a host, the scheduler examines all latest-version app_versions for the platform, calls app_plan() for each, and selects the one for which flops is greatest. The client uses flops to estimate job completion times.

The scheduler reply includes, for each job, an XML encoding of HOST_USAGE.

Notes

  • The server code that estimates completion times currently doesn't know about multiprocessors or coprocessors.