A Runtime System for Volunteer ComputingDavid P. AndersonCarl Christensen Bruce Allen Abstract
1. Introduction1.1 Volunteer computing and BOINCVolunteer computing is a form of distributed computing in which processing and storage resources are volunteered, primarily by members of general public. Early volunteer computing projects include the Great Internet Mersenne Prime Search [10], SETI@home [1], Distributed.net [6] and Folding@home [12]. BOINC (Berkeley Open Infrastructure for Network Computing) is a middleware system for volunteer computing [2]. BOINC is being used by a number of projects, including SETI@home, Climateprediction.net [5], LHC@home [13], Predictor@home [16], and Einstein@Home [7]. Volunteer computing is being used in high-energy physics, molecular biology, medicine, astrophysics, climate study, and other areas. Volunteers participate by running BOINC client software on their computers ('volunteer hosts', or 'hosts'). They can 'attach' each computer to any set of projects, and can control the resource fraction devoted to each project (see Figure 1).
Figure 1: BOINC volunteers can attach their hosts to any set of projects, and can assign a resource share to each project. BOINC software includes server-side components, such as scheduler and related daemon programs that manage task distribution and collection [REF], and web-based interfaces for volunteers and project administrators. In this paper we are concerned only with the part of the BOINC software that runs on volunteer hosts. 1.2 What volunteers wantVolunteer computing requires volunteers, and that requires attracting them and keeping them happy. How can middleware contribute to this? First, it is critical that BOINC's activity be 'invisible': that is, volunteers should not perceive processing or network slowness, exhaustion of disk space, and so forth, as a result of BOINC's activity. Volunteer hosts are mostly desktop PCs. BOINC runs scientific applications except while the host is in use (and in some cases even then). Second, volunteers should be able to limit when and how their computing resources are used - to specify, in effect, the tradeoff between invisibility and productivity. In BOINC, volunteers can specify general preferences, including:
In addition, volunteers can specify project-specific preferences for each project. These can specify, for example, to control the graphics generated by that project's applications. Third, should have the option of seeing graphics that show what the application is doing, either in a windows or as a screensaver. Finally, volunteers must be given credit for the work done by their hosts. BOINC has a system in which work done (a combination of computation, storage and network usage) is represented by a numeric measure, or 'credit'. This is used to display web-based leaderboards, graphs in the client GUI, and other visual forms. BOINC uses various mechanisms to ensure that credit is granted fairly. 1.3 Structure of BOINC client softwareThe BOINC client software consists of the following components (see Figure 2):
Figure 2: the BOINC client software structure
1.4 The BOINC runtime systemBOINC provides an Application Programming Interface (API) that provides functions such as process management, checkpoint/restart support, and file access. These functions are implemented by a runtime system consisting of a library that is linked with applications, together with code in the BOINC core client that interacts with this library. BOINC's application runtime system corresponds to analogous layers of systems for batch processing, distributed computing, and Grid computing. However, BOINC has several types of requirements that differ from those of other systems:
This paper 2. Process and IPC structure2.1 Shared-memory message-passingThe central design decision in the implementation of the BOINC API is: how should the core client interact with applications? Operating systems offer a variety of mechanisms for interprocess communication, process control, and synchronization. For example, POSIX-compliant systems such as Linux [REF] have signals, semaphores, and pipes. Windows has mutexes, messages, and various system calls for process and thread control. We chose to base the BOINC runtime system on shared-memory message-passing. For each application it executes, the core client creates a shared memory segment shared by itself and the application. The shared memory contains a data structure representing a number of unidirectional message channels. Each channel consists of a fixed-size buffer and a flag indicating whether a message is present. The sender checks if the flag is clear. If so, it copies the message to the buffer and sets the flag. If the sender finds the message buffer full, in some cases it is OK to discard the message. Otherwise, the sender can maintain a queue of unsent messages and sends them from a polling function. The BOINC API implementation uses eight message channels, four in each direction. For example, one channel carries process control messages (telling the application to suspend, resume or quit) while another one conveys graphics-related messages (telling the application to create or destroy a graphics window).
Figure 3: shared-memory message passing Compared with alternatives (such as using pipes, sockets, or signals) the use of shared-memory message passing has several advantages:
2.2. Application thread structure
The thread structure of the BOINC runtime system
is shown in Figure 3.
Figure 3: The thread structure of the BOINC runtime system The process structure is slightly different between Unix and Windows.
2.3. Simple and compound applicationsSimple applications are those that consist of a single program; scientific code, graphics code, and the BOINC runtime library reside and execute in a single address space. Compound applications consist of a master program that runs one or more worker programs. The master program, for example, might run pre-processing, main, and post-processing programs in sequence. It might launch one or more other processes (e.g. coupled climate models) that communicate via shared memory but are controlled by the master program. It might run a graphics program concurrently with a scientific program. The master program typically handles the suspend/quit/resume functionality (described below), as well as managing the reporting of CPU time and fraction done (see Section X).
Figure 4: Compound application 3. Process control and synchronization3.1 Process controlThe core client needs to suspend, resume, and quit applications for various reasons:
There are various project-configurable options for handling process control. In the simplest case, the BOINC client can handle all of the suspend/resume/quit functionality for a project's application. But it is also possible for an application to receive and handle process control messages from BOINC. In this way, for example, the various sub-processes of a compound application can be shut down more "cleanly", i.e. a common shared-memory segment can be detached and closed first before quitting. In the event of a compound application not responding to a quit request in a suitable time period, the BOINC client can force a kill of the application and sub-processes (via a KILL signal in Unix or TerminateProcess() in Windows). Override: send KILL signal 3.2 HeartbeatsSometimes the core client exits unexpectedly; for example, when it crashes. In these situations, a mechanism is needed that will cause applications to exit eventually. We do this using heartbeat messages, which are sent once per second from the core client to each application. If an application doesn't get a heartbeat message for 30 seconds, it exits. In some cases, processes need to make an action atomic with respect to exiting. For example, they might need to write a file or group of files. BOINC provides API functions boinc_begin_critical_section(); boinc_end_critical_section(); boinc_begin_critical_section()
sets a variable in the runtime library.
Any code that would otherwise exit
(heartbeat timeout, handling of 'quit' messages')
checks this variable and defers quitting if it's set.
4. SandboxingThe BOINC runtime system doesn't 'sandbox' applications. However, it guards against certain types of application misbehavior. Each task has specified limits on memory usage, disk usage, and computation (number of floating-point operations). The runtime system periodically measures these quantities and reports them to the core client. If the limits are exceeded, the core client kills the application. 5. Status and credit reporting5.1 CPU time and VM usageCPU time reporting 5.2 Fraction done
boinc_fraction_done(double fraction_done);The fraction_done argument is an estimate of the
workunit fraction complete (0 to 1).
This function is fast and can be called frequently.
5.3 Credit reportingboinc_fpops_per_cpu_sec() boinc_fpops_total() 6. Directory structure and file accessThe directory structure used by the BOINC runtime system is as follows (see Figure 5).
Figure 5: Directory structure To access files, applications call int boinc_resolve_filename(char *logical_name, char *physical_name);to convert logical file names to physical names. For example, instead of f = fopen(\"my_file\", \"r\");the application might use string resolved_name; retval = boinc_resolve_filename(\"my_file\", resolved_name); if (retval) fail(\"can't resolve filename\"); f = boinc_fopen(resolved_name.c_str(), \"r\"); boinc_resolve_filename() doesn't need to be used for temporary files.
Applications should replace fopen() calls with boinc_fopen(char* path, char* mode);This deals with platform-specific problems. On Windows, where security and indexing programs can briefly lock files, boinc_fopen() does several retries at 1-second intervals. On Unix, where signals can cause fopen() to fail with EINTR, boinc_fopen checks for this and does a few retries. 7. CheckpointingComputations that use a significant amount of time per work unit may want to periodically write the current state of the computation to disk. This is known as checkpointing. The state file must include everything required to start the computation again at the same place it was checkpointed. On startup, the application reads the state file to determine where to begin computation. If the BOINC client quits or exits, the computation can be restarted from the most recent checkpoint. Frequency of checkpointing is a user preference (e.g. laptop users might want to checkpoint infrequently). An application must call int boinc_time_to_checkpoint();whenever it reaches a point where it is able to checkpoint. If this returns nonzero, the application should call a 'checkpoint function' that writes the state file and flushes all output files, then should call void boinc_checkpoint_completed(); boinc_time_to_checkpoint() is fast,
so it can be called frequently (hundreds or thousands of times a second).
These functions automatically make checkpointing
into a critical section (see section X).
7.1 Output file integrityTypical BOINC applications write incrementally to output files. It an application is preempted by quitting at a point when has written to an output file since the last checkpoint, the output will be written a second time when the task runs again, producing an erroneous output file. There are several ways of dealing with this:
8. GraphicsBOINC applications can optionally provide graphics, which are displayed either in an application window or in a full-screen window (when acting as a screensaver). Requests to open and close graphics windows originate from the BOINC Manager and screensaver, and are relayed to applications by the core client. The recommended way for an application to provide graphics is for it to provide a set of 'graphics functions', which are called as needed by the runtime library. app_graphics_init(); // initialize graphics app_graphics_render(); // render a frame app_graphics_resize(); // called when window is resized app_mouse_move(); // called when mouse moved app_mouse_button(); // called when mouse button clicked app_key_press(); // called when key pressed app_key_release(); // called when key releasedTo use this approach, applications call an initialization function (see section X) that creates a 'graphics thread'. This thread executes code (in the BOINC runtime library) that monitors the graphics message queue, and that implements a window-system event handling loop. This code handles the creation and destruction of windows (using native calls on Windows, and GLUT on other platforms). It calls the application-supplied graphics functions as needed. Typically it calls the render function 10 times a second, allowing applications to generate moving, animated graphics. It uses a throttling mechanism that limits the fraction of CPU time used by the graphics thread. On hosts without graphics acceleration, the graphics will gracefully degrade to one frame per second or so. Applications typically implement graphics using OpenGL [REF], since it is available on most platforms and offers hardware acceleration where available. There is a complication, however, on Unix-based systems. Applications must link dynamically to the needed graphics libraries (OpenGL, X11); otherwise they won't get hardware acceleration. However, it an executable that dynamically links a library is run on a system that doesn't have the library, it will fail to start. Some Unix systems have no graphics libraries. BOINC's solution to this is to divide the application into a main program and a shared library. The shared library contains the application's graphics code. It dynamically links system graphics libraries, but the main program does not. BOINC's graphics-initialization function attempts to load the shared library; if system graphics libraries are missing, this will fail, in which the main program continues to execute normally, but without graphics. This structure is not needed on Windows, since OpenGL libraries are available on all versions of Windows. In the above approach, the worker and graphics thread run in the same address space. This makes it easy to show a detailed, up-to-the-second view of the scientific calculation. For some applications, this is not important; it may be easier to implement graphics as a separate program that uses information in output or checkpoint files. The BOINC runtime system supports this also. 9. Long-running applicationsExample: CPDN [REF]. 9.1 Trickle messagesTrickle messages let applications communicate with the server during the execution of a workunit. They are intended for applications that have long work units (multiple days). Trickle messages may go in either direction: 'trickle up' messages go from application to server, 'trickle down' messages go from server to application. Typical uses of this mechanism:
Trickle messages are asynchronous and reliable. Trickle messages are conveyed in scheduler RPC messages, so they may not be delivered immediately after being generated. The trickle-message API is as follows:
sends a trickle message of the given variety.
Returns zero if success.
receives a trickle message.
Returns true if there was a message.
Messages are delivered in order.
9.2 Intermediate file uploadLong-running applications can upload their output files before the result as a whole is finished. To initiate the upload of an output file, call extern int boinc_upload_file(char* name);where 'name' is the logical name of the file. The application cannot modify the file after making this call. To check on the status of a file being uploaded, call extern int boinc_upload_status(char* name);This will return zero if the upload of the file is finished successfully. 10. Non-CPU-intensive applicationsMost BOINC applications are CPU-intensive, i.e. when they run they use as much CPU time as they can get. However, some applications are not CPU intensive. These applications typically are designed so that they always run, but have short and infrequent periods of activity, sleeping the rest of the time. Examples include:
Some non-CPU-intensive require that, during their periods of activity, no other BOINC-related work should occur. In particular, no network transfers should take place, and all other BOINC applications should exit. For this purpose, BOINC supplies the following API: int boinc_suspend_other_activities(); int boinc_resume_other_activities();The application should call these at the start and end of the its period of activity. 11. Initialization, finalization and error reportingTo start the initial execution of a task, the core client does the following:
Applications call int boinc_init();before calling other BOINC functions or doing I/O. This does the following:
When the application has completed it calls int boinc_finish(int status); status is nonzero if an error was encountered.
This does the following:
The boinc_finished file solves the following problem: On some versions of Windows, when a program is killed externally by the user, this is indistinguishable (from the parent process's viewpoint) from a call to exit(0). If the core client detects that a program has exited unexpectedly but no boinc_finished file is found, it logs a warning message and starts the application again. 12. Related workCondor: migration; remote system call technology; traps library calls, sent over network to machine where job was submitted. No changes to source code, no recompile (just relink). ClassAds: form of preferences. UD, Entropia XtremWeb, Banyihan 13. Conclusionnon-goal: binary compatibility not handled: allocation of non-CPU resources e.g. could have an app that checks for a GPU, uses it (and not CPU) if found. On a host with GPU, would want to run this together with a CPU-intensive app. References |