[[PageOutline]] = Condor-B: BOINC/Condor integration = This document describes the design of Condor-B, extensions to BOINC and Condor so that a BOINC-based volunteer computing project can provide computing resources to a Condor pool. A central design goal is transparency from the job submitter's viewpoint. Condor-B must address some basic differences between Condor and BOINC: * Data model: * In BOINC, files have both logical and physical names. Physical names are unique within a project, and the file associated with a given physical name is immutable. Files may be used by many jobs. In Condor, a file is associated with a job, and has a single name. * BOINC is designed for apps for which the number and names of output files is fixed at the time of job submission. Condor doesn't have this restriction. * Application concept: In Condor, a job is associated with a single executable, and can run only on hosts of the appropriate platform (and possibly other attributes, as specified by the job's !ClassAd). In BOINC, there may be many app versions for a single application: e.g. versions for different platforms, GPU types, etc. A job is associated with an application, not an app version. == Assumptions == For simplicity, we'll assume that the BOINC project has been configured to run a certain set of applications for which jobs are commonly submitted to Condor. For each of these applications, admins must * Create a BOINC application record * Create input and output templates. Note: in general, the set of input/output files, and their names, must be fixed ahead of time. If an application produces output files with indeterminate names, it must combine these into a zip file (the BOINC wrapper can do this). * Build the app for one or more platforms (ways of doing this are discussed below). * Create BOINC "app versions". == Job submission mechanism == We'll use Condor's existing mechanism for sending jobs to non-Condor back ends. This will involve 2 components: * A "BOINC GAHP" program: runs as a daemon process on the submit node. This handles RPCs (over pipes) from the Condor job router to submit and monitor jobs. * A new class in Condor's job_router for managing communication with the BOINC GAHP. [[Image(condor.png)]] === GAHP protocol === The GAHP protocol will be based on the one used for HTCondor's interactions with Globus GRAM. That protocol's description can be found at http://research.cs.wisc.edu/htcondor/gahp/gahp_protocol.txt. From that protocol, we will take the basic syntax and command structure, and these commands: * ASYNC_MODE_ON * ASYNC_MODE_OFF * COMMANDS * QUIT * RESULTS * VERSION * RESPONSE_PREFIX To that, we will add the BOINC-specific commands outlined below. The GAHP protocol is text-based. Each request and reply consists of a single line. The main commands return S (success) or E (error) depending on whether it was syntactically valid. The commands takes a argument, and a RESULTS command fetches the results of completed commands. The commands are: {{{ BOINC_SUBMIT <#jobs> <#args> ... <#input files> ... ALL|<#output files> ... ... Result: NULL (success) or }}} Notes: * The batch name must be unique over all submissions * The output file descriptions are optional; in any case, they must agree with the app's output template. * As of now, will always be the filename part of * We could add a argument to prepend to input paths. {{{ BOINC_QUERY_BATCH Result: NULL| ... }}} Notes: * status is either NOT_STARTED, IN_PROGRESS, DONE, or ERROR {{{ BOINC_FETCH_OUTPUT ALL|<#files> ... Result: NULL|error_msg }}} Retrieves a job's output files. {{{ BOINC_ABORT_JOBS ... Result: NULL| }}} Abort the given jobs. {{{ BOINC_RETIRE_BATCH Result: NULL| }}} Retire the given batch; its files and database records can be deleted. {{{ BOINC_SET_LEASE Result: NULL| }}} Set the "lease time" for a batch. After this time its files and database records can be deleted. {{{ RESULTS Result: # of completed commands result1 ... }}} If any commands have completed, return their results. Note: the GAHP protocol defines an "async mode" where the GAHP can notify the grid manager that a command has completed by sending "R\n". This is probably not worth doing since polling is very cheap. === Project selection and authentication === For the time being we'll do it this way: Each job submitter has a separate account on the BOINC project (these accounts can be assigned [MultiUser access rights and quotas]). The account has a private '''authenticator''' (a random string). The job submitter will create a configuration file containing * the URL of the BOINC project * the account authenticator The BOINC GAHP will read this configuration file at startup, and will handle requests using that account on that project. Note: we could generalize this a bit by including the project URL and authenticator as an argument to each GAHP request. == Data model == The BOINC GAHP uses BOINC's [RemoteInputFiles#Content-basedfilemanagement content-based file management system] to manage input files. In this system, files are stored on the BOINC server with names based on their MD5. This provides automatic file immutability It minimizes server disk usage and network transfer in cases where a given file is used by many jobs or batches. The BOINC database stores records associating files and batches; a file is deleted only when it is no longer associated with any batches. == Implementation notes == The BOINC GAHP handles BOINC_SUBMIT as follows: * Do an RPC to create a "batch" record * Make list of all input files; eliminate duplicates * Compute MD5s of files * Do an RPC to see which files are already on the BOINC server and create batch/file associations for these files (this avoids a race condition with the file cleanup daemon). * Do an RPC to copy needed files to BOINC server, and create batch/file associations for these files. * Do an RPC create jobs == Ways to deploy applications on BOINC == BOINC offers three "environments" in which applications can be deployed: * '''Native''': This requires making source-code modifications and building the app for different platforms, linking with the BOINC API library. * '''BOINC wrapper''': Requires apps to be built for different platforms, but no source code mods. * '''Virtual machine-based''': This would eliminate multi-platform issues but would require volunteer hosts to have VirtualBox installed.