wiki:RemoteJob

Version 1 (modified by davea, 14 years ago) (diff)

--

Remote job submission

A group from Universitat Pompeu Fabra has developed a system for remote job submission and monitoring. This system allows scientists to submit jobs (or groups of jobs) from a convenient command-line interface.

The system (Perl-based) is in boinc/rboinc/.

Warning: this system has been used only by its developers. It will take some work to get it working on other projects.

Powerpoint slides describing the system are here.

Notes

The software should be fairly self-explanatory, but installation may be tricky. Here's a general overview

  • boinc_retrieve_server, boinc_submit_server run as cgi. The former, actually, also handles all administrative requests (stop, purge).
  • boinc_retrieve, boinc_submit, are the client components (ditto as above for admin requests)
  • Exchange of files between client and server is done through WEBDAV http extensions (a scratch area needs be setup for this)
  • Wus naming is important and enforced like this: NNN-UUU_GGG-XX-YY-RNDzzzz where
    • NN is the name of the workunit (sub-group)
    • UU is the submitter id
    • GGG is the group
    • XX is the current step in the chain
    • YY is the total n. of steps
    • zzzz is a random number (not needed,actually)
  • WUs are kept in a "workflow_directory", a subdir of the project dir, as per slide 22 of the Powerpoint.
  • Inside each dir a "process" bash file is created, which is executed by the assimilator with the name of the assimilated WU as its argument. It will create_work the next step for execution.
  • The main reason for using perl is that I preferred to use the XML::Simple module for (un-) xml-ing data structures over the network - it was useful for adding features on the fly keeping backwards compatibility
  • I implemented basic functions for authentication, but this is not finished yet
  • file storage is optimized through hardlinking and pooling. Network transfers are not (but they could be)