Version 7 (modified by dweir, 5 years ago) (diff)


Running apps in virtual machines

This page documents efforts and methods to run apps within virtual machines, with a view to removing the dependence on the host operating system.

Previous attempts

See the LHC@Home Twiki for previous efforts within the context of LHC@Home.


  • Ben Segal's early implementation suggestion is attached to this page.
  • David Weir's proposal was:

I think, though, that the exact way in which we do this is strongly dependent on the "target audience" -- is it still the porting of applications? I'm aware that what follows is essentially a rewrite of the implementation you suggested to me a month or two back, but as I have become familiar with the capabilities of the VIX API I've been able to flesh out some of the details.

Assumptions: no "inner BOINC client", inner application does not link against BOINC libraries, inner application is small with very few dependencies not included in a basic virtual machine. No assumptions are made about whether the inner application would need access to the network, but the wrapper XML standard could be extended to include a field indicating whether this is necessary.

  1. We deploy a virtual machine image over BOINC, with the VMWare Tools installed. This need only be done once for our hypothetical project, with different programs being sent as part of the workunit.
  1. The VMWare-aware wrapper powers up the virtual machine, checks for a snapshot relevant to the current workunit with VixVM_GetNamedSnapshot() and VixVM_RevertToSnapshot() (this is our checkpointing recovery step). If so, we skip to stage 4.
  1. For a given workunit, the wrapper XML standard is then used to send a package containing a non-BOINCified program and dependencies (suppose a RPM for the CernVM). This is installed using a call to !VixVM_RunProgramInGuest() by the VMWare-aware wrapper. This is then also used to start the program running.
  1. (main loop). The wrapper polls the process handle returned by RunProgramInGuest() to see if the workunit has finished. It also calls boinc_time_to_checkpoint(); then runs VixVM_CreateSnapshot() to create a snapshot as a checkpoint, if necessary.
  1. Upon successful completion of the inner job, we must copy out the results, uninstall the workunit package in the VM and move the checkpointed snapshot. The results are then sent back to the BOINC server. We can get the CPU time easily inside the virtual machine by calling the executable through time(1), say, but it may turn out to be more reasonable to use the cputime estimate made by the core client itself.

Note: For issuing partial credit on long processes we could require the inner program to update a file inside the VM (say /tmp/creditreport) giving the number of floating point and integer operations done so far. This could then be polled in step 4 (the main loop), and the results returned to the core client. The same thing would be necessary for reporting the fraction done. We could even re-implement fraction_done() and ops_cumulative(), writing a "fake" BOINC library to be used when compiling projects to run inside virtual machines.

  • Daniel's proposal?


  • If we specialise to VMWare Server 1, we do not gain access to all the API calls suggested in "David's proposal" above. The snapshot calls are not available until VMWare Server 2, which unfortunately does not allow anonymous logins to the local machine.
  • Much as we would like to use the latest possible release of VMWare server, if we require the user's login details we would not be able to BOINCify the VMWare calls because the user would have to provide the correct username and password for the VMWare server. This would (at least) require interactive input to the wrapper (user provides login to VMWare server), and (most probably) changes to the BOINC infrastructure. The latest VIX reference page for !VixHost_Connect() mentions logging in anonymously as the current user on the current host, so it is possible that this is simply an oversight in the current release of VMWare Server.

Outstanding issues

  • Do we grant credit based on virtualised CPU time, or wrapper CPU time, or something else?