Feature suggestion: staggered start of VM jobs on slow storage

Message boards : BOINC client : Feature suggestion: staggered start of VM jobs on slow storage
Message board moderation

To post messages, you must log in.

AuthorMessage
Pavel Hanak

Send message
Joined: 17 Apr 13
Posts: 10
Czech Republic
Message 77095 - Posted: 1 Apr 2017, 22:10:24 UTC

Hi all, I've encountered a problem when many VM jobs are trying to start at once. Currently, it is most noticeable with LHCb app from the LHC@home project, but I remember I encountered it with ATLAS@home too, back when it was a standalone project. I tested this on a machine with 24 GB RAM which can safely run up to 8 LHCb jobs without the risk out running out of memory.

The problem occurs when several VM jobs trying to start on drives with slow write speeds. When a new job starts, the client copies the VM image (*.vdi file) to appropriate "slot" directory. Some of the images are rather big, over 4 GB in the case of LHCb app. When several such apps start at once, it takes quite some time to copy and start all the VMs. In such situations, the VMs are prone to hang up - they are listed as "running" in the manager, but the elapsed time always shows "---", even after an hour or so. I had to manually abort all such hanged jobs.

I've actually tested this rather extensively. For about 6 years now, I've used a small (40 GB) separate SSD drive for BOINC. It was rather slow, too - write speed of about 35 MB/s. It worked fine for non-VM jobs and occassional ATLAS ones, but couldn't cope with LHCb load - about 1/10 of the tasks hanged like I described above. I had a hunch it may be a storage problem when I noticed the HDD LED was continuously lit for 3 minutes or more whenever new LHCb jobs started. To further test this, I moved BOINC directory to (almost) new 4 TB mechanical drive (around 130 MB/s write speed). The problem got much worse - apart from significantly larger probability of hang-ups, the drive developed extreme fragmentation after a few days. Some of the *.vdi files had 1000 fragments or more - the result of their simultaneous copying. In the end, I had to limit number of LHCb tasks to 2 to avoid the hang-ups. Additionally, I've noticed that partially done jobs sometimes failed after boot or when suspending/resuming computation.

After seeing this, I bought a new, very fast SSD drive for BOINC (around 500 MB/s sustained write speeds). I've had it for a few days now and I haven't seen any hanged LHCb jobs since them, even though up to 8 of them run simultaneously. But of course, I guess not everyone is so devoted as to actually buy new hardware when BOINC needs it.

So my proposal is to add "staggered start" functionality for multiple VM jobs, similar to "staggered spin-up" which is used in large disk drive arrays. The staggered VM start should also kick in after all reboots, when resuming computation etc. The user has to be able to define time delay between individual VMs - it can be seconds for fast SSDs or minutes for slow mechanical drives. I guess the ideal solution would be to start next VM when the previous one is already running, but IMO this would bring in many additional problems (like what to do when the previous VM fails to start for other reasons).

I guess I should also note that I tried to simulate this "staggered start" with a script (which I run manually). I used the client command line interface and app_config.xml file. The script did following:
1. Edit the app_config file to increase the number of allowed LHCb jobs.
2. Make the client re-read config files with --read_cc_config.
3. Wait predetermined amount of time.
4. Repeat until all 8 LHCb jobs are running.
I seemed to work fine even on the 4 TB mechanical drive (no hang-ups) and most importantly it prevented the terrible fragmentation. But I must admit I soon got tired of running it manually whenever I needed to suspend/resume computation...
ID: 77095 · Report as offensive
SekeRob
Volunteer tester
Help desk expert

Send message
Joined: 25 Aug 06
Posts: 1547
Message 77106 - Posted: 2 Apr 2017, 10:56:45 UTC - in response to Message 77095.  
Last modified: 2 Apr 2017, 10:57:38 UTC

Years ago this, staggered starting, was requested by kneed of the WCG project relating to Clean Energy. There still exists a ticket for this in github.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 77106 · Report as offensive
ChristianB
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 305
Germany
Message 77108 - Posted: 2 Apr 2017, 15:38:00 UTC

This was already implemented into vboxwrapper. It might need to be adjusted to todays computers where you want to run >4 VMs at the same time.
ID: 77108 · Report as offensive
Pavel Hanak

Send message
Joined: 17 Apr 13
Posts: 10
Czech Republic
Message 77111 - Posted: 2 Apr 2017, 20:29:41 UTC - in response to Message 77108.  

Interesting. I tried to google it, but I couldn't find any way to enable it. Where I could find some relevant information?
ID: 77111 · Report as offensive
ChristianB
Volunteer moderator
Volunteer developer
Volunteer tester

Send message
Joined: 4 Jul 12
Posts: 305
Germany
Message 77123 - Posted: 3 Apr 2017, 9:48:45 UTC

This is enabled by default since vboxwrapper 26077 and is called random_checkpoint_factor. Apparently this does not apply at startup although I thought that it did in the past. It should be easy to build in a sleep command with this random offset before the I/O intensive part begins. This would be a Vbox only change that can be deployed by VM projects immediately.

The issue SekeRob is refering to (https://github.com/BOINC/boinc/issues/1293) is trying to fix the problem for all applications by a change in the client which will only be available in a future Client version if someone implements it.
ID: 77123 · Report as offensive

Message boards : BOINC client : Feature suggestion: staggered start of VM jobs on slow storage

Copyright © 2017 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.