BOINC caused small catastrophe

Message boards : Questions and problems : BOINC caused small catastrophe
Message board moderation

To post messages, you must log in.

AuthorMessage
Paul Schauble

Send message
Joined: 29 Aug 05
Posts: 68
Message 71877 - Posted: 25 Aug 2016, 11:32:05 UTC

I just had BOINC case a small catastrophe. I'll describe what happened and hope that the behavior can be changed.

The main machine on which I run BOINC has two problems that have required workarounds. The machine has an AMD Phenom II CPU and runs Windows 10 64 bit.

1. The machine is the main machine for backups. And, due to some hardware glitch, it is not possible to run GPU computation while doing a backup. If yo do, the SCSI controller will crash, requiring a power off reboot and destroying the data on whatever tape is in use.

The workaround, of course, is to set BOINC to suspend GPU use when the backup program is running. Problem solved.

2. I'm in Phoenix, where summer temperatures of 115F or so are not uncommon. The air conditioning cannot keep up. So in the summer I want to limit hours that GPU computations run to exclude the hottest part of the day. The workaround here is to suspend GPU in BOINC and activate a Windows scheduled task that at the right time of day runs the command
"C:\Program Files\BOINC\boinccmd.exe --set_gpu_mode always 43200"
to restart GPU computation for a time. This has also been working fine.

This evening, my backup hung and as I discovered, ruined the tape set being updated. A quick look at BOINC showed that it was running GPU tasks, despite the running backup program being set as a GPU exclusive program.

I can only speculate what happened, but here goes. Clearly the set_gpu_mode command overrides the menu setting to suspend GPU computations exactly as documented. It appears that his command ALSO overrides the specification of a gpu exclusive application. Otherwise, why was BOINC running GPU tasks while the backup program was running?

If I'm right, I think this should be fixed. If nothing else, the set_gpu_mode is not documented to override "suspend GPU when.." settings and I think it should not.

Thanks for listening,
++PLS
ID: 71877 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 71879 - Posted: 25 Aug 2016, 12:00:42 UTC - in response to Message 71877.  

It would be helpful to post segments of the event logs which cover significant time events during that process - the switches between GPU use suspended and permitted, and the time when the backup program started to run. That might help to narrow down which event went wrong, and either caused GPU processing to restart, or didn't stop it when it should have.

Since you've probably rebooted the machine since then, you would have to recover the logs from the file 'stdoutdae.txt' in your BOINC data directory.

I have to say the the combination of

* SCSI tape backup server
* Windows 10
* BOINC GPU computation

sounds ambitious to me, and is possibly pushing the boundaries a bit hard.
ID: 71879 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 71880 - Posted: 25 Aug 2016, 14:16:52 UTC - in response to Message 71877.  

The workaround here is to suspend GPU in BOINC and activate a Windows scheduled task that at the right time of day runs the command
"C:\Program Files\BOINC\boinccmd.exe --set_gpu_mode always 43200"
to restart GPU computation for a time.

"Always" in this case is akin to "Run Always", not "Run based on preferences". So this setting ignores any other preferences you have set, including the exclusive_gpu_app option you have set in cc_config.xml

Set it to "auto" to follow preferences.
ID: 71880 · Report as offensive
Paul Schauble

Send message
Joined: 29 Aug 05
Posts: 68
Message 71902 - Posted: 26 Aug 2016, 5:30:54 UTC - in response to Message 71879.  

Right now I'm rerunning the backup set that has its data corrupted. When that is done, I can reproduce the situation. I've copied the log file you mention, so we'll see.

Ambitious? Perhaps, but this machine is idle when not running backups. And the problem with the SCSI controller and GPU is not unique to BOINC. Windows 10 uses the GPU enough to cause problems if the machine is being actively used during a backup.
ID: 71902 · Report as offensive
Paul Schauble

Send message
Joined: 29 Aug 05
Posts: 68
Message 71903 - Posted: 26 Aug 2016, 5:32:02 UTC - in response to Message 71880.  

> Set it to "auto"...

Thanks, I'll try that. May I suggest updating the documentation to make this difference clear?

Thanks,
++PLS
ID: 71903 · Report as offensive

Message boards : Questions and problems : BOINC caused small catastrophe

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.