Irritating

Message boards : BOINC client : Irritating
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 2751 - Posted: 25 Jan 2006, 15:44:41 UTC
Last modified: 25 Jan 2006, 15:52:46 UTC

I was looking at one of my machines and noticed there seemed to be quite a lot of wu's, (LHC wu's), on there with a relatively short deadline, (4 days). This was due to a stupid mistake I made. Normally, that machine has it's "connect time" set to 0.5 days, I had been fiddling to get some Rosetta wu's and when I set it back, I put 5 days instead of .5 and the thing had downloaded great wodges of work from everywhere before I could stop it, most of which cleared sensibly.

Anyway, from experience, I know that a good number of LHC wu's go to 100% quite quickly, others take hours. I figured a good thing to do was to run each of these wu's for 10 minutes. If they were all still there, I could maybe bump the CPU quota for LHC for a while to get this cleared.

So I suspended the other projects, and let a wu run for ~10 minutes, the I suspended it and let the next run for ~10 and so on. At least 2 of the wu's did finish in that time so the problem was starting to look less serious, but I figured I may as well do them all.

Upon suspending a wu, all the remaining unstarted units suddenly failed with a computation error. The problem was that CreateProcess() failed to create a new process as the systems paging file was too small.

The thing is, this is not an error with the wu or application, it is an indication of a busy machine, and at best, one in need of a reboot, or page file bump. I would have thought this "computation error" should not crash out the wu's, rather, it should simply log the reason it failed to start as expected in the message log so corrective action could be taken.

Even if it was a new install, it is something that is possibly correctable, but on a machine that has been running for ages, BOINC knows that the machine is basically good, just busy.

BOINC core 5.2.6

25/01/06 16:27:47|LHC@home|Pausing result woct1_v6s4hvnom_mqx-oct1__16__64.209_59.219__10_12__6__80_1_sixvf_boinc42500_5 (left in memory)
25/01/06 16:27:47|LHC@home|CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af)
25/01/06 16:27:48|LHC@home|CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af)
25/01/06 16:27:48|LHC@home|CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af)
25/01/06 16:27:48|LHC@home|CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af)
25/01/06 16:27:49|LHC@home|CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af)
25/01/06 16:27:49|LHC@home|Unrecoverable error for result woct1_v6s4hvnom_mqx-oct1__16__64.21_59.22__8_10__6__70_1_sixvf_boinc42549_2 (CreateProcess() failed - The paging file is too small for this operation to complete. (0x5af))
25/01/06 16:27:49||request_reschedule_cpus: start failed
25/01/06 16:27:49|LHC@home|Computation for result woct1_v6s4hvnom_mqx-oct1__16__64.21_59.22__8_10__6__70_1_sixvf_boinc42549_2 finished


Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 2751 · Report as offensive
Jim K
Avatar

Send message
Joined: 8 Sep 05
Posts: 168
Message 2755 - Posted: 25 Jan 2006, 17:18:09 UTC

Not Boinc's error.....
BOINC Wiki
ID: 2755 · Report as offensive
Paul D. Buck

Send message
Joined: 29 Aug 05
Posts: 225
Message 2762 - Posted: 25 Jan 2006, 20:19:30 UTC

ID: 2762 · Report as offensive
Profile adrianxw
Avatar

Send message
Joined: 2 Oct 05
Posts: 400
Denmark
Message 2776 - Posted: 26 Jan 2006, 11:09:28 UTC
Last modified: 26 Jan 2006, 11:10:18 UTC

>>> Not Boinc's error.....

BOINC did not cause the run on the paging file, I did, I explained that above, it did, however, trash my remaining wu's by failing them on the process creation failure.

>>> The wiki is still your friend.

Tells me nothing I did not already know, and indeed explained above. As I said, the problem was caused when I tried to create an excessive number of client processes. Since they use large amounts of virtual memory, they exceeded the page file size.

My comment is that this situation should not be handled the way it is.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 2776 · Report as offensive

Message boards : BOINC client : Irritating

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.