Message boards : Questions and problems : BOINC starving while waiting for memory
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Mar 06 Posts: 73 |
![]() This is a curious situation. As you can see, the high-priority RNA WU is waiting for memory to run, but all the other projects are suspended, probably to make room for the RNA WU, but since they are suspended in memory, this is probably a dead-lock. Perhaps when the NCI WUProp WU ends the conditions to run the RNA WU will be met. But if they aren't, what then? Still, it's kind of bizarre to suspend almost all projects for a possibly dead-locked situation. Please, advise. TIA |
Send message Joined: 10 Mar 06 Posts: 73 |
Suspending the RNA WU lets other WUs run. Resuming it again lets the other WUs to continue to run. |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
What kind of system is this on? How much memory? Amount of CPUs BOINC can use? Any GPUs? Which Linux? Resource shares? Any multi-threading applications? Anything I forgot to ask? ;-) |
Send message Joined: 10 Mar 06 Posts: 73 |
|
Send message Joined: 10 Mar 06 Posts: 73 |
As BOINC got back to the situation in the previous image, suspending the NCI WUProp WU lets the RNA WU resume. Then, resuming the WUProp WU lets the RNA WU running and the NFS WU resumes. |
Send message Joined: 10 Mar 06 Posts: 73 |
Anything I forgot to ask? ;-) I'm Pisces. :-) |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
Those RNA tasks, how much memory do they want? With only 1 GB in the system, you may have trouble. Does that task stay in memory while waiting for more memory? How much memory do any of the tasks want when they run? |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
Anything I forgot to ask? ;-) LOL, thanks, already got one of those. ;-) |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
I had a quick chat with one of the developers, and he thinks it's a bug. So sending mail to all developers. In the mean time, can you post a log with only <cpu_sched_debug> activated? Then one with only <rr_simulation> And one with only <sched_op_debug> Thanks. |
Send message Joined: 10 Mar 06 Posts: 73 |
In the mean time, can you post a log with only <cpu_sched_debug> activated? Here's the log with <mem_usage_debug> too when I performed the actions in http://bit.ly/YsPstm: http://pastebin.com/zUvwM5xv. HTH |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
http://boinc.berkeley.edu/trac/changeset/4323afee1fcde44055dc35d03aefbdfae84fd220/boinc-v2: client: task schedule tweak to avoid starvation case Do you still build your own BOINC versions? You may want to git (yes, pun) the latest version of BOINC, build it and see if that fixes your situation. Of course, we thank you for bringing it to the front. :-) |
Send message Joined: 10 Mar 06 Posts: 73 |
This patch seems to minimize the case when there's limited memory. However, what I see now is that the RNA WU is legitimately suspended while the WUProp WU is running alongside two other WUs of other projects: ![]() Also, pausing the WUProp WU does nothing, the RNA WU remains suspended and the other WUs continue running, as expected. But shouldn't this situation, when the RNA WU is suspended due to low memory, lead to a WU from some project being fetched since there are 3 processors for 2 CI WUs and 1 NCI WU? How can I help? TIA |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
Try the simulator: http://boinc.berkeley.edu/dev/sim_web.php By the way, the NCI doesn't use a CPU core. It'll run always, even if there's enough work to fill all cores. So e.g. on a 4 core CPU, you can have 4 CPU intensive tasks and 1 non-CPU intensive task running at the same time. |
Send message Joined: 10 Mar 06 Posts: 73 |
|
Send message Joined: 10 Mar 06 Posts: 73 |
By the way, the NCI doesn't use a CPU core. It'll run always, even if there's enough work to fill all cores. So e.g. on a 4 core CPU, you can have 4 CPU intensive tasks and 1 non-CPU intensive task running at the same time. Precisely, only 2 of the 3 available processors are being used by CI WUs. Shouldn't BPOINC try to put that free processor to good use or does the fact that the RNA WU is impeding it? TIA |
Send message Joined: 2 Dec 06 Posts: 69 ![]() |
When I looked at your RNA WU, it says that estimated runtime on reference system 11w 2d 22h 12m 14s (6905534.7950536 s) That's 11 weeks of runtime and RNA doesn't checkpoint!!! Read the following thread in the RNA message boards: checkpoints for long WU The recommended method of running these RNA WUs seems to be to run a Virtualbox virtual machine (20 GB virtual HDD and 1 GB memory + 2 GB memory per cpu core on the virtual machine so for a dual core machine you're talking 5 GB memory allocated for the VM). The virtual machine will need to run on a 64 bit OS. Then you snapshot the virtual machine regularly and restart it from the latest virtual machine snapshot each time you restart the physical machine. David David Ball ![]() |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
Shouldn't BOINC try to put that free processor to good use or does the fact that the RNA WU is impeding it? According to the developers, it should be used for another task. Flagging your thread for David again. |
![]() Send message Joined: 29 Aug 05 Posts: 15599 ![]() |
David Anderson wrote: I'll fix this problem the next time I revise the job scheduling logic (should be in 2-3 months) Not the answer you wanted, but it's being worked on nonetheless. |
Send message Joined: 10 Mar 06 Posts: 73 |
It is the answer I wanted, just not soon enough. ;-) |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.