Scheduler: alternance between multi-thread and single thread WU

Message boards : BOINC client : Scheduler: alternance between multi-thread and single thread WU
Message board moderation

To post messages, you must log in.

AuthorMessage
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 29334 - Posted: 10 Dec 2009, 3:58:06 UTC

Platform: WIN XP64 SP2, quad core CPU, GPU with CUDA available.

Hello,

Since I installed client 6.10.18, I notice problems with the scheduler. When the scheduler handles single-threaded WUs, almost no problem. When the scheduler have to handle multi-threaded WUs (ie AQUA), after some alternances between single and multi-threaded WUs, only ONE core is running for ALL projects.

May be more precise:
Client start: uses any WU available on ALL cores.
Rotation: multi-threaded WU's turn => one WU all cores.
Rotation: uses any WU available on ALL cores.
Rotation: multi-threaded WU's turn => one WU all cores.
.....
Rotation: uses any WU available on ONE core.
Rotation: multi-threaded WU's turn => one WU all cores.
Rotation: used any WU available on ONE core.

This is getting worst when a WU comes with a high-priority mode. In this particular case, only the high-prioritized WU runs, even if other WUs have near or missed deadlines and other cores are available...

Actually Yoyo/Evolution@Home has sent a WU with a deadline at 4 january 2010, in high-priority!!! All other projects are squeezed... Unacceptable... Especially on a multi-core and GPU machine... I mean: Evolution@Home may have miss-calibrated their deadline and this is bad. But the scheduler, having three other cores to play with should use them.

If I restart the client (running in service), the scheduler still takes in first the high-priority single threaded WU and doesn't launched any other core. If I suspend the task, this is the AQUA multi-threaded which starts, even if they are other WUs closer to their deadlines.

I have to suspend both high-priority (Evolution) and multi-thread (AQUA) WUs, then restart the client, in order to get back on a correct multi-core behavior. Then restart suspended tasks. Then falling after hours to the rotation bug: all cores / one core.

I read that some stuff are debugged on future 6.10.2x about high-priority mode. Is this case fitting the debug?

Regards
ID: 29334 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 29337 - Posted: 10 Dec 2009, 5:25:18 UTC - in response to Message 29334.  

Update to 6.10.24.

Before you start using it, use Notepad to make a cc_config.xml file in your BOINC Data directory.

Add into it these lines:
<cc_config>
<options>
<zero_debts>1</zero_debts>
</options>
</cc_config>


Save file, make sure it retains its .XML extension, not that it gets a .TXT or other extension added.

Now start BOINC (net start boinc).
Exit BOINC again (net stop boinc).

Edit the cc_config.xml file so it shows like this:
<cc_config>
<options>
<zero_debts>0</zero_debts>
</options>
</cc_config>

Save this file.

Restart BOINC (net start boinc).

You've just made sure that it has reset all its long term and short term debts.

As for the task from Yoyo. It may well be that it has a very wrong value for <rsc_fpops_est> and <rsc_fpops_bound>, which will throw it into high priority. There's not much BOINC can do about that, other than to run lots of these tasks and learn. Or you could report this fact of life to Yoyo and ask him to check his numbers.
ID: 29337 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 29340 - Posted: 10 Dec 2009, 9:48:13 UTC - in response to Message 29337.  

Hi,

Updated to 6.10.24, I added the line <zero_debts> into the CC_config.xml, because the file already exists. [We did have a thread about this file and <ncpus>-1</ncpus>, which should have been mis-writed by BoincView and interferes with number of GPU/CPU available.]

First restart, <zero_debts>1, same behavior, only the high-priority WU starts on ONE core.
Second restart, <zero_debts>0, seems ok, since all cores are used.

I will keep you informed after a day, see what happens after few rotations between single and multi-threaded WU.

Thanks a lot for you prompt answer!
Regards
ID: 29340 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 29469 - Posted: 15 Dec 2009, 12:04:10 UTC - in response to Message 29340.  

Hi,

I have finaly let passed more than one day to observe how the 6.10.24 behaves.
I installed it on the other windows boxes I got at home.
They are all monitored through BoincView since years, but only my box do have the cc_config.xml???

After four days running, the multi-threaded Aqua's WU has ended. I have no more multi-threaded WU, neither high-priority one. The Boinc's client use only one CPU but this is because it has only one task to compute!!! And especially the client do not ask for job anymore?!?!? No CPU job, no GPU job...

Even if I restart the client, it still doesn't ask or receive jobs. Has it something to do with the trick you pointed out with the zero_debts option? All my balance CT are at more than 60000secs!!!! How should I clear this? I presume it's not realistic at all. On another one, all balance CT counter are at -1800 secs???

The only box which ask, receive and compute jobs is the linux one with 6.6.36 client. On this one, balance CT counters are normally mixed between -500 to 500secs, most of projects have zero.

I tried to put aside the cc_config.xml file, then restart the client, in case it would behaves in a more "standard" (default) way, but the client doesn't ask for job either.

May be ALL my boxes are "overworked", but this is the first time I see a such behavior generalized on all boxes???

Normally, on the twenty projects in which I participate, there is always at least one project that has something to compute, or one of my boxes that isn't overworked and can compute... I do not understand this...

I will try to get back to version 6.10.18, may be 6.6.38, see what happens.

Thanx for any clue.
Regards
ID: 29469 · Report as offensive
David Matsumoto

Send message
Joined: 5 Jan 10
Posts: 1
United States
Message 30436 - Posted: 5 Jan 2010, 20:12:32 UTC

I've noted something similar, but not identical to what was described. If it is related, I'll try the recommended solution.

I also use Windows 7 x64 under 6.10.18. I have BOINC setup to use 85% of my 4 cores. Effectively this allows 3 full time CPUs and the GPU with the partial CPU to work when I'm offline.

The issue I've seen is similar and occurs with Aqua@Home as well. The system will run normally with 3 1-CPU tasks simultaneously, but if 1 or 2 tasks finish early and it happens to want to run the 3-CPU MT Aqua@Home next the 1 or 2 cores will simply remain unused until the last task is complete or it hits my 1.5 hour swap time.

I believe the expected behavior would be to run other 1 CPU tasks until the last one finishes its cycle vs. idling the CPUs. I understand this would probably be more complex since these "filler" tasks would not follow the normal rules for swapping.

Thank you for any help you may provide
ID: 30436 · Report as offensive

Message boards : BOINC client : Scheduler: alternance between multi-thread and single thread WU

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.