What's the trick ?? Boincmgr handling GPU *AND* CPU WU's

Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU *AND* CPU WU's
Message board moderation

To post messages, you must log in.

AuthorMessage
seigell

Send message
Joined: 13 Nov 09
Posts: 3
United States
Message 28714 - Posted: 13 Nov 2009, 7:15:17 UTC

Great News - Boinc finally provides support for GPU as well as CPU WU's
Workflow - OK that Boinc requests GPU WU's for Projects that are CPU-only (rationale: to ensure catching GPU if/when that Project supports it)

BUT... What's the trick to reach a balance which prompts for GPU occasionally but then reverts to CPU often enough to not STARVE for WU's of that Project ??

Surely the CPU-only Projects are frustrated by the "more than trivial" frequency of GPU requests. It causes incremental additional network traffic (at a minimum).

At a MAXIMUM, it buries those Projects when users such as I find No Recourse but to manually manage the WU Queue by brute force use of Update and even Reset Project for the sole reason to "get some / any work" !!

I've just endured yet another bout where 2 separate Projects were asked repeatedly for GPU for hours on end without once asking for CPU.

What's the Trick ?? There's got to be one...
ID: 28714 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 28722 - Posted: 13 Nov 2009, 14:25:39 UTC - in response to Message 28714.  

The GPU work request will back-off further and further until it's only done once every 24 hours. It'll be reset when you press Update. As far as I know, it will not reset when you do Advanced->Do network communications on a running count-down to next try to contact the project.

And else, just stop hammering the Update button. Attach to other projects that do have lots of work.
ID: 28722 · Report as offensive
seigell

Send message
Joined: 13 Nov 09
Posts: 3
United States
Message 28865 - Posted: 19 Nov 2009, 15:34:38 UTC - in response to Message 28722.  

I've followed your advice, and remained "hands-off" for the past several days.

And, again, the CPU-only Projects have "starved"...

I run 8 Projects (all-time 3-core + GPU; addl work buffer=2):
GPU:
GPUGRID - Project imposed limit 2 WUs
SETI(cuda&CPU)
CPU:
Einstein
Rosetta
malaria
WCG - multi-projects
CPDN - long-term WUs always present
MilkyWay - on hiatus for server rebuild

Result:
GPUGRID constantly has it 2(max) WUs - refilled soon after reporting
SETI goes to town - lots of WUs w/ 0.5-2.5hr est completion times (10-25+)
CPU Projects did start asking for CPU WUs - after 16-32hrs from startup - performing network requests only every 8-10hrs - usually getting 2-4hrs worth WUs each time
This means that while I'm making nominal / halting progress on the CPU projects, I'm rolling thru the GPU projects and racking up good credits. And CPDN is benefiting in that the WUs due for mid-2010 will be completed within the next 2 weeks.

But this isn't what I want !! I joined these projects to participate in all of them !! Prior versions of BOINC - 6.5.x - made this happen with little effort. Now with 6.6.x and 6.10.x, I'm feeling the need to babysit and fight BOINC in order to participate and contribute evenly.

There has to be a Better Trick...
ID: 28865 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 28867 - Posted: 19 Nov 2009, 16:17:04 UTC

This is a known problem, to do with Long Term Debt (or work fetch priority, as it is now known). If you look at the "properties" page for each of your CPU projects, you will probably find that each of them has a work fetch priority below -86,400 (seconds, = 1 day), and has the additional annotation "(overworked)". Under these circumstances, the projects are only contacted for a 'refill' when the work remaining in your cache drops below whatever setting you have for "Computer is connected to the Internet about every " - usually 0.1 days (2.4 hours), unless you've changed it.

One cause of this problem has been identified, and the fix should be in the new version 6.10.19, which I started testing about 15 minutes ago. This is one of the things I was going to be checking anyway, so I'll report here what I find out - though in the nature of things, these several-day cache settings take - well, several days - to thoroughly test.

In the meantime, be assured that current BOINC clients will indeed keep all your CPU cores busy, though it sometimes seems as if they leave it until the last minute before refilling.
ID: 28867 · Report as offensive
seigell

Send message
Joined: 13 Nov 09
Posts: 3
United States
Message 28883 - Posted: 20 Nov 2009, 6:02:45 UTC - in response to Message 28867.  

Correct: "work fetch priority" is below (more negative than) -86400 for each CPU project (except MilkyWay - whose servers are currently not providing any work units for the past week+).

"Connect to network every" is set to 0.00 days (always connected - highspeed cable)

I'll be quite interested in the results of the QA you are undertaking...
ID: 28883 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 29470 - Posted: 15 Dec 2009, 14:02:34 UTC - in response to Message 28867.  
Last modified: 15 Dec 2009, 14:04:54 UTC

Hi,

I've started a thread about multi-threaded and single-threaded wu, but it is now going into a scheduler problem between CPU and GPU job. I've also read some other threads talking about this to get some clues.

I completely agree with Seigell when he/she (!) notices that it's now (since 6.10.x) heavy duty to control Boinc client so that it behaves as the preferences are set. I'm fore fully agree also with the term "babysitting" Boinc.

In addition, it seems that we could (should?) put our hands in the file cc_config to have better control, but this may interfere with the (BAM) preferences...

In the meantime, be assured that current BOINC clients will indeed keep all your CPU cores busy
No. It doesn't... Now each minute, the 6.10.18/24 client send a "Sending scheduler request: to fetch work" / "Not reporting or requesting task". But in fact, it never ask for new job... Ageless has already explain the trick with the zero_debts option to clear things, but my client doesn't ask for job anymore, whatever I put aside the cc_config file or the global prefs file.

I'm in no position to argue at a technical level. Because I don't want to become a Boinc guru level #32768 to manage my hosts... For years now, I've played the grinchy old user in forums when I face project which ask me (or force me) to become alternately a guru of computer or a project babysitter...

None of this is welcome. Boinc was and should remain a simple tool to use, even if what's underneath becomes heavily complex. But as it has been said before in the forum, many users don't know a clue of computer's inside and don't want to play with that. They just want to participate in something brillant for mankind or science. Remember that SETI started this as a "funny" screensaver!!! ;-)

My perfect wish of a perfect use is that once I've been warned that I should set my work preferences ONE time at my first subscribing to Boinc network, my wait is that all projects follow these preferences without babysitt ALL my hosts/clients. If I set that I'm agree to participate in both CPU and GPU projects, I shouldn't have to monitor if I'm receiving well GPU WU or CPU ones. No more should I monitor the client when it receives WU multi-threaded or high priority, and become "anxious" at the next rotation of WUs to see if all cores are used...

I'm not angry. I could be patient and very regarding about the heavy and nice job done by Boinc's dev teams and waiting for the next version. Until there, just now, what simple move could I perform to have a "x86_64bits quad cores CPU and CUDA compliant GPU" client asking, receiving and computing 24h/24 7d/7 on ALL available cores WITHOUT rummage into the config files, WITHOUT babysitt my debts and especially WITHOUT going into each project preferences to choose CPU/GPU?!

Regards
ID: 29470 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 29486 - Posted: 16 Dec 2009, 1:13:15 UTC - in response to Message 29470.  

Following this message, that is, de-installing completely Boinc, emptying the data and config files, the client is now functionning as it should:
"requesting new tasks for CPU and GPU"

When the client is not following user's preferences (BAM or local), there is definetively something wrong somewhere in:
cc_config.xml
global_prefs.xml
client_state.xml
etc

Whatever you do with zero_debt, resetting prefs, going back to older version etc, somewhere in the config fileS, something takes precedence over preferences, this is a bug.
ID: 29486 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 30263 - Posted: 19 Dec 2009, 14:21:42 UTC - in response to Message 29486.  
Last modified: 19 Dec 2009, 14:22:53 UTC

I don't know if the following example could lead to something but:

  • I was able to retrieve and re-install a 5.10.45 x86_64 client. Once done, everything went fine because this client has resetted the CT/LT balances to acceptable values. Each project was now requesting an amount of seconds of work AND GET JOB! This client don't handle GPU, no matter.
  • Now let's install the 6.6.38 x86_64 client over the 5.10.45. It starts to ask for GPU job for each project, but no more CPU job is asked. After 24h of working, no more job AT ALL is downloaded, only the remaining tasks previously downloaded compute with alternance.
  • Let's install the 6.10.24 x86_64 last client over the 6.6.38. It doesn't change anything: no more job asked, no more job got.


Let's erase all client binaries and config files, but always keep projects file.


  • Let's install again the 5.10.45 x86_64. Same pattern: Each project is now requesting an amount of seconds of work AND GET JOB! (Edges, Malaria, Ibercivis, WCG, etc, have always some jobs to compute)
  • Let's install the 6.10.24 over the 5.10.45. After 24h, the client is still asking and getting jobs.


May be, I step too fast into conclusion, but it seems that there is something really wrong in the 6.6.x trunk which corrupt the config files. The next client generation being not able to correct this and follows the previous config files corruption.

This is corroborated on the others Windows hosts. Too of them were upgraded directly from 5.10.x to 6.10.24, these ones ask and receive jobs. The another one from 6.6.36 to 6.10.24, this last one only ask for GPU jobs.

In order to be very precise: only one of my windows hosts is running 24/24h. The three other ones describe above are shutdown many times a day. I don't know if this is revelant, but I may guess that the scheduler could be more "in pain" running 24/24h, than running a few hours and being resetted by a shutdown.

Exception which confirms the rule: 6.6.36 Linux (Mandriva) client has no scheduler problem at all and receives well jobs 24/24h 7/7d, but hasn't any CUDA/ATI graphic controler.

To be continued!

ID: 30263 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15490
Netherlands
Message 30265 - Posted: 19 Dec 2009, 14:35:38 UTC - in response to Message 30263.  

.. but it seems that there is something really wrong in the 6.6.x trunk which corrupt the config files.

The way that debt is calculated changed between BOINC 5 and BOINC 6.6, so you would have to reset all projects after you update to 6.6 to get a more correct view on things. The latest BOINC 6.10 versions are now changing the way that the short term debt (needed for fetching work) is calculated and used.
ID: 30265 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 30267 - Posted: 19 Dec 2009, 15:43:47 UTC - in response to Message 30265.  
Last modified: 19 Dec 2009, 15:44:38 UTC

Hello Ageless,

Thx for the precision.
The way that debt is calculated changed between BOINC 5 and BOINC 6.6, so you would have to reset all projects after you update to 6.6 to get a more correct view on things.

I'm aware of that. Numerous threads treating about the (almost same) problem. And I've read many before posting myself. But it doesn't work... This has to be more complicated than that.

The Windows host on which I updated from 6.6.x to 6.10.x, I already performed the zero_debts trick you gave me earlier, AND resetting some projects. Still, Malaria for example, keep going with this "Message from server: CPU app exists for malariacontrol.net but no CPU work requested".

I let this host finishing the heavy Aqua WU it has to do. I will consider re-install this one from scratch with a 6.10.24. Mean erase everything prior to the new install.
ID: 30267 · Report as offensive

Message boards : BOINC client : What's the trick ?? Boincmgr handling GPU *AND* CPU WU's

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.