Thread 'massive work fetch bug in 7.0.25'

Message boards : BOINC client : massive work fetch bug in 7.0.25
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profilepschoefer
Avatar

Send message
Joined: 5 Aug 06
Posts: 59
Germany
Message 43342 - Posted: 9 Apr 2012, 21:33:51 UTC

Just switched to the new recommended version... and it's definitely nothing I would recommend. ;)

My system: i7 980X + ATI HD 5850 + NVIDIA GTX 470
Active projects:
- PrimeGrid (only on NVIDIA)
- Collatz Conjecture (only on ATI, so ATI is currently idle because of downtime)
- WUProp@Home
- EDGeS@Home (CPU; Resource Share 0, because I used it as backup project during last PrimeGrid Challenge and did not change it back afterwards)

Work buffer settings:
<work_buf_min_days>0.0000000</work_buf_min_days>
<work_buf_additional_days>0.0100000</work_buf_additional_days>

While PG on NVIDIA and WUProp (nci) run nicely, the client keeps requesting EDGeS work for CPU and ATI... of course, it can't get work for ATI there, but it gets one more CPU WU on each request. 320 WUs and counting, about 12 hours of work according to BOINC's own estimation... that's 0.5 days, way more than 0.01 days.

At least I did a backup before the update, so I can switch back to 6.12.34 for now.
ID: 43342 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43356 - Posted: 9 Apr 2012, 22:29:20 UTC - in response to Message 43342.  
Last modified: 10 Apr 2012, 9:17:24 UTC

Running away from something like this isn't going to help anyone. So, if you would please be so kind as to:

Make a [trac]Wiki:Client_configuration[/trac] (cc_config.xml) file and add into it:
<cc_config>
<log_flags>
<cpu_sched_debug>1</cpu_sched_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
</cc_config>


Save it in your BOINC data directory.
Let BOINC know about it, by either exiting BOINC & restarting it, or opening BOINC Manager->Advanced view->Advanced->Read config file.
The output will be in the Event Log. Let it output for at least 5 minutes, then copy the part of the log with these debug messages in them, and stop BOINC.

Remove the old cc_config.xml file's lines, and in their place put the following:
<cc_config>
<log_flags>
<rr_simulation>1</rr_simulation>
</log_flags>
</cc_config>


Run this in the same manner as described above.
After 5 minutes, copy that part of the log, add it to the previous part (like in Notepad), and send this file to David Anderson with a description of what you think is wrong, how it should work and where in the logs it isn't. You can find his address here.

As for your question whether or not we tested this version, of course we did. The developers do not chuck untested applications in your lap, however, we can't test every possible combination of hardware + projects that exist, as then no BOINC version will ever be released. Also of note, it's very possible that this is a project scheduler problem. It wouldn't be the first.

We're always on the lookout for alpha testers, so if you think you can do better than the rest of us, go to http://boinc.berkeley.edu/trac/wiki/AlphaInstructions and follow the instructions. It is that simple.
ID: 43356 · Report as offensive
ProfileTrog Dog
Avatar

Send message
Joined: 6 May 06
Posts: 287
Australia
Message 43371 - Posted: 10 Apr 2012, 11:15:45 UTC - in response to Message 43342.  

320 WUs and counting, about 12 hours of work according to BOINC's own estimation... that's 0.5 days, way more than 0.01 days.


I've been running the 7.0.x series on 4 boxes and I did notice that kind of behaviour on a couple of projects (predominantly newly added ones) but it sorted itself out - not guaranteeing that this will happen in your case, but let it run its course before abandoning it. I can't remember which projects it was but when it happened I was a bit concerned, particularly as boinc was going into "high priority" mode and it looked as though the wu's from other projects were going to suffer, as it was I think it's just the way the 7 series estimates runtime.
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1
ID: 43371 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43374 - Posted: 10 Apr 2012, 12:20:30 UTC

Yes, in general, for others reading here. Do know, that the scheduling code has been rewritten from the ground up. It will not go and fetch work, or schedule which projects to run, as previous versions did.

So rather than just run it for an hour, run it for 2 weeks, try to learn what it does differently. See how the new work fetch methods are now based on a low water mark and a high water mark. Find how BOINC won't immediately after uploading & reporting work ask for new work, but will only do so when it's past the low water mark.

E.g. if you wanted at least a day's worth of work:
In 6.12 and before, you'd set connect to interval to 0.01 and additional work to 1.0

In 7.0 you set minimum work buffer to 1.0 and max additional work buffer to 0.01
ID: 43374 · Report as offensive
RMc-Canada

Send message
Joined: 11 Jul 09
Posts: 18
Canada
Message 43462 - Posted: 13 Apr 2012, 9:57:31 UTC

well I'm sorry but I hate the new method.

I'm attached to ten projects, eight are active & the two are not, & I'll miss WU when they are available using this method.

So far after running 7.0.25 for a few days now it now only ever runs three of my eight active projects?!. & when it does fetch work it fetches a bunch of WU’s from only three projects.

so if I set it to -

"In 7.0 you set minimum work buffer to 1.0 and max additional work buffer to 0.01"

-it will run like it should?, like it used to?.

Thanks.
ID: 43462 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 43463 - Posted: 13 Apr 2012, 11:00:55 UTC - in response to Message 43462.  

so if I set it to -

"In 7.0 you set minimum work buffer to 1.0 and max additional work buffer to 0.01"

-it will run like it should?, like it used to?

What keeps you from trying it? ;-)

Gruß,
Gundolf
ID: 43463 · Report as offensive
RMc-Canada

Send message
Joined: 11 Jul 09
Posts: 18
Canada
Message 43468 - Posted: 13 Apr 2012, 14:23:57 UTC - in response to Message 43463.  

so if I set it to -

"In 7.0 you set minimum work buffer to 1.0 and max additional work buffer to 0.01"

-it will run like it should?, like it used to?

What keeps you from trying it? ;-)

Gruß,
Gundolf


Because I just run it?, I dont make it, let alone like to fool with it?. thats why I thought i'd ask first? & provide some feedback wile I was at it as a longtime user?. is that ok?.
ID: 43468 · Report as offensive
RMc-Canada

Send message
Joined: 11 Jul 09
Posts: 18
Canada
Message 43469 - Posted: 13 Apr 2012, 15:13:23 UTC - in response to Message 43468.  
Last modified: 13 Apr 2012, 15:15:32 UTC

well I set it to those numbers & now its even worse?!~.

Now its running only two of my eight active projects with 13 WU between them!?! LOL!

Thats just great! thanks alot...


Going to re-install 6.12.34 I guess.
ID: 43469 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 43470 - Posted: 13 Apr 2012, 16:53:51 UTC - in response to Message 43468.  

Because I just run it?, I dont make it, let alone like to fool with it?. thats why I thought i'd ask first? & provide some feedback wile I was at it as a longtime user?. is that ok?.

Sorry, I didn't want to criticise you, I just think that playing with some preferences isn't like fooling with the application.

It's never wrong to ask, but sometimes you get a faster response by just trying it.

And giving feedback is okay in any case. :-)

Gruß,
Gundolf
ID: 43470 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 43471 - Posted: 13 Apr 2012, 16:55:47 UTC - in response to Message 43469.  

Going to re-install 6.12.34 I guess.

And what about trying some other numbers and giving feedback again? ;-)

Perhaps that way you help to discover and remove some bugs.

Gruß,
Gundolf
ID: 43471 · Report as offensive
Deno

Send message
Joined: 15 Apr 12
Posts: 5
United States
Message 43527 - Posted: 15 Apr 2012, 21:29:11 UTC

Back in the early 6.x release the Min and Additional work buffer fields were set to ZERO so that projects would only get a single task - not multiple. Then you would have many tasks but only from individual projects. How can we do this in the new 7.0.25 version?

Thanks
ID: 43527 · Report as offensive
Brian Priebe

Send message
Joined: 15 Mar 10
Posts: 10
Canada
Message 43534 - Posted: 16 Apr 2012, 5:59:37 UTC - in response to Message 43374.  

Find how BOINC won't immediately after uploading & reporting work ask for new work...
How do the scheduler changes affect the reporting of completed WU's? 7.0.25 seems to keep completed WU's around a lot longer than 6.12.34 did. In the particular case of Milkyway@Home, 6.x always uploaded the results almost immediately (~1min after completion). 7.x lets the results sit there by the multiple dozens for an indeterminate period of time.
ID: 43534 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43535 - Posted: 16 Apr 2012, 8:28:49 UTC - in response to Message 43534.  

6.x always uploaded the results almost immediately (~1min after completion). 7.x lets the results sit there by the multiple dozens for an indeterminate period of time.

Yup, plus it will ignore the <report_results_immediately/> switch in cc_config.xml
The change is that it will try to do a work request at the same time as it's reporting work.

Only when a time-of-day change (CPU and Network) is happening in the next 30 minutes, will BOINC 7 report work immediately.

ID: 43535 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43536 - Posted: 16 Apr 2012, 8:40:59 UTC - in response to Message 43527.  
Last modified: 16 Apr 2012, 8:41:29 UTC

Back in the early 6.x release the Min and Additional work buffer fields were set to ZERO so that projects would only get a single task - not multiple. Then you would have many tasks but only from individual projects. How can we do this in the new 7.0.25 version?

I'd almost say guess... ;-)
But uhm, how about setting both to ZERO again?

The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

So if you only want 1 task for each CPU + GPU core, you set 0 + 0, which will fetch 1 second per hardware core.
ID: 43536 · Report as offensive
RMc-Canada

Send message
Joined: 11 Jul 09
Posts: 18
Canada
Message 43539 - Posted: 16 Apr 2012, 9:27:32 UTC

All I know is if I have X number of active projects then I want at least one WU running/waiting to run for each of those projects, just like 6.x did so well for years.


I install 7.x & it goes from that to only 4 of my 8 active projects having 4 WU each!?!...


Just because people use BOINC doesn't mean they are computer experts who understand all the jargon etc. or that they are comfortable fooling around with settings, if you know the dame answer then please just give it?.


Re-installed 6.12.34 after worthless babble for answers & nothing from ageless who I’d hoped for a simple straight answer, working perfect again now anyways, no thanks to this forum...
ID: 43539 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43542 - Posted: 16 Apr 2012, 9:57:01 UTC - in response to Message 43539.  
Last modified: 16 Apr 2012, 9:59:30 UTC

I have given you a straight and simple answer.

Want to have the difficult answer? As I posted here:
I have the same problem with version 7. Each task would complete, but BOINC Manager would not request new ones. I do not consider myself stupid, but the explanation of the parameters for minimum and maximum work buffers in the network preferences is VERY difficult to understand.

Is normal situation, for all.
The old BOINC would do work requests almost whenever it felt like it, but with BOINC 7.0 it'll wait until you reach the minimum buffer as set through the "Minimum work buffer" value. This does mean that when there's a couple of tasks in the list that have totally wrong estimates as how long they'll take, say some Seti Astropulse with an estimate of 145 hours, but they only run for 6... that BOINC can reach bottom with only as many tasks in cache as you have hardware (CPU + GPU) cores.

But anyway, the old BOINC did a "connect to" + additional work. The "connect to" was the interval with which BOINC would contact projects, while the additional work value was what extra cache you wanted. This didn't work too well, for a lot of reasons.

With BOINC 7.0 this has been changed to a Minimum work buffer + Maximum additional work buffer. Which do exactly what they say on the tin... at least when your BOINC does one project.

There's a bug in the software that sometimes rears its ugly head when more than 2 projects are attached and allowed to fetch work. As I understand it it's a difficult bug to track, and there's even the thing that when you leave BOINC well alone, eventually, after a week or what, it'll fix itself.

But you have to leave BOINC alone anyway, since this version of BOINC uses a totally new, from the ground up rewritten CPU and GPU scheduler plus work fetch module. These cannot be compared to BOINC 6's versions, they use their own values in the code and the program, which is one the reason why when you move over to 7.0 you can't easily return to 6.xx, and are the main reason why this BOINC reacts differently from previous versions.

BOINC 7 will have to learn all over again about the project's applications, their different run times, how different the estimates are from reality etc. This will take a week or more. Depends on how much your machine is on and when BOINC is allowed to do work on CPU and if applicable GPU.


I didn't program BOINC, yet when I tried to put an explanation on how this new BOINC works in the User Manual Wiki, I was called back by the developers who found this information too strenuous for the poor souls of the simple BOINC user, it would scare them away. And as such, hey presto, there's no explanation anywhere why BOINC 7.0 works the way that it does, other than in quite technical language.

Still want to lay blame in all the wrong places?
ID: 43542 · Report as offensive
Deno

Send message
Joined: 15 Apr 12
Posts: 5
United States
Message 43548 - Posted: 16 Apr 2012, 20:19:04 UTC - in response to Message 43536.  

Back in the early 6.x release the Min and Additional work buffer fields were set to ZERO so that projects would only get a single task - not multiple. Then you would have many tasks but only from individual projects. How can we do this in the new 7.0.25 version?

I'd almost say guess... ;-)
But uhm, how about setting both to ZERO again?

The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

So if you only want 1 task for each CPU + GPU core, you set 0 + 0, which will fetch 1 second per hardware core.



But by having ZERO in both settings in 6.x - allowed all my projects, that had a task to send, to get only a single task downloaded per project – so I ultimately had many tasks waiting but there was only one per project. Now the ZERO settings just gets one task at a time and if I up it – I get several tasks for the same project. How can I get only one task per project in 7.x release? This used to work…
ID: 43548 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 43549 - Posted: 16 Apr 2012, 20:35:33 UTC - in response to Message 43548.  

By allowing BOINC to run for several days with the zero and zero days setting. BOINC 7 has a new work fetch module and new separated CPU and GPU schedulers. It will have to learn anew about how you run things. It can't do that without starting with one task per hardware core per project.

It won't do it immediately, there is no way to force it. But your BOINC 6.10 or 6.12 didn't do this immediately either, it had to learn this as well.

So if you're just willing to let go of things, not look in BOINC Manager every moment of the day what it is doing, you will eventually see that it will do things the old way. Just one task per project per core, depending on resource share and REC (the new debt).
ID: 43549 · Report as offensive
Deno

Send message
Joined: 15 Apr 12
Posts: 5
United States
Message 43550 - Posted: 16 Apr 2012, 20:44:38 UTC - in response to Message 43549.  

By allowing BOINC to run for several days with the zero and zero days setting. BOINC 7 has a new work fetch module and new separated CPU and GPU schedulers. It will have to learn anew about how you run things. It can't do that without starting with one task per hardware core per project.

It won't do it immediately, there is no way to force it. But your BOINC 6.10 or 6.12 didn't do this immediately either, it had to learn this as well.

So if you're just willing to let go of things, not look in BOINC Manager every moment of the day what it is doing, you will eventually see that it will do things the old way. Just one task per project per core, depending on resource share and REC (the new debt).


Great!!! I guess as in most things: ”Patience is the Key”
Thanks for the quick response – I will just leave it alone...
Keep up the good work
ID: 43550 · Report as offensive
Brian Priebe

Send message
Joined: 15 Mar 10
Posts: 10
Canada
Message 43579 - Posted: 18 Apr 2012, 13:01:43 UTC - in response to Message 43535.  
Last modified: 18 Apr 2012, 13:02:14 UTC

The change is that it will try to do a work request at the same time as it's reporting work.
But when does it report results when it is NOT doing a work request? Does it wait until WU's available is less than "Minimum work buffer"? If so, setting "Max. Additional work buffer" to a number of days could presumably delay transmitting results for a very long time.
ID: 43579 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : massive work fetch bug in 7.0.25

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.