massive work fetch bug in 7.0.25

Message boards : BOINC client : massive work fetch bug in 7.0.25
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Ageless-Away
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 29 Aug 05
Posts: 13293
Netherlands
Message 43581 - Posted: 18 Apr 2012, 14:01:55 UTC - in response to Message 43579.  

No, the "24 hours after (one) task completion" rule still stands. So the minimum is 24 hours, or a work request, whichever comes first.

These are all the reasons when to report:
1) 24 hours before deadline.
2) Connect Every X before deadline.
3) 24 hours after task completion.
4) Immediately if the upload completes later than either 1, 2, or 3 upon completion of the task.
5) On a trickle up message.
6) On a trickle down request.
7) On a server scheduled connection. Used, but I am not certain by which project.
8) On a request for new work.
9) When the user pushes the update button.
10) On a request from an account manager.
11) Report immediately every task, if "No new Task" is set.
12) Report immediately if CPU or network time-of-day override is within the next 30 minutes.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored!
ID: 43581 · Report as offensive
Profile Trog Dog
Avatar

Send message
Joined: 6 May 06
Posts: 287
Australia
Message 43582 - Posted: 18 Apr 2012, 14:15:02 UTC - in response to Message 43579.  

The change is that it will try to do a work request at the same time as it's reporting work.
But when does it report results when it is NOT doing a work request? Does it wait until WU's available is less than "Minimum work buffer"? If so, setting "Max. Additional work buffer" to a number of days could presumably delay transmitting results for a very long time.


IME, it will still report tasks when it does its regular checkin with the project to meet expiring deadlines. eg

Sempron

1365	eon2	18/04/2012 9:34:36 PM	Sending scheduler request: Requested by project.	
1366	eon2	18/04/2012 9:34:36 PM	Reporting 8 completed tasks, not requesting new tasks	
1367	eon2	18/04/2012 9:34:39 PM	Scheduler request completed	


And (at least using boinctasks) it's possible to manually report tasks by updating a project or using the report all tasks menu item. I use this when I know I'm going to have downtime or when I see I have wu's that have errored out and I want to see the error codes and reports. - Reporting completed wu's will not cause boinc to request more work.
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1
ID: 43582 · Report as offensive
squeak
Avatar

Send message
Joined: 14 Jun 11
Posts: 15
Australia
Message 43594 - Posted: 19 Apr 2012, 2:25:23 UTC

I too have been frustrated by 7.0.25. Once I finally managed to get it installed, after having to locate the BOINC.msi file myself because the BOINC installer lost the reference to the folder (very sloppy), I found that things were different. It immediately decided that rosetta had to run in high priority mode. For some time I have running with the settings "Maintain enough tasks to keep busy for at least 3 days" and "... and up to an additional 4 days", mainly to be able to ride through the times which all projects seem to have whereby there are no WUs available, or the project is shut down for maintenance. At the time I put in 7.0.25, I had rosetta WUs amounting to about 8 hours, with deadlines about a week away, and my resource shares meant that Rosetta had a target resource share of 36%, so 8 hours of work in those conditions didn't seem to justify BOINC going into high priority mode. Anyway, I watched with interest. BOINC then went through my other projects one at a time clearing out all WUs and not asking for any more, until only CPDN was left. My CPDN WUs won't finish for weeks, and have deadlines well into 2013. I have tried updating the projects, resetting the projects, but to no avail.

Now I see the comment from "ageless" that BOINC 7 "will not go and fetch work, or schedule which projects to run, as previous versions did". I am staggered, as I thought that this was the whole raison d'etre for BOINC. I then see comments about "In 6.12 and before, you'd set connect to interval to x.xx and additional work to x.xx" but "In 7.0 you set minimum work buffer to 1.0 and max additional work buffer to 0.01". Now the "connect to..." and "additional work" settings are part of the user interface, and indeed I have been setting them as described above. However, there are no parts of the user interface relating to work buffers.
I see "ageless" also referring to the cc_config.xml file. I did a search across my hard drive and it eventually found such a file buried in part of my /Documents and Settings tree. The file hadn't been touched for 2 years, and had no entries related to work buffer settings.

I am not a novice, having been a developer/designer/architect for 40 years. Mind you I have always figured that developers put settings into a user interface because the intention was that these were the things that users should play with. Internal config files in obscure directories were put there precisely because they contained stuff that users should NOT be playing with. Or is the dominant logic here that in order to make BOINC work reliably you needed to know as much as the BOINC developers?

The release notes for 7.0.25 say ...
"The new scheduler observes the resource share setting better than the old scheduler.

Another change is the client will no longer attempt to get work right after completing a job. Instead it will wait until it drops below a threshold and then start asking around for work. You can change both the lower threshold and upper threshold by changing these preference settings:

'Maintain enough tasks to keep busy for at least' (lower threshold) and
'... and up to an additional' (upper threshold)"

Now this contradicts ageless, as it suggests that the user interface settings are the way to control things. I have those thresholds set but BOINC is not honouring them. Mind you the release notes do not clarify if the amount of work relates to an overall figure adding up all projects, or is on a per project basis. If it is an overall thing, then one project like CPDN will always have more than enough work, so that nothing else gets a look in. On the other hand, if it is supposed to apply separately to each project (as clearly most respondents expect), then it's certainly not actually working that way.

Also, the idea that BOINC won't necessarily report completed WUs until something else happens seems a litle strange. Some projects like to know when WUs are done, so they can cross them off. If thereare a bunch of BOINCs out there hiding their finished WUs until a convenient moment, then life in the projects can slow down considerably. Have the various projects signed off on this slowdown of results?

Lastly, a (possibly) unrelated issue. After adding the World Community Grid to my list of projects, I discovered that WCG is a pretty arrogant project. It retitled my BOINC, and despite doing a full uninstall and reinstall, BOINC still always comes up as "World Community Grid - BOINC". What I don't know if there are other little trojans left behind by WCG which may be stuffing up my BOINC environment. Anyone know how I go back to "vanilla" BOINC?

I'll climb off my soapbox now. :)

squeak
ID: 43594 · Report as offensive
squeak
Avatar

Send message
Joined: 14 Jun 11
Posts: 15
Australia
Message 43617 - Posted: 20 Apr 2012, 2:54:08 UTC

I've heard the suggestion that I was attacking ageless personally in my last post. So, firstly, let me apologise for causing offence. It was not intended as a personal attack on ageless or anyone else, but I was certainly concerned by some of the implications of the advice being provided.

The intent of the post was to point out what I believe were significant philosophical issues about user interface design and about the level of advice given to users as against developers or testers. Having managed teams of developers in the past, I understand well the temptation to resolve questions by tweaking internal settings or adjusting things which are not typically exposed to general users. However while these approaches are OK for developers and testers trying to identify the reasons for a tool's behaviour, they should not be mechanisms for users to control the behaviour of the tool. Their toolbox should be restricted to the user interface provided, which perhaps may need extension, but should not be bypassed.

Anyway, apologies again. I would like to see some response to the technical isues I've raised.

I'd also like to see my BOINC 7.0.25 ask my various projects for some work other than CPDN, which it is steadfastly refusing to do.
squeak
ID: 43617 · Report as offensive
Profile Ageless-Away
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 29 Aug 05
Posts: 13293
Netherlands
Message 43639 - Posted: 20 Apr 2012, 21:27:10 UTC
Last modified: 20 Apr 2012, 23:07:32 UTC

You don't ask why we do certain things the way we do them.

The developers have opted for hiding the majority of messages from view, since these were found to be threatening, worrisome, or perceived as all errors by people with no computer knowledge. However, if these people then come onto these forums, they --like you-- will say that there's a bug in BOINC.

We then ask to enable some of the debug flags that enable a lot more messages, and post a log about that. Through that more advanced log, those of us who can read that information overload, can see if there is a bug and forward that information plus (part of) that log to the developers.

If you keep on posting the way that you do, I see it as quite forcedly over-aggressive, I doubt there'll be anyone left who is interested in what you have to say or are trying to get help on. I sure ain't. I don't like your style.
Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored!
ID: 43639 · Report as offensive
Profile David Anderson
Volunteer moderator
Project administrator
Project developer
Avatar

Send message
Joined: 10 Sep 05
Posts: 655
Message 43641 - Posted: 20 Apr 2012, 23:25:48 UTC - in response to Message 43536.  


The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

Not quite. The meaning of the prefs is:

- The client requests work for a given resource when the amount of buffered work falls below min.
- It requests (from the highest-priority project) enough work to bring the amount up to min + additional.


ID: 43641 · Report as offensive
squeak
Avatar

Send message
Joined: 14 Jun 11
Posts: 15
Australia
Message 43665 - Posted: 22 Apr 2012, 7:23:11 UTC - in response to Message 43639.  

Thanks you, ageless, for the response. It goes some of the way to clarifying things for me. Nonetheless, I wasn't quibbling about the use of debug flags, I totally agree that they should be nowhere near a user interface. I was only concerned about the suggestions to put stuff in cc_config.xml which were relevant at the user interface. Now maybe (bearing in mind the nature of this particular message board, the people receiving this advice were developers and testers, and so were accustomed to dealing at that level. If so, mea culpa.

Anyway, since you mentioned debug flags, and since you had suggested to someone that they put the flags
<log_flags>
<cpu_sched_debug>1</cpu_sched_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
into cc_config.xml, I thought I'd try it and gather up the output.

Well, after 4 or 5 days of BOINC steadfastly refusing to fetch any work, putting in those debug flags caused it immediately to request work from all my other projects EXCEPT the one with the highest resource share setting, which in my case is rosetta. It generated a lot of output which I'm happy to send in, if anyone is interested, not sure whether an inline post is appropriate, what is the best way sending a file?
squeak
ID: 43665 · Report as offensive
AmigaForever

Send message
Joined: 14 Jun 11
Posts: 46
Germany
Message 43675 - Posted: 22 Apr 2012, 18:10:06 UTC - in response to Message 43641.  


The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

Not quite. The meaning of the prefs is:

- The client requests work for a given resource when the amount of buffered work falls below min.
- It requests (from the highest-priority project) enough work to bring the amount up to min + additional.



Thank you David for clearing things up.

BTW, is there a resource which explains all of the BOINC options as well as you did with these two? That would be very helpful.

Thanks!
ID: 43675 · Report as offensive
Profile Peter
Avatar

Send message
Joined: 7 Sep 09
Posts: 167
Canada
Message 43835 - Posted: 28 Apr 2012, 15:22:42 UTC

How low does that 'low water mark' have to go? I'm attached to 16 projects and although I realise there could be no work available at any of them, BOINC hasn't even tried to poll in 48 hours with NO WU's currently stored at all.
ID: 43835 · Report as offensive
Profile Ageless-Away
Volunteer moderator
Project administrator
Avatar

Send message
Joined: 29 Aug 05
Posts: 13293
Netherlands
Message 43837 - Posted: 28 Apr 2012, 16:27:31 UTC - in response to Message 43835.  

How low does that 'low water mark' have to go? I'm attached to 16 projects and although I realise there could be no work available at any of them, BOINC hasn't even tried to poll in 48 hours with NO WU's currently stored at all.


When reporting purported bugs in the program
You can tell us a whole story of how you feel there's a bug in the program, that things aren't working and such, but for all we know it's something you did yourself. Without a debug log we cannot help you and your post will go ignored.
The debug messages will write into the Event Log.

The main debug flags that we're normally interested in are:
: problems involving the choice of applications to run.
: problems involving work fetch (which projects are asked for work, and how much).
: problems involving jobs being run in high-priority mode.
: problems involving scheduler operations and other low level information.

Use these flags from the cc_config.xml file.

When posting a log, after you added the flags to the configuration file, restart BOINC, let it run for up to 5 minutes, then post the WHOLE log. Not just 2 or 4 lines.

When you do post what you think is a bug, post it in a thread of your own in the Q&P forum. Please do not use an existing thread. Please do not piggy-back on to another person's thread with logs. That gets very confusing if we're trying to get the developers to chime in.

There's space enough for your own thread, these forums don't use a quota, so there's really no excuse for you to feel the need to add to an existing thread.

With thanks for your help and understanding.


Jord
Please do not private message me for tech support. Use the forums for that. Tech PMs will be ignored!
ID: 43837 · Report as offensive
Profile Peter
Avatar

Send message
Joined: 7 Sep 09
Posts: 167
Canada
Message 43840 - Posted: 28 Apr 2012, 17:07:59 UTC

I did post my own thread in that forum. I was asking a simple question...I'll rephrase it, is it normal for BOINC to be empty of all work for 2 days and not once polll for new work?

It worked normally until the recent update and I have altered nothing from the default settings.

This is all that's in that config file:

- <cc_config>
<log_flags />
- <options>
<ignore_cuda_dev>0</ignore_cuda_dev>
</options>
</cc_config>
ID: 43840 · Report as offensive
Profile Peter
Avatar

Send message
Joined: 7 Sep 09
Posts: 167
Canada
Message 43846 - Posted: 28 Apr 2012, 20:11:22 UTC - in response to Message 43840.  
Last modified: 28 Apr 2012, 20:12:47 UTC

Problem solved by installing 7.0.26.

See here.
ID: 43846 · Report as offensive
squeak
Avatar

Send message
Joined: 14 Jun 11
Posts: 15
Australia
Message 43923 - Posted: 2 May 2012, 14:24:45 UTC

Hi. My 7.0.25 has been in for a while now, I managed previously to kick it into having work for all my projects, which worked fine for a while, but now all of the projects except CPDN have run down to zero WUs, and this has been the case for a couple of days now. I've just been leaving it alone, following the theory that if I leave it alone, it'll sort itself out. It doesn't seem to be doing that.

Here is my cc_config.xml
------snip----------------
<cc_config>
<options>
<client_version_check_url>http://www.worldcommunitygrid.org/download.php?xml=1</client_version_check_url>
<client_download_url>http://www.worldcommunitygrid.org/download.php</client_download_url>
<network_test_url>http://www.ibm.com/</network_test_url>
<start_delay>120</start_delay>
</options>
<log_flags>
<cpu_sched_debug>1</cpu_sched_debug>
<work_fetch_debug>1</work_fetch_debug>
</log_flags>
</cc_config>
-------unsnip-----------
I have no idea why the worldcommunitygrid lines are in there (maybe someone can explain), but it might be the reason why my BOINC always has a WCG personality.

My projects settings are as follows:

project avg_work_done resource_share
rosetta 71.85 36%
seti@home 81.09 30%
CPDN 98.79 18%
WCG 49.26 16%

Here is a piece of my log.

------snip-------------
3/05/2012 00:21:32 | | [cpu_sched_debug] Request CPU reschedule: periodic CPU scheduling
3/05/2012 00:21:32 | | [cpu_sched_debug] schedule_cpus(): start
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] scheduling hadam3p_pnw_8l0f_2000_1_007823766_0 (CPU job, priority order) (prio -1.000000)
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] scheduling hadam3p_pnw_yzms_1963_1_006910844_1 (CPU job, priority order) (prio -1.019052)
3/05/2012 00:21:32 | | [cpu_sched_debug] enforce_schedule(): start
3/05/2012 00:21:32 | | [cpu_sched_debug] preliminary job list:
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] 0: hadam3p_pnw_8l0f_2000_1_007823766_0 (MD: no; UTS: no)
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] 1: hadam3p_pnw_yzms_1963_1_006910844_1 (MD: no; UTS: no)
3/05/2012 00:21:32 | | [cpu_sched_debug] final job list:
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] 0: hadam3p_pnw_8l0f_2000_1_007823766_0 (MD: no; UTS: no)
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] 1: hadam3p_pnw_yzms_1963_1_006910844_1 (MD: no; UTS: no)
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] scheduling hadam3p_pnw_8l0f_2000_1_007823766_0
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] scheduling hadam3p_pnw_yzms_1963_1_006910844_1
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] hadam3p_pnw_8l0f_2000_1_007823766_0 sched state 2 next 2 task state 1
3/05/2012 00:21:32 | climateprediction.net | [cpu_sched_debug] hadam3p_pnw_yzms_1963_1_006910844_1 sched state 2 next 2 task state 1
3/05/2012 00:21:32 | | [cpu_sched_debug] enforce_schedule: end
3/05/2012 00:21:35 | | [work_fetch] work fetch start
3/05/2012 00:21:35 | | [work_fetch] ------- start work fetch state -------
3/05/2012 00:21:35 | | [work_fetch] target work buffer: 259200.00 + 345600.00 sec
3/05/2012 00:21:35 | rosetta@home | [work_fetch] REC 70.998 priority -0.666025
3/05/2012 00:21:35 | climateprediction.net | [work_fetch] REC 89.165 priority -2.270503
3/05/2012 00:21:35 | SETI@home | [work_fetch] REC 67.264 priority -0.757205
3/05/2012 00:21:35 | World Community Grid | [work_fetch] REC 68.682 priority -1.449677
3/05/2012 00:21:35 | | [work_fetch] CPU: shortfall 236574.72 nidle 0.00 saturated 368225.28 busy 0.00
3/05/2012 00:21:35 | rosetta@home | [work_fetch] CPU: fetch share 0.360 rsc backoff (dt 0.00, inc 0.00)
3/05/2012 00:21:35 | climateprediction.net | [work_fetch] CPU: fetch share 0.180 rsc backoff (dt 0.00, inc 0.00)
3/05/2012 00:21:35 | SETI@home | [work_fetch] CPU: fetch share 0.300 rsc backoff (dt 0.00, inc 0.00)
3/05/2012 00:21:35 | World Community Grid | [work_fetch] CPU: fetch share 0.160 rsc backoff (dt 0.00, inc 0.00)
3/05/2012 00:21:35 | | [work_fetch] ------- end work fetch state -------
3/05/2012 00:21:35 | | [work_fetch] No project chosen for work fetch
-------unsnip---------

BOINC is showing no inclination to fetch any WUs, and has maintained this stance for some days now. Is this expected behaviour?
squeak
ID: 43923 · Report as offensive
Profile Peter
Avatar

Send message
Joined: 7 Sep 09
Posts: 167
Canada
Message 43925 - Posted: 2 May 2012, 14:35:41 UTC

Ignore my previous post saying 7.0.26 solved things...it didn't. I've had to turn off GPU work in the project settings for now as having it incoming causes issues with my display unless I use the cc_config file and that stops the work fetch.

Not that fetch seems to be working anyway. I have a few Collatz WU's and Boinc isn't even looking for any other projects' work.
ID: 43925 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 43965 - Posted: 4 May 2012, 17:10:56 UTC - in response to Message 43641.  
Last modified: 4 May 2012, 17:13:22 UTC

The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

Not quite. The meaning of the prefs is:
- The client requests work for a given resource when the amount of buffered work falls below min.
- It requests (from the highest-priority project) enough work to bring the amount up to min + additional.

Nice, thx for this. Now I (guess I) understand correctly what it does:
it is not: min < work to do < max
but: min < work to do < min + max (additional)

So may I suggest that international versions of client be enhanced to better reflect this. In french client language, these settings have been translated with: "minimum reserve work" and "maximum reserve work".
Following these wrong translations, I initially set 1 day for min and 5 days for max, but it should be 5 min and 0,1 additional. Is it?
If yes, the correct labels should be: "minimum reserve work" and "additional reserve work".

Likewise, may be there a mistranslation in Bam prefs' settings, because I do not find these new ones. I guess that Bam prefs' labels haven't been updated to follow new 7.x version. Anyway, should I put my 5 and 0,1 values into:
"Connect to network about every" 5 days
"Maintain enough work for an additional" 0,1 day
Correct?
Regards
ID: 43965 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 191
United States
Message 43969 - Posted: 4 May 2012, 21:14:51 UTC - in response to Message 43641.  


The minimum work buffer setting sets the minimum amount of work you're going to request.
The maximum additional work buffer sets the additional days worth of work you want to have.

Not quite. The meaning of the prefs is:

- The client requests work for a given resource when the amount of buffered work falls below min.
- It requests (from the highest-priority project) enough work to bring the amount up to min + additional.


Like most people here, until you explained it I would not have guessed the meaning. Why not just change it to "Min" and "Max" in the usual sense? I think that would forestall a lot of misunderstanding right at the outset, and save a lot of explaining later.
ID: 43969 · Report as offensive
Profile Trog Dog
Avatar

Send message
Joined: 6 May 06
Posts: 287
Australia
Message 43975 - Posted: 4 May 2012, 22:07:23 UTC - in response to Message 43965.  


Likewise, may be there a mistranslation in Bam prefs' settings, because I do not find these new ones. I guess that Bam prefs' labels haven't been updated to follow new 7.x version. Anyway, should I put my 5 and 0,1 values into:
"Connect to network about every" 5 days
"Maintain enough work for an additional" 0,1 day
Correct?
Regards


Unfortunately issues with BAM need to be raised with Willy over at the BAM/Boincstats website - I do know that his has been working on a new version of the website for about 18 months and the new site is expected to go live around the end of this month
CIC1=CC=C(C2=N[C@@H](CC(OC(C)(C)C)=O)C3=NN=C(C)N3C4=C2C(C)=C(C)S4)C=C1
ID: 43975 · Report as offensive
rvp_lan
Avatar

Send message
Joined: 30 Dec 08
Posts: 24
France
Message 44075 - Posted: 10 May 2012, 2:31:08 UTC - in response to Message 43975.  

I do know that his has been working on a new version of the website for about 18 months and the new site is expected to go live around the end of this month
Thx for this info. So I'll wait til further updates of Bam.

ID: 44075 · Report as offensive
Previous · 1 · 2

Message boards : BOINC client : massive work fetch bug in 7.0.25

Copyright © 2019 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.