resource "share' not working as expected when set to zero

Message boards : Questions and problems : resource "share' not working as expected when set to zero
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 91628 - Posted: 25 May 2019, 20:24:18 UTC
Last modified: 25 May 2019, 20:42:49 UTC

I was under the impression that if resources on project "A" was %100 and project "B" was 0 then project B would only get data when A had none and would be limited to a single task at a time.

Well, maybe that works if there is only two projects but I got a boatload of Milkyway work units when its resource was %0 and SETI was 100% out of data. I had also set Einstein to %0 as I wanted it as a backup in addition to milkyway. This system has only 2 threads and Milkyway requires a full CPU unlike some of my other systems that run on 8-9% CPU utilization per job easily. Seti came back with more jobs but in the aftermath I have 125 Milkyway work units that cannot possibly finish as I cannot run more than 2 out of the 5 RX560s concurrently. I did notice there is a couple of weeks away but that will just extend the problem for another two weeks before they start executing at priority and 3 of the GPUs remain idle..

I looked that the implemention plan here
https://scienceunited.org/doc/implementation.pdf

I did not see anything that indicated ZERO is a special number that allows only 1 task.

Maybe the problem is with the project handing out a couple of 100 tasks when only 1 was asked for?

If this is a bug then I can submit it else I assume it is a feature request. I went to githug and poked around but all I could find was that PDF on implementation.

I set share to ZERO by going to my manager BAM! and changing from 0 to 1 and back to 0. I think that is a bug in BAM! but in any event when I run my local manger I see %0 share for those two backup projects so it got set to 0 on my PC. Maybe milkyway does not use that??? This was a new system and I was unaware Milkyway need %95 or so CPU for each GPU until it "happened"
ID: 91628 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 91633 - Posted: 26 May 2019, 8:56:45 UTC

I have an idea what happened there and if I'm right it's a bug in the client. You could verify it if you still have the logs.

An (undocumented?) feature of the BOINC client is that it will reset the resource shares to 100 if all projects are set to zero. Besides being useless at best - some thoughts on that later - it looks like there also is an error in the implementation. Sometimes the client detects all zeroes when it's not correct, sometimes it doesn't reset the shares when it says it does. In one case I observed that a resource share was incorrectly set from 0 to 100 at client start. The client then immediately fetched a sh*tload of unwanted work and in the process picked up the zero RS from the project again. I would never have known what happened if I hadn't been looking just that moment.

I guess that feature dates from the time when zero was not special - it is now as you noticed. Back then zero meant zero and I'm not even sure it was a valid setting. With all RS being zero the client wouldn't have fetched any tasks and it could have been reasonable to automatically reset the RS to a positive value. Now that zero is a valid and usually intentional setting, the client must not mess with it. If something needs adjustment, just tell me - preferably with a reason - and I may do it. I vote for the auto-adjustment feature to be removed, any changes it makes are against my explicitly stated wish and are volatile anyway as seen above.
ID: 91633 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5080
United Kingdom
Message 91635 - Posted: 26 May 2019, 9:50:36 UTC

A corollary to that: in theory, Resource Share can only be set on a project preferences website or via an Account Manager (BeemerBiker mentions BAM!, but other AMs are available). RS can't be set via BOINC Manager or boinccmd - the only local control would be via editing the raw data files.

Not every project updates their server code in sync with BOINC releases - many of them leave it untouched for months or years. For a long time, there were reports that specific projects wouldn't allow the special 0 value to be entered via their websites.

What happens if a user specifies 0 via an AM, for a project which can't handle it? BOINC global preferences - including RS - are supposed to be propagated from project to project during client contacts: would a non-updated project substitute unwanted values during that process? Also, Einstein@Home discovered that for many years a bug in the server code had been dropping elements from the final venue settings during propagation: that also disrupted things.

I'll have a look through the client code to see if I can find that "reset if all zero" bug that floyd described: we can lose that now that zero has been ascribed an active meaning. But for the time being (until I find the smoking gun in the code), I'm keeping an open mind whether it's in the client, an outdated server, or the AM route.
ID: 91635 · Report as offensive
floyd
Help desk expert

Send message
Joined: 23 Apr 12
Posts: 77
Message 91636 - Posted: 26 May 2019, 14:48:43 UTC - in response to Message 91635.  

For a long time, there were reports that specific projects wouldn't allow the special 0 value to be entered via their websites.
Rosetta changed 0 to 100 until a few months ago.

What happens if a user specifies 0 via an AM, for a project which can't handle it?
I don't know enough about AMs to be sure but I'd expect something like above to happen. It would be up to the AM or the BOINC client to cope with that. But in this case the suspect is Milkyway, their server code understands 0.

BOINC global preferences - including RS - are supposed to be propagated from project to project during client contacts: would a non-updated project substitute unwanted values during that process?
RS is not a global preference, it is project specific by its nature and as far as I know it's not transmitted to other projects. But forgive me a slightly related question, where do projects get the lists of other projects users participate in?

I'll have a look through the client code to see if I can find that "reset if all zero" bug that floyd described
I hope you find something. I sure looked after the incident I described and I looked again today, still don't see how it is possible but it happened. Unfortunately I can't remember details but they must be important.

we can lose that now that zero has been ascribed an active meaning
Yes when it is only about changing a meaningless value. But on a second thought, maybe parts of the code rely on the sum of shares being non-zero, for example when calculating fractions.

But for the time being (until I find the smoking gun in the code), I'm keeping an open mind
No gun, sadly not even smoke. But I heard a bang and someone fell over. Hate to look like a fool.
ID: 91636 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 91637 - Posted: 26 May 2019, 15:26:02 UTC
Last modified: 26 May 2019, 16:07:17 UTC

I failed to mention that this system is Ubuntu and runs an older version 7.9.3. Maybe that is the problem?

Going to try to upgrade to 7.14 although I hate "fixing" a working system.

Put in 7.14.2 using that site mentioned earlier. Also had to add
After=multi-user.target to get all 5 boards recognized on power up

updated image from THIS post so refresh page to see changes
ID: 91637 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5080
United Kingdom
Message 91639 - Posted: 26 May 2019, 16:18:34 UTC - in response to Message 91636.  
Last modified: 26 May 2019, 16:30:26 UTC

RS is not a global preference, it is project specific by its nature and as far as I know it's not transmitted to other projects.
My mistake - ignore. Not enough coffee error.

But forgive me a slightly related question, where do projects get the lists of other projects users participate in?
From BOINC Combined Statistics - https://boinc.netsoft-online.com/. It's one of the generalised stats sites (though not one commonly mentioned by users). An 'include' call at https://github.com/BOINC/boinc/blob/master/html/inc/user.inc#L32 gets that block (complete) from netsoft, so the project you're viewing doesn't actually 'know' anything about your other projects: the page is assembled on the fly. That's why the figures ("last public export") are always a bit below the internal ones for the current project.

I'll have a look through the client code to see if I can find that "reset if all zero" bug that floyd described
I hope you find something. I sure looked after the incident I described and I looked again today, still don't see how it is possible but it happened. Unfortunately I can't remember details but they must be important.
You're right, it definitely happens. I found

26/05/2019 16:27:33 |  | All projects have zero resource share; setting to 100
in the event log after a test, which led me to https://github.com/BOINC/boinc/blob/master/client/cs_statefile.cpp#L544. In turn, that code was added on 28 January 2010, just in time for version 6.10.32, which in turn has "The long awaited for backup project mechanism has now been implemented. Projects with a resource share of 0 are considered backup projects." (change log)

I think the two must be both intentional and related, although the checkin note says simply "client: fix my last checkin".

we can lose that now that zero has been ascribed an active meaning
Yes when it is only about changing a meaningless value. But on a second thought, maybe parts of the code rely on the sum of shares being non-zero, for example when calculating fractions.
A good point. Given the deliberate coding, I think we should probably leave it alone, although there is some untidiness. Even after the 'setting to 100' appeared in the event log, the project page in BOINC Manager still showed zero across the board: after I plugged the network cable back in and did an update on all projects, my normal values appeared for all projects, all venues. So it's non-destructive.

But for the time being (until I find the smoking gun in the code), I'm keeping an open mind
No gun, sadly not even smoke. But I heard a bang and someone fell over. Hate to look like a fool.
I think the best we can do is to add a note to the documentation to say "When setting backup projects, leave yourself at least one leg to stand on". ;-)

Edit - added to List of project preferences
ID: 91639 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 92356 - Posted: 3 Aug 2019, 18:07:01 UTC
Last modified: 3 Aug 2019, 18:08:04 UTC

Hate to bring this up but I am having to abort tasks that I know will not finish before the deadline.

I asked for help with this over at Einstein
https://einsteinathome.org/content/looking-way-limit-number-work-units

Basically setting resource to 1 did not work as expected but I will try again next time SETI goes down. Possibly the change to "1" did not propagate (sync?) in time to catch on.

I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage. That caused the project to show up with %100. I have since set it to 1 and verified all are synced and hopefully I will not have to abort the, usually, 75 or so Einstein tasks that will never finish after SETI goes back on=line.

I read were a number of SETI users have a special (even secret!) BOINC client mod to allow their queue to be large enough so their queue will not "run dry". I have no interest in using a mod unless I can do the mod my self and would rather fallback on another project.
ID: 92356 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 92360 - Posted: 3 Aug 2019, 22:08:36 UTC - in response to Message 92356.  
Last modified: 3 Aug 2019, 22:11:59 UTC


I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage


OK, I disconnected and the re-connected to !BAM and now have resources = 0 showing up on the BOINC client (my PC) for Einstein and %100 for SETI so will see what happens when SETI goes offline next Tuesday.

Not sure why I had to do this to get the 0 to show up. Before I disconnected I was showing 100 on the PC after setting resource to 0 both at the project web site (Einstein) and at !BAM even after doing an update and a sync. But a disconnect and a reconnect seemed to fix it.

Looks like it might work. Will find out next time SETI goes off line.
ID: 92360 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 92363 - Posted: 4 Aug 2019, 13:03:05 UTC - in response to Message 92360.  


I noticed that setting resource to "0" over at !BAM it had actually changed to "-1" when I refreshed their webpage


This was a bug that Willie just fixed!!!!

https://www.boincstats.com/forum/7/12283,1

I think the problem was also compounded when I changed the venue to "home" for testing purposes and then synched with the account manager before doing an update. The updates was needed to get the venue into the client else the account manager would not know WTF was going on.

Willie site has been upgraded, looks really nice. I have used !BAM since I join BIONC and possibly this bug has been there all this time???
ID: 92363 · Report as offensive
mmonnin

Send message
Joined: 1 Jul 16
Posts: 146
United States
Message 92371 - Posted: 5 Aug 2019, 15:19:44 UTC - in response to Message 91635.  

Not every project updates their server code in sync with BOINC releases - many of them leave it untouched for months or years. For a long time, there were reports that specific projects wouldn't allow the special 0 value to be entered via their websites.


YoYo for some reason doesn't allow 0 so its never a backup project for me. Admins have confirmed 0 is not allowed. Thats the only project I've come across that doesn't allow 0.
ID: 92371 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 92372 - Posted: 5 Aug 2019, 15:24:19 UTC - in response to Message 92371.  

Not every project updates their server code in sync with BOINC releases - many of them leave it untouched for months or years. For a long time, there were reports that specific projects wouldn't allow the special 0 value to be entered via their websites.


YoYo for some reason doesn't allow 0 so its never a backup project for me. Admins have confirmed 0 is not allowed. Thats the only project I've come across that doesn't allow 0.


That is a shame. Just another excuse for someone to mod the boinc code to make up for deficiency in project code.
ID: 92372 · Report as offensive

Message boards : Questions and problems : resource "share' not working as expected when set to zero

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.