Why does one of my hosts have some projects' priorities stuck at their past values

Message boards : Questions and problems : Why does one of my hosts have some projects' priorities stuck at their past values
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61625 - Posted: 16 Apr 2015, 5:21:30 UTC

I had assigned different priorities to different projects for quite some time. Then, recently, I said "oh, whatever", and set them all to 100 using BAM!. All my machines use BAM! as the account manager. I forced an update of all the projects on each host, and they all got 100s for the projects they churn. Great. Except for one host. That host got 100 for almost all of the projects, but a handful stayed at their old priorities. No amount of updating helps. Since those projects currently didn't have any tasks churning on that host, I also did a reset and then update, nada. Still the old values stuck around. I turned on the priority_debug for the Event Log, but to my eyes, uninitiated in the dark magic of BOINC's inner workings, I couldn't understand if it gave any clues. Clearly, projects themselves know what their new priority for me is, i.e. BAM! has told them successfully that whatever they had for me changed to 100.

What would a more knowledgeable person want to figure this out? Turn on another flag? How much of the tail portion of stdoutdae.txt would be useful after taking which actions?

Thanks
Tuna
ID: 61625 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 61627 - Posted: 16 Apr 2015, 10:04:11 UTC - in response to Message 61625.  

I think anything you set to 100 via BAM! would be called a "Resource Share" in BOINC terminology. Priorites (especially in the context of Event Log debug flags) are something different entirely.

It sounds as if something is failing in the communication between BAM! and your computer(s). I suggest you draw up a complete list of the projects which have not updated their Resource Share as you expect, and submit it to Willy de Zutter via the BAM! forum: he might be able to identify a common factor (age of server software?) between the projects affected.
ID: 61627 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61628 - Posted: 16 Apr 2015, 11:21:30 UTC - in response to Message 61627.  

I think anything you set to 100 via BAM! would be called a "Resource Share" in BOINC terminology. Priorites (especially in the context of Event Log debug flags) are something different entirely.

It sounds as if something is failing in the communication between BAM! and your computer(s). I suggest you draw up a complete list of the projects which have not updated their Resource Share as you expect, and submit it to Willy de Zutter via the BAM! forum: he might be able to identify a common factor (age of server software?) between the projects affected.


Hmmm... My understanding of how this works is: BAM! tells each project PRJ "Hey PRJ, your Resource Share for user Foo is now NNN", and its job is done. Then the BOINC Manager on each host HST maintained by user Foo eventually goes "Hey, I'm running an Update on PRJ; so PRJ, what is the Resource Share you've got for my master Foo?" Then PRJ responds with NNN, BOINC Manager says thank you, and uses that NNB as PRJ's Resource Share.

In my case, BAM! told 44 projects that their Resource Share for me is 100. They heard and accepted it; I can confirm it on the project sites under Project Preferences under Your Account. And, 7 of my 8 hosts got the 100 during their very next Update of PRJ. The 8th host heard the news for about 35 projects, and insists on turning a deaf ear to 8 or 9 (I'm away from my computers right now, thus inexact count).

Am I wrong? If not, then the issue is with my local BOINC software/data on this host that ignores what some of my projects return. Tell me how to find out what is broken, and how to fix it.

If I'm wrong, tell me how it really works.

Sorry for thinking Priority equals Resource Share. But that is what the grid column says in BOINC Manager... :-)

Cheers...
Tuna
ID: 61628 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 61629 - Posted: 16 Apr 2015, 12:01:14 UTC - in response to Message 61628.  

OK, I can go along with that explanation - I'm not a BAM! user (as you can probably tell), so I'm not clear on the finer points.

So, the end result with the problematic projects is that the project web sites display one value (100) for resource share, and BOINC Manager still displays a different value, even after you do an 'Update' so that the client contacts the project and gets any updated information?

Similar trouble-shooting process. Taking only the problematic projects: do you have any other 'Venues' (groups of preference settings - default, home, school, work - aka 'host locations') set up? If so, is the RS set at 100 for all of them? Can you match the number shown in BOINC Manager with anywhere else? If you call up the 'Properties' page for the project in BOINC Manager, does the 'Host location' show what you expect? (Note that you can change the host location project-by-project, but only via the project website or possibly BAM! - there's no local override)
ID: 61629 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 61637 - Posted: 16 Apr 2015, 15:07:45 UTC - in response to Message 61629.  

BOINCStats/BAM FAQ#50: I have updated my preferences in BAM, updated the client from BAM but the preferences are still the old. Why?

The preferences aren't updated directly from BAM. BAM updates the preferences to the projects and then the client updates the preferences from a project.
In order to update the preferences of the client after you have edited the preferences in BAM, press the "Save to projects" button. Then update a project (no matter which) and you will receive the updated preferences.

ID: 61637 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 61638 - Posted: 16 Apr 2015, 15:27:02 UTC - in response to Message 61637.  

Because Resource Share can vary from project to project (it's the only one which is defined by BOINC centrally which is set 'per project': the others are all either part of the global set which BOINC itself keeps synchronised, or project-specific definitions), I'd guess you'd have to update every project before the RS transfer is complete.
ID: 61638 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61640 - Posted: 16 Apr 2015, 17:42:24 UTC

I'd guess you'd have to update every project before the RS transfer is complete


Already done, as stated before. I updated every project on all of my machines; everything worked on all my machines, except for this one where only a subset of the projects are not getting the new Resource Share value.

The preferences aren't updated directly from BAM. BAM updates the preferences to the projects and then the client updates the preferences from a project. In order to update the preferences of the client after you have edited the preferences in BAM, press the "Save to projects" button. Then update a project (no matter which) and you will receive the updated preferences.


That already worked, as stated before. Every project knows it is at 100. I verified that further by spending the time to visit every project website. Plus, since my other 7 machines got the right "100" during their Update process of each project, obviously they are correct at the project websites, thus the values did travel from BAM! to project websites properly. But, doublechecked.

So, the end result with the problematic projects is that the project web sites display one value (100) for resource share, and BOINC Manager still displays a different value, even after you do an 'Update' so that the client contacts the project and gets any updated information?


Yes. I actually should have trimmed my story to the bare essentials, not overshare :-), and say: "All my projects used to have different Resource Sharing values but now they all have a setting of 100, correctly displayed at project websites. All of my hosts got this 100 for each project during their project updates, except for one host that is stuck with the old values for a subset of projects even after repeated project updates."

Taking only the problematic projects: do you have any other 'Venues' (groups of preference settings - default, home, school, work - aka 'host locations') set up? If so, is the RS set at 100 for all of them? Can you match the number shown in BOINC Manager with anywhere else? If you call up the 'Properties' page for the project in BOINC Manager, does the 'Host location' show what you expect? (Note that you can change the host location project-by-project, but only via the project website or possibly BAM! - there's no local override)


No other Venues. Can't match the old values to anything else on the project website under my account. All of my Host Locations are shown as "default" in the BOINC Manager and "---" on the project websites. And when I check the Machines page on each project website under my account, Location column is blank. So, blank = default = '---'.

So, somehow this machine is not respecting the values it is getting for these projects from the project sites during the project update process. Or, the projects are sending the "old" value to this host only, but that "old" value is nowhere on their project websites. Question is why.

Also, if somebody knows the local XML file this is stored, maybe I can edit it manually?

The only other thing I can think of is to uninstall BOINC, remove all the data, and reinstall. But, I have so many really long PrimeGrid jobs that are finished 90%, yet will take weeks to finish the next 10%. And, PrimeGrid is one of those projects on this machine that is stuck... :-( I would hate to lose all that work...
ID: 61640 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 61641 - Posted: 16 Apr 2015, 17:56:40 UTC - in response to Message 61625.  
Last modified: 16 Apr 2015, 17:57:45 UTC

Except for one host. That host got 100 for almost all of the projects, but a handful stayed at their old priorities.

Which projects are affected?
Are these projects up&about?
Which BOINC version is this with?
What message do you see in Event Log when you press Update on these projects?
ID: 61641 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 61642 - Posted: 16 Apr 2015, 18:04:49 UTC - in response to Message 61628.  

Sorry for thinking Priority equals Resource Share. But that is what the grid column says in BOINC Manager... :-)

???


ID: 61642 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61645 - Posted: 16 Apr 2015, 18:40:27 UTC - in response to Message 61642.  
Last modified: 16 Apr 2015, 19:05:47 UTC

Sorry for thinking Priority equals Resource Share. But that is what the grid column says in BOINC Manager... :-)

???


Yeah, about that... It was 3am, I was barely awake, on my smartphone, typing away, away from the PC, and I had this strong image of "Pri..." as the truncated form of column title "Priority" in my head. No idea where that brain fart came from. I was soooo sure...
ID: 61645 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 61646 - Posted: 16 Apr 2015, 18:42:11 UTC - in response to Message 61640.  

Also, if somebody knows the local XML file this is stored, maybe I can edit it manually?

I think the information flow should be:

1) Receive "sched_reply_[project url].xml". This file is hard to read in Windows - use WordPad instead of NotePad. You get a completely new one with every update - timestamp should be current. Not worth editing, but have a look to see that the contents 'look right' (compare with a project which has updated properly). <resource_share> should be 15-20 lines down, after <project_preferences>

2) Info copied to "account_[project url].xml". This one is only changed when it needs to be - look at the datestamp. This one would be worth editing if you can't get RS to transfer any other way - use Notepad. If you see something odd (and if it still looks odd when you compare it with a good one), post it here - but remove the line with <authenticator> for security purposes.

3) All info for all projects is combined into "client_state.xml" - but I think we should find what the problem is before getting into that one.

The only other thing I can think of is to uninstall BOINC, remove all the data, and reinstall. But, I have so many really long PrimeGrid jobs that are finished 90%, yet will take weeks to finish the next 10%. And, PrimeGrid is one of those projects on this machine that is stuck... :-( I would hate to lose all that work...

That's what I call the "sledgehammer and two short planks" approach to computer maintenance. Could we try and work out what's going wrong first, please? There might be a bug here we can get corrected - or at least learn something to add to the reply toolbox for the next person who encounters it.

After that, you're welcome to take your hard disk out into the yard and re-educate it with extreme prejudice, if you still want to.... ;)
ID: 61646 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61647 - Posted: 16 Apr 2015, 19:01:15 UTC - in response to Message 61641.  
Last modified: 16 Apr 2015, 19:02:04 UTC

Except for one host. That host got 100 for almost all of the projects, but a handful stayed at their old priorities.

Which projects are affected?
Are these projects up&about?
Which BOINC version is this with?
What message do you see in Event Log when you press Update on these projects?


Stuck at:

50:
Enigma
PrimeGrid
SZTAKI

150:
LHC@home

200:
Einstein
SETI

300:
MalariaControl
Poem
Rosetta

400:
World Community Grid

They are all up'n'about right now. Yet, an Update on each didn't do anything; see the log below. Mind you, this has been happening multiple times a day for the last week or so. So, the problem is not specific to today.

BOINC latest: 7.4.42

Relevant log is below. Note that the fact that there are no tasks being requested shouldn't have any bearing on whether the Resource Sharing info is getting transmitted. Due to the large number of projects I run, all my machines are always in the state of not requesting tasks for more than 3-4 of my projects (seems part of the logic of how BOINC Manager rotates over the projects to satisfy the RS values and work buffers etc.), yet all their Resource Sharing values got updated the first time I issued a bulk Update on all the projects, where a good number of them didn't request tasks, yet got the "100".

I guess I could also try to REMOVE these projects (except for PrimeGrid) via BAM! since they don't have any tasks in the queue, set PrimeGrid to NoNewTasks, update with BAM!, go find everything under the data folder with these project's names, delete them, then add them back via BAM!, hoping such a refresh will do the right thing. Then, I would repeat the same with PrimeGrid once it is drained of the tasks on this machine.

Still, I am curious if we can figure out why this is happening. Might be an interesting bug somewhere in the software worth finding...

4/16/2015 11:48:31 AM | Enigma@Home | update requested by user
4/16/2015 11:48:31 AM | PrimeGrid | update requested by user
4/16/2015 11:48:31 AM | SZTAKI Desktop Grid | update requested by user
4/16/2015 11:48:31 AM | LHC@home 1.0 | update requested by user
4/16/2015 11:48:31 AM | Einstein@Home | update requested by user
4/16/2015 11:48:31 AM | SETI@home | update requested by user
4/16/2015 11:48:31 AM | malariacontrol.net | update requested by user
4/16/2015 11:48:31 AM | Poem@Home | update requested by user
4/16/2015 11:48:31 AM | rosetta@home | update requested by user
4/16/2015 11:48:31 AM | World Community Grid | update requested by user
4/16/2015 11:48:33 AM | SZTAKI Desktop Grid | Sending scheduler request: Requested by user.
4/16/2015 11:48:33 AM | SZTAKI Desktop Grid | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:48:36 AM | SZTAKI Desktop Grid | Scheduler request completed
4/16/2015 11:48:36 AM | | General prefs: from http://bam.boincstats.com/ (last modified 02-Apr-2015 22:18:55)
4/16/2015 11:48:36 AM | | Host location: none
4/16/2015 11:48:36 AM | | General prefs: using your defaults
4/16/2015 11:48:36 AM | | Reading preferences override file
4/16/2015 11:48:36 AM | | Preferences:
4/16/2015 11:48:36 AM | | max memory usage when active: 2455.82MB
4/16/2015 11:48:36 AM | | max memory usage when idle: 11051.17MB
4/16/2015 11:48:36 AM | | max disk usage: 30.00GB
4/16/2015 11:48:36 AM | | don't compute while active
4/16/2015 11:48:36 AM | | don't use GPU while active
4/16/2015 11:48:36 AM | | suspend work if non-BOINC CPU load exceeds 25%
4/16/2015 11:48:36 AM | | (to change preferences, visit a project web site or select Preferences in the Manager)
4/16/2015 11:48:41 AM | LHC@home 1.0 | Sending scheduler request: Requested by user.
4/16/2015 11:48:41 AM | LHC@home 1.0 | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:48:43 AM | LHC@home 1.0 | Scheduler request completed
4/16/2015 11:48:48 AM | malariacontrol.net | Sending scheduler request: Requested by user.
4/16/2015 11:48:48 AM | malariacontrol.net | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:48:50 AM | malariacontrol.net | Scheduler request completed
4/16/2015 11:48:55 AM | World Community Grid | Sending scheduler request: Requested by user.
4/16/2015 11:48:55 AM | World Community Grid | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:48:56 AM | World Community Grid | Scheduler request completed
4/16/2015 11:49:01 AM | rosetta@home | Sending scheduler request: Requested by user.
4/16/2015 11:49:01 AM | rosetta@home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:02 AM | rosetta@home | Scheduler request completed
4/16/2015 11:49:08 AM | Enigma@Home | Sending scheduler request: Requested by user.
4/16/2015 11:49:08 AM | Enigma@Home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:10 AM | Enigma@Home | Scheduler request completed
4/16/2015 11:49:15 AM | SETI@home | Sending scheduler request: Requested by user.
4/16/2015 11:49:15 AM | SETI@home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:16 AM | SETI@home | Scheduler request completed
4/16/2015 11:49:21 AM | Einstein@Home | Sending scheduler request: Requested by user.
4/16/2015 11:49:21 AM | Einstein@Home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:23 AM | Einstein@Home | Scheduler request completed
4/16/2015 11:49:44 AM | Poem@Home | Sending scheduler request: Requested by user.
4/16/2015 11:49:44 AM | Poem@Home | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:46 AM | Poem@Home | Scheduler request completed
4/16/2015 11:49:51 AM | PrimeGrid | Sending scheduler request: Requested by user.
4/16/2015 11:49:51 AM | PrimeGrid | Not requesting tasks: don't need (CPU: not highest priority project; NVIDIA GPU: job cache full)
4/16/2015 11:49:54 AM | PrimeGrid | Scheduler request completed
ID: 61647 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61648 - Posted: 16 Apr 2015, 19:05:14 UTC - in response to Message 61646.  


That's what I call the "sledgehammer and two short planks" approach to computer maintenance. Could we try and work out what's going wrong first, please? There might be a bug here we can get corrected - or at least learn something to add to the reply toolbox for the next person who encounters it.


That's exactly what I like doing, anyways. So, I'll do the comparisons and post back. That'll probably take a little time though. So, later!

Thanks for all the help so far, guys. I am curious. I'll be back.

Tuna
ID: 61648 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61651 - Posted: 16 Apr 2015, 23:15:21 UTC - in response to Message 61648.  


That's what I call the "sledgehammer and two short planks" approach to computer maintenance. Could we try and work out what's going wrong first, please? There might be a bug here we can get corrected - or at least learn something to add to the reply toolbox for the next person who encounters it.


That's exactly what I like doing, anyways. So, I'll do the comparisons and post back. That'll probably take a little time though. So, later!

Thanks for all the help so far, guys. I am curious. I'll be back.

Tuna


I thought I already sent this message, but I don't see it. Weird...

Anyways, to get ready to investigate, I copied all relevant files to http://1drv.ms/1PUnbWv. I renamed files for projects that are misbehaving with a leading +, so they'll pop out better. I also included the result of the DIR of these files on my computer to get a snapshot of the timestamps.

I'll investigate, yet I thought maybe someone else might want to see them instead of waiting for me to be done with my day.

Tuna
ID: 61651 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61653 - Posted: 17 Apr 2015, 3:27:55 UTC - in response to Message 61651.  

Never mind. Removed the files. I'll look at them. Seems I didn't fully read Richard's security warning in my haste to start my day.
ID: 61653 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61654 - Posted: 17 Apr 2015, 6:07:11 UTC - in response to Message 61646.  
Last modified: 17 Apr 2015, 6:11:00 UTC

Also, if somebody knows the local XML file this is stored, maybe I can edit it manually?

I think the information flow should be:

1) Receive "sched_reply_[project url].xml". This file is hard to read in Windows - use WordPad instead of NotePad. You get a completely new one with every update - timestamp should be current. Not worth editing, but have a look to see that the contents 'look right' (compare with a project which has updated properly). <resource_share> should be 15-20 lines down, after <project_preferences>

2) Info copied to "account_[project url].xml". This one is only changed when it needs to be - look at the datestamp. This one would be worth editing if you can't get RS to transfer any other way - use Notepad. If you see something odd (and if it still looks odd when you compare it with a good one), post it here - but remove the line with <authenticator> for security purposes.

3) All info for all projects is combined into "client_state.xml" - but I think we should find what the problem is before getting into that one.


Soooooo.... The bottomline is, in all the sched_* and account_* XML files the Resource Share is 100 for all projects. So, projects do report stuff properly. But, both client_state and client_state_prev have the wrong values of 50...400. And, the timestamp of client_state is after the timestamps of sched_* and account_*. Also note the timestamps of the sched_* show that they are from the Update I had run earlier and reported in this thread. And, the account_* already had the right values from sched_* in the past, given the day early timestamp. But the client_* seems to not get them correctly from account_*. Thus, something seems to be going wrong during the processing of account_* into client_state. Nope, client_state is not locked or hidden or readonly or in use by some rogue process; I checked by renaming it to foobar.xml and then back to client_state.xml which would have failed if it were locked.

04/16/2015  11:49 AM             2,990 sched_reply_boinc.bakerlab.org_rosetta.xml
04/16/2015  11:49 AM               992 sched_reply_boinc.fzk.de_poem.xml
04/16/2015  11:49 AM            17,033 sched_reply_einstein.phys.uwm.edu.xml
04/16/2015  11:48 AM             1,103 sched_reply_lhcathomeclassic.cern.ch_sixtrack.xml
04/16/2015  11:49 AM             5,585 sched_reply_setiathome.berkeley.edu.xml
04/16/2015  11:48 AM             2,128 sched_reply_szdg.lpds.sztaki.hu_szdg.xml
04/16/2015  11:49 AM             4,335 sched_reply_www.enigmaathome.net.xml
04/16/2015  11:48 AM             1,113 sched_reply_www.malariacontrol.net.xml
04/16/2015  11:49 AM             5,134 sched_reply_www.primegrid.com.xml
04/16/2015  11:48 AM            18,544 sched_reply_www.worldcommunitygrid.org.xml

04/15/2015  09:35 PM             2,403 account_boinc.bakerlab.org_rosetta.xml
04/15/2015  09:33 PM               321 account_boinc.fzk.de_poem.xml
04/15/2015  09:36 PM             3,713 account_einstein.phys.uwm.edu.xml
04/15/2015  09:21 PM               340 account_lhcathomeclassic.cern.ch_sixtrack.xml
04/15/2015  09:36 PM             2,978 account_setiathome.berkeley.edu.xml
04/15/2015  09:33 PM               472 account_szdg.lpds.sztaki.hu_szdg.xml
04/15/2015  09:36 PM             1,014 account_www.enigmaathome.net.xml
04/15/2015  09:35 PM               585 account_www.malariacontrol.net.xml
04/15/2015  09:36 PM             2,933 account_www.primegrid.com.xml
04/15/2015  09:35 PM             2,440 account_www.worldcommunitygrid.org.xml

04/16/2015  02:53 PM           504,565 client_state.xml
04/16/2015  02:53 PM           509,753 client_state_prev.xml

sched_reply_boinc.bakerlab.org_rosetta.xml:<resource_share>100</resource_share>
sched_reply_boinc.fzk.de_poem.xml:<resource_share>100</resource_share>
sched_reply_einstein.phys.uwm.edu.xml:<resource_share>100</resource_share>
sched_reply_lhcathomeclassic.cern.ch_sixtrack.xml:<resource_share>100</resource_share>
sched_reply_setiathome.berkeley.edu.xml:<resource_share>100</resource_share>
sched_reply_szdg.lpds.sztaki.hu_szdg.xml:<resource_share>100</resource_share>
sched_reply_www.enigmaathome.net.xml:<resource_share>100</resource_share>
sched_reply_www.malariacontrol.net.xml:<resource_share>100</resource_share>
sched_reply_www.primegrid.com.xml:<resource_share>100</resource_share>
sched_reply_www.worldcommunitygrid.org.xml:    <resource_share>100.0</resource_share>

account_boinc.bakerlab.org_rosetta.xml:<resource_share>100</resource_share>
account_boinc.fzk.de_poem.xml:<resource_share>100</resource_share>
account_einstein.phys.uwm.edu.xml:<resource_share>100</resource_share>
account_lhcathomeclassic.cern.ch_sixtrack.xml:<resource_share>100</resource_share>
account_setiathome.berkeley.edu.xml:<resource_share>100</resource_share>
account_szdg.lpds.sztaki.hu_szdg.xml:<resource_share>100</resource_share>
account_www.enigmaathome.net.xml:<resource_share>100</resource_share>
account_www.malariacontrol.net.xml:<resource_share>100</resource_share>
account_www.primegrid.com.xml:<resource_share>100</resource_share>
account_www.worldcommunitygrid.org.xml:    <resource_share>100.0</resource_share>

client_state.xml:    <resource_share>50.000000</resource_share>
client_state.xml:    <resource_share>150.000000</resource_share>
client_state.xml:    <resource_share>300.000000</resource_share>
client_state.xml:    <resource_share>400.000000</resource_share>
client_state.xml:    <resource_share>300.000000</resource_share>
client_state.xml:    <resource_share>50.000000</resource_share>
client_state.xml:    <resource_share>200.000000</resource_share>
client_state.xml:    <resource_share>200.000000</resource_share>
client_state.xml:    <resource_share>300.000000</resource_share>
client_state.xml:    <resource_share>50.000000</resource_share>


Then I got curious as to what other file might have ">400.000000<", focusing on World Community Grid. Boom! I got a hit in acct_mgr_request.xml which has this in it, along many more things, including the other wrong RS states. It is from about 25mins before me typing this post; it is set up such that my hosts contact BAM! every hour.

04/16/2015  10:43 PM            35,699 acct_mgr_request.xml

   <project>
      <url>http://www.worldcommunitygrid.org/</url>
      <project_name>World Community Grid</project_name>
      <suspended_via_gui>0</suspended_via_gui>
      <account_key>blah blah blah</account_key>
      <hostid>1363081</hostid>
      <not_started_dur>0.000000</not_started_dur>
      <in_progress_dur>0.000000</in_progress_dur>
      <attached_via_acct_mgr>1</attached_via_acct_mgr>
      <dont_request_more_work>0</dont_request_more_work>
      <detach_when_done>0</detach_when_done>
      <ended>0</ended>
      <resource_share>400.000000</resource_share>
   </project>


So, what is this file? Is this something BOINC Manager sends to the account manager? If so, it wouldn't be the culprit since it is simply telling the account manager what BOINC knows, without knowing what it knows is wrong, due to all the above stuff. Thus, a red herring. Or, is it? I don't really know what this file does...

So, what else?

Tuna
ID: 61654 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 61655 - Posted: 17 Apr 2015, 7:29:27 UTC - in response to Message 61654.  

So, the problem seems to be in processing/applying an RS change received from some projects. I doubt it matters whether the project change was initiated by BAM!, or directly by the user at the website.

I'm going to be offline until tomorrow night, but I run three of the projects on your list - Einstein (which uses very old server code), and SETI, LHC (both of which use recent code) - so I can look into it when I get back.
ID: 61655 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61659 - Posted: 17 Apr 2015, 16:29:29 UTC - in response to Message 61655.  

But, server code's age shouldn't matter, right? They have done their job and sent the right value to the host. At that point, it should be the host's job to interpret and roll up that value properly. All my other machines did that, but this host didn't and doesn't and won't. So, it must be a bug in BOINC Manager that happens only sometimes, and when it happens, it ignores the RS value coming from the server. Right? If I'm right, is there a BOINC developer listening in who would want me to turn on a few logging flags and do a few certain actions to generate a useful log before I try removing and re-adding these projects on this problematic host?
ID: 61659 · Report as offensive
Profile Tuna Ertemalp
Avatar

Send message
Joined: 23 Dec 13
Posts: 45
United States
Message 61695 - Posted: 19 Apr 2015, 12:32:15 UTC

Well, the host who would ignore for over a week to roll up the most current Resource Sharing values it receives from the project servers, Update after Update, now has the correct RS values, all of a sudden. And, I didn't do anything on the host itself. I wasn't anywhere near it! So, mysteriously fixed...

Still, I have the sched* and account* and client_state* files from before backed up if anyone needs information contained in them.

Tuna
ID: 61695 · Report as offensive

Message boards : Questions and problems : Why does one of my hosts have some projects' priorities stuck at their past values

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.