Client doesn't honor work queue setting

Message boards : Questions and problems : Client doesn't honor work queue setting
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 100586 - Posted: 4 Sep 2020, 15:07:31 UTC - in response to Message 100584.  
Last modified: 4 Sep 2020, 15:17:35 UTC

I believe this contributes to a shortfall calculation and therefore additional work is requested. Am I looking at this correctly?

I wish I knew.
I noticed the problem (mainly on WCG, but sometimes other projects) when I upgraded to BOINC 7.15.x almost a year ago, and posted on it then. I know that they changed the BOINC scheduler in that version, and assumed that the server did not get the message. Eventually, the problem corrected itself.

I have not received any particularly useful responses, except one person said that if you have an app_config.xml limiting the number of tasks, that could trigger the problem. But that bug should have been fixed. Let us know what you find.

PS - I had the problem again a few days ago when installing BOINC 7.16.10 on a new OS install of Ubuntu 20.04.1. I got a week's worth of OPN, even though my buffer is set for 0.1 + 0.5 days (I added an app_config later to limit the number of MIP, but don't recall that I had it when the problem occurred). It seems to be possible for any new install.
ID: 100586 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 100591 - Posted: 4 Sep 2020, 17:57:08 UTC - in response to Message 100579.  

Client attaches to project server and downloads 183 hours of work

Did you report this to the project? Sounds like their scheduler is over anxious sending you work.

The log isn't very useful. Would've been more useful to see the work request and response from the server.
Also remember, if this is a new client to the project, BOINC won't know yet how long tasks run for, the project only sends an estimated time along. If that estimated time is wildly wrong, BOINC can ask for more work than it can chew. But that's still something the project has to fix. Therefore, did you report this at the project?
ID: 100591 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 100595 - Posted: 4 Sep 2020, 20:38:22 UTC - in response to Message 100594.  

Look, I don't mind you having a rant and all that. We all have that now and then. But then say you're having a rant and that you don't want questions because you're not in the mood to answer them.

I asked my questions with a reason, I've not seen you answer them, so for all I know your mind is made up on it being a bug in rr_simulation, and in that case just go to Github and report it there.
ID: 100595 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 100598 - Posted: 4 Sep 2020, 21:25:03 UTC

FWIW, here are my postings from a year ago. It was when I upgraded from 7.14.1 to 7.16.3 that the problem occurred.
I am quite sure the initial time estimates were not that far off to account for it.
https://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=41860

I included a link to an Einstein discussion about the change to the scheduler, but that is unfortunately no longer available.
Going back to BOINC 7.9.3 fixed it for me (or going to Rosetta worked too).

It is not very illuminating perhaps, but that is all I can troubleshoot.
ID: 100598 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 100602 - Posted: 5 Sep 2020, 1:09:20 UTC - in response to Message 100597.  
Last modified: 5 Sep 2020, 1:19:41 UTC

Hence, I clarified in my reply that the server sent that much work after multiple requests.
Which I would still like to see the messages from. Preferably with the sched_op_debug flag set (so it showed the actual amount of seconds of work it was asking each time), but you probably didn't enable that flag.

Going back to your previous posts. Because you came into this thread with absolutely no information whatsoever about your system, or projects or a previous thread to point to. Everyone reading just has to guess what you're on about. So first things first:
The machine always has 32 threads busy 100% of the time.

In this post at WCG I think you state it's an 8 core Ryzen 7 1700, which does 16 threads. Are you running 32 threads still? How do you run 32 threads on it? Have you tried going back to 16 threads for the duration of X to see if that calms things down? What else have you done to remedy things? Or are you just adding debug flags left and right to see what they do?

if the client doesn't make a server request there won't be a server response to provide
But if the project severely underestimates the task length (with the rsc_est_fpops value per task), BOINC will look at the work it got in, decide that it wasn't enough and ask for more. That's not a bug in rr_sim or work_fetch_debug, but a project problem.

That other guy there saying As soon as I set OPN to unlimited, I downloaded about 9200 work units and then he blames the client. But the client will only ask for so much work and decide after it got in some of that work if more is needed per the estimated fpops value. Now, if a task comes in with an estimated fpops of 27 trillion, or it comes in at 2700... that's quite a difference (these are examples, I don't know the values that project sends for its tasks). Work requests are done based on the amount of seconds you set the cache for. The project then says "Here's work for that amount of time", which BOINC then checks for the amount of floating operations per second that the task takes. And if that value is woefully low, BOINC won't believe that it has enough work to fill that cache. So it will ask for more. Until it decides it has enough. And then when it's running, these tasks are way way longer than the estimated fpops said they would last. Which is when those problems start. Still, in that case not a client problem. (Run a round with cpu_sched_debug on all those tasks and post that output).

Einstein is one of the projects that always sends too much work on 'new' machines, or machines that haven't run it in a while. Richard is still looking into that and wants scheduler reports, communications from the client and responses from the server with sched_op_debug set. (But in a separate thread please)
ID: 100602 · Report as offensive

Message boards : Questions and problems : Client doesn't honor work queue setting

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.