Message boards : Questions and problems : Client doesn't honor work queue setting
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Nov 10 Posts: 310 ![]() |
I believe this contributes to a shortfall calculation and therefore additional work is requested. Am I looking at this correctly? I wish I knew. I noticed the problem (mainly on WCG, but sometimes other projects) when I upgraded to BOINC 7.15.x almost a year ago, and posted on it then. I know that they changed the BOINC scheduler in that version, and assumed that the server did not get the message. Eventually, the problem corrected itself. I have not received any particularly useful responses, except one person said that if you have an app_config.xml limiting the number of tasks, that could trigger the problem. But that bug should have been fixed. Let us know what you find. PS - I had the problem again a few days ago when installing BOINC 7.16.10 on a new OS install of Ubuntu 20.04.1. I got a week's worth of OPN, even though my buffer is set for 0.1 + 0.5 days (I added an app_config later to limit the number of MIP, but don't recall that I had it when the problem occurred). It seems to be possible for any new install. |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Client attaches to project server and downloads 183 hours of work Did you report this to the project? Sounds like their scheduler is over anxious sending you work. The log isn't very useful. Would've been more useful to see the work request and response from the server. Also remember, if this is a new client to the project, BOINC won't know yet how long tasks run for, the project only sends an estimated time along. If that estimated time is wildly wrong, BOINC can ask for more work than it can chew. But that's still something the project has to fix. Therefore, did you report this at the project? |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Look, I don't mind you having a rant and all that. We all have that now and then. But then say you're having a rant and that you don't want questions because you're not in the mood to answer them. I asked my questions with a reason, I've not seen you answer them, so for all I know your mind is made up on it being a bug in rr_simulation, and in that case just go to Github and report it there. |
Send message Joined: 8 Nov 10 Posts: 310 ![]() |
FWIW, here are my postings from a year ago. It was when I upgraded from 7.14.1 to 7.16.3 that the problem occurred. I am quite sure the initial time estimates were not that far off to account for it. https://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=41860 I included a link to an Einstein discussion about the change to the scheduler, but that is unfortunately no longer available. Going back to BOINC 7.9.3 fixed it for me (or going to Rosetta worked too). It is not very illuminating perhaps, but that is all I can troubleshoot. |
![]() Send message Joined: 29 Aug 05 Posts: 15632 ![]() |
Hence, I clarified in my reply that the server sent that much work after multiple requests.Which I would still like to see the messages from. Preferably with the sched_op_debug flag set (so it showed the actual amount of seconds of work it was asking each time), but you probably didn't enable that flag. Going back to your previous posts. Because you came into this thread with absolutely no information whatsoever about your system, or projects or a previous thread to point to. Everyone reading just has to guess what you're on about. So first things first: The machine always has 32 threads busy 100% of the time. In this post at WCG I think you state it's an 8 core Ryzen 7 1700, which does 16 threads. Are you running 32 threads still? How do you run 32 threads on it? Have you tried going back to 16 threads for the duration of X to see if that calms things down? What else have you done to remedy things? Or are you just adding debug flags left and right to see what they do? if the client doesn't make a server request there won't be a server response to provideBut if the project severely underestimates the task length (with the rsc_est_fpops value per task), BOINC will look at the work it got in, decide that it wasn't enough and ask for more. That's not a bug in rr_sim or work_fetch_debug, but a project problem. That other guy there saying As soon as I set OPN to unlimited, I downloaded about 9200 work units and then he blames the client. But the client will only ask for so much work and decide after it got in some of that work if more is needed per the estimated fpops value. Now, if a task comes in with an estimated fpops of 27 trillion, or it comes in at 2700... that's quite a difference (these are examples, I don't know the values that project sends for its tasks). Work requests are done based on the amount of seconds you set the cache for. The project then says "Here's work for that amount of time", which BOINC then checks for the amount of floating operations per second that the task takes. And if that value is woefully low, BOINC won't believe that it has enough work to fill that cache. So it will ask for more. Until it decides it has enough. And then when it's running, these tasks are way way longer than the estimated fpops said they would last. Which is when those problems start. Still, in that case not a client problem. (Run a round with cpu_sched_debug on all those tasks and post that output). Einstein is one of the projects that always sends too much work on 'new' machines, or machines that haven't run it in a while. Richard is still looking into that and wants scheduler reports, communications from the client and responses from the server with sched_op_debug set. (But in a separate thread please) |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.