Message boards : Questions and problems : High priority mode?
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Feb 23 Posts: 36 |
I see sometimes Boinc does "high priority" on tasks it's running. But I cannot work out what this means. It doesn't do the shortest deadline first, or the shortest to run first, or anything I can see as a logical order. Anyone know what it actually does? |
Send message Joined: 5 Oct 06 Posts: 5121 |
The word "priority" is used in many different places, and with many different meanings, according to context - in both computing in general, and BOINC in particular. Please give us an example or two of the context for this particular usage. |
Send message Joined: 18 Feb 23 Posts: 36 |
Doesn't seem to show up in Boinc Manager, only Boinctasks. Can't remember if it appears in the Boinc Manager window. Status in Boinc Manager against a task says "Running High P" instead of "Running" - it seems to do this when there is more work to do than I think the earliest deadline. Would be nice to know how it decides what to do, because it's never the way I would choose it. For example, with 1 GPU: Primegrid genefer extreme 1 day to go, 17 days left on deadline. Gets just below 1 day, less than my 1 day buffer, so it tries to get the rare WCG GPU work. It manages to get a few. Not enough, so it asks Primegrid, gets another extreme, which it (incorrectly) thinks takes 50 days (they always start like that then go down to 4 days). So now we have: Extreme 22 hours to go on 17 days deadline Extreme "50" days to go on 21 days deadline Four of WCG 15 minutes to go each on 3 days deadline It chose to do the short extreme first. How did it make this decision? Shortest deadline first I could understand but then the WCG would be running. Shortest time to run first to get as many things completed would make sense, but again the WCG would be running. Project weighting is far higher for WCG and it's done far less of it, so it can't be anything to do with that. I'd just like to know the calculation it does in deciding work in "panic mode". |
Send message Joined: 5 Oct 06 Posts: 5121 |
So now we have:* is the problem. It's the only one of the tasks you mention that - on the basis of the information that BOINC has been given by PrimeGrid - is at risk of missing its deadline. The most important policy directive that BOINC has in this situation is "avoid missing deadlines, at all costs". So the 50 day estimate task * gets first stab at the machine's resources, and only other tasks which can fit around it will be run. Before a task has started running, BOINC is only given two pieces of information about it: the number of floating point operations that will be needed to complete the task (estimated by the project staff in advance), and the speed of the device it will be running on (estimated by BOINC from the observed speed of previous tasks run by the same application on the same device). Nothing else. PrimeGrid could possibly make a better attempt to get those estimates right ... |
Send message Joined: 18 Feb 23 Posts: 36 |
It shouldn't be. Imagine a case where there really is a task which is late, maybe I turned the computer off for a few days. I see no point in running them in the order it does. The best it can do is get as many things done by the deadlines as it can. The obvious order in this case is WCG, then the extreme already started, then the new extreme (which since it's hardly started isn't much of a big deal if it's cancelled).So now we have:* is the problem. It's the only one of the tasks you mention that - on the basis of the information that BOINC has been given by PrimeGrid - is at risk of missing its deadline. The most important policy directive that BOINC has in this situation is "avoid missing deadlines, at all costs".I would still like to know why it chooses the order it does, I'm sure I'd read "earliest deadline first" in any such scenario, which would make sense, but it isn't doing that. So the 50 day estimate task * gets first stab at the machine's resources, and only other tasks which can fit around it will be run.That would be daft, as it's the least likely to be any use when it's completed. It didn't do that anyway, it took the other extreme which was almost finished. Before a task has started running, BOINC is only given two pieces of information about it: the number of floating point operations that will be needed to complete the task (estimated by the project staff in advance), and the speed of the device it will be running on (estimated by BOINC from the observed speed of previous tasks run by the same application on the same device). Nothing else.That can't be right. If I only run those tasks from Primegrid and nothing else, it gradually learns how long they take. But if I run a CPU task from them, it gets all messed up. Seems the client isn't able to record two different speeds, the CPU and the GPU. There should be a seperate speed recorded for every app on every type of device it has. |
Send message Joined: 5 Oct 06 Posts: 5121 |
Someone above my pay grade will have to explain "why?": I just try to explain the "what?" |
Send message Joined: 18 Feb 23 Posts: 36 |
Someone above my pay grade will have to explain "why?": I just try to explain the "what?"The "what" would be interesting. I want to know what choices it makes. I always thought it was earliest deadline first, but it seems there's something else going on. Earliest deadline first would have done the WCG tasks first, which would have been preferable, since they could get done really quick, needed finishing first, and were the highest weighted project. |
Send message Joined: 19 Apr 23 Posts: 16 |
In the world of the BOINC client scheduler, “earliest deadline” doesn’t mean “the deadline occurring first in the calendar”; it means “the task most likely to miss its deadline”. In your case, then (as Richard has already explained), “earliest deadline first” policy (which BoincTasks labels “high priority”) means the “50-day” tasks get to run, because they are projected to miss their deadline. Everything else is lower priority. The prioritisation algorithm is:
|
Send message Joined: 18 Feb 23 Posts: 36 |
In the world of the BOINC client scheduler, “earliest deadline” doesn’t mean “the deadline occurring first in the calendar”; it means “the task most likely to miss its deadline”.I see, thanks. So similar to a workman doing the job for the customer who is jumping up and down the most. I guess it's just the why then, because doing something you're going to take over twice as long to complete instead of giving it to someone else is a bit daft. Especially when you then don't do things you could have completed in a timely fashion. It would be like a mechanic working on a 5 day fix for someone who needs it tomorrow, and neglecting 6 other customers he could change a tyre for in 10 minutes. Even better would be to fix the problem Richard pointed out in red. The client not keeping a note of different apps on different CPU/GPU types going at different speeds. I've been told this is to do with outdated server software, but I don't see why. The client knows how long those tasks take on that processor. Other apps on other processors should not affect it. |
Send message Joined: 19 Apr 23 Posts: 16 |
Especially when you then don't do things you could have But it will do them. (Or at least it should…) Under the existing policy, and all else remaining equal: at some point during the next 3 days, the WCG tasks will become the ones most likely to miss their deadline (even with the others still in progress), and they will be given priority such that they complete in time. |
Send message Joined: 18 Feb 23 Posts: 36 |
Let's say the Primegrid one really is going to take 50 days. It will be the most behind until some point where WCG is also behind. So the WCG tasks are now sent back late too. It would be better to do stuff you know you can get done in time first. Think of each task as a customer. Why make them all angry instead of just one?Especially when you then don't do things you could have |
Send message Joined: 19 Apr 23 Posts: 16 |
until some point where WCG is also behind With a 1-day work buffer, that shouldn’t happen. The client will realise WCG needs prioritising before it gets behind, not once it gets behind – so (assuming the remaining-time estimates aren’t wildly wrong) the tasks will finish by their deadline. |
Send message Joined: 18 Feb 23 Posts: 36 |
You said earlier "the WCG tasks will become the ones most likely to miss their deadline" - but as the WCG deadline approaches, we now have Primegrid with 47 days to go on an 18 day deadline, and WCG with 15 minutes to go on a 2 minute deadline. So Primegrid is still the furthest behind and will continue to run.until some point where WCG is also behindWith a 1-day work buffer, that shouldn’t happen. The client will realise WCG needs prioritising before it gets behind, not once it gets behind – so (assuming the remaining-time estimates aren’t wildly wrong) the tasks will finish by their deadline. And I've seen it fail many times. Constantly finding tasks with a negative deadline. Anyway I'm sure WCG would appreciate their tasks getting done sooner. And since 15 minutes is a lot less than 50 days, do the shortest first? We had this sorted in the 80s in the NHS computer system back in the days of mainframes. |
Send message Joined: 19 Apr 23 Posts: 16 |
You said earlier "the WCG tasks will become the ones most likely to miss their deadline"“Most likely” is perhaps inaccurate. At every scheduling point, BOINC predicts whether each task will miss its deadline. There is no relative grading of probability; it’s 0 or 100%. Right now, your PrimeGrid tasks get priority because under all conceivable scheduling choices, they will miss their deadline. In that situation, the client’s policy is to give them the chance to run – it allows for the remaining-time and achieved-performance estimates to be wrong (and perhaps to improve over time), and it is not permitted to abort the tasks simply because it believes it has no chance of finishing them. The WCG tasks do not get priority yet, because there is still plenty of time before the deadline to do the remaining work; the client does not need to favour them over the PrimeGrid tasks now, because it can reschedule later. WCG with 15 minutes to go on a 2 minute deadlineIt should never get that close to the wire. With a work buffer of 1 day, the scheduler applies 1 day of padding to task deadlines (because it could run for that long before next contacting a server). So 1 day before the WCG tasks are due, the scheduler will predict that all tasks will miss their deadline, at which point they get prioritised in deadline order (the intuitive interpretation of “earliest deadline first”). I've seen it fail many times.If you can get a concrete example of that, please capture the client state, feed it in to the Client Emulator, and raise a bug. I'm sure WCG would appreciate their tasks getting done soonerIf a project wants its tasks returned sooner, it needs to set earlier deadlines. BOINC is not a race; it does not care whether tasks finish 1 second or 1 week before their deadline. |
Send message Joined: 29 Aug 05 Posts: 15542 |
It would be better to do stuff you know you can get done in time first.Doing that each time, BOINC will never do work for the other projects, when you always have a project with earlier deadlines than others. Just leave things well alone and you'll notice that the next time BOINC asks for work from Primegrid, is going to be quite a distance time, so as to give the other project(s) time to play catch up. And besides, BOINC projects use redundancy. Work is not just sent out to you, but to another computer as well. When one doesn't return the work in time, it's sent out to a 3rd computer. Until a canonical result comes back. You asked earlier as to why BOINC only has one scheduler for all project's applications. First off, the scheduler comes from the time that computers had just a CPU and most projects had 1 application. Only later have new hardware options been added, like GPUs and multiple applications per project. Seeing how the development of BOINC was then and still is done by volunteers and rewriting the scheduler from scratch to include all the new things people want is quite a job, it's been put on the back burner. Maybe one day. |
Send message Joined: 18 Feb 23 Posts: 36 |
So, at each scheduling point, only Primegrid is to "miss deadline" until such time as WCG will also "miss deadline". then they're both 100%. Best case scenario, WCG is a little bit late. It doesn't have to be.You said earlier "the WCG tasks will become the ones most likely to miss their deadline"“Most likely” is perhaps inaccurate. At every scheduling point, BOINC predicts whether each task will miss its deadline. There is no relative grading of probability; it’s 0 or 100%. Right now, your PrimeGrid tasks get priority because under all conceivable scheduling choices, they will miss their deadline. In that situation, the client’s policy is to give them the chance to run – it allows for the remaining-time and achieved-performance estimates to be wrong (and perhaps to improve over time), and it is not permitted to abort the tasks simply because it believes it has no chance of finishing them. The WCG tasks do not get priority yet, because there is still plenty of time before the deadline to do the remaining work; the client does not need to favour them over the PrimeGrid tasks now, because it can reschedule later. Ah, a further complication. So when I tell it to set a buffer, I'm unknowingly adjusting the deadlines. The plot thickens.WCG with 15 minutes to go on a 2 minute deadlineIt should never get that close to the wire. With a work buffer of 1 day, the scheduler applies 1 day of padding to task deadlines (because it could run for that long before next contacting a server). So 1 day before the WCG tasks are due, the scheduler will predict that all tasks will miss their deadline, at which point they get prioritised in deadline order (the intuitive interpretation of “earliest deadline first”). Again? I'm tired of raising bugs. Bugs ought to be removed before the software goes to the customer. Did Microsoft write Boinc?I've seen it fail many times.If you can get a concrete example of that, please capture the client state, feed it in to the Client Emulator, and raise a bug. It's a game for projects to pick numbers which they think will cause Boinc to suit their purpose. Primegrid for example runs a system of secretly (without the client knowing) extending the deadlines if the task shows progress.I'm sure WCG would appreciate their tasks getting done soonerIf a project wants its tasks returned sooner, it needs to set earlier deadlines. BOINC is not a race; it does not care whether tasks finish 1 second or 1 week before their deadline. |
Send message Joined: 18 Feb 23 Posts: 36 |
I'm only talking about during panic mode. If there's too much to do, do the earliest things first.It would be better to do stuff you know you can get done in time first.Doing that each time, BOINC will never do work for the other projects, when you always have a project with earlier deadlines than others. Just leave things well alone and you'll notice that the next time BOINC asks for work from Primegrid, is going to be quite a distance time, so as to give the other project(s) time to play catch up.Except since WCG has sporadic work, as soon as WCG says no, Primegrid gets another "50" day task. And besides, BOINC projects use redundancy. Work is not just sent out to you, but to another computer as well. When one doesn't return the work in time, it's sent out to a 3rd computer. Until a canonical result comes back.They would prefer not to have to wait. And a lot of projects don't even send it twice, that's only when checking is needed, a lot of apps self check. Some projects working on biology need a set of results back to create the next batch of work, they can't have stuff hanging around for a few deadline's worth. You asked earlier as to why BOINC only has one scheduler for all project's applications. First off, the scheduler comes from the time that computers had just a CPU and most projects had 1 application. Only later have new hardware options been added, like GPUs and multiple applications per project. Seeing how the development of BOINC was then and still is done by volunteers and rewriting the scheduler from scratch to include all the new things people want is quite a job, it's been put on the back burner. Maybe one day.It's a very important change, and one I was told has already been done, but for some reason requires the server end to also be updated?! And GPUs aren't that recent. |
Send message Joined: 28 Jun 10 Posts: 2636 |
The way I see it is that it is Prime Grid's responsibility to either get better estimates or to increase the deadlines. Pretty sure there are other projects out there that mess things up as well, like CPDN, the main one I am involved in having deadlines of over a year for work that PhD students need back a lot quicker for their theses. Much tighter deadlines on some recent batches have worked well. The other thing I would do if it were such a big issue for me is use my account over at git-hub to put in a request for what you want. That would still need a developer to pick it up and run with it but if they don't get the requests proving a demand, there isn't a chance of it being prioritised. |
Send message Joined: 12 May 23 Posts: 3 |
add |
Send message Joined: 28 Jun 10 Posts: 2636 |
Incidentally, I think Amicable numbers is guilty of overestimating the time tasks will take, certainly on their multi-CPU tasks where it looks to me like the estimate is for if the task is running on just on processor rather than the six I am using. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.