Thread 'High priority mode?'

Message boards : Questions and problems : High priority mode?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111709 - Posted: 3 May 2023, 11:27:28 UTC

I see sometimes Boinc does "high priority" on tasks it's running. But I cannot work out what this means. It doesn't do the shortest deadline first, or the shortest to run first, or anything I can see as a logical order. Anyone know what it actually does?
ID: 111709 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111710 - Posted: 3 May 2023, 11:54:10 UTC - in response to Message 111709.  

The word "priority" is used in many different places, and with many different meanings, according to context - in both computing in general, and BOINC in particular.

Please give us an example or two of the context for this particular usage.
ID: 111710 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111711 - Posted: 3 May 2023, 12:22:40 UTC - in response to Message 111710.  
Last modified: 3 May 2023, 12:23:03 UTC

Doesn't seem to show up in Boinc Manager, only Boinctasks. Can't remember if it appears in the Boinc Manager window.

Status in Boinc Manager against a task says "Running High P" instead of "Running" - it seems to do this when there is more work to do than I think the earliest deadline.

Would be nice to know how it decides what to do, because it's never the way I would choose it.

For example, with 1 GPU:
Primegrid genefer extreme 1 day to go, 17 days left on deadline.
Gets just below 1 day, less than my 1 day buffer, so it tries to get the rare WCG GPU work.
It manages to get a few.
Not enough, so it asks Primegrid, gets another extreme, which it (incorrectly) thinks takes 50 days (they always start like that then go down to 4 days).
So now we have:
Extreme 22 hours to go on 17 days deadline
Extreme "50" days to go on 21 days deadline
Four of WCG 15 minutes to go each on 3 days deadline

It chose to do the short extreme first. How did it make this decision? Shortest deadline first I could understand but then the WCG would be running. Shortest time to run first to get as many things completed would make sense, but again the WCG would be running. Project weighting is far higher for WCG and it's done far less of it, so it can't be anything to do with that.

I'd just like to know the calculation it does in deciding work in "panic mode".
ID: 111711 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111712 - Posted: 3 May 2023, 13:26:08 UTC - in response to Message 111711.  

So now we have:
Extreme 22 hours to go on 17 days deadline
Extreme "50" days to go on 21 days deadline (* will miss deadline)
Four of WCG 15 minutes to go each on 3 days deadline
* is the problem.

It's the only one of the tasks you mention that - on the basis of the information that BOINC has been given by PrimeGrid - is at risk of missing its deadline. The most important policy directive that BOINC has in this situation is "avoid missing deadlines, at all costs".

So the 50 day estimate task * gets first stab at the machine's resources, and only other tasks which can fit around it will be run.

Before a task has started running, BOINC is only given two pieces of information about it: the number of floating point operations that will be needed to complete the task (estimated by the project staff in advance), and the speed of the device it will be running on (estimated by BOINC from the observed speed of previous tasks run by the same application on the same device). Nothing else.

PrimeGrid could possibly make a better attempt to get those estimates right ...
ID: 111712 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111713 - Posted: 3 May 2023, 13:52:47 UTC - in response to Message 111712.  
Last modified: 3 May 2023, 13:54:27 UTC

So now we have:
Extreme 22 hours to go on 17 days deadline
Extreme "50" days to go on 21 days deadline (* will miss deadline)
Four of WCG 15 minutes to go each on 3 days deadline
* is the problem.
It shouldn't be. Imagine a case where there really is a task which is late, maybe I turned the computer off for a few days. I see no point in running them in the order it does. The best it can do is get as many things done by the deadlines as it can. The obvious order in this case is WCG, then the extreme already started, then the new extreme (which since it's hardly started isn't much of a big deal if it's cancelled).

It's the only one of the tasks you mention that - on the basis of the information that BOINC has been given by PrimeGrid - is at risk of missing its deadline. The most important policy directive that BOINC has in this situation is "avoid missing deadlines, at all costs".
I would still like to know why it chooses the order it does, I'm sure I'd read "earliest deadline first" in any such scenario, which would make sense, but it isn't doing that.

So the 50 day estimate task * gets first stab at the machine's resources, and only other tasks which can fit around it will be run.
That would be daft, as it's the least likely to be any use when it's completed. It didn't do that anyway, it took the other extreme which was almost finished.

Before a task has started running, BOINC is only given two pieces of information about it: the number of floating point operations that will be needed to complete the task (estimated by the project staff in advance), and the speed of the device it will be running on (estimated by BOINC from the observed speed of previous tasks run by the same application on the same device). Nothing else.
That can't be right. If I only run those tasks from Primegrid and nothing else, it gradually learns how long they take. But if I run a CPU task from them, it gets all messed up. Seems the client isn't able to record two different speeds, the CPU and the GPU. There should be a seperate speed recorded for every app on every type of device it has.
ID: 111713 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5121
United Kingdom
Message 111714 - Posted: 3 May 2023, 14:06:58 UTC - in response to Message 111713.  

Someone above my pay grade will have to explain "why?": I just try to explain the "what?"
ID: 111714 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111715 - Posted: 3 May 2023, 14:26:06 UTC - in response to Message 111714.  

Someone above my pay grade will have to explain "why?": I just try to explain the "what?"
The "what" would be interesting. I want to know what choices it makes. I always thought it was earliest deadline first, but it seems there's something else going on. Earliest deadline first would have done the WCG tasks first, which would have been preferable, since they could get done really quick, needed finishing first, and were the highest weighted project.
ID: 111715 · Report as offensive
Brian Nixon

Send message
Joined: 19 Apr 23
Posts: 16
United Kingdom
Message 111716 - Posted: 3 May 2023, 15:16:53 UTC - in response to Message 111715.  
Last modified: 3 May 2023, 15:32:01 UTC

In the world of the BOINC client scheduler, “earliest deadline” doesn’t mean “the deadline occurring first in the calendar”; it means “the task most likely to miss its deadline”.

In your case, then (as Richard has already explained), “earliest deadline first” policy (which BoincTasks labels “high priority”) means the “50-⁠day” tasks get to run, because they are projected to miss their deadline. Everything else is lower priority.

The prioritisation algorithm is:

  1. favor jobs in danger of deadline miss
  2. favor coproc jobs, so that e.g. if we're RAM-limited we'll use the GPU instead of the CPU
  3. favor jobs in the middle of time slice, or that haven't checkpointed since start of time slice
  4. for CPU jobs, favor jobs that use more CPUs
  5. favor jobs selected first by schedule_cpus() (e.g., because their project has high sched priority)



If you speak C++, you can see the implementation here.

ID: 111716 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111717 - Posted: 3 May 2023, 15:30:54 UTC - in response to Message 111716.  

In the world of the BOINC client scheduler, “earliest deadline” doesn’t mean “the deadline occurring first in the calendar”; it means “the task most likely to miss its deadline”.

In your case, then (as Richard has already explained), “earliest deadline first” policy (which BoincTasks labels “high priority”) means the “50-⁠day” tasks get to run, because they are projected to miss their deadline. Everything else is lower priority.

If you speak C++, you can see the prioritisation algorithm here.
I see, thanks. So similar to a workman doing the job for the customer who is jumping up and down the most.

I guess it's just the why then, because doing something you're going to take over twice as long to complete instead of giving it to someone else is a bit daft. Especially when you then don't do things you could have completed in a timely fashion. It would be like a mechanic working on a 5 day fix for someone who needs it tomorrow, and neglecting 6 other customers he could change a tyre for in 10 minutes.

Even better would be to fix the problem Richard pointed out in red. The client not keeping a note of different apps on different CPU/GPU types going at different speeds. I've been told this is to do with outdated server software, but I don't see why. The client knows how long those tasks take on that processor. Other apps on other processors should not affect it.
ID: 111717 · Report as offensive
Brian Nixon

Send message
Joined: 19 Apr 23
Posts: 16
United Kingdom
Message 111719 - Posted: 3 May 2023, 16:19:02 UTC - in response to Message 111717.  

Especially when you then don't do things you could have

But it will do them. (Or at least it should…) Under the existing policy, and all else remaining equal: at some point during the next 3 days, the WCG tasks will become the ones most likely to miss their deadline (even with the others still in progress), and they will be given priority such that they complete in time.
ID: 111719 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111720 - Posted: 3 May 2023, 16:30:48 UTC - in response to Message 111719.  

Especially when you then don't do things you could have

But it will do them. (Or at least it should…) Under the existing policy, and all else remaining equal: at some point during the next 3 days, the WCG tasks will become the ones most likely to miss their deadline (even with the others still in progress), and they will be given priority such that they complete in time.
Let's say the Primegrid one really is going to take 50 days. It will be the most behind until some point where WCG is also behind. So the WCG tasks are now sent back late too. It would be better to do stuff you know you can get done in time first. Think of each task as a customer. Why make them all angry instead of just one?
ID: 111720 · Report as offensive
Brian Nixon

Send message
Joined: 19 Apr 23
Posts: 16
United Kingdom
Message 111721 - Posted: 3 May 2023, 17:05:39 UTC - in response to Message 111720.  

until some point where WCG is also behind

With a 1-⁠day work buffer, that shouldn’t happen. The client will realise WCG needs prioritising before it gets behind, not once it gets behind – so (assuming the remaining-time estimates aren’t wildly wrong) the tasks will finish by their deadline.
ID: 111721 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111722 - Posted: 3 May 2023, 17:39:05 UTC - in response to Message 111721.  
Last modified: 3 May 2023, 17:45:21 UTC

until some point where WCG is also behind
With a 1-⁠day work buffer, that shouldn’t happen. The client will realise WCG needs prioritising before it gets behind, not once it gets behind – so (assuming the remaining-time estimates aren’t wildly wrong) the tasks will finish by their deadline.
You said earlier "the WCG tasks will become the ones most likely to miss their deadline" - but as the WCG deadline approaches, we now have Primegrid with 47 days to go on an 18 day deadline, and WCG with 15 minutes to go on a 2 minute deadline. So Primegrid is still the furthest behind and will continue to run.

And I've seen it fail many times. Constantly finding tasks with a negative deadline.

Anyway I'm sure WCG would appreciate their tasks getting done sooner. And since 15 minutes is a lot less than 50 days, do the shortest first? We had this sorted in the 80s in the NHS computer system back in the days of mainframes.
ID: 111722 · Report as offensive
Brian Nixon

Send message
Joined: 19 Apr 23
Posts: 16
United Kingdom
Message 111723 - Posted: 3 May 2023, 19:00:09 UTC - in response to Message 111722.  

You said earlier "the WCG tasks will become the ones most likely to miss their deadline"
“Most likely” is perhaps inaccurate. At every scheduling point, BOINC predicts whether each task will miss its deadline. There is no relative grading of probability; it’s 0 or 100%. Right now, your PrimeGrid tasks get priority because under all conceivable scheduling choices, they will miss their deadline. In that situation, the client’s policy is to give them the chance to run – it allows for the remaining-time and achieved-performance estimates to be wrong (and perhaps to improve over time), and it is not permitted to abort the tasks simply because it believes it has no chance of finishing them. The WCG tasks do not get priority yet, because there is still plenty of time before the deadline to do the remaining work; the client does not need to favour them over the PrimeGrid tasks now, because it can reschedule later.

WCG with 15 minutes to go on a 2 minute deadline
It should never get that close to the wire. With a work buffer of 1 day, the scheduler applies 1 day of padding to task deadlines (because it could run for that long before next contacting a server). So 1 day before the WCG tasks are due, the scheduler will predict that all tasks will miss their deadline, at which point they get prioritised in deadline order (the intuitive interpretation of “earliest deadline first”).

I've seen it fail many times.
If you can get a concrete example of that, please capture the client state, feed it in to the Client Emulator, and raise a bug.

I'm sure WCG would appreciate their tasks getting done sooner
If a project wants its tasks returned sooner, it needs to set earlier deadlines. BOINC is not a race; it does not care whether tasks finish 1 second or 1 week before their deadline.
ID: 111723 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15542
Netherlands
Message 111724 - Posted: 3 May 2023, 19:07:34 UTC - in response to Message 111720.  

It would be better to do stuff you know you can get done in time first.
Doing that each time, BOINC will never do work for the other projects, when you always have a project with earlier deadlines than others. Just leave things well alone and you'll notice that the next time BOINC asks for work from Primegrid, is going to be quite a distance time, so as to give the other project(s) time to play catch up.

And besides, BOINC projects use redundancy. Work is not just sent out to you, but to another computer as well. When one doesn't return the work in time, it's sent out to a 3rd computer. Until a canonical result comes back.

You asked earlier as to why BOINC only has one scheduler for all project's applications. First off, the scheduler comes from the time that computers had just a CPU and most projects had 1 application. Only later have new hardware options been added, like GPUs and multiple applications per project. Seeing how the development of BOINC was then and still is done by volunteers and rewriting the scheduler from scratch to include all the new things people want is quite a job, it's been put on the back burner. Maybe one day.
ID: 111724 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111725 - Posted: 3 May 2023, 19:28:24 UTC - in response to Message 111723.  
Last modified: 3 May 2023, 19:29:14 UTC

You said earlier "the WCG tasks will become the ones most likely to miss their deadline"
“Most likely” is perhaps inaccurate. At every scheduling point, BOINC predicts whether each task will miss its deadline. There is no relative grading of probability; it’s 0 or 100%. Right now, your PrimeGrid tasks get priority because under all conceivable scheduling choices, they will miss their deadline. In that situation, the client’s policy is to give them the chance to run – it allows for the remaining-time and achieved-performance estimates to be wrong (and perhaps to improve over time), and it is not permitted to abort the tasks simply because it believes it has no chance of finishing them. The WCG tasks do not get priority yet, because there is still plenty of time before the deadline to do the remaining work; the client does not need to favour them over the PrimeGrid tasks now, because it can reschedule later.
So, at each scheduling point, only Primegrid is to "miss deadline" until such time as WCG will also "miss deadline". then they're both 100%. Best case scenario, WCG is a little bit late. It doesn't have to be.

WCG with 15 minutes to go on a 2 minute deadline
It should never get that close to the wire. With a work buffer of 1 day, the scheduler applies 1 day of padding to task deadlines (because it could run for that long before next contacting a server). So 1 day before the WCG tasks are due, the scheduler will predict that all tasks will miss their deadline, at which point they get prioritised in deadline order (the intuitive interpretation of “earliest deadline first”).
Ah, a further complication. So when I tell it to set a buffer, I'm unknowingly adjusting the deadlines. The plot thickens.

I've seen it fail many times.
If you can get a concrete example of that, please capture the client state, feed it in to the Client Emulator, and raise a bug.
Again? I'm tired of raising bugs. Bugs ought to be removed before the software goes to the customer. Did Microsoft write Boinc?

I'm sure WCG would appreciate their tasks getting done sooner
If a project wants its tasks returned sooner, it needs to set earlier deadlines. BOINC is not a race; it does not care whether tasks finish 1 second or 1 week before their deadline.
It's a game for projects to pick numbers which they think will cause Boinc to suit their purpose. Primegrid for example runs a system of secretly (without the client knowing) extending the deadlines if the task shows progress.
ID: 111725 · Report as offensive
Lucas Dobre

Send message
Joined: 18 Feb 23
Posts: 36
Message 111726 - Posted: 3 May 2023, 19:33:33 UTC - in response to Message 111724.  
Last modified: 3 May 2023, 19:34:04 UTC

It would be better to do stuff you know you can get done in time first.
Doing that each time, BOINC will never do work for the other projects, when you always have a project with earlier deadlines than others.
I'm only talking about during panic mode. If there's too much to do, do the earliest things first.

Just leave things well alone and you'll notice that the next time BOINC asks for work from Primegrid, is going to be quite a distance time, so as to give the other project(s) time to play catch up.
Except since WCG has sporadic work, as soon as WCG says no, Primegrid gets another "50" day task.

And besides, BOINC projects use redundancy. Work is not just sent out to you, but to another computer as well. When one doesn't return the work in time, it's sent out to a 3rd computer. Until a canonical result comes back.
They would prefer not to have to wait. And a lot of projects don't even send it twice, that's only when checking is needed, a lot of apps self check. Some projects working on biology need a set of results back to create the next batch of work, they can't have stuff hanging around for a few deadline's worth.

You asked earlier as to why BOINC only has one scheduler for all project's applications. First off, the scheduler comes from the time that computers had just a CPU and most projects had 1 application. Only later have new hardware options been added, like GPUs and multiple applications per project. Seeing how the development of BOINC was then and still is done by volunteers and rewriting the scheduler from scratch to include all the new things people want is quite a job, it's been put on the back burner. Maybe one day.
It's a very important change, and one I was told has already been done, but for some reason requires the server end to also be updated?! And GPUs aren't that recent.
ID: 111726 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111729 - Posted: 4 May 2023, 18:53:15 UTC - in response to Message 111726.  

The way I see it is that it is Prime Grid's responsibility to either get better estimates or to increase the deadlines. Pretty sure there are other projects out there that mess things up as well, like CPDN, the main one I am involved in having deadlines of over a year for work that PhD students need back a lot quicker for their theses. Much tighter deadlines on some recent batches have worked well.

The other thing I would do if it were such a big issue for me is use my account over at git-hub to put in a request for what you want. That would still need a developer to pick it up and run with it but if they don't get the requests proving a demand, there isn't a chance of it being prioritised.
ID: 111729 · Report as offensive
Convert
Avatar

Send message
Joined: 12 May 23
Posts: 3
France
Message 111785 - Posted: 12 May 2023, 10:10:14 UTC - in response to Message 111710.  

add
ID: 111785 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2636
United Kingdom
Message 111790 - Posted: 12 May 2023, 16:42:36 UTC

Incidentally, I think Amicable numbers is guilty of overestimating the time tasks will take, certainly on their multi-CPU tasks where it looks to me like the estimate is for if the task is running on just on processor rather than the six I am using.
ID: 111790 · Report as offensive
1 · 2 · Next

Message boards : Questions and problems : High priority mode?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.