What's wrong with the way deadlines work using BOINC?

Message boards : Questions and problems : What's wrong with the way deadlines work using BOINC?
Message board moderation

To post messages, you must log in.

AuthorMessage
O&O
Avatar

Send message
Joined: 15 Jun 18
Posts: 12
Saudi Arabia
Message 86895 - Posted: 5 Jul 2018, 15:53:12 UTC
Last modified: 5 Jul 2018, 15:59:11 UTC

Hi,

This is an extract from a message broads by a moderator, not BOINC.
Can someone from BOINC validate this, please?

What's wrong with the way deadlines work

The biggest problem is that the BOINC client isn't really good at scheduling tasks, especially when the user is running tasks from multiple projects. It frequently starts tasks too late, and they end up finishing after the deadline. On XXXXXX, you still get credit for late tasks (within reason), but that's not true at all projects.

Another problem is that sometimes the amount of time needed to process a task can vary a lot. One example is when you have both CPU and GPU apps. The best example is the GFN-WR tasks, which are currently GPU-only. The deadline for that is based on the speed of a moderate GPU. If we were to allow CPUs, we would need a much, much longer deadline. The result would be that it would take much longer to detect abandoned tasks, and we want to avoid that.

Along the same lines, sometimes the nature of the task itself means that it will take much longer to run than was anticipated. SR5 is the best example. These are fairly short tasks, except if you happen to find a prime. When a prime is found, LLR has to run additional tests, and the processing can take up to 10 times longer than normal. To prevent the server from sending out replacement tasks to other computers while your computer is doing that extra processing, we've given SR5 a ridiculously long 15 day deadline. If we didn't do that, you might discover a prime, but that extra replacement task that gets sent out could be processed by a faster computer and returned before your original task completes -- and then the other guy would get credit for discovering the prime instead of you. So we make the deadline really long for that 1-in-10,000 chance that a number might be a prime.

Furthermore, it's just nice to get results quickly. If you're waiting for credit, it's one thing to be waiting for the other guy to finish, but waiting many days (or weeks) when nobody is crunching the other task just wastes everybody's time.


Thanks
ID: 86895 · Report as offensive
O&O
Avatar

Send message
Joined: 15 Jun 18
Posts: 12
Saudi Arabia
Message 86898 - Posted: 5 Jul 2018, 17:46:31 UTC - in response to Message 86895.  
Last modified: 5 Jul 2018, 18:08:45 UTC

My questions are, ...

a) Is it possible for a project under BOINC to send a WU that requires far more time to complete the computation (Ex. 80 days) on a given host and the deadline is far less (Ex. 20 days) while the host CAN ONLY complete the computation beyond the deadline (the 20 days)?

b) Isn't one of the attributes of a WU is to estimate the average number of floating-point operations required to complete the computation and also estimate how long the computation will take on a given host.(the rsc_fpops_est)?

If so, ...

c) Then when a project under BOINC is found to do so, doesn't this constitute an abuse of participant hosts in accordance to CREATING BOINC PROJECTS?

Intentional abuse of participant hosts by projects
BOINC does nothing to prevent this (e.g. there is no 'sandboxing' of applications). Participants must understand that when they join a BOINC project, they are entrusting the security of their systems to that project.

Accidental abuse of participant hosts by projects
BOINC prevents some problems: for example, it detects when applications use too much disk space, memory, or CPU time, and aborts them. But applications are not 'sandboxed', so many types of accidental abuse are possible. Projects can minimize the likelihood by pre-released application testing. Projects should test their
applications thoroughly on all platforms and with all input data scenarios before promoting them to production status.


Thanks,
O&O
ID: 86898 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 86899 - Posted: 5 Jul 2018, 18:16:50 UTC - in response to Message 86898.  

It is certainly possible for the project administrators, scientists, or researchers - the people, not the abstract project - to make a mistake and create tasks with a longer runtime than the assigned deadline (we were talking to one of them just the other day).

If your client receives such a task, it will do its best to complete the work as quickly is possible, running that task in 'Earliest Deadline First' mode and postponing other jobs.

The client will automatically abort a task which has been cached, and work hasn't even started on it by the time the deadline is reached. If work has already started on it, but is unfinished by the time the deadline is reached, a warning will be printed in the Event Log at startup (these scenarios typically apply if the user switches the machine off and takes a holiday).

I thought that project servers (schedulers) wouldn't send tasks if it was 'infeasible' (the word used) for the client to finish it before the deadline. But different projects have schedulers from different revisions, and I don't know the age of the Primegrid scheduler.
ID: 86899 · Report as offensive
O&O
Avatar

Send message
Joined: 15 Jun 18
Posts: 12
Saudi Arabia
Message 86901 - Posted: 5 Jul 2018, 18:46:47 UTC - in response to Message 86899.  
Last modified: 5 Jul 2018, 19:01:33 UTC

It is certainly possible for the project administrators, scientists, or researchers - the people, not the abstract project - to make a mistake and create tasks with a longer runtime than the assigned deadline (we were talking to one of them just the other day).

Thank you.

To many "mistakes" which I don't have the time or inclination to prove, ... resulting on frequent waste of CPU resources.

Last Question please, ....

What is really then the use of "Store At Least x days of work" and "Store up to an additional y days of work"?
Mine was x=0.26 days and y=0.5 days ,... never and ever more.


Regards,
O&O
ID: 86901 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 86903 - Posted: 5 Jul 2018, 19:09:29 UTC - in response to Message 86901.  

In the early days of the internet, when CPUs were slow and GPUs didn't exist, 10 days wasn't much work to cache, and people working away from home or on the road might be away from communications for days at a time.

Even now, some trades - submarine crew come to mind - may be unable to make personal use of the internet for months.
ID: 86903 · Report as offensive
jglrogujgv

Send message
Joined: 6 Jul 18
Posts: 49
Barbados
Message 86905 - Posted: 6 Jul 2018, 7:37:34 UTC - in response to Message 86901.  

To many "mistakes" which I don't have the time or inclination to prove, ... resulting on frequent waste of CPU resources.

#MeToo.BOINC


What is really then the use of "Store At Least x days of work" and "Store up to an additional y days of work"?

It's a fudge factor. Use it to your benefit or don't. Nobody cares if you do; nobody cares if you don't.


Mine was x=0.26 days and y=0.5 days ,... never and ever more.

Nobody cares what yours was set to. If your settings aren't working out for you then change them. Or don't change them, whatever.
ID: 86905 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 86922 - Posted: 7 Jul 2018, 23:02:07 UTC

The “store at least” in current clients is the cache low-water mark. The “store up to” becomes the high-water mark. So in your example you have 0.5 low-water and 0.76 as high-water days of work the client tries to keep cached.

Current server code stores estimates by app version after a machine has done at least 11 task for each app version. The older method was to store a single DCF (duration correction factor) value on the client but it gets confused with different app versions so estimates vary wildly. Which is used depends on the server software and a lot of projects are still running older versions.

I run multiple projects, but only one active at a time. I have a small cache setting and I toggle which one by setting no new tasks on the project tab in the manager. Some projects play nice together and others not. The ones that play nice usually have similar run times.
MarkJ
ID: 86922 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 86925 - Posted: 8 Jul 2018, 0:22:44 UTC - in response to Message 86922.  

... you have 0.5 low-water and 0.76 as high-water days of work the client tries to keep cached.

That's actually a 0.26 low-water. It's the 'extra days' that is 0.5.

The other thing to point out is that the cached work isn't always 0.76. My understanding is that when the low mark is reached, the client 'fills up' to the high and doesn't keep topping up but allows the cache to drain until the low is again reached. You will have a work supply that oscillates between 0.26 and 0.76 (in theory). If you were using the cache as a protection against temporary failures in work supply, you'd probably be better off putting the full amount of 'protection' into the first setting only.
Cheers,
Gary.
ID: 86925 · Report as offensive

Message boards : Questions and problems : What's wrong with the way deadlines work using BOINC?

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.