Unfinished Tasks Question.

Message boards : BOINC Manager : Unfinished Tasks Question.
Message board moderation

To post messages, you must log in.

AuthorMessage
JRLatham

Send message
Joined: 30 May 20
Posts: 3
Message 98890 - Posted: 30 May 2020, 14:00:48 UTC

My BOINC screen show a lot of tasks for the same project and they are unfinished.
How do I get BOINC to finish these projects.
It appears that BOINC will finish a project and then start a new one rather that finish an unfinished task.
This is very disconcerting.
I have been a BOINC user / supporter for several years now.
James R. Latham (JRLatham)
ID: 98890 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 98896 - Posted: 30 May 2020, 14:31:52 UTC

Which projects are you trying to run?

Are there any error messages?

What is your hardware?

The answers to these questions will help others workout the most probable answers to your question.

In general BOINC will sort out the scheduling of tasks being run so that all the tasks on a computer will finish before their due date and that can result in tasks only being partially completed before they are suspended, and tasks which are due in the very near future running to completion.



A little bit of nomenclature to help you understand what is going on:
BOINC - The environment in which a number of projects can be be run (it does a lot of background work, but let's keep this simple just now).
Project - A science project which distributes an application and tasks to a host.
Application - The program that actually does the calculations.
Task - A little bit of data that the project sends to you for the application to work on to provide a result back to the project.
Host - A computer that is working on one or more tasks from one or more projects.
ID: 98896 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 98971 - Posted: 31 May 2020, 22:12:51 UTC
Last modified: 31 May 2020, 22:16:26 UTC

Unlike FAH, most projects aren't time based.
Which means that it doesn't matter if one job is finished in 1 hour, or in 10. You'll still get the same credit.

Boinc client, has 2 systems working against the full completion of a task in one go.
The first one is prioritizing WUs with lower deadline.
A mechanism that allows WUs to be done first, so they don't time out.
It's something that's built in, and you can't disable it from Manager.

The second, is a setting you can adjust yourself in manager.
Some WUs aren't very stable. When they run, they can run into an error, where the WU stalls, and hogs up a CPU core, preventing other processes to make use of this processing power.
To prevent the CPU to be passive when you're not around, until you manually address the issue, the Boinc client has a setting that can be adjusted in the manager, that will lengthen auto rotation of WUs.

It's found in advanced >> Options >> Computer preferences >> Other >> Switch tasks every x-minutes.
If a WU takes 4 hours to complete, and the setting is set to 60 minutes, it'll only do the first 25% of the WU, before checking for another WU to complete, and then come back to finish the rest of the WU in 3 more sessions. Sometimes it chooses the same WU, and you won't notice it. Sometimes, for whatever priority reasons you've set to a project, it might want to finish another project first.

If you set this value to 999, Boinc won't autorotate on most WUs anymore, however, if one WU is 'frozen', it could remain frozen (cpu core unused) for the remainder of the 999 minutes.
ID: 98971 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 98972 - Posted: 31 May 2020, 22:41:41 UTC - in response to Message 98971.  

The first one is prioritizing WUs with lower deadline.
A mechanism that allows WUs to be done first, so they don't time out.
It's something that's built in, and you can't disable it from Manager.
You're talking about the scheduler, which is FIFO based (first in, first out) and which will only go into EDF (earliest deadline first) mode when tasks are in danger of missing their deadline. Normally the ones with the longer deadline. Which is why people are always complaining that tasks with an earlier deadline are waiting to be done while BOINC runs tasks with a later deadline almost exclusively.

BOINC its main purpose that it will always try to follow is to get all work it's got in cache done by their individual deadlines. And when that's impossible, due to the user micromanaging or because there's so much work in cache that it can't be all done by deadline, BOINC will still try to get as much scheduled for running as it can.

To prevent the CPU to be passive when you're not around, until you manually address the issue, the Boinc client has a setting that can be adjusted in the manager, that will lengthen auto rotation of WUs.

...

If you set this value to 999, Boinc won't autorotate on most WUs anymore, however, if one WU is 'frozen', it could remain frozen (cpu core unused) for the remainder of the 999 minutes.
So you tell the guy to go adjust this value, because you first claim this will prevent the CPU from being passive, to then say the CPU may be passive for the whole of the 999 minutes when the task has hung? What kind of advice is that?
ID: 98972 · Report as offensive
JRLatham

Send message
Joined: 30 May 20
Posts: 3
Message 99028 - Posted: 1 Jun 2020, 23:49:41 UTC - in response to Message 98896.  

Here are the answers to your questions. - JRLatham

Which projects are you trying to run?
Ans. My tasks drop down box shows the following

Distributed.net Client
Gamma-ray pulsar search #5
Gravitational Wave search 02 Multi-Directional GPU
Milkyway @ home separation

Are there any error messages?
Ans. No, there are no error messages.

What is your hardware?
Ans. I am running an Apple iMac with macOS 10.15.5 Catalina installed.
~~~
ID: 99028 · Report as offensive
Keith T
Avatar

Send message
Joined: 26 Feb 07
Posts: 71
United Kingdom
Message 99103 - Posted: 5 Jun 2020, 10:14:34 UTC - in response to Message 99028.  

Here are the answers to your questions. - JRLatham

Which projects are you trying to run?
Ans. My tasks drop down box shows the following

Distributed.net Client
Gamma-ray pulsar search #5
Gravitational Wave search 02 Multi-Directional GPU
Milkyway @ home separation

Are there any error messages?
Ans. No, there are no error messages.

What is your hardware?
Ans. I am running an Apple iMac with macOS 10.15.5 Catalina installed.
~~~


You might have a few Tasks that are close to deadline.
As some others have said already, your Tasks might get preempted by another one that has a shorter deadline.
It is usually best to have a smaller cache of tasks, particularly if you are running multiple projects.
Less than 1 day of cache is probably best.
ID: 99103 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 99140 - Posted: 5 Jun 2020, 20:20:31 UTC - in response to Message 98972.  
Last modified: 5 Jun 2020, 20:21:27 UTC

The first one is prioritizing WUs with lower deadline.
A mechanism that allows WUs to be done first, so they don't time out.
It's something that's built in, and you can't disable it from Manager.
You're talking about the scheduler, which is FIFO based (first in, first out) and which will only go into EDF (earliest deadline first) mode when tasks are in danger of missing their deadline. Normally the ones with the longer deadline. Which is why people are always complaining that tasks with an earlier deadline are waiting to be done while BOINC runs tasks with a later deadline almost exclusively.

BOINC its main purpose that it will always try to follow is to get all work it's got in cache done by their individual deadlines. And when that's impossible, due to the user micromanaging or because there's so much work in cache that it can't be all done by deadline, BOINC will still try to get as much scheduled for running as it can.

To prevent the CPU to be passive when you're not around, until you manually address the issue, the Boinc client has a setting that can be adjusted in the manager, that will lengthen auto rotation of WUs.

...

If you set this value to 999, Boinc won't autorotate on most WUs anymore, however, if one WU is 'frozen', it could remain frozen (cpu core unused) for the remainder of the 999 minutes.
So you tell the guy to go adjust this value, because you first claim this will prevent the CPU from being passive, to then say the CPU may be passive for the whole of the 999 minutes when the task has hung? What kind of advice is that?


Not sure, because you cut my response, and took words out of context.
What you're saying makes no sense!
What I was saying is that, adjusting this value (to a lower value) will prevent the CPU from running a stalled Wu for too long.

And setting it to 999 will disable boinc from switching WUs for at a maximum of 999 minutes. But if a Wu stalls, it'll stall for a long time.
ID: 99140 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99145 - Posted: 5 Jun 2020, 21:22:33 UTC - in response to Message 99140.  
Last modified: 5 Jun 2020, 21:24:20 UTC

Not sure, because you cut my response, and took words out of context.
I'm sorry, I didn't mean to make it out of context, I used the ellipsis (...) to signal I cut the long quote, people can look up the rest of the text and you know what you wrote. All the double, triple, quadruple etc. full quoting of whole posts is what makes things totally unreadable, especially on mobile devices. Therefore it's easier (for me) to cut the quote to the parts I react to.

What I was saying is that, adjusting this value (to a lower value) will prevent the CPU from running a stalled Wu for too long.

And setting it to 999 will disable boinc from switching WUs for at a maximum of 999 minutes. But if a Wu stalls, it'll stall for a long time.

But if a task is stalled, it doesn't matter what value you set here because a stalled task doesn't get switched out. It's stalled. Either it stays running in memory even after BOINC was cut out, or if you're lucky it eventually errs due to going over set restrictions, or if you're unlucky it stays running for absolutely forever while BOINC runs. I've seen that happen with the new Rosetta 4.20 app and their Junior_HalfRoid tasks. These don't checkpoint that often and when stuck, they'll be stuck for days. Had some that were 6.3 days over the deadline when I found out.
ID: 99145 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 99168 - Posted: 7 Jun 2020, 16:47:11 UTC - in response to Message 99145.  

Not sure, because you cut my response, and took words out of context.
I'm sorry, I didn't mean to make it out of context, I used the ellipsis (...) to signal I cut the long quote, people can look up the rest of the text and you know what you wrote. All the double, triple, quadruple etc. full quoting of whole posts is what makes things totally unreadable, especially on mobile devices. Therefore it's easier (for me) to cut the quote to the parts I react to.

What I was saying is that, adjusting this value (to a lower value) will prevent the CPU from running a stalled Wu for too long.

And setting it to 999 will disable boinc from switching WUs for at a maximum of 999 minutes. But if a Wu stalls, it'll stall for a long time.

But if a task is stalled, it doesn't matter what value you set here because a stalled task doesn't get switched out. It's stalled. Either it stays running in memory even after BOINC was cut out, or if you're lucky it eventually errs due to going over set restrictions, or if you're unlucky it stays running for absolutely forever while BOINC runs. I've seen that happen with the new Rosetta 4.20 app and their Junior_HalfRoid tasks. These don't checkpoint that often and when stuck, they'll be stuck for days. Had some that were 6.3 days over the deadline when I found out.


...Which is why you can modify the value to eg: 60. That way every 60 minutes the client should swap out tasks.
And sometimes doing another WU, and re-doing the Wu that got stuck,will get it sorted out.
I have been running Boinc since the beginning of this year.
And for the past 3 months have been running it on 3x Ryzen 9 3000 series CPUs (and a 6 core intel CPU as well). That's 86 threads, and I yet have to see the first WU get stuck, or hog up my CPU.

I've had Cosmology WUs (24), that tried to push 24 WUs through 1 CPU thread, locking the 23 other threads to idle.
That was about the only issue I've had so far, but I've just cancelled all of them, and disabled Cosmology from sending me those type of WUs; because they are not present on the forums to address the issue.
Other than that, I've never encountered a WU that stalled, and if it did, it would give an error. or perhaps time out.
I check my systems regularly
ID: 99168 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99170 - Posted: 7 Jun 2020, 18:18:38 UTC - in response to Message 99168.  

...Which is why you can modify the value to eg: 60. That way every 60 minutes the client should swap out tasks.
You're missing the point. When a task is stuck, it doesn't get swapped out. It can even survive a BOINC exit and restart, by staying running in memory outside of the new BOINC instance. So advising to put the switch applications every N minutes value to something ludicrous high doesn't help on stuck tasks. They don't swap out of memory or BOINC once stuck. At the end of whichever value you set for N, all non-stuck tasks gets swapped out but the stuck task is stuck, it doesn't go anywhere. It's no longer reacting to anything BOINC tells it, but for maybe an abort command. And even that's not always working.

About the only safe way to get rid of them is a reboot, if the OS allows that. I mean, my Windows 10 stops rebooting and waiting for me to click an option if I leave Winamp open... go figure.
ID: 99170 · Report as offensive
JRLatham

Send message
Joined: 30 May 20
Posts: 3
Message 99226 - Posted: 10 Jun 2020, 9:07:58 UTC - in response to Message 99103.  

Keith how can I see the deadline on each of the multiple tasks.
I have several tasks for each of these four.
Distributed.net Client
Gamma-ray pulsar search #5
Gravitational Wave search 02 Multi-Directional GPU
Milkyway @ home separation
Some event on my computer is causing an older task to be suspended and a new task to be started.
Could it be a restart after a computer crash?
Thanks for your help
Jim Latham
ID: 99226 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 99259 - Posted: 12 Jun 2020, 1:19:03 UTC - in response to Message 99226.  

Keith how can I see the deadline on each of the multiple tasks.
The BOINC client runs in the background. To see exactly what it is doing, you need to open a separate program called BOINC Manager. In addition to observation, the Manager allows you to make changes to, or exert some control over, how the client is behaving. Until you know what you are doing, it's best to just use the Manager to observe.

If you have the Manager open, make sure you set it to "Advanced view" if you want to see and understand what is really happening. If you see just pretty pictures and little info, go to the 'View' menu and select "Advanced".

In the advanced view, there are a number of different 'tabs' labeled "Notices", "Projects", "Tasks", etc. Select the Tasks tab and you will be able to see lots of columns of information for a whole page of current tasks for your different projects. One of the columns will be headed "Deadline". That is how you see what all the competing deadlines are. You can easily adjust the window size and drag the column separators (between column headings) to see all the data for a particular column if the column has an inappropriate width.

You really should look at the available information on all the tabs if you want to have an understanding of what the client is doing and how it is handling different tasks. To know what different projects you are running, look at the Projects tab. If you want to know what applications you are running (quite a different matter) look at the "Application" column on the Tasks tab. In addition to the different tabs, the very top line of the screen has a range of menu items which you should make yourself familiar with.

Exploring the Advanced view in this manner will often allow you to answer your own questions. If there are things you don't understand, you could always consult the User manual which goes into detail about this very basic stuff, complete with pretty pictures.

Some event on my computer is causing an older task to be suspended and a new task to be started.
Could it be a restart after a computer crash?
No.

Maybe you have too large a work cache size and the client has gone into high priority mode rather than the more normal FIFO (First In First Out) mode so as not to miss deadlines if possible. That could easily happen if your machine has been off for a while and the client thinks that this low on-time will be normal behaviour for the future as well. That is the assumption it will make if there has been a period of not running.
Cheers,
Gary.
ID: 99259 · Report as offensive
Profile Toombra
Avatar

Send message
Joined: 29 Dec 08
Posts: 14
Canada
Message 99420 - Posted: 24 Jun 2020, 8:31:51 UTC

The other problem I'm seeing is that multi-core tasks will "hog" all the resources and instead of being able to get more WU done in general, a couple of multi-core WUs (often with deadlines that are later in my case) will take the focus.

Here is a screenshot that I take tonight, since I will be at work for a majority of the next day, and possibly out for the rest of that evening. Several of these Rosetta WUs are in danger of not being completed, because the scheduler would like to leave them "until the last minute" while it prioritizes the other and then panic and prioritize them to try to get done. You can see that the multi-core Milkyway@Home WUs are even preventing the non-multicore WUs from finishing up and being completed.

https://imgur.com/a/2c3rXwo

I think the deadlines should become more of a priority in the scheduler. Either give more leeway so that panic time is activated sooner or just change the logic so that the scheduler goes "I should do this task that's due in 3 days rather than this task that's due in 2 weeks". It seems to me there's probably a lot of incomplete WUs in the pipeline among those who aren't running 24/7 machines or building systems around BOINC.
ID: 99420 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 99422 - Posted: 24 Jun 2020, 9:16:28 UTC - in response to Message 99420.  

I think the deadlines should become more of a priority in the scheduler.
The deadlines are the only thing BOINC cares about. But within reason, and you as the user has the responsibility to allow BOINC to do all work before the deadline, by pairing projects together with similar deadlines and not filling up the cache to where it's absolutely impossible for it to do all that work before their deadlines.

Want to see how BOINC thinks? Enable the rr_simulation flag in Event Log Options.

PS: posting a screen shot of what your system is doing without saying what system you have isn't very serious.
ID: 99422 · Report as offensive
Profile Toombra
Avatar

Send message
Joined: 29 Dec 08
Posts: 14
Canada
Message 100163 - Posted: 31 Jul 2020, 5:07:28 UTC - in response to Message 99422.  

The deadlines are the only thing BOINC cares about. But within reason, and you as the user has the responsibility to allow BOINC to do all work before the deadline, by pairing projects together with similar deadlines and not filling up the cache to where it's absolutely impossible for it to do all that work before their deadlines.


I just feel like the BOINC should be getting as many WUs done as possible, and if it's told by my preferences screen to Store at least 0.1 days of work and up to an additional 0.1 days of work... then it should be managing it's download requests in a manner to get the optimum amount of WUs done.

Right now, I'm watching as a mt WU has hijacked the system to process, while several other project WUs have now missed their deadline because they were not allocated any CPU to get the work done in the allotted time. (*also a reduction in general uptime due to heatwaves). I watched for a whole 5 days where they could have been completed and then it could have been resource shared to the single mt thread, however the logic dictated different.
ID: 100163 · Report as offensive

Message boards : BOINC Manager : Unfinished Tasks Question.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.