Thread 'Starting New Tasks with Many Tasks in "waiting to run" State'

Message boards : Questions and problems : Starting New Tasks with Many Tasks in "waiting to run" State
Message board moderation

To post messages, you must log in.

AuthorMessage
d_j_liu

Send message
Joined: 9 Jun 08
Posts: 4
United States
Message 34865 - Posted: 23 Sep 2010, 18:10:49 UTC

On computers running multiple projects, sometimes a project may have several tasks in "waiting to run" state, yet the scheduler starts new tasks of that projects instead of resuming old tasks waiting for their turn.

Is this a bug?
ID: 34865 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15573
Netherlands
Message 34869 - Posted: 23 Sep 2010, 19:44:56 UTC - in response to Message 34865.  

BOINC will always try to run all work before its deadline. So it can happen that it starts newer tasks before continuing with ones that had run already for a bit.
ID: 34869 · Report as offensive
d_j_liu

Send message
Joined: 9 Jun 08
Posts: 4
United States
Message 34876 - Posted: 23 Sep 2010, 23:07:47 UTC - in response to Message 34869.  

I just noticed that the BOINC manager did not have the "leave applications in memory while suspended" option enabled. Let's see what happens if I enable it.
ID: 34876 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15573
Netherlands
Message 34878 - Posted: 23 Sep 2010, 23:15:04 UTC - in response to Message 34876.  

Then it will leave all the tasks that are stopping and going to "waiting to run" in memory. Have enough tasks and you could fill up your memory. It won't 'force' tasks to completion.
ID: 34878 · Report as offensive
d_j_liu

Send message
Joined: 9 Jun 08
Posts: 4
United States
Message 34892 - Posted: 24 Sep 2010, 15:58:18 UTC - in response to Message 34878.  

Yes, you are right -- the total number of running and waiting tasks decreased, but still exceeds the number of CPU in the system.

Thanks.
ID: 34892 · Report as offensive
manuel oliveira

Send message
Joined: 6 Feb 10
Posts: 18
Portugal
Message 35093 - Posted: 4 Oct 2010, 10:54:31 UTC - in response to Message 34892.  
Last modified: 4 Oct 2010, 11:12:52 UTC

In my opinion this is a bug. As I have already mentioned somewhere in this forum, it is not normal to start work units after wu's, without finishing them, reaching deadline, a complete mess...
Downgrading to 6.10.17, the same work is orderly performed(FIFO). This is easily seen when crunching very small work units.
This happens to me while working for EDGeS@home and Ibercivis projects using 6.10.5x's both linux and microsoft OS's.
Regards.
ID: 35093 · Report as offensive
manuel oliveira

Send message
Joined: 6 Feb 10
Posts: 18
Portugal
Message 35330 - Posted: 21 Oct 2010, 18:30:06 UTC - in response to Message 35093.  
Last modified: 21 Oct 2010, 18:30:59 UTC

Using now version 6.10.58 and all is Ok!
Regards.
ID: 35330 · Report as offensive
manuel oliveira

Send message
Joined: 6 Feb 10
Posts: 18
Portugal
Message 35562 - Posted: 31 Oct 2010, 10:39:59 UTC - in response to Message 35330.  

Unfortunately, after some time working well, this fault returned both on windows and linux OS machines. So I am now using 6.10.17 / 18 without issues, 100% OK.
Regards.
ID: 35562 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15573
Netherlands
Message 35563 - Posted: 31 Oct 2010, 11:53:57 UTC - in response to Message 35562.  

The developers are working on a new tack: http://boinc.berkeley.edu/trac/wiki/ClientSchedOctTen
ID: 35563 · Report as offensive
manuel oliveira

Send message
Joined: 6 Feb 10
Posts: 18
Portugal
Message 35565 - Posted: 31 Oct 2010, 18:06:04 UTC - in response to Message 35563.  
Last modified: 31 Oct 2010, 18:14:06 UTC

Thank you for your reply.
I would like to add that this is happening even in computers running just wu's of one application of "just one" project, so the nature of this issue may be other than those described in http://boinc.berkeley.edu/trac/wiki/ClientSchedOctTen

Regards

-------
Part of http://boinc.berkeley.edu/trac/wiki/ClientSched
This looks like a problem somewhere in...

CPU scheduling policy

The CPU scheduler uses an earliest-deadline-first (EDF) policy for results that are in danger of missing their deadline, and weighted round-robin among other projects if additional CPUs exist. This allows the client to meet deadlines that would otherwise be missed, while honoring resource shares over the long term. The scheduling policy is:

1. Set the 'anticipated debt' of each project to its short-term debt
2. Let P be the project with the earliest-deadline runnable result among projects with deadlines_missed(P)>0. Let R be P's earliest-deadline runnable result not scheduled yet. Tiebreaker: least index in result array.
3. If such an R exists, schedule R, decrement P's anticipated debt, and decrement deadlines_missed(P).
4. If there are more CPUs, and projects with deadlines_missed(P)>0, go to 1.
5. If all CPUs are scheduled, stop.
6. If there is a result R that is currently running, and has been running for less than the CPU scheduling period, schedule R and go to 5.
7. Find the project P with the greatest anticipated debt, select one of P's runnable results (picking one that is already running, if possible, else the one received first from the project) and schedule that result.
8. Decrement P's anticipated debt by the 'expected payoff' (the scheduling period divided by NCPUS).
9. Go to 5.

The CPU scheduler runs when a result is completed, when the end of the user-specified scheduling period is reached, when new results become runnable, or when the user performs a UI interaction (e.g. suspending or resuming a project or result).
CPU schedule enforcement

The CPU scheduler decides what results should run, but it doesn't enforce this decision. This enforcement is done by a separate scheduler enforcement function, which is called by the CPU scheduler at its conclusion. Let X be the set of scheduled results that are not currently running, let Y be the set of running results that are not scheduled, and let T be the time the scheduler last ran. The enforcement policy is as follows:

1. If deadline_missed(R) for some R in X, then preempt a result in Y, and run R (preempt the result with the least CPU wall time since checkpoint). Repeat as needed.
2. If there is a result R in Y that checkpointed more recently than T, then preempt R and run a result in X.
(...something wrong in the scheduler enforcement function?)
ID: 35565 · Report as offensive
Chris

Send message
Joined: 11 Nov 10
Posts: 1
United Kingdom
Message 35692 - Posted: 11 Nov 2010, 13:17:20 UTC - in response to Message 35565.  
Last modified: 11 Nov 2010, 13:19:46 UTC

I am getting really annoyed with this bug.

I have hundreds of WU's that are 50-99.9% complete but the scheduler ignores them and starts up another fresh one.

After lots of analysis in my opinion the scheduler just goes and finds the next WU with the lowest amount of work done instead of the highest.

You can test this by suspending everything except a few WU's with varying % complete....

When the scheduler moves onto one of these "waiting to run" tasks it will invariably pick the WU with the lowest % compete every time *grrr*

I have this problem with 6.10.58 x32 and x64...

In my opinion there is something wrong with the scheduler enforcement function...
ID: 35692 · Report as offensive

Message boards : Questions and problems : Starting New Tasks with Many Tasks in "waiting to run" State

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.