Tasks don't switch as scheduled...

Message boards : BOINC client : Tasks don't switch as scheduled...
Message board moderation

To post messages, you must log in.

AuthorMessage
Tess

Send message
Joined: 8 Aug 07
Posts: 5
United States
Message 11975 - Posted: 8 Aug 2007, 22:21:44 UTC

I hope I'm posting to the right category...

I've been running Boinc (currently 5.10.13 manager) for about 2 years now. Recently, it has stayed stuck on one task even though I am attached to 5 projects. The only way I can get it to switch from running one task to another is if I suspend the running task. What is going on?

Tess
ID: 11975 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 11976 - Posted: 8 Aug 2007, 22:48:36 UTC


There are several things it could be, the most likely is something called 'long term debt', which occurs if Boinc starts to panic about finishing the work unit by the deadline. The most common reason for this to happen is if the resource share for that project is too low, but there can be other reasons (DCF, deadlines reset to 1901, etc).


What projects do you run, what resource share do you give to each, and what are the deadlines for the longer tasks?
ID: 11976 · Report as offensive
Tess

Send message
Joined: 8 Aug 07
Posts: 5
United States
Message 11977 - Posted: 8 Aug 2007, 22:59:35 UTC - in response to Message 11976.  

climatepredict - Apr 20, 2008
PredictorHome - Aug 28, 2007
Rosetta - Aug 16, 2007
Seti - Aug 19, 2007
lhcat - no work currently

They just use default settings, I think. They are suppose to switch every 60 minutes.

Tess
ID: 11977 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 11978 - Posted: 8 Aug 2007, 23:00:29 UTC
Last modified: 8 Aug 2007, 23:03:37 UTC

It's not 'long term debt' that is in the way, nor short term debt, for that matter. No debts.

(For Mike: Long term debt decides when a project can download new work, short term debt decides when a result needs to be crunched next).

Since you, Tess, don't say what result of project it is and which other projects you are attached to, I am assuming that BOINC thinks that the esult is out of time to reach the deadline. So BOINC decides that this result needs to get all available time to try to reach the deadline. Even if the deadline is still days away.

This was called EDF or Earliest Deadline First crunching, but no notification of it was done for a while. In newer 5.10 versions it'll show as Running, High priority in the Tasks tab.

Just let it go. Don't stop it, as it'll only make sure you won't make the deadline for that project.

Always check the deadline date and time for results like this. They will likely be within the next 24 hours. If not, then their estimated time to continue is too way off from the deadline time. Then BOINC runs the result exclusively and checks when the time to completion is somewhat more normal before releasing itself to other results.
ID: 11978 · Report as offensive
Tess

Send message
Joined: 8 Aug 07
Posts: 5
United States
Message 11979 - Posted: 8 Aug 2007, 23:25:50 UTC - in response to Message 11978.  

I think it started happening after I came back from vacation last month. Before leaving for my vacation, I made sure to not fetch any new tasks. All remaining tasks were finished except climatepredict, because it has a huge long task. I was gone for 6 days. After I came back and resumed things, Boinc wouldn't fetch any new tasks. I had to detach and reattach the projects. Then new tasks were fetched but wouldn't start unless I suspended climatepredict.

ID: 11979 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 11980 - Posted: 9 Aug 2007, 0:04:53 UTC
Last modified: 9 Aug 2007, 0:08:24 UTC

If you occasionally suspend ClimatePrediction in order to get new work and also contribute to other projects, it won't do your climate model any harm because cpdn ignores the deadlines and accepts results uploaded later. It's the only project to do this, but boinc doesn't realise.

As long as you don't complete it too late.....

When you eventually need a new climate model, there are now shorter ones also available. You can select in your cpdn project preferences. A shorter model might suit your computer usage and mix of boinc projects better. You can find out what's now available here.

We do need to finish our current models first though.


ID: 11980 · Report as offensive
Tess

Send message
Joined: 8 Aug 07
Posts: 5
United States
Message 11981 - Posted: 9 Aug 2007, 0:13:04 UTC - in response to Message 11980.  

If you occasionally suspend ClimatePrediction in order to get new work and also contribute to other projects, it won't do your climate model any harm because cpdn ignores the deadlines and accepts results uploaded later. It's the only project to do this, but boinc doesn't realise.

As long as you don't complete it too late.....

When you eventually need a new climate model, there are now shorter ones also available. You can select in your cpdn project preferences. A shorter model might suit your computer usage and mix of boinc projects better. You can find out what's now available here.

We do need to finish our current models first though.



mo.v,

That's good to know. Thanks!

Tess
ID: 11981 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 11982 - Posted: 9 Aug 2007, 1:02:09 UTC - in response to Message 11979.  
Last modified: 9 Aug 2007, 1:03:15 UTC

Who told you to detach and re-attach to get work again?
I ask as BOINC is a delicate piece of work, which will determine by itself if projects need work or not. Had you just allowed those other projects to get work again, they would have if only you allowed BOINC to run its course. It was probably just relieving itself of the amount of long term debt that it had accumulated as you and BOINC had put preferences on the other projects.

Anyway, don't detach and re-attach. Try a rest first.
Yet where CPDN doesn't care about their deadline and still set one on their models (mo.v, Mike, why isn't that deadline set to 2018?), all other projects do like their results back in by the time they set the deadline. Each minute one other result takes time away from the CPDN model because it is due in, that CPDN model will eventually take back to make ITs deadline... even though CPDN doesn't have a deadline as such.

Just curious, which projects do you run?
ID: 11982 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 11983 - Posted: 9 Aug 2007, 1:27:36 UTC
Last modified: 9 Aug 2007, 1:38:39 UTC

Tess listed her projects above - you probably missed this as you were posting at the same moment.

'climatepredict - Apr 20, 2008
PredictorHome - Aug 28, 2007
Rosetta - Aug 16, 2007
Seti - Aug 19, 2007
lhcat - no work currently'


A very few BBC-cpdn members do seem to think their model deadline is 2018, judging by some reports in the thread about the race to produce the last completed model. And guess what? I believe MikeMars has a model with a marginal chance of winning. He's deliberately holding this model back as long as possible and at the last moment will make it race to the finish.

Tess, I'm not advising you to do the same.

Seriously though, the deadline for the 160-year models is a year, the aim being to induce in members a certain sense of urgency......

Jorden, are you generally advising a rest or a reset or both?

ID: 11983 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 11986 - Posted: 9 Aug 2007, 7:16:14 UTC
Last modified: 9 Aug 2007, 7:28:01 UTC


I keep suspending the BBC model whenever new CPDN-beta applications come out. So the poor thing sits in a corner for weeks at a time, unfed and unloved... the RSPCM will no doubt come banging at my door any day now.

Getting back on topic, if Boinc had been shut down during the vacation, then the '%time computer is running' figure will be low, and would confuse the scheduler (until the figures adjust themselves back to what they should be). Could you provide a link to the host on CPDN?

We've discussed asking for a '<ignore-deadline/>' flag to be implemented in the scheduler so that work units can request simple scheduling instead (sticking to resource share regardless of it's own deadlines). This would also bypass the 1901 deadline issue if it occurs.

ID: 11986 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 11990 - Posted: 9 Aug 2007, 10:29:46 UTC
Last modified: 9 Aug 2007, 10:38:53 UTC

Mike, would the idea be that

a) all cpdn workunits would automatically bypass the sophisticated/flexible boinc scheduler?

b) or would the scheduler only implement a fixed resource share for cpdn if the member selected this option?

c) in the case of both a) and b), would sophisticated/flexible scheduling be retained for workunits from other projects?

d) or do we want all crunchers to have a choice between the previous simple round-robin scheduler and the current flexible one?


Tess, if a very long workunit is preventing you from getting work from other projects, the usual procedure is to suspend the extralong WU using the Suspend button in the Tasks tab. In extreme circumstances you could reset the projects that refuse to fetch new work, but it should never be necessary to detach and reattach to projects in order to get new work.

However, if you just allow the boinc scheduler to do its own thing, it will try to share crunching time fairly between projects in the longer term, and it will usually succeed. But if the computer is a slow one, or doesn't crunch for much of the time, or the member has attached to a large number of projects apart from cpdn, this can make it difficult or even impossible for the current scheduler to achieve the time shares you want, even in the longer term.

If you can provide a link to your computer on cpdn as Mike suggests, he should be able to see whether the crunching you want it to do is realistic or overstretching things a bit.
ID: 11990 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 12008 - Posted: 9 Aug 2007, 18:36:47 UTC


a) from Carl's comments the other day. So CPDN work units would be going strictly by resource share, but the other work units on the same PC from different projects would still be looking at deadlines etc (assuming that the other projects didn't use the new option).

User wouldn't get a choice, but the end result is they'd have more control since they could set CPDN to whatever resource share they like and it'd stick to that in both the short term and long term (rather than just the long term as it is now).

However I haven't seen that idea on the dev mailing list or in /trac, so I don't know if it's going to progress.
ID: 12008 · Report as offensive
Tess

Send message
Joined: 8 Aug 07
Posts: 5
United States
Message 12010 - Posted: 9 Aug 2007, 18:57:23 UTC
Last modified: 9 Aug 2007, 18:58:25 UTC

I think boinc has finally run long enough since my vacation. It seems to bear out what most people say here - climatepredict is the culprit. When I suspend its task, and leave the others alone, they eventually were scheduled properly.

(Sorry its not possible to link to my computer. But I'm reconciled with the situation. I will periodically let climatepredict loose until it finishes its task. I made it not get any new task - I'll probably take a rest from it until I get a newer computer.)

Thank you all for your ideas and information!
ID: 12010 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 12013 - Posted: 9 Aug 2007, 20:36:13 UTC

The idea was just for you to give us the link to your cpdn account which would show the details of what work the computer's doing on boinc projects. You can open boinc manager and in the Tasks tab highlight Climate Prediction. If you then click on Your Account you see a page like this

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_user.php?userid=21936
If you can complete your current model it will be greatly appreciated at cpdn. We know what an effort each model entails.

We're hoping that a choice of cpdn model lengths will now be available for quite a while.
ID: 12013 · Report as offensive
Carl Christensen

Send message
Joined: 5 Jun 07
Posts: 9
Message 12014 - Posted: 9 Aug 2007, 21:18:52 UTC - in response to Message 12013.  

review things, rather than trying to change the already complicated & working fairly well scheduling on BOINC, we'll just have to bump up future boinc workunits to be years (I have just been using a default

<delay_bound> 30000000 </delay_bound>

which I put in the template ages ago, 30000000 being the number of seconds, which is about a year (347 days) --- it's just a number I came up with once long ago and has never been changed as there was always been 100 things more important to work on etc. so I'll tell Tolu & Milo to bump this up by a factor of 5 (I don't have anything to do with "day to day" CPDN anymore).
ID: 12014 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 12018 - Posted: 9 Aug 2007, 22:34:08 UTC - in response to Message 11983.  

Jorden, are you generally advising a rest or a reset or both?

One of these days I throw this keyboard out the window. I swar. :-)
ID: 12018 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 12020 - Posted: 10 Aug 2007, 9:49:31 UTC

Carl, is increasing the time to completion by a factor of as much as 5 a good idea? This would give crunchers the impression that if their current models aren't finished until 2012 ie when the London Olympics are held, the results would still be useful to the researchers, which I doubt. Wouldn't a factor of 2 keep the boinc flexible scheduler happier while letting crunchers know that getting their trickles and zip files to Oxford in reasonable time does matter.

I think a factor of 5 could cause some other problems too.

1) It could encourage cpdn members to commit to more projects than is realistic, bearing in mind that no current or planned cpdn workunits could be described as small.

2) If crunchers discover eg while the marathon is being run in London that they've been crunching cpdn part-time but uselessly for 2 years because the researchers no longer need their results, they could be quite irate. If Oxford terminates their models before the end because the results are no longer needed, they could also be irate.

3) I could make cpdn News announcements about this every week for years and copy them all over the forums, but a high proportion of crunchers remain unreachable.

4) If before the deadline eg in 2009 we sent mass emails to members or announced in the graphics window that particular workunit results are in fact needed soon, this would contradict the time to completion info in the boinc manager and make some members irate. If they had to override the boinc task scheduler in order to comply, some would also be irate.


Unless, of course, the model results really will still be useful to Oxford in 2012.


ID: 12020 · Report as offensive

Message boards : BOINC client : Tasks don't switch as scheduled...

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.