Message boards : BOINC client : Tasks don't switch as scheduled...
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Aug 07 Posts: 5 |
I hope I'm posting to the right category... I've been running Boinc (currently 5.10.13 manager) for about 2 years now. Recently, it has stayed stuck on one task even though I am attached to 5 projects. The only way I can get it to switch from running one task to another is if I suspend the running task. What is going on? Tess |
Send message Joined: 16 Apr 06 Posts: 386 |
There are several things it could be, the most likely is something called 'long term debt', which occurs if Boinc starts to panic about finishing the work unit by the deadline. The most common reason for this to happen is if the resource share for that project is too low, but there can be other reasons (DCF, deadlines reset to 1901, etc). What projects do you run, what resource share do you give to each, and what are the deadlines for the longer tasks? |
Send message Joined: 8 Aug 07 Posts: 5 |
climatepredict - Apr 20, 2008 PredictorHome - Aug 28, 2007 Rosetta - Aug 16, 2007 Seti - Aug 19, 2007 lhcat - no work currently They just use default settings, I think. They are suppose to switch every 60 minutes. Tess |
Send message Joined: 29 Aug 05 Posts: 15574 |
It's not 'long term debt' that is in the way, nor short term debt, for that matter. No debts. (For Mike: Long term debt decides when a project can download new work, short term debt decides when a result needs to be crunched next). Since you, Tess, don't say what result of project it is and which other projects you are attached to, I am assuming that BOINC thinks that the esult is out of time to reach the deadline. So BOINC decides that this result needs to get all available time to try to reach the deadline. Even if the deadline is still days away. This was called EDF or Earliest Deadline First crunching, but no notification of it was done for a while. In newer 5.10 versions it'll show as Running, High priority in the Tasks tab. Just let it go. Don't stop it, as it'll only make sure you won't make the deadline for that project. Always check the deadline date and time for results like this. They will likely be within the next 24 hours. If not, then their estimated time to continue is too way off from the deadline time. Then BOINC runs the result exclusively and checks when the time to completion is somewhat more normal before releasing itself to other results. |
Send message Joined: 8 Aug 07 Posts: 5 |
I think it started happening after I came back from vacation last month. Before leaving for my vacation, I made sure to not fetch any new tasks. All remaining tasks were finished except climatepredict, because it has a huge long task. I was gone for 6 days. After I came back and resumed things, Boinc wouldn't fetch any new tasks. I had to detach and reattach the projects. Then new tasks were fetched but wouldn't start unless I suspended climatepredict. |
Send message Joined: 13 Aug 06 Posts: 778 |
If you occasionally suspend ClimatePrediction in order to get new work and also contribute to other projects, it won't do your climate model any harm because cpdn ignores the deadlines and accepts results uploaded later. It's the only project to do this, but boinc doesn't realise. As long as you don't complete it too late..... When you eventually need a new climate model, there are now shorter ones also available. You can select in your cpdn project preferences. A shorter model might suit your computer usage and mix of boinc projects better. You can find out what's now available here. We do need to finish our current models first though. |
Send message Joined: 8 Aug 07 Posts: 5 |
If you occasionally suspend ClimatePrediction in order to get new work and also contribute to other projects, it won't do your climate model any harm because cpdn ignores the deadlines and accepts results uploaded later. It's the only project to do this, but boinc doesn't realise. mo.v, That's good to know. Thanks! Tess |
Send message Joined: 29 Aug 05 Posts: 15574 |
Who told you to detach and re-attach to get work again? I ask as BOINC is a delicate piece of work, which will determine by itself if projects need work or not. Had you just allowed those other projects to get work again, they would have if only you allowed BOINC to run its course. It was probably just relieving itself of the amount of long term debt that it had accumulated as you and BOINC had put preferences on the other projects. Anyway, don't detach and re-attach. Try a rest first. Yet where CPDN doesn't care about their deadline and still set one on their models (mo.v, Mike, why isn't that deadline set to 2018?), all other projects do like their results back in by the time they set the deadline. Each minute one other result takes time away from the CPDN model because it is due in, that CPDN model will eventually take back to make ITs deadline... even though CPDN doesn't have a deadline as such. Just curious, which projects do you run? |
Send message Joined: 13 Aug 06 Posts: 778 |
Tess listed her projects above - you probably missed this as you were posting at the same moment. 'climatepredict - Apr 20, 2008 PredictorHome - Aug 28, 2007 Rosetta - Aug 16, 2007 Seti - Aug 19, 2007 lhcat - no work currently' A very few BBC-cpdn members do seem to think their model deadline is 2018, judging by some reports in the thread about the race to produce the last completed model. And guess what? I believe MikeMars has a model with a marginal chance of winning. He's deliberately holding this model back as long as possible and at the last moment will make it race to the finish. Tess, I'm not advising you to do the same. Seriously though, the deadline for the 160-year models is a year, the aim being to induce in members a certain sense of urgency...... Jorden, are you generally advising a rest or a reset or both? |
Send message Joined: 16 Apr 06 Posts: 386 |
I keep suspending the BBC model whenever new CPDN-beta applications come out. So the poor thing sits in a corner for weeks at a time, unfed and unloved... the RSPCM will no doubt come banging at my door any day now. Getting back on topic, if Boinc had been shut down during the vacation, then the '%time computer is running' figure will be low, and would confuse the scheduler (until the figures adjust themselves back to what they should be). Could you provide a link to the host on CPDN? We've discussed asking for a '<ignore-deadline/>' flag to be implemented in the scheduler so that work units can request simple scheduling instead (sticking to resource share regardless of it's own deadlines). This would also bypass the 1901 deadline issue if it occurs. |
Send message Joined: 13 Aug 06 Posts: 778 |
Mike, would the idea be that a) all cpdn workunits would automatically bypass the sophisticated/flexible boinc scheduler? b) or would the scheduler only implement a fixed resource share for cpdn if the member selected this option? c) in the case of both a) and b), would sophisticated/flexible scheduling be retained for workunits from other projects? d) or do we want all crunchers to have a choice between the previous simple round-robin scheduler and the current flexible one? Tess, if a very long workunit is preventing you from getting work from other projects, the usual procedure is to suspend the extralong WU using the Suspend button in the Tasks tab. In extreme circumstances you could reset the projects that refuse to fetch new work, but it should never be necessary to detach and reattach to projects in order to get new work. However, if you just allow the boinc scheduler to do its own thing, it will try to share crunching time fairly between projects in the longer term, and it will usually succeed. But if the computer is a slow one, or doesn't crunch for much of the time, or the member has attached to a large number of projects apart from cpdn, this can make it difficult or even impossible for the current scheduler to achieve the time shares you want, even in the longer term. If you can provide a link to your computer on cpdn as Mike suggests, he should be able to see whether the crunching you want it to do is realistic or overstretching things a bit. |
Send message Joined: 16 Apr 06 Posts: 386 |
a) from Carl's comments the other day. So CPDN work units would be going strictly by resource share, but the other work units on the same PC from different projects would still be looking at deadlines etc (assuming that the other projects didn't use the new option). User wouldn't get a choice, but the end result is they'd have more control since they could set CPDN to whatever resource share they like and it'd stick to that in both the short term and long term (rather than just the long term as it is now). However I haven't seen that idea on the dev mailing list or in /trac, so I don't know if it's going to progress. |
Send message Joined: 8 Aug 07 Posts: 5 |
I think boinc has finally run long enough since my vacation. It seems to bear out what most people say here - climatepredict is the culprit. When I suspend its task, and leave the others alone, they eventually were scheduled properly. (Sorry its not possible to link to my computer. But I'm reconciled with the situation. I will periodically let climatepredict loose until it finishes its task. I made it not get any new task - I'll probably take a rest from it until I get a newer computer.) Thank you all for your ideas and information! |
Send message Joined: 13 Aug 06 Posts: 778 |
The idea was just for you to give us the link to your cpdn account which would show the details of what work the computer's doing on boinc projects. You can open boinc manager and in the Tasks tab highlight Climate Prediction. If you then click on Your Account you see a page like this http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_user.php?userid=21936 If you can complete your current model it will be greatly appreciated at cpdn. We know what an effort each model entails. We're hoping that a choice of cpdn model lengths will now be available for quite a while. |
Send message Joined: 5 Jun 07 Posts: 9 |
review things, rather than trying to change the already complicated & working fairly well scheduling on BOINC, we'll just have to bump up future boinc workunits to be years (I have just been using a default <delay_bound> 30000000 </delay_bound> which I put in the template ages ago, 30000000 being the number of seconds, which is about a year (347 days) --- it's just a number I came up with once long ago and has never been changed as there was always been 100 things more important to work on etc. so I'll tell Tolu & Milo to bump this up by a factor of 5 (I don't have anything to do with "day to day" CPDN anymore). |
Send message Joined: 29 Aug 05 Posts: 15574 |
Jorden, are you generally advising a rest or a reset or both? One of these days I throw this keyboard out the window. I swar. :-) |
Send message Joined: 13 Aug 06 Posts: 778 |
Carl, is increasing the time to completion by a factor of as much as 5 a good idea? This would give crunchers the impression that if their current models aren't finished until 2012 ie when the London Olympics are held, the results would still be useful to the researchers, which I doubt. Wouldn't a factor of 2 keep the boinc flexible scheduler happier while letting crunchers know that getting their trickles and zip files to Oxford in reasonable time does matter. I think a factor of 5 could cause some other problems too. 1) It could encourage cpdn members to commit to more projects than is realistic, bearing in mind that no current or planned cpdn workunits could be described as small. 2) If crunchers discover eg while the marathon is being run in London that they've been crunching cpdn part-time but uselessly for 2 years because the researchers no longer need their results, they could be quite irate. If Oxford terminates their models before the end because the results are no longer needed, they could also be irate. 3) I could make cpdn News announcements about this every week for years and copy them all over the forums, but a high proportion of crunchers remain unreachable. 4) If before the deadline eg in 2009 we sent mass emails to members or announced in the graphics window that particular workunit results are in fact needed soon, this would contradict the time to completion info in the boinc manager and make some members irate. If they had to override the boinc task scheduler in order to comply, some would also be irate. Unless, of course, the model results really will still be useful to Oxford in 2012. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.