Message boards :
Questions and problems :
Why doesn't Boinc schedule earlier deadlines first?
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 14 Aug 19 Posts: 55 |
What's irksome is when things like the following happen. Right now I'm attached to: 1) YoYo (share 100) 2) Latin Squares (share 1) 3) T.Brada Experiment (share 1) 4) QuChemPedIA@home (share 1)
Team USA forum Follow us on Twitter Help us #crunchforcures! |
Send message Joined: 25 May 09 Posts: 1284 |
As far as I'm aware there is no setting available to force completion of tasks once they start. To implement such an option would mean a considerable rework of the scheduler within the client so is probably a long way down the list of priorities. Checkpoints are under the control of the project, some projects have very long checkpoints (CPDN comes to mind), while others fairly short, for some the user can change them, and for others there are no checkpoints available. I'm not familiar with the projects you are running, but one thing that might be possible is to set the checkpoint to just slightly longer than typical task run time. BUT there is a downside to doing this, if you suffer any sort of outage then you will loose the work done to that point, whereas if you have a short (1-5 minutes) checkpoint you should only loose those few minutes of work. The actual management of the how and what is stored by a checkpoint is down to the individual project's applications, so there is nothing BOINC can do apart from making sure the checkpoint is triggered. |
Send message Joined: 5 Oct 06 Posts: 5082 |
You can also use the 'Switch between tasks every ...' control in the global computing preferences section of any project. Notes:
|
Send message Joined: 14 Aug 19 Posts: 55 |
Rob & Richard, I'm aware of everything you have said. I'm particularly aware of the limitations of Switch Between Tasks, although usually I make sure to set that to an absurdly high value and on this host I had forgotten. YoYo has at least one app (ECM) that does not checkpoint and I'm no longer running it, since it runs ten to twelve hours and no checkpoint for such an app is ridiculous. Years ago, after much exasperation with BOINC's scheduling, I settled on "Low cache, few projects". This generally works well for me, other than occasionally quirky behavior like I described. I don't understand why BOINC would assume it would take 60 days to clear with low cache settings. That's not intuitive and just seems bizarre. None of these projects (*edit: or at least these particular tasks) have deadlines of 60 days. Brada has the longest deadline of what's currently in the queue, seven days, which might be why BOINC is delaying those tasks, but it should realize it doesn't have 60 days to process tasks with a deadline of seven. In this case the work did clear and didn't cause a "real" problem. One of the more annoying problems I had with this sort of scenario years ago was BOINC stopping GPU work to free up cores to run high-priority CPU work. I solved that with multiple clients. The point is that BOINC does not honor FIFO as advertised, or at least as it would be expected to. If there happened to be a "spanner in the works", that could lead work not getting done in time, entirely preventable by honoring FIFO. Now I have ten Brada tasks, received the 11th and due on the 18th. I suppose this process will repeat in a few days. Team USA forum Follow us on Twitter Help us #crunchforcures! |
Send message Joined: 25 May 09 Posts: 1284 |
Checkpoint every x seconds means exactly as it says - checkpoint every x seconds. There may be other checkpoints set in the application - such as when a particular part of a calculation has been completed, these will occur, and the x-second counter reset. However some applications do not have the ability to checkpoint programmed, or have very long default periods between checkpoints. Realistically 30 seconds is about as low as you want to go as doing a checkpoint does take a finite amount of resource, and some may say that is far too frequent.... |
Send message Joined: 25 May 09 Posts: 1284 |
Let's try again - the time interval for a timed checkpoint (as opposed to an event checkpoint) is the normal interval, there is a SMALL tolerance (a second or so). Checkpoints for most projects are written to disk. Some, like CPDN, do a "trickle-up" in addition to the checkpoints, these are sent to the servers. If an application is paused all the timers are stopped, nothing is happening and depending how the application was written it may do a checkpoint write when paused so it can resume at that instant, or it may not, in which case when it re-starts it will go back to the previous checkpoint. When a job swaps from "running" to "waiting to run" most applications do a checkpoint write, but some don't, just relying on the last checkpoint. Likewise for suspending an application.... |
Send message Joined: 28 Jun 10 Posts: 2540 |
Some, like CPDN, do a "trickle-up" in addition to the checkpoints, The trickle up files are what the credit is based on for CPDN. These days they are concurrent with the monthly (or other interval) zips being produced which have the information for the scientists. The system was introduced in the days when tasks could take six months or more so that if a task through no fault of the cruncher produced an invalid climate, e.g. -ve pressure after five months causing the task to crash, the cruncher would still get the credit for work done up to the last trickle up. |
Send message Joined: 17 Nov 16 Posts: 869 |
I'd like a setting in cc_config to force jobs to run to completion once they start. You can do that by changing the "switch between tasks every" parameter. I change from the default of 60 minutes to 360 minutes so that a GPUGrid job runs to completion and never exits or suspends. The application can't handle restarting on a different device in a mixed type gpu configuration and the switch parameter is how I get around the issue. |
Send message Joined: 5 Oct 06 Posts: 5082 |
That's a terrible system which means we're all wasting huge amounts of processing power if we run more than one project, as every time a processor is swapped from project A to project B, project A is likely to lose calculations. Data should ALWAYS be written to disk when an application is paused for whatever reason (computer shut down, phone unplugged from charger, exclusive application running, another project taking the processor).I think the algorithm is 1) Wait until Task Switch Interval has expired. 2) Then start looking for a good time to switch. 3) Wait until task has just checkpointed. 4) SWITCH You can set debug message log flags that will show you all that happening. |
Send message Joined: 5 Oct 06 Posts: 5082 |
That algorithm won't help for computer shut down, phone unplugged from charger, exclusive application running.No, the human factor can never be predicted, but it's pretty good for the events under BOINC's control. There's some good design in there, if you can be bothered to look. |
Send message Joined: 25 May 09 Posts: 1284 |
Richard was describing the process for JOB SWAPPING - which is obviously an operation where regular checkpoints are an advantage. The controlled shutdown (including phones changing to battery, exclusive application starts) process for "sensible" projects includes do a checkpoint write, for "non-sensible" projects don't do that checkpoint. Uncontrolled shutdowns are just that, sudden and uncontrolled (and hopefully rare) events - one has to rely on the last checkpoint to get things going again and there is no escape from that. Keith's comment about applications trashing tasks if they start on the "wrong" processor is only partially correct. Some applications do the nice thing and start without faults, but others don't when the two GPUs are the same, but most fail when the two GPUs are different models never mind from different families. |
Send message Joined: 25 Nov 05 Posts: 1654 |
Plus in 6 months something bad could happen to that computer, or it might not be processing Boinc any more, or it might not meet the deadline, so the project at least gets some of the data and can send the remainder of the calculations to someone else. No they can't be sent to some one else part way through. Resends start from the beginning with the new person/computer. And BOINC was written way back before people started using smart phones to run BOINC. And BOINC has never kept up. Probably impossible to do so. |
Send message Joined: 8 Nov 19 Posts: 718 |
As far as I'm aware there is no setting available to force completion of tasks once they start. To implement such an option would mean a considerable rework of the scheduler within the client so is probably a long way down the list of priorities. You'd kind of have to look at bittorrent clients, how they set priorities on (nearly downloaded) torrents. Their priority rating depends on personal priority settings (high, normal, low), as well as torrent availability, network speed of the torrent, and finished percentage. Once a torrent reaches past a certain point (eg:75%), it's priority status gets increased dramatically, to the point that torrents which are 98% finished, get a 98% boost in priority. The high/normal/low settings influence mostly torrents of similar finished percentage. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.