Suspend tasks after checkpoint

Message boards : BOINC Manager : Suspend tasks after checkpoint
Message board moderation

To post messages, you must log in.

AuthorMessage
Mr Anderson

Send message
Joined: 18 Feb 21
Posts: 3
Message 104815 - Posted: 17 Jul 2021, 1:13:20 UTC

I don't like how shutting down or rebooting the PC often wastes work done on the running tasks because they might not have checkpointed for a while. I propose two enhancements:

1. The "CPU time since checkpoint", which can be seen by clicking on a task and selecting properties, should be an optional column in the tasks list. This way the user can quickly see whether any running tasks have recently checkpointed or not.

2. A function to suspend any running tasks once they have checkpointed (and not allow new tasks to run in their place). The user can select this some time before shutting down so that they stop processing and the computer can then be shut down without wasting work. (Perhaps this could even be extended to perform the actual shut down.) When the system and Boinc restarts then the option should automatically clear.
ID: 104815 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 104817 - Posted: 17 Jul 2021, 7:15:09 UTC - in response to Message 104815.  

Task checkpointing is solely dependent on the science application running, and therefore the responsibility of the project that develops the app.

BOINC has no control over whether an application has the ability to checkpoint.

Take the issue up with project you are having issues with.
ID: 104817 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5078
United Kingdom
Message 104818 - Posted: 17 Jul 2021, 7:32:17 UTC

On the other hand, BOINC is aware when a science app has checkpointed - the app notifies BOINC - so the idea is feasible. And it would have a minor energy-saving effect.

Two things would need to be considered carefully:
1) Timings. Some projects write very large data files to disk as part of the checkpointing process. I don't think BOINC has any awareness of the physical write speed of the device in use, or of the effect of any 'lazy write' caching in the operating system. It would be counter-productive if BOINC acted on the checkpoint notification before all write operations had securely completed.
2) Some projects haven't implemented checkpointing at all. There would need to be a clear policy, communicated to users, for how BOINC would incorporate that scenario into the proposal.
ID: 104818 · Report as offensive
PMH_UK

Send message
Joined: 24 Dec 10
Posts: 35
United Kingdom
Message 104819 - Posted: 17 Jul 2021, 9:25:23 UTC - in response to Message 104818.  

BOINCTasks from efmer can suspend at checkpoint, it waits about a minute after checkpoint before suspending.
I suspend other long checkpoint tasks but allow short checkpoint ones to run.

Paul.
ID: 104819 · Report as offensive
Harri Liljeroos

Send message
Joined: 25 Jul 18
Posts: 62
Finland
Message 104824 - Posted: 17 Jul 2021, 18:09:27 UTC

BOINCTasks also has the Checkpoint column that shows how many checkpoints an app has written and the time since last checkpoint.
ID: 104824 · Report as offensive
Mr Anderson

Send message
Joined: 18 Feb 21
Posts: 3
Message 104832 - Posted: 19 Jul 2021, 0:43:18 UTC

@Keith Myers I believe I have mentioned this before at Einstein@home and the response was that it was a Boinc issue. I tend to agree; this is not about whether or not a task has the ability to checkpoint (all the tasks I run do checkpoint), or how often but about a function to suspend tasks once checkpointing has been done. Boinc is the front end for the tasks and it manages which task executes or is suspended so this is the place to do this.

@Richard Haselgrove I don't think timing is an issue. If a task reports to Boinc that it has checkpointed then all write operations done by the task have been completed from the task's point of view so it can be immediately suspended. If Boinc is also shutting down the system then once the last task has been suspended then Boinc can initiate this. However Boinc is not actually turning the computer off, it is the operating system and whether or not the operating system is still writing the data or some lazy write needs to finish is not Boinc's concern since it is up to the operating system to manage these things when the system is shut down. Any operating system that is worth its salt will take care of all that and anything less is a major bug that would repeatedly result in data loss.

BOINCTasks sounds interesting but somewhat overkill for my needs. I don't particularly want yet another application to manage Boinc. Why can't Boinc do this itself?
ID: 104832 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15478
Netherlands
Message 104840 - Posted: 19 Jul 2021, 13:29:32 UTC - in response to Message 104832.  

Why can't Boinc do this itself?
Mostly because the developers try to keep client and manager as simple as possible.

This option you ask for isn't something that'll help thousands of users, and when it can easily be done via a 3rd party app already, then why not use the 3rd party app?
If you still want to have it added to client or manager, best do a feature request at the BOINC Github page at https://github.com/BOINC/boinc/issues.
ID: 104840 · Report as offensive

Message boards : BOINC Manager : Suspend tasks after checkpoint

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.