Request: Automatic Purge Function

Author	Message
dcdc Send message Joined: 29 Aug 06 Posts: 82	Message 10034 - Posted: 7 May 2007, 10:30:15 UTC Hi I've mentioned this before on Rom's blog and here too I think. Productivity can be increased by including an optional purge function into BOINC. With Rosetta I currently have a remote cruncher that is working away on jobs that have passed their deadline. It'll waste a few days on these until it catches up to some real work. This happens on all machines when they aren't used for a few days or have irregular usage patterns - once they're in use again everything has expired. If BOINC had a purge function then these tasks could be removed from the queue if no longer needed. AFAICS it's quite a simple function, requiring some form of XML to be downloaded from the project with a list of jobs (assuming the use of some kind of wildcards/masks to reduce the file-size) that aren't needed any more. cheers Danny ID: 10034 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 10036 - Posted: 7 May 2007, 11:44:11 UTC Some projects, such as CPDN, allow for their results/models to come in over the deadline and still catch credit. If BOINC would purge these automatically, hell would break loose. So it's not really an option. ID: 10036 ·

mo.v Send message Joined: 13 Aug 06 Posts: 778	Message 10041 - Posted: 7 May 2007, 12:15:23 UTC The remote computer would only be successful with a cpdn climate model if the person(s) using it can always exit from boinc before the computer is turned off. Otherwise, sooner or later the model would probably crash. Because the models are such very long workunits, this is an important consideration. ID: 10041 ·

dcdc Send message Joined: 29 Aug 06 Posts: 82	Message 10042 - Posted: 7 May 2007, 12:39:35 UTC - in response to Message 10036. Last modified: 7 May 2007, 12:43:27 UTC Some projects, such as CPDN, allow for their results/models to come in over the deadline and still catch credit. If BOINC would purge these automatically, hell would break loose. So it's not really an option. Then CPDN doesn't have to use it. I'm not suggesting everything that has expired is purged. If it's based on an xml of redundant tasks (as designated by the project) which is downloaded from the server then CPDN (or whoever) can include jobs that aren't required any more, but in CPDN's case that would probably only be really old (or buggy) tasks. It'd also have the advantage of allowing the project to remove jobs that have been downloaded but have then been shown to be problematic. I can't see any reason why it's not an option... ID: 10042 ·

Aurora Borealis Send message Joined: 8 Jan 06 Posts: 448	Message 10043 - Posted: 7 May 2007, 12:58:26 UTC Past due and defective WU purge has been discussed a bit on the mailing list. It's probably on low priority side of the current todo list. ID: 10043 ·

Nicolas Send message Joined: 19 Jan 07 Posts: 1179	Message 10099 - Posted: 10 May 2007, 2:18:19 UTC It's already implemented as far as I know. Although I don't know since which BOINC client version, or if any project is using it already. ID: 10099 ·

MikeMarsUK Send message Joined: 16 Apr 06 Posts: 386	Message 10169 - Posted: 10 May 2007, 17:22:18 UTC I hope it's off-by-default ... ID: 10169 ·

dcdc Send message Joined: 29 Aug 06 Posts: 82	Message 10173 - Posted: 10 May 2007, 22:18:46 UTC any idea where i can get some info on how it works? cheers Danny ID: 10173 ·

Ingleside Send message Joined: 11 May 07 Posts: 8	Message 10182 - Posted: 11 May 2007, 2:24:19 UTC This is handled by 2 flags: 1; result_abort 2; result_abort_if_not_started As the names implies, #1 will immediately abort result. Intended used if wu cancelled by project or errored-out, meaning result won't be used by project and user won't get any credit for it, and it's therefore a waste of time to continue. #2 will only abort if have never started crunching a result, even if you've only crunched 1 second client won't abort result. Intended used in cases wu has already got "canonical result", meaning result won't be used by project and it's therefore no point to start crunching it. But, since user can still get credit for result, a started result isn't cancelled, since it's no way to know if only a few seconds or a long time left to crunch... Client-side, #1 has been included for nearly a year, since v5.5.1. #2 was added at the same time, but due to a small bug #2 wasn't working before in v5.8.17, meaning for windows-users you need v5.9.xx. Server-side, nothing was done until WCG had a batch of bad wu's in March? 2007, and they programmed this part, and got it incorporated in general BOINC-code in April with some bug-fixes added later... For project to use this, they needs to enable "send_result_abort" in their config-file. ID: 10182 ·

Aaron Finney Send message Joined: 2 Sep 05 Posts: 45	Message 10219 - Posted: 13 May 2007, 8:22:39 UTC A long time ago, I suggested a way for the client to 'request' an extended deadline for workunits that have exceeded their deadline. It was shot down since (at the time) there were far far larger things on the plate for poor david and rom who were probably very very overworked. I'll go through my projects mailing list and see if I can find it, but the suggestion would have been many months ago. I'll try and summarize, but the original suggestion was a very well thought out email. Suggestion (with only minor details, again this is off the top of my head of what I can remember.) When a workunit is nearing it's due date/time, there should be some method for BOINC to contact the project for the suggested workunits and request an extended deadline. This is much more beneficial than having the workunit crunched an extra once or twice, and possibly validated even later. Addendum 1 - 'workunit tag for threshold timer' There needs to be a tag in the workunits or attached to the workunits so that the boinc client knows at what point to either A:Ask for more time;B:Abort the workunit;C:Do nothing. Addendum 2 - 'server setting for functionality' There needs to be some setting or string server side to enable or disable this function. Addendum 3 - 'Server side setting for threshold' If Enabled, there needs to be a server side setting for the threshold time in minutes before deadline. Addendum 4 - 'Allowance for computers that may not be on during threshold period' If the computer in question is offline during the entire threshold time and comes online after the workunits are already 'overdue' then there needs to be some method for the computer to check to see if the workunits in question have already been given out already. See addendum 5. Addendum 5 - 'Server response to entension request' The server should be able to reply to a request to extend workunit time intelligently. First, it needs to determine if a workunit has already been given again to an additional computer (see addendum 4) If so, then the request should be denied. Second, if it has not been given to another computer, there should be a check to see if the quorum has already been reached. If so, the request should be denied. The only situation where the request should be granted IMHO is if the workunit has not been redirected and sent to a new host already, and a quorum has not been reached. Addendum 6 - 'client side reaction to extension denial' - Any workunit extension denial should result in the workunit being aborted. Addendum 7 - 'Server response to extra workunit generation' Any extra workunits that have been generated due to workunits passing their deadline, but that have not been sent to a new host when an extension request is requested for the late workunit in question, should be deleted and NOT sent to a new computer. Explanation : Granted, I know poorly little about the actual behavior of the stuff under the hood of BOINC, but I think I know enough about it to be able to get my point and ideas across to those that do. This would be a nice feature to add to BOINC, and possibly could be slated for the new version 6 so that just simply a graphics API isn't the only improvement between major version releases :-D I will post this again to the BOINC ALPHA list, and see if it generates positive response from ROM or David. ID: 10219 ·

Ingleside Send message Joined: 11 May 07 Posts: 8	Message 10252 - Posted: 14 May 2007, 15:41:57 UTC - in response to Message 10221. Last modified: 14 May 2007, 15:43:13 UTC WCG implemented it and created some of the server side code. Recently it send instruction to BOINC clients for a bad batch. It only works on >= 5.8 clients. Think D@H adopted their code. Theoretically servers could send an instruction to clients where work was deemed 'Too Late' or 'No Reply', but thing is, the client has to initiate a server contact. Not heard if the abort feature is being employed to do those. Helps efficiency and lessens frustration if inadvertently one gets completed and zero credit is awarded. WCG had nothing to do with the client-side implementation. Server-side, there was some discussion of tying aborting into the re-issue-functionality to minimize db-server-load, but AFAIK no-one finished this, so WCG made their own server-code and got this added to standard BOINC 05.04.2007 so all projects can choose to enable this. For results past deadline or wu Assimilated, if enabled server sends result_abort_if_not_started, but unfortunately this only works in v5.8.17 or later. result_abort on the other hand works with v5.5.1 or later. ID: 10252 ·

dcdc Send message Joined: 29 Aug 06 Posts: 82	Message 10263 - Posted: 15 May 2007, 16:08:15 UTC - in response to Message 10182. This is handled by 2 flags: 1; result_abort 2; result_abort_if_not_started As the names implies, #1 will immediately abort result. Intended used if wu cancelled by project or errored-out, meaning result won't be used by project and user won't get any credit for it, and it's therefore a waste of time to continue. #2 will only abort if have never started crunching a result, even if you've only crunched 1 second client won't abort result. Intended used in cases wu has already got "canonical result", meaning result won't be used by project and it's therefore no point to start crunching it. But, since user can still get credit for result, a started result isn't cancelled, since it's no way to know if only a few seconds or a long time left to crunch... Can you clarify this pls? Does the client pick up an xml of WUs that are to be cancelled? How does the client know if the particular task it's running is to be cancelled or not? Or is it only tasks that have expired that can be cancelled? ID: 10263 ·

Nicolas Send message Joined: 19 Jan 07 Posts: 1179	Message 10264 - Posted: 15 May 2007, 17:36:16 UTC - in response to Message 10263. Can you clarify this pls? Does the client pick up an xml of WUs that are to be cancelled? How does the client know if the particular task it's running is to be cancelled or not? Or is it only tasks that have expired that can be cancelled? The server gives the client a list of workunits to abort; that logic isn't on the client. ID: 10264 ·

Nicolas Send message Joined: 19 Jan 07 Posts: 1179	Message 10265 - Posted: 15 May 2007, 17:38:10 UTC - in response to Message 10169. I hope it's off-by-default ... There isn't a client option at all to disable it. Why would you want to? If the project is telling your client to abort the workunit, it's because it knows better than you that it doesn't need it to be processed. ID: 10265 ·

MikeMarsUK Send message Joined: 16 Apr 06 Posts: 386	Message 10268 - Posted: 15 May 2007, 19:30:57 UTC - in response to Message 10265. I hope it's off-by-default ... There isn't a client option at all to disable it. Why would you want to? If the project is telling your client to abort the workunit, it's because it knows better than you that it doesn't need it to be processed. I don't have any concerns about a project-driven abort function (similar to the Killer Trickle which has been in CPDN for years), but what I was worried about is a function where the manager itself automatically purges work units due to deadline issues and so forth. CPDN doesn't pay attention to the deadlines hence it would be inappropriate. Similarly CPDN doesn't have a canonical result, so aborting a model due to another model in the same batch wouldn't be appropriate either (there is a random seed on each individual result within the WU). ID: 10268 ·

Ingleside Send message Joined: 11 May 07 Posts: 8	Message 10274 - Posted: 15 May 2007, 22:12:38 UTC - in response to Message 10263. Can you clarify this pls? Does the client pick up an xml of WUs that are to be cancelled? How does the client know if the particular task it's running is to be cancelled or not? Or is it only tasks that have expired that can be cancelled? The list of results (if any) to abort is generated by Scheduling-server, and will be included as part of any normal scheduler-reply, as long as project has enabled aborting of results. ID: 10274 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.