Server cancel for an already running task?

Author	Message
Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2533	Message 100416 - Posted: 23 Aug 2020, 8:08:44 UTC Not sure all projects do the same with this. CPDN has cancelled tasks a few times in my memory because there are errors in some of the files and it is known they will all fail but I don't think they ever cancel for other reasons. Pretty sure some other projects are rather less lax in their policies. ID: 100416 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5080	Message 100418 - Posted: 23 Aug 2020, 8:31:25 UTC If a task has not even started by the time the deadline is reached, the local client (not the server) will abort it. If the task has been started, the client will let it run to completion. As Dave said, server instructions will vary according to the policy of the individual project concerned. Most projects will simply let nature take its course. The only common exception is when a project realises that an entire batch of tasks has been prepared from faulty data, and has no scientific value. Under these circumstances, server operators have the power to abort a task at whatever stage it has reached. Most projects are extremely reluctant to use this power, and will only use it as a last resort. Project decisions like that are taken on the basis of scientific need alone, and pay no attention to credit. The purpose of aborting a batch of tasks is to free up the machines from wasting time on useless work: it enables your machine to start working on replacement (and hopefully corrected) tasks as quickly as possible. ID: 100418 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 100423 - Posted: 23 Aug 2020, 14:59:54 UTC The BOINC client on your computer is doing exactly what you asked it to do. A resource share of zero for a project means "This project is now a reserve project, only do work for this project when there is no work available from non-zero resource share projects". Since Milkyway is a non-zero project, and has work it's tasks will run, unless there are ANY existing tasks from zero share projects which are in danger of not running, BOINC is designed to allow those tasks to complete. The fact that you run into a problem where both Milkyway and WCG tasks are in danger of timing out suggests to me that your cache settings are too generous, and you had a very large number of WCG tasks sitting in it when you started to run MW, and changed the resource shares for both projects. As you are aware there are two figures associated with cache size "Store at least x days" and "store up to and additional y days". Set x to a low integer, say 1 or 2; set y to a MAXIMUM of 10% of x (if you have a fairly permanent internet connection you should go even lower to 1% of x) - as a guide I normally run a cache of 2days, plus an additional 0.01day. Looking the WCG tasks in hand their deadlines appear to be about 7 days, and Milkyway about two weeks, so I'm guessing your total cache was something in the region of 10-15 days, which is way too big. ID: 100423 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 100427 - Posted: 23 Aug 2020, 21:42:25 UTC OK, you've got the cache round the wrong way. NEVER use 0 for the store at least - this is really "store zero", and you then have the store additional is ruling the cache. This causes some strange effects - BOINC uses the "store at least" value to determine how much data to download, and the "store additional" to determine how often it will contact the servers, unless you run out of work before then. Set the "store at least" to a low value, then store additional to an even smaller value, so use settings of something more realistic, say store at least 1 and store an extra 0.01 - I'm sure you've been told this before, but you have ignored those that have gone through the pain. Now to the other part of your problem - this is down to how Milkyway's application works. It is designed to use all available cores on a CPU, up to the maximum permitted by BOINC. In your case this is four; the Milkyway application is configured to run on ONLY four cores, it would appear to be set at the time of download, not at the time of execution - on an 8-core processor I set BOINC to only use two cores, and got a few tasks and set "no new tasks". These ran one task at a time, and continued to do so when I upped the allowed core count to five, but tasks downloaded after the change ran quite happily on five cores. I then allowed new tasks and got a few more tasks thees would not run concurrently when a single "two core" task that was already running however two "two-core" tasks would run concurrently. To me this looks to be a similar thing with tasks being assigned to a GPU at the time of download not being readily moveable onto a CPU (and visa-versa) - the execution environment for a task is defined at the time of download, not at the time of execution. Obsolete tasks - well, not really as many projects actually require at least two people to execute the same task so that validation can occur - this is a project level option. BOINC client already has a safety margin built in, I can't remember if it is expressed in terms of a fixed time or as a fraction of the cache size (as defined by store at least), but at a guess its a fixed number of hours, possibly 24. Given the fact that the Milkyway tasks appear to run in a matter of an hour or two with a two WEEK deadline and WCG tasks take about 2 hours with a deadline of about a week you must have had far more WCG tasks around when you did the switch because with a full 3-hour cache of WCG tasks (0+3hours) according to your cache setting you would have only had the four active tasks plus a maximum four in the cache - that is a maximum of 16hours of execution time to clear the cache of WCG work - and 16 hours is a lot less than the ~168 hours to their deadline. During this time a number of Milkyway tasks would be downloaded, and these, having been configured at the time of download to need 4 cores would wait until four cores become available - a maximum of 16 hours until the first one could run - well within the 336 hours to their deadline. BUT if you actually had more WCG tasks around, because of an over-inflated cache then tasks could well run into their deadlines. Just now, with a "store at least" of 3 days, store an additional of 0.01 and 5 core running BOINC I have 87 WCG tasks running or waiting to run, this computer only runs about 8 hours a day, so that is almost exactly 3.5 days of work - nicely within the 7 day deadline for these tasks. Now to your final misunderstanding - you have already been told that the projects DO NOT know what your machine is doing, and there is NO way of it doing so without both BOINC server & client being modified, and every project and user adopting the new version. If you feel this is not how you want BOINC to behave get over to Github and report this as a feature enhancement, describe EXACTLY what you want to achieve and why this is so important, and be prepared to argue your case. ID: 100427 ·

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2533	Message 100434 - Posted: 24 Aug 2020, 6:04:40 UTC By the way, the weighting sux. If I change the weighting, it doesn't take effect immediately, it takes some weird average over the last week and fluctuates everywhere. So if I have MW weight 2 and WCG 1 for a week, then change it to MW 1 WCG 2, it will only run WCG and no MW at all, because WCG is now way behind for my new setting. What it should do is delete any history of what it's done when I change the weighting, and start doing 2 WCG units to 1 MW unit. Changing how the weighting works might help for you and the projects you support but would I am almost certain introduce problems for other projects and crunchers. I've argued it here, on the Boinc forums, which is why they're provided. The way I see it, they are provided for discussion and for those using BOINC, both crunchers and projects to get help. They are not the forum for getting things changed, even if a post and subsequent discussion does sometimes get picked up by someone who goes over to git-hub and initiates a change. ID: 100434 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 100435 - Posted: 24 Aug 2020, 6:11:15 UTC Then it needs modified. If you hire a builder to work on your house and tell him to do it by Thursday, and he ends up running late and will take till Friday, he phones you up and asks you to wait a bit. You don't hire another builder, which is what projects are doing! If you feel this is not how you want BOINC to behave get over to Github and report this as a feature enhancement, describe EXACTLY what you want to achieve and why this is so important, and be prepared to argue your case. I've argued it here, on the Boinc forums, which is why they're provided. And have failed to gain much support, because you refuse to accept that there is little or no support for your desires and you are asking in both the wrong manner and wrong place. I reiterate again: The BOINC DEVELOPERS do not routinely monitor these forum for suggested upgrades. They DO monitor Github for both bugs and future development suggestions. These forum are for help and assistance. END OF DEBATE. ID: 100435 ·

Bryn Mawr Help desk expert Send message Joined: 31 Dec 18 Posts: 284	Message 100438 - Posted: 24 Aug 2020, 10:59:33 UTC - in response to Message 100428. By the way, the weighting sux. If I change the weighting, it doesn't take effect immediately, it takes some weird average over the last week and fluctuates everywhere. So if I have MW weight 2 and WCG 1 for a week, then change it to MW 1 WCG 2, it will only run WCG and no MW at all, because WCG is now way behind for my new setting. What it should do is delete any history of what it's done when I change the weighting, and start doing 2 WCG units to 1 MW unit. . If you want the weighting to take effect immediately then edit the cc_config.xml file to set the rec_half_life_days to zero (or very close, I’ve never tried zero) and you’ll get the behaviour you want. ID: 100438 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.