Backup projects

Author	Message
arkayn Send message Joined: 21 Mar 09 Posts: 33	Message 42151 - Posted: 20 Jan 2012, 4:59:43 UTC How are backup projects going to be implemented since resource share is being ignored now. ID: 42151 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 42153 - Posted: 20 Jan 2012, 6:17:17 UTC - in response to Message 42151. I don't understand, what is it you are asking? Mind elaborating so we don't have to guess at what you were thinking when you wrote the one sentence? ID: 42153 ·

arkayn Send message Joined: 21 Mar 09 Posts: 33	Message 42154 - Posted: 20 Jan 2012, 6:40:34 UTC With BOINC versions 6.10 and 6.12 you could set a resource share of "0" and work would only be fetched when you were out of work on your primary projects. My primary projects are SETI and Milkyway with resource shares of 100 and I have Collatz, Einstein and Primegrid at resource shares of 0. I just installed 7.0.11 and BOINC starts requesting work from all projects irregardless of the resource share. 1/19/2012 21:04:46 \| Collatz Conjecture \| Sending scheduler request: To fetch work. 1/19/2012 21:04:46 \| Collatz Conjecture \| Requesting new tasks for ATI 1/19/2012 21:04:48 \| Collatz Conjecture \| Scheduler request completed: got 1 new tasks 1/19/2012 21:04:51 \| Collatz Conjecture \| Started download of collatz_2373591143462694791528_824633720832 1/19/2012 21:04:54 \| Collatz Conjecture \| Finished download of collatz_2373591143462694791528_824633720832 1/19/2012 21:04:54 \| Milkyway@Home \| Sending scheduler request: To fetch work. 1/19/2012 21:04:54 \| Milkyway@Home \| Reporting 1 completed tasks, requesting new tasks for ATI 1/19/2012 21:04:56 \| Milkyway@Home \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:02 \| PrimeGrid \| Sending scheduler request: To fetch work. 1/19/2012 21:05:02 \| PrimeGrid \| Requesting new tasks for ATI 1/19/2012 21:05:04 \| PrimeGrid \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:15 \| PrimeGrid \| Sending scheduler request: To fetch work. 1/19/2012 21:05:15 \| PrimeGrid \| Requesting new tasks for ATI 1/19/2012 21:05:17 \| PrimeGrid \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:27 \| PrimeGrid \| Sending scheduler request: To fetch work. 1/19/2012 21:05:27 \| PrimeGrid \| Requesting new tasks for ATI 1/19/2012 21:05:30 \| PrimeGrid \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:41 \| PrimeGrid \| Sending scheduler request: To fetch work. 1/19/2012 21:05:41 \| PrimeGrid \| Requesting new tasks for ATI 1/19/2012 21:05:43 \| PrimeGrid \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:53 \| Collatz Conjecture \| Sending scheduler request: To fetch work. 1/19/2012 21:05:53 \| Collatz Conjecture \| Requesting new tasks for ATI 1/19/2012 21:05:55 \| Collatz Conjecture \| Scheduler request completed: got 1 new tasks 1/19/2012 21:05:58 \| Collatz Conjecture \| Started download of collatz_2373591165933963684200_824633720832 1/19/2012 21:05:59 \| Collatz Conjecture \| Finished download of collatz_2373591165933963684200_824633720832 I have since set NNT on those projects so they do not continually download work for now, but that does not help if I run out of work for one of my main projects. ID: 42154 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 42157 - Posted: 20 Jan 2012, 9:34:56 UTC BOINC 7 uses a completely new method of calculating debt, with a from the ground up newly written scheduler. So the moment you installed 7.0.11, all previous debt values were forgotten, they're no longer used by this BOINC. So BOINC downloads one task from all the projects that are allowed to fetch work. You can easily set them back to ANT, as it won't do it again. ID: 42157 ·

arkayn Send message Joined: 21 Mar 09 Posts: 33	Message 42158 - Posted: 20 Jan 2012, 15:17:49 UTC - in response to Message 42157. BOINC 7 uses a completely new method of calculating debt, with a from the ground up newly written scheduler. So the moment you installed 7.0.11, all previous debt values were forgotten, they're no longer used by this BOINC. So BOINC downloads one task from all the projects that are allowed to fetch work. You can easily set them back to ANT, as it won't do it again. I don't think that is going to work because Milkyway does not allow for a large cache, currently it is 35 WU per GPU and BOINC is trying to fill the cache to the specified settings of 10 days and keeps requesting work from the other projects to do it. 1/20/2012 08:08:53 \| PrimeGrid \| work fetch resumed by user 1/20/2012 08:09:27 \| Milkyway@Home \| Computation for task ps_separation_82_2s_mix0_3_1716631_0 finished 1/20/2012 08:09:27 \| Milkyway@Home \| Starting task ps_separation_82_2s_mix4_3_1715967_0 using milkyway version 82 (ati14ati) 1/20/2012 08:09:27 \| Milkyway@Home \| Sending scheduler request: To fetch work. 1/20/2012 08:09:27 \| Milkyway@Home \| Reporting 1 completed tasks, requesting new tasks for ATI 1/20/2012 08:09:29 \| Milkyway@Home \| Scheduler request completed: got 1 new tasks 1/20/2012 08:09:34 \| PrimeGrid \| Sending scheduler request: To fetch work. 1/20/2012 08:09:34 \| PrimeGrid \| Requesting new tasks for ATI 1/20/2012 08:09:37 \| PrimeGrid \| Scheduler request completed: got 1 new tasks 1/20/2012 08:09:44 \| PrimeGrid \| work fetch suspended by user http://boinc.berkeley.edu/dev/sim_web.php?action=show_simulation&scen=50&sim=0 ID: 42158 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 42159 - Posted: 20 Jan 2012, 15:26:51 UTC - in response to Message 42158. I don't think that is going to work because Milkyway does not allow for a large cache, currently it is 35 WU per GPU and BOINC is trying to fill the cache to the specified settings of 10 days and keeps requesting work from the other projects to do it. Yes, that's normal in this sense. Compare it to installing BOINC cleanly on a new system. Then all the projects will also try to fetch work for the full cache, including those with an RS of zero, since they haven't ever had to do work yet and BOINC doesn't know the RS for the projects yet. Your BOINC 7 knows nothing of the previous resource share settings. It has to learn again from the ground up. This will take time, so it's better to adjust the cache you have to something less. As long as you let BOINC learn by itself, and depending on the amount of projects you run, it can take a week for things to adjust and go back to 'normal'. ID: 42159 ·

zombie67 Send message Joined: 14 Feb 06 Posts: 139	Message 42165 - Posted: 21 Jan 2012, 3:12:51 UTC I am a bit confused. Are you saying that (after downloading a single task), BOINC 7.x will continue to request work from a project with a resource share of zero? At least until it somehow learns not to do so? Regardless of cache size, BOINC should not download work from a zero RS project until it is completely out of work, and there is not other way to prevent idle time. Right? Also, how does one get 7.x to actually d/l enough tasks to fill the cache to the set amount? I can't get any more tasks than just enough to keep the CPUs & GPUs running. Reno, NV Team: SETI.USA ID: 42165 ·

SekeRob2 Send message Joined: 6 Jul 10 Posts: 585	Message 42167 - Posted: 21 Jan 2012, 10:55:14 UTC - in response to Message 42165. The current test clients practically only listen to the "connect every xx days" for caching. The Additional Buffer does nothing [from the regular user perspective]. --//-- ID: 42167 ·

zombie67 Send message Joined: 14 Feb 06 Posts: 139	Message 42169 - Posted: 21 Jan 2012, 17:00:13 UTC - in response to Message 42167. The current test clients practically only listen to the "connect every xx days" for caching. The Additional Buffer does nothing [from the regular user perspective]. --//-- Ah, well that explains it. I have my set to connect every zero days + 1 day. Reno, NV Team: SETI.USA ID: 42169 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 42170 - Posted: 21 Jan 2012, 18:46:02 UTC - in response to Message 42165. I am a bit confused. Are you saying that (after downloading a single task), BOINC 7.x will continue to request work from a project with a resource share of zero? At least until it somehow learns not to do so? Regardless of cache size, BOINC should not download work from a zero RS project until it is completely out of work, and there is not other way to prevent idle time. Right? Yes, but in this case the new BOINC does not know much of anything about all your projects. It is as if it contacts them for the very first time, and will therefore ask for 1 second worth of work from all projects eligible to ask work. It will respect projects on NNT. And sorry if I said this earlier, or that people understood this, but when your work request is for 10 days it won't follow that initially. As I said, it'll do 1 second requests for work. As for the cache being broken, I just increased my additional work request from 0.5 days to 3.5 days, left my connect to at 0.1 days. Normally on a connect to of 0.1 and an additional days worth you will not get a work request like this: 21/01/2012 19:41:24 \| SETI@home \| [sched_op] ATI work request: 309910.78 seconds; 0.00 devices (3.5 x 86400 = 302400 seconds, so the work request is correct) 21/01/2012 19:41:27 \| SETI@home \| Scheduler request completed: got 3 new tasks 21/01/2012 19:41:27 \| SETI@home \| [sched_op] estimated total ATI task duration: 85898 seconds Whether or not I can get that work in, is another thing. Load at Seti is maxed out for now. ;) ID: 42170 ·

zombie67 Send message Joined: 14 Feb 06 Posts: 139	Message 42173 - Posted: 21 Jan 2012, 21:07:03 UTC Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. A resource share of 0 is not working. It is continuing to download work, even when there is plenty of work from projects with a RS > 0. Reno, NV Team: SETI.USA ID: 42173 ·

Claggy Send message Joined: 23 Apr 07 Posts: 1112	Message 42174 - Posted: 21 Jan 2012, 21:11:15 UTC - in response to Message 42173. Last modified: 21 Jan 2012, 21:12:47 UTC Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. A resource share of 0 is not working. It is continuing to download work, even when there is plenty of work from projects with a RS > 0. Try and think of the two Cache settings as a Minimum and Maximum amount of Cache that you'll have, that's how i think it works from what David has said. Claggy ID: 42174 ·

SekeRob2 Send message Joined: 6 Jul 10 Posts: 585	Message 42175 - Posted: 22 Jan 2012, 2:16:05 UTC - in response to Message 42174. Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. A resource share of 0 is not working. It is continuing to download work, even when there is plenty of work from projects with a RS > 0. Try and think of the two Cache settings as a Minimum and Maximum amount of Cache that you'll have, that's how i think it works from what David has said. Claggy Yes, and what from testing comes out is, that unless the lower [non-zero, connect every] limit is passed, total WIP gets below, increasing the cache through the additional buffer setting has no effect. Knowing that a device will be off-line for 7 days e.g., you have to increase both the 'connect every' to above present WIP in queue and the 'additional buffer' values. To give it the KISS. --//-- ID: 42175 ·

SekeRob2 Send message Joined: 6 Jul 10 Posts: 585	Message 42176 - Posted: 22 Jan 2012, 8:24:24 UTC - in response to Message 42175. Last modified: 22 Jan 2012, 8:32:59 UTC Question for Ageless is, have you tried your 0.1 days CE, 3.0 days AD setting on a multicore device say a duo, quad or an octo CPU processor device at a project that gives out work in limitation per call, a say a maximum 15 work units, which does not total the 309,910.78 seconds? I just tried on an octo with an increased AD to 3.5 days and raising the CE to 2.75 days and got 15 which lifted cache from 2.25 to 3.03 days, 4863 World Community Grid 22-1-2012 8:57:00 Requesting new tasks for CPU 4864 World Community Grid 22-1-2012 8:57:00 [sched_op] CPU work request: 2235483.27 seconds; 0.00 devices 4865 World Community Grid 22-1-2012 8:57:05 Scheduler request completed: got 15 new tasks 4866 World Community Grid 22-1-2012 8:57:05 [sched_op] Server version 601 4867 World Community Grid 22-1-2012 8:57:05 Project requested delay of 11 seconds 4868 World Community Grid 22-1-2012 8:57:05 [sched_op] estimated total CPU task duration: 337401 seconds 4869 World Community Grid 22-1-2012 8:57:05 [sched_op] Deferring communication for 11 sec after which, things fell silent and a manual update got: 4931 World Community Grid 22-1-2012 8:58:41 update requested by user 4932 World Community Grid 22-1-2012 8:58:42 [sched_op] Starting scheduler request 4933 World Community Grid 22-1-2012 8:58:42 Sending scheduler request: Requested by user. 4934 World Community Grid 22-1-2012 8:58:42 Not reporting or requesting tasks 4935 World Community Grid 22-1-2012 8:58:42 [sched_op] CPU work request: 0.00 seconds; 0.00 devices No more work fetching as the minimum CE was satisfied, at 2.75 days [actually 3:03 days was the balance after the 15]. Truly have no idea if the GPU scheduling follows same logic with the SETI demonstration, but if 3 in one call fills 302,400 seconds, then there's still 7510.78 seconds to fill [old clients would] As what I wrote in the alpha mail list and the hysteresis function the AD setting has taken on [per John McLeod reply] and a 10 day deadline such as at WCG, you'll find that work-fetching, whence you've incremented the CE value near 5.0 days, altogether stops and Hi Prio processing on all cores starts working on the fetches... requesting zero seconds of CPU time. No chance to get 7 days of work, take the machine offline and come back a week later to upload the results. Not with 7.0.7, not with 7.0.11 Of course there is the 'simple' work around... I've put on a second client 6.10.58 install so 7 day caching is possible without causing a panic state in the scheduler. The on_frac and active_frac value near 1.0 tell things are OK to have 7 days work. I know how to do this [got 4 clients in fact on 1 machine which can be launched at will], but the unwitting upgrading volunteer cruncher wont. --//-- ID: 42176 ·

Gary Roberts Send message Joined: 7 Sep 05 Posts: 130	Message 42177 - Posted: 22 Jan 2012, 9:40:30 UTC - in response to Message 42176. ... but the unwitting upgrading volunteer cruncher wont. This person shouldn't be messing with 7.0.X in the first place. However, there are experienced people (like myself) who are forced into doing so because of the needs of a particular project. I want to support the testing of the (under early development) OpenCL app for Einstein (through the Albert@Home test project) and their minimum requirement is 7.0.11. I'm browsing here because I noticed that (amongst other things) after installing 7.0.11 on a test machine, my cache of regular Einstein tasks was not refilling. Thanks to your earlier message, I now understand why. I'm very grateful for that information. I normally run with a CE setting of zero and the tooltip (local pref settings) for 7.0.11 still says to use zero for an 'always connected' host. Perhaps the tooltips for CE and CA need to be modified to reflect the new 'low water mark' and 'high water mark' status of the two cache settings. I'm still confused by the various 'answers' to arkayn's original question in this thread. Like him, I too use a resource share of zero to create a backup project that only gets to run if the primary project can't supply. Correct me if I'm wrong, but it seems to me that the backup project concept is (for the time being) collateral damage until something is done to re-engineer it. That's quite OK for the moment since 'production' people needing backup projects probably shouldn't be using 7.0.X for a while yet. I wont have backup projects on any test hosts I set up. Cheers, Gary. ID: 42177 ·

zombie67 Send message Joined: 14 Feb 06 Posts: 139	Message 42182 - Posted: 22 Jan 2012, 15:12:44 UTC - in response to Message 42177. I'm still confused by the various 'answers' to arkayn's original question in this thread. Like him, I too use a resource share of zero to create a backup project that only gets to run if the primary project can't supply. Correct me if I'm wrong, but it seems to me that the backup project concept is (for the time being) collateral damage until something is done to re-engineer it. Yes, I would like some clarity on this situation too. I can see that it is broken. But what is not clear to me is if that was intentional, or an unanticipated consequence of changing something else. Also, it is not clear to me if the developers are aware of the problem. Reno, NV Team: SETI.USA ID: 42182 ·

arkayn Send message Joined: 21 Mar 09 Posts: 33	Message 42283 - Posted: 27 Jan 2012, 16:46:51 UTC Last modified: 27 Jan 2012, 17:18:35 UTC Looks like there is a new version to check and it includes this item. - client: fix divide-by-zero bug in calculation of priority of projects with zero resource share [edit] Nope, still keeps requesting work from a zero resource share project while my cache is full of primary work. [/edit] ID: 42283 ·

zombie67 Send message Joined: 14 Feb 06 Posts: 139	Message 42367 - Posted: 31 Jan 2012, 4:10:38 UTC - in response to Message 42182. Yes, I would like some clarity on this situation too. I can see that it is broken. But what is not clear to me is if that was intentional, or an unanticipated consequence of changing something else. Also, it is not clear to me if the developers are aware of the problem. Ageless? Do you know? Or should we be taking this up with the development team directly? Reno, NV Team: SETI.USA ID: 42367 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15483	Message 42379 - Posted: 31 Jan 2012, 18:32:34 UTC - in response to Message 42367. Last modified: 31 Jan 2012, 18:33:26 UTC Yes, please take it up with development. David Anderson to be precise, he's the most knowledgeable at this time. ID: 42379 ·

arkayn Send message Joined: 21 Mar 09 Posts: 33	Message 42521 - Posted: 11 Feb 2012, 0:15:50 UTC - in response to Message 42379. Yes, please take it up with development. David Anderson to be precise, he's the most knowledgeable at this time. I just sent off a email to the Alpha list on this matter. ID: 42521 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.