Message boards : BOINC client : Backup projects
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Mar 09 Posts: 33 |
How are backup projects going to be implemented since resource share is being ignored now. |
Send message Joined: 29 Aug 05 Posts: 15581 |
I don't understand, what is it you are asking? Mind elaborating so we don't have to guess at what you were thinking when you wrote the one sentence? |
Send message Joined: 21 Mar 09 Posts: 33 |
With BOINC versions 6.10 and 6.12 you could set a resource share of "0" and work would only be fetched when you were out of work on your primary projects. My primary projects are SETI and Milkyway with resource shares of 100 and I have Collatz, Einstein and Primegrid at resource shares of 0. I just installed 7.0.11 and BOINC starts requesting work from all projects irregardless of the resource share. 1/19/2012 21:04:46 | Collatz Conjecture | Sending scheduler request: To fetch work. I have since set NNT on those projects so they do not continually download work for now, but that does not help if I run out of work for one of my main projects. |
Send message Joined: 29 Aug 05 Posts: 15581 |
BOINC 7 uses a completely new method of calculating debt, with a from the ground up newly written scheduler. So the moment you installed 7.0.11, all previous debt values were forgotten, they're no longer used by this BOINC. So BOINC downloads one task from all the projects that are allowed to fetch work. You can easily set them back to ANT, as it won't do it again. |
Send message Joined: 21 Mar 09 Posts: 33 |
BOINC 7 uses a completely new method of calculating debt, with a from the ground up newly written scheduler. I don't think that is going to work because Milkyway does not allow for a large cache, currently it is 35 WU per GPU and BOINC is trying to fill the cache to the specified settings of 10 days and keeps requesting work from the other projects to do it. 1/20/2012 08:08:53 | PrimeGrid | work fetch resumed by user http://boinc.berkeley.edu/dev/sim_web.php?action=show_simulation&scen=50&sim=0 |
Send message Joined: 29 Aug 05 Posts: 15581 |
I don't think that is going to work because Milkyway does not allow for a large cache, currently it is 35 WU per GPU and BOINC is trying to fill the cache to the specified settings of 10 days and keeps requesting work from the other projects to do it. Yes, that's normal in this sense. Compare it to installing BOINC cleanly on a new system. Then all the projects will also try to fetch work for the full cache, including those with an RS of zero, since they haven't ever had to do work yet and BOINC doesn't know the RS for the projects yet. Your BOINC 7 knows nothing of the previous resource share settings. It has to learn again from the ground up. This will take time, so it's better to adjust the cache you have to something less. As long as you let BOINC learn by itself, and depending on the amount of projects you run, it can take a week for things to adjust and go back to 'normal'. |
Send message Joined: 14 Feb 06 Posts: 139 |
I am a bit confused. Are you saying that (after downloading a single task), BOINC 7.x will continue to request work from a project with a resource share of zero? At least until it somehow learns not to do so? Regardless of cache size, BOINC should not download work from a zero RS project until it is completely out of work, and there is not other way to prevent idle time. Right? Also, how does one get 7.x to actually d/l enough tasks to fill the cache to the set amount? I can't get any more tasks than just enough to keep the CPUs & GPUs running. Reno, NV Team: SETI.USA |
Send message Joined: 6 Jul 10 Posts: 585 |
The current test clients practically only listen to the "connect every xx days" for caching. The Additional Buffer does nothing [from the regular user perspective]. --//-- |
Send message Joined: 14 Feb 06 Posts: 139 |
The current test clients practically only listen to the "connect every xx days" for caching. The Additional Buffer does nothing [from the regular user perspective]. Ah, well that explains it. I have my set to connect every zero days + 1 day. Reno, NV Team: SETI.USA |
Send message Joined: 29 Aug 05 Posts: 15581 |
I am a bit confused. Are you saying that (after downloading a single task), BOINC 7.x will continue to request work from a project with a resource share of zero? At least until it somehow learns not to do so? Yes, but in this case the new BOINC does not know much of anything about all your projects. It is as if it contacts them for the very first time, and will therefore ask for 1 second worth of work from all projects eligible to ask work. It will respect projects on NNT. And sorry if I said this earlier, or that people understood this, but when your work request is for 10 days it won't follow that initially. As I said, it'll do 1 second requests for work. As for the cache being broken, I just increased my additional work request from 0.5 days to 3.5 days, left my connect to at 0.1 days. Normally on a connect to of 0.1 and an additional days worth you will not get a work request like this: 21/01/2012 19:41:24 | SETI@home | [sched_op] ATI work request: 309910.78 seconds; 0.00 devices (3.5 x 86400 = 302400 seconds, so the work request is correct) 21/01/2012 19:41:27 | SETI@home | Scheduler request completed: got 3 new tasks 21/01/2012 19:41:27 | SETI@home | [sched_op] estimated total ATI task duration: 85898 seconds Whether or not I can get that work in, is another thing. Load at Seti is maxed out for now. ;) |
Send message Joined: 14 Feb 06 Posts: 139 |
Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. A resource share of 0 is not working. It is continuing to download work, even when there is plenty of work from projects with a RS > 0. Reno, NV Team: SETI.USA |
Send message Joined: 23 Apr 07 Posts: 1112 |
Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. Try and think of the two Cache settings as a Minimum and Maximum amount of Cache that you'll have, that's how i think it works from what David has said. Claggy |
Send message Joined: 6 Jul 10 Posts: 585 |
Hmm. With 0 + 1, boinc will not download any work beyond just enough to keep all the cores and GPUs busy. No additional cache at all. Maybe "0" breaks the new logic somehow. I just changed it to .5 + .5 and it is downloading more work no. Yes, and what from testing comes out is, that unless the lower [non-zero, connect every] limit is passed, total WIP gets below, increasing the cache through the additional buffer setting has *no* effect. Knowing that a device will be off-line for 7 days e.g., you have to increase both the 'connect every' to above present WIP in queue and the 'additional buffer' values. To give it the KISS. --//-- |
Send message Joined: 6 Jul 10 Posts: 585 |
Question for Ageless is, have you tried your 0.1 days CE, 3.0 days AD setting on a multicore device say a duo, quad or an octo CPU processor device at a project that gives out work in limitation per call, a say a maximum 15 work units, which does not total the 309,910.78 seconds? I just tried on an octo with an increased AD to 3.5 days and raising the CE to 2.75 days and got 15 which lifted cache from 2.25 to 3.03 days, 4863 World Community Grid 22-1-2012 8:57:00 Requesting new tasks for CPU 4864 World Community Grid 22-1-2012 8:57:00 [sched_op] CPU work request: 2235483.27 seconds; 0.00 devices 4865 World Community Grid 22-1-2012 8:57:05 Scheduler request completed: got 15 new tasks 4866 World Community Grid 22-1-2012 8:57:05 [sched_op] Server version 601 4867 World Community Grid 22-1-2012 8:57:05 Project requested delay of 11 seconds 4868 World Community Grid 22-1-2012 8:57:05 [sched_op] estimated total CPU task duration: 337401 seconds 4869 World Community Grid 22-1-2012 8:57:05 [sched_op] Deferring communication for 11 sec after which, things fell silent and a manual update got: 4931 World Community Grid 22-1-2012 8:58:41 update requested by user 4932 World Community Grid 22-1-2012 8:58:42 [sched_op] Starting scheduler request 4933 World Community Grid 22-1-2012 8:58:42 Sending scheduler request: Requested by user. 4934 World Community Grid 22-1-2012 8:58:42 Not reporting or requesting tasks 4935 World Community Grid 22-1-2012 8:58:42 [sched_op] CPU work request: 0.00 seconds; 0.00 devices No more work fetching as the minimum CE was satisfied, at 2.75 days [actually 3:03 days was the balance after the 15]. Truly have no idea if the GPU scheduling follows same logic with the SETI demonstration, but if 3 in one call fills 302,400 seconds, then there's still 7510.78 seconds to fill [old clients would] As what I wrote in the alpha mail list and the hysteresis function the AD setting has taken on [per John McLeod reply] and a 10 day deadline such as at WCG, you'll find that work-fetching, whence you've incremented the CE value near 5.0 days, altogether stops and Hi Prio processing on all cores starts working on the fetches... requesting zero seconds of CPU time. No chance to get 7 days of work, take the machine offline and come back a week later to upload the results. Not with 7.0.7, not with 7.0.11 Of course there is the 'simple' work around... I've put on a second client 6.10.58 install so 7 day caching is possible without causing a panic state in the scheduler. The on_frac and active_frac value near 1.0 tell things are OK to have 7 days work. I know how to do this [got 4 clients in fact on 1 machine which can be launched at will], but the unwitting upgrading volunteer cruncher wont. --//-- |
Send message Joined: 7 Sep 05 Posts: 130 |
... but the unwitting upgrading volunteer cruncher wont. This person shouldn't be messing with 7.0.X in the first place. However, there are experienced people (like myself) who are forced into doing so because of the needs of a particular project. I want to support the testing of the (under early development) OpenCL app for Einstein (through the Albert@Home test project) and their minimum requirement is 7.0.11. I'm browsing here because I noticed that (amongst other things) after installing 7.0.11 on a test machine, my cache of regular Einstein tasks was not refilling. Thanks to your earlier message, I now understand why. I'm very grateful for that information. I normally run with a CE setting of zero and the tooltip (local pref settings) for 7.0.11 still says to use zero for an 'always connected' host. Perhaps the tooltips for CE and CA need to be modified to reflect the new 'low water mark' and 'high water mark' status of the two cache settings. I'm still confused by the various 'answers' to arkayn's original question in this thread. Like him, I too use a resource share of zero to create a backup project that only gets to run if the primary project can't supply. Correct me if I'm wrong, but it seems to me that the backup project concept is (for the time being) collateral damage until something is done to re-engineer it. That's quite OK for the moment since 'production' people needing backup projects probably shouldn't be using 7.0.X for a while yet. I wont have backup projects on any test hosts I set up. Cheers, Gary. |
Send message Joined: 14 Feb 06 Posts: 139 |
I'm still confused by the various 'answers' to arkayn's original question in this thread. Like him, I too use a resource share of zero to create a backup project that only gets to run if the primary project can't supply. Correct me if I'm wrong, but it seems to me that the backup project concept is (for the time being) collateral damage until something is done to re-engineer it. Yes, I would like some clarity on this situation too. I can see that it is broken. But what is not clear to me is if that was intentional, or an unanticipated consequence of changing something else. Also, it is not clear to me if the developers are aware of the problem. Reno, NV Team: SETI.USA |
Send message Joined: 21 Mar 09 Posts: 33 |
Looks like there is a new version to check and it includes this item. - client: fix divide-by-zero bug in calculation of priority of projects with zero resource share [edit] Nope, still keeps requesting work from a zero resource share project while my cache is full of primary work. [/edit] |
Send message Joined: 14 Feb 06 Posts: 139 |
Yes, I would like some clarity on this situation too. I can see that it is broken. But what is not clear to me is if that was intentional, or an unanticipated consequence of changing something else. Also, it is not clear to me if the developers are aware of the problem. Ageless? Do you know? Or should we be taking this up with the development team directly? Reno, NV Team: SETI.USA |
Send message Joined: 29 Aug 05 Posts: 15581 |
Yes, please take it up with development. David Anderson to be precise, he's the most knowledgeable at this time. |
Send message Joined: 21 Mar 09 Posts: 33 |
Yes, please take it up with development. David Anderson to be precise, he's the most knowledgeable at this time. I just sent off a email to the Alpha list on this matter. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.