Message boards : Server programs : Won't finish in time ...
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jun 06 Posts: 305 |
The calculation of the project ressource share is (still) wrong : Message from server: (won't finish in time) Computer on 53.9% of time, BOINC on 100.0% of that, this project gets 2.7% of that 2.7% is simply wrong, the project in question currently has 50% on that machine, which equals 100% of one CPU. All other projects are inactive or set to "no new work". This bug hits only on those few projects with a very short deadline and only on computers, that are attached to lots of projects, so it is probably quite a rare problem. It exists for years already though, it would be nice if someone could take a look at it sometimes. p.s.: I am aware that the server side scheduler does not know about inactive projects on the client side - but producing an error message without knowing the facts doesn't sound right. |
Send message Joined: 20 Dec 07 Posts: 1069 |
I agree with you, but - as a "workaround" - did you try to suspend the inactive/NNT projects? I think having read somewhere that this might "convince" boinc of no longer counting them in the resource share. It might mess up the long term debt, though. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) |
Send message Joined: 27 Jun 06 Posts: 305 |
Suspending or setting to "no new work" doesn't make a difference, both types of inactive seem to be treated the same. I have changed the 53.9% uptime fraction to 99% now (the low value has been a result of vacations) and detached from 2 projects that do not exist anymore anyway. Those two changes together have been sufficient to allow new work. p.s.: I'm always using this box to test new projects, that's why it is attached to nearly all projects where I have an account, including test projects. One project (running on canis.csc.ncsu.edu) seems to have been just a test setup for Anansi, I never received any work from it. The other one was SciLINK, which has been stopped due to the heavy traffic it had caused. p.p.s.: The concept of Long Term Debts is a mess anyway (especially in combination with CPDN), I have a workaround for it though, that resets all values to 0 on a client restart. |
Send message Joined: 29 Aug 05 Posts: 304 |
p.p.s.: The concept of Long Term Debts is a mess anyway (especially in combination with CPDN), I have a workaround for it though, that resets all values to 0 on a client restart. That won't work either. It ignores or abuses the participants that want to run 50 projects on a 500mhz CPU. In that case there is no way to respect the resource shares in a short time. It may take years to balance even with short tasks and long deadlines. The current system is not perfect however I have yet to see an alternative suggestion that is as good at handling all possible cases. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 27 Jun 06 Posts: 305 |
No need for lots of projects with a non-steady WU flow. I'll give a simple example, where LTD is messy : On a dual task machine, load one long running CPDN model plus any other project for the second task. While the CPDN model runs, it piles up hundreds of thousands of LTD, even if it does crunch all the time, whereas the second project gets the same value as negative debits. After a month, add a third project - it will start with zero debits - but as the second project has piled up so many negative debits, it will download only stuff from the third project for several weeks. The user will go to that project and blame the project developers, that their project doesn't respect his share settings. The answer is always the same : Edit your client_state.xml and remove all lines with LTD tags. I have read lots of threads like that. Imo., the short term debits are good enough for the CPU time distribution, people understand that. It can easily happen, that a project is supposed to crunch (positive STD) but isn't allowed to download work (negative LTD). As LTD affects the caching, this can even make the client ignore the cache settings completely and download a new WU only when the cache is totally empty. LTD works fine only in one situation : All projects get attached at the same time, no project gets a reset and all projects deliver a constant WU flow. There are projects with long running WUs out there, there are projects with a sparse flow of WUs and there are projects that pop up and disappear after a few months, so BOINC needs to get along with those somehow. edit : I guess it would help, to use the trickle_ups for the decay of LTDs on projects that use trickles. It would not solve the problem completely, but at least it would fix the major LTD problem that all CPDN crunchers have. |
Send message Joined: 20 Jul 09 Posts: 6 |
Here is an idea to tweak the current debt setup -- cap the long term debt to a amount, based on estimated completion times, cache settings, system benchmarks and on the projects being run, and have it recalculate the cap whenever a project is added/deleted. This could possibly prevent the cache setting problems that occur when projects cant maintain a steady flow of work for whatever reason. This would also have the benefit of allowing people to add long term workunit project like AQUA or CPDN without messing up work for other projects.... |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.