Thread 'Won't finish in time ...'

Message boards : Server programs : Won't finish in time ...
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 21298 - Posted: 17 Nov 2008, 8:26:50 UTC
Last modified: 17 Nov 2008, 8:29:06 UTC

The calculation of the project ressource share is (still) wrong :

Message from server: (won't finish in time) Computer on 53.9% of time, BOINC on 100.0% of that, this project gets 2.7% of that


2.7% is simply wrong, the project in question currently has 50% on that machine, which equals 100% of one CPU.

All other projects are inactive or set to "no new work".

This bug hits only on those few projects with a very short deadline and only on computers, that are attached to lots of projects, so it is probably quite a rare problem.

It exists for years already though, it would be nice if someone could take a look at it sometimes.


p.s.: I am aware that the server side scheduler does not know about inactive projects on the client side - but producing an error message without knowing the facts doesn't sound right.
ID: 21298 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 21300 - Posted: 17 Nov 2008, 9:43:52 UTC - in response to Message 21298.  

I agree with you, but - as a "workaround" - did you try to suspend the inactive/NNT projects? I think having read somewhere that this might "convince" boinc of no longer counting them in the resource share. It might mess up the long term debt, though.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 21300 · Report as offensive
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 21313 - Posted: 18 Nov 2008, 8:24:16 UTC
Last modified: 18 Nov 2008, 8:46:32 UTC

Suspending or setting to "no new work" doesn't make a difference, both types of inactive seem to be treated the same.

I have changed the 53.9% uptime fraction to 99% now (the low value has been a result of vacations) and detached from 2 projects that do not exist anymore anyway.

Those two changes together have been sufficient to allow new work.


p.s.: I'm always using this box to test new projects, that's why it is attached to nearly all projects where I have an account, including test projects. One project (running on canis.csc.ncsu.edu) seems to have been just a test setup for Anansi, I never received any work from it. The other one was SciLINK, which has been stopped due to the heavy traffic it had caused.

p.p.s.: The concept of Long Term Debts is a mess anyway (especially in combination with CPDN), I have a workaround for it though, that resets all values to 0 on a client restart.
ID: 21313 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 21320 - Posted: 18 Nov 2008, 19:05:29 UTC - in response to Message 21316.  

p.p.s.: The concept of Long Term Debts is a mess anyway (especially in combination with CPDN), I have a workaround for it though, that resets all values to 0 on a client restart.


That fails to respect your resource shares. A better solution would be to add code to the scheduler (or some other module) that diagnoses the situation and prescribes corrective actions to the user. The corrective actions would be to either:

1) reduce the resource shares for projects which cannot supply enough work to meet the shares, or
2) detach from the project, or
3) reset the debts

Reseting the debts on EVERY client restart is not a good idea when users are attached to projects that DO supply enough work to meet their shares.

The concept of Long Term Debts is sound as long as you attach to projects that can provide enough work to meet your resource shares. The scheduler attempts to respect shares that simply cannot be met. In other words it attempts to do the impossible in certain combinations of projects and shares. It cannot do the impossible so why not just have the software tell the user like "this cannot work, you need to make some adjustments". THAT would be sane. Attempting to do the impossible over and over and over every friggin day is NOT sane.

Easy to say, probably not so easy to code it. And I don't know how CPDN affects it, just some thoughts, FWIW.


That won't work either. It ignores or abuses the participants that want to run 50 projects on a 500mhz CPU. In that case there is no way to respect the resource shares in a short time. It may take years to balance even with short tasks and long deadlines. The current system is not perfect however I have yet to see an alternative suggestion that is as good at handling all possible cases.
BOINC WIKI

BOINCing since 2002/12/8
ID: 21320 · Report as offensive
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 21385 - Posted: 19 Nov 2008, 8:43:27 UTC
Last modified: 19 Nov 2008, 8:55:49 UTC

No need for lots of projects with a non-steady WU flow. I'll give a simple example, where LTD is messy :


On a dual task machine, load one long running CPDN model plus any other project for the second task.

While the CPDN model runs, it piles up hundreds of thousands of LTD, even if it does crunch all the time, whereas the second project gets the same value as negative debits.

After a month, add a third project - it will start with zero debits - but as the second project has piled up so many negative debits, it will download only stuff from the third project for several weeks.

The user will go to that project and blame the project developers, that their project doesn't respect his share settings. The answer is always the same : Edit your client_state.xml and remove all lines with LTD tags.

I have read lots of threads like that.


Imo., the short term debits are good enough for the CPU time distribution, people understand that. It can easily happen, that a project is supposed to crunch (positive STD) but isn't allowed to download work (negative LTD). As LTD affects the caching, this can even make the client ignore the cache settings completely and download a new WU only when the cache is totally empty.


LTD works fine only in one situation : All projects get attached at the same time, no project gets a reset and all projects deliver a constant WU flow.


There are projects with long running WUs out there, there are projects with a sparse flow of WUs and there are projects that pop up and disappear after a few months, so BOINC needs to get along with those somehow.


edit : I guess it would help, to use the trickle_ups for the decay of LTDs on projects that use trickles. It would not solve the problem completely, but at least it would fix the major LTD problem that all CPDN crunchers have.
ID: 21385 · Report as offensive
Matt Lowe

Send message
Joined: 20 Jul 09
Posts: 6
United States
Message 26267 - Posted: 26 Jul 2009, 17:40:38 UTC - in response to Message 21385.  

Here is an idea to tweak the current debt setup -- cap the long term debt to a amount, based on estimated completion times, cache settings, system benchmarks and on the projects being run, and have it recalculate the cap whenever a project is added/deleted. This could possibly prevent the cache setting problems that occur when projects cant maintain a steady flow of work for whatever reason. This would also have the benefit of allowing people to add long term workunit project like AQUA or CPDN without messing up work for other projects....
ID: 26267 · Report as offensive

Message boards : Server programs : Won't finish in time ...

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.