can't download new work units

Message boards : BOINC client : can't download new work units
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Thund3rb1rd

Send message
Joined: 17 Apr 08
Posts: 22
United States
Message 16700 - Posted: 17 Apr 2008, 17:51:42 UTC

I'm not sure this is the right place for this question, so if it isn't, if someone can redirect me, I'd be much obliged.

I currently work on 6 projects. With the exception of lhc@home, up until about 48 hours ago as I write this, my client was happily downloading and managing work units.

Now, all of my projects are drying up and when the client contacts the project, it asks for zero w/u. Since this is happening for five of the six projects, I sort of suspect the boinc client.

I have double-checked the disk space parameters and as near as I can tell, there should be no holdup there.

I first noticed this when climateprediction@home completed a run but wouldn't download another model. Then, one by one, einstein@home, milkyway@home and finally, setiathome all followed suit. orbit@home and lhc@home never have any work anyway, but at least orbit@home asks for work units.

What should I be looking at?

cheers

bob graham
ID: 16700 · Report as offensive
SekeRob

Send message
Joined: 25 Aug 06
Posts: 1596
Message 16701 - Posted: 17 Apr 2008, 18:01:42 UTC - in response to Message 16700.  
Last modified: 17 Apr 2008, 18:02:03 UTC

could be a myriad of things with that project mix. To start, try to Reset the projects that have no work at the moment. That should initiate at least the download of the science applications before a try to download work.
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 16701 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4763
United Kingdom
Message 16702 - Posted: 17 Apr 2008, 18:34:20 UTC - in response to Message 16700.  

What should I be looking at?

One thing would be the time statistics for the computer.

On any project website, towards the bottom of your computer details, you should see a block like:

% of time BOINC client is running 
While BOINC running, % of time work is allowed 
Average CPU efficiency 
Task duration correction factor

What figures do you see there?
ID: 16702 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16704 - Posted: 17 Apr 2008, 23:24:10 UTC
Last modified: 17 Apr 2008, 23:25:20 UTC

What are the <long_term_debt> values for each project?

What is the connect ever X value (<work_buf_min_queue>).

[edit]
How many CPUs does the computer have?

What is the remaining CPU time for each task? What project does each task belong to?

BOINC WIKI
ID: 16704 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16719 - Posted: 18 Apr 2008, 19:26:11 UTC - in response to Message 16700.  

I first noticed this when climateprediction@home completed a run but wouldn't download another model.

That is a feature.

Suppose you run CPDN along with project X. Project X has short workunits, and the deadlines are quite long enough. At some point, your computer gets in "deadline risk" with CPDN. If it continues getting work from X, it may delay the CPDN workunit too much and may not meet the deadline. So BOINC stops getting work from X.

However, once CPDN is done, it stops getting work from it. Why? Because if it got another model, it would get into the same trouble *again*, and you would end up crunching a lot more CPDN than X (not following your resource shares). So BOINC stops getting work from CPDN until it computed enough of X to compensate.

This is remembered by keeping "debts" for each project (John McLeod would explain the exact mechanism much better than me, if you're interested).

So, maybe some of your current work is in deadline trouble (risk of not meeting deadline if it got more work), or maybe Orbit has a high debt (which makes the rest have a negative debt).

ID: 16719 · Report as offensive
Douglas Hornick

Send message
Joined: 13 May 06
Posts: 2
United States
Message 16744 - Posted: 19 Apr 2008, 23:13:26 UTC - in response to Message 16719.  

I first noticed this when climateprediction@home completed a run but wouldn't download another model.

That is a feature.




I have the same problem while running seti, einstein and rosetta. Has nothing to do with the particular projects because it did it when I had only seti and einstein. Clicking reset doesn't do anything. I detach the project that's stopped downloading and then reattach. Downloading starts immediately and everything is fine again until the next time.
ID: 16744 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16745 - Posted: 19 Apr 2008, 23:44:02 UTC - in response to Message 16744.  

I first noticed this when climateprediction@home completed a run but wouldn't download another model.

That is a feature.




I have the same problem while running seti, einstein and rosetta. Has nothing to do with the particular projects because it did it when I had only seti and einstein. Clicking reset doesn't do anything. I detach the project that's stopped downloading and then reattach. Downloading starts immediately and everything is fine again until the next time.

Before you do the detach / attach, what are the values for the long_term_debts for all of the projects?

BOINC WIKI
ID: 16745 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14954
Netherlands
Message 16746 - Posted: 19 Apr 2008, 23:52:52 UTC

To check Long Term Debt, use BOINCDV. Unzip it in your BOINC directory (or BOINC Data directory if you use BOINC 6), run it, then copy the contents to here.
ID: 16746 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 16748 - Posted: 20 Apr 2008, 13:14:26 UTC
Last modified: 20 Apr 2008, 13:16:20 UTC

The problem may be due to the way that LTD is accumulated for projects which are starved of work on the client and have no work available on the server.

A project starts accumulating LTD as soon as its communication deferral timeout expires and continues to do so until you send a scheduler request asking for more work and the request times out or receives a no work scheduler reply. There are 2 conditions under which the effect is amplified:

  • the deferral timeout expires while networking is disabled. This causes the project to accumulate LTD until networking is re-enabled and the request/reply sequence is performed.

  • the work load on other projects means your scheduler request doesn't request more work. In this case LTD will continue to accumulate for all of the next deferral period.


This can result in work starved projects accumulating a massive positive LTD. I was up to 1.5 million on APS 5 months ago before I spotted this. All other projects (MCDN, CPDN, CPDN beta) had negative LTD and wouldn't request more work until their LTD had increased to -3600 (not a problem for me because CPDN and CPDN beta had loads of work that MCDN could work off its debt against).

I can easily envisage this leading to a situation where the client work queue for all projects with work available on the server has been run down and they all have LTD less than -3600. When this happens no work can be downloaded for those projects until the other project has worked off enough of its LTD.

If a project is work starved on the client it should stop accumulating LTD at the expense of projects which do until the server gives it some work.

Raised as Trac Ticket 622.


"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 16748 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16749 - Posted: 20 Apr 2008, 17:18:09 UTC - in response to Message 16748.  

The problem may be due to the way that LTD is accumulated for projects which are starved of work on the client and have no work available on the server.

A project starts accumulating LTD as soon as its communication deferral timeout expires and continues to do so until you send a scheduler request asking for more work and the request times out or receives a no work scheduler reply. There are 2 conditions under which the effect is amplified:

  • the deferral timeout expires while networking is disabled. This causes the project to accumulate LTD until networking is re-enabled and the request/reply sequence is performed.

  • the work load on other projects means your scheduler request doesn't request more work. In this case LTD will continue to accumulate for all of the next deferral period.


This can result in work starved projects accumulating a massive positive LTD. I was up to 1.5 million on APS 5 months ago before I spotted this. All other projects (MCDN, CPDN, CPDN beta) had negative LTD and wouldn't request more work until their LTD had increased to -3600 (not a problem for me because CPDN and CPDN beta had loads of work that MCDN could work off its debt against).

I can easily envisage this leading to a situation where the client work queue for all projects with work available on the server has been run down and they all have LTD less than -3600. When this happens no work can be downloaded for those projects until the other project has worked off enough of its LTD.

If a project is work starved on the client it should stop accumulating LTD at the expense of projects which do until the server gives it some work.

Raised as Trac Ticket 622.


The basic problem is it cannot be known if the project is work starved or not. If the queue is less than what the user has asked for, more work is downloaded, even if the project from which it is downloaded must be one of those with negative LTD. This is true for the most recent clients.

I would be really upset if the following happened:

S@H does not have work because of a Tuesday outage.
CPDN downloads a task and instantly goes into EDF for a year.
The LTD does not change because S@H was out of work during the last contact. This is NOT reflecting my resource allocation at all.

BOINC WIKI
ID: 16749 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 16750 - Posted: 20 Apr 2008, 18:27:48 UTC - in response to Message 16749.  

...
I would be really upset if the following happened:

S@H does not have work because of a Tuesday outage.
CPDN downloads a task and instantly goes into EDF for a year.
The LTD does not change because S@H was out of work during the last contact. This is NOT reflecting my resource allocation at all.


But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well. There are a number of projects now which have been out of work for months or even over a year, and they're building up an incredible positive balance (which to my mind is just as bad as your example).

ID: 16750 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16752 - Posted: 20 Apr 2008, 18:42:01 UTC - in response to Message 16750.  

But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well.

BOINC provides no way to "poll" a project like that. The only way for the client to know if a project has work, is by requesting work.

ID: 16752 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16753 - Posted: 20 Apr 2008, 19:25:17 UTC - in response to Message 16752.  
Last modified: 20 Apr 2008, 19:25:55 UTC

But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well.

BOINC provides no way to "poll" a project like that. The only way for the client to know if a project has work, is by requesting work.

And if you get work, that is just added to the list of tasks that need to be done - whether there is time for it or not.

Another point. If it is once a week, and you hit the S@H weekly outage every single time?

BOINC WIKI
ID: 16753 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 16754 - Posted: 20 Apr 2008, 20:50:52 UTC - in response to Message 16752.  

...
BOINC provides no way to "poll" a project like that. The only way for the client to know if a project has work, is by requesting work.


The WCG project polls back every week, I don't know what mechanism they use ... (there are no workunits from WCG on my computer, and it is set to 'no more work').

ID: 16754 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16755 - Posted: 20 Apr 2008, 21:32:06 UTC - in response to Message 16754.  

...
BOINC provides no way to "poll" a project like that. The only way for the client to know if a project has work, is by requesting work.


The WCG project polls back every week, I don't know what mechanism they use ... (there are no workunits from WCG on my computer, and it is set to 'no more work').

The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week. This is also set up by the projects, and if the project does not indicate a need to check in on occasion, the BOINC client will not. Some projects have extremely overloaded DataBases, and do not want clients to phone in unless they are asking for more work, or reporting completed work (preferrably both at the same time).

BOINC WIKI
ID: 16755 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14954
Netherlands
Message 16756 - Posted: 20 Apr 2008, 22:57:15 UTC - in response to Message 16755.  

The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week.

Another idea then, how about a maximum for the LTD in positive and negative numbers? Is that doable?
ID: 16756 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16757 - Posted: 20 Apr 2008, 23:03:36 UTC - in response to Message 16756.  

The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week.

Another idea then, how about a maximum for the LTD in positive and negative numbers? Is that doable?

Not unless they are too large to be a meaningful cap. Think of running CPDN for a couple of years in EDF.

BOINC WIKI
ID: 16757 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14954
Netherlands
Message 16759 - Posted: 20 Apr 2008, 23:47:53 UTC - in response to Message 16757.  
Last modified: 20 Apr 2008, 23:48:12 UTC

OK, when the maximum is reached, reset the LTD to near zero (say 1,000 either way). You just need to define the maximum, which isn't going to be easy.

Perhaps going back to the polling part, when a project has no work, the message usually comes back as "no work from project". Isn't it possible to add these up and when it comes to a given amount, that the LTD counting stops as if the project is suspended or on NNT? Complications would be the amount of time you're deferred, but if those can be set at a standard 1 hour and you'd take a day as maximum, then after 24 times of "no work from project" the counting of LTD is frozen, until you get work again and the counter is reset.

If a project is completely off line and doesn't even have a scheduler, what happens to the LTD then?
ID: 16759 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16760 - Posted: 20 Apr 2008, 23:57:35 UTC - in response to Message 16759.  

OK, when the maximum is reached, reset the LTD to near zero (say 1,000 either way). You just need to define the maximum, which isn't going to be easy.

Perhaps going back to the polling part, when a project has no work, the message usually comes back as "no work from project". Isn't it possible to add these up and when it comes to a given amount, that the LTD counting stops as if the project is suspended or on NNT? Complications would be the amount of time you're deferred, but if those can be set at a standard 1 hour and you'd take a day as maximum, then after 24 times of "no work from project" the counting of LTD is frozen, until you get work again and the counter is reset.

If a project is completely off line and doesn't even have a scheduler, what happens to the LTD then?

You only get the "no work from project" message if you actually ask for work. In the situation we are talking about, asking for work is not what we want to do.

Resetting LTD violates the long term resource shares you set. There is another group of people that would really like to have the LTD calculated in all cases, even if the project is not contactable for long periods of time.

The most recent client will keep your queue full of work from projects below the cutoff if there is no work available from projects above the cutoff.

BOINC WIKI
ID: 16760 · Report as offensive
Keck_Komputers
Avatar

Send message
Joined: 29 Aug 05
Posts: 304
United States
Message 16763 - Posted: 21 Apr 2008, 9:55:50 UTC - in response to Message 16760.  

Resetting LTD violates the long term resource shares you set. There is another group of people that would really like to have the LTD calculated in all cases, even if the project is not contactable for long periods of time.

For example how will LHC, BURP, or AIS ever come close to getting it's correct share if the LTD is reset? All of those projects have intermittant work supplies and/or very restrictive limits on tasks in progress. I would love to add RALPH and SETI beta to that list. In my opinion any test project should only give work when they have something to test.
BOINC WIKI

BOINCing since 2002/12/8
ID: 16763 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : can't download new work units

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.