STD and LTD go both to zero

Message boards : BOINC client : STD and LTD go both to zero
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 15970 - Posted: 18 Mar 2008, 11:39:02 UTC

STD and LTD go both to zero for all projects. So, BOINC don't get more new tasks for any project.

I've set the debt debugger and got this out:

18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project proteins@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project rosetta@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project Riesel Sieve Project: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project Cels@Home: STD -1.#IND00, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project Milkyway@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project SETI@home: STD -1.#IND00, LTD -1.#IND00
18-Mar-2008 10:39:30 [---] [debt_debug] adjust_debts(): project World Community Grid: STD -1.#IND00, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project proteins@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project rosetta@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project Riesel Sieve Project: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project Cels@Home: STD -1.#IND00, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project Milkyway@home: STD 0.000000, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project SETI@home: STD -1.#IND00, LTD -1.#IND00
18-Mar-2008 10:40:30 [---] [debt_debug] adjust_debts(): project World Community Grid: STD -1.#IND00, LTD -1.#IND00

OS: Vista / BOINC: 5.10.45 running as a service.
Stopping and reinitializing the service fix the problem for a while.

Any help?

Thanks in advance.
ID: 15970 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 15976 - Posted: 18 Mar 2008, 18:12:35 UTC - in response to Message 15970.  

-1.#IND00

That's not zero. That's negative infinity. o_o
ID: 15976 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 15977 - Posted: 18 Mar 2008, 19:19:11 UTC - in response to Message 15976.  

-1.#IND00

That's not zero. That's negative infinity. o_o


And is it a correct value or a bad one?

As far as I know the sum of STD for all projects must be zero and with some infinities in the sum... 8-O




Eppur si muove

ID: 15977 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 15979 - Posted: 18 Mar 2008, 20:32:44 UTC - in response to Message 15978.  

A practical solution is to exit BOINC, delete the client_state.xml and client_state_prev.xml and restart BOINC. That will eradicate anything that's causing this oddity.

That would make him lose all work.
ID: 15979 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 15981 - Posted: 18 Mar 2008, 20:35:22 UTC - in response to Message 15977.  

-1.#IND00

That's not zero. That's negative infinity. o_o


And is it a correct value or a bad one?

A very incorrect one; I have no idea how it managed to go to infinity. Probably some division by zero. I don't think it will ever go back to sanity on its own. BOINC doesn't fetch work from projects with negative LTD, but if they are negative infinity, they would take an infinite amount of time to get to zero again (ie. never).

ID: 15981 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 15983 - Posted: 18 Mar 2008, 22:23:19 UTC - in response to Message 15981.  

-1.#IND00

That's not zero. That's negative infinity. o_o


And is it a correct value or a bad one?

A very incorrect one; I have no idea how it managed to go to infinity. Probably some division by zero. I don't think it will ever go back to sanity on its own. BOINC doesn't fetch work from projects with negative LTD, but if they are negative infinity, they would take an infinite amount of time to get to zero again (ie. never).


This es second time the error occurs. The first, I stopped and reinitialize the service and everything went well for about a day.

The problem today was that I couldn't stop the service without loosing all the work of a CELS@H task, (12 cpu hours without intermediate checkpointing).

Once the Cels'wu has been finished I have reinit the service. This time I've used BoincDV for clearing the debts before starting. Hope that will be enough.

Everything seems running well by now.

Thank you for your opinions.

Eppur si muove

ID: 15983 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 15991 - Posted: 19 Mar 2008, 6:48:03 UTC

Where did you install BOINC to under Vista? The default c:\program files\BOINC directory, or outside the c:\program files directory (like c:\BOINC)?
ID: 15991 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 15994 - Posted: 19 Mar 2008, 11:17:40 UTC - in response to Message 15991.  

Where did you install BOINC to under Vista? The default c:program filesBOINC directory, or outside the c:program files directory (like c:BOINC)?


At the default directory: C:\Program Files\BOINC

Saludos, Rafa.
Eppur si muove

ID: 15994 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 16007 - Posted: 19 Mar 2008, 22:08:50 UTC - in response to Message 15994.  

That might be your problem. By default Vista does not allow programs to write to the c:\program files\ directory. A work around against this is to install BOINC in a separate directory, like C:\BOINC

To do so with your present install:
Exit BOINC.
Uninstall it through Add/Remove Programs.
Move the whole BOINC directory from C:\Program Files\BOINC to C:\BOINC, making sure you include all left-over files and sub-directories.
Install BOINC to the new directory. You can choose where to install BOINC to by changing the directory path of the Destination Folder in the 4th screen in the installer.

That should reset things as well.
ID: 16007 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 16011 - Posted: 20 Mar 2008, 15:06:40 UTC - in response to Message 16007.  

That might be your problem. By default Vista does not allow programs to write to the c:program files directory. A work around against this is to install BOINC in a separate directory, like C:BOINC

To do so with your present install:
Exit BOINC.
Uninstall it through Add/Remove Programs.
Move the whole BOINC directory from C:Program FilesBOINC to C:BOINC, making sure you include all left-over files and sub-directories.
Install BOINC to the new directory. You can choose where to install BOINC to by changing the directory path of the Destination Folder in the 4th screen in the installer.

That should reset things as well.


Well, BOINC have been moved to c:\BOINC. Now all I have to do is waiting for no news. ;-)

Thank you all, for your support.

Saludos, Rafa.
Eppur si muove

ID: 16011 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 16013 - Posted: 20 Mar 2008, 15:49:39 UTC

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.
ID: 16013 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 16039 - Posted: 22 Mar 2008, 16:56:00 UTC - in response to Message 16013.  
Last modified: 22 Mar 2008, 17:00:28 UTC

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.


It happened again.



It was just after reboot the machine. Besides the minus infinity, the rsrc was zero for each project (red rounded at picture). This time, BOINC was installed at C:\BOINC.

I'm thinking in two possible causes (at least):
*Either a security permission between the process and the files/directory where it must write.
*Or a problem with de project's rsrc asignment at the very begining of starting the service, which would cause the division by zero.

Referring to the first issue, I've changed the account which runs the service from my user to Local System, giving total access to the directory and files under C:BOINC to SYSTEM and Local Service accounts.

Finally I've checked than stop and start the service causes the STD and LTD starting to calculate well, i.e. it isn't necessary to reset the counters. The first value shown by BOINC after starting now is -1.

Saludos,
Eppur si muove

ID: 16039 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16047 - Posted: 23 Mar 2008, 17:12:05 UTC - in response to Message 16039.  

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.


It happened again.



It was just after reboot the machine. Besides the minus infinity, the rsrc was zero for each project (red rounded at picture). This time, BOINC was installed at C:BOINC.

I'm thinking in two possible causes (at least):
*Either a security permission between the process and the files/directory where it must write.
*Or a problem with de project's rsrc asignment at the very begining of starting the service, which would cause the division by zero.

Referring to the first issue, I've changed the account which runs the service from my user to Local System, giving total access to the directory and files under C:BOINC to SYSTEM and Local Service accounts.

Finally I've checked than stop and start the service causes the STD and LTD starting to calculate well, i.e. it isn't necessary to reset the counters. The first value shown by BOINC after starting now is -1.

Saludos,

What are the resource shares of the projects on the web sites? WHat are they in the client_state.xml file, and what are in the reply files?

BOINC WIKI
ID: 16047 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 16049 - Posted: 23 Mar 2008, 21:12:20 UTC - in response to Message 16047.  
Last modified: 23 Mar 2008, 21:19:29 UTC

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.


It happened again.



It was just after reboot the machine. Besides the minus infinity, the rsrc was zero for each project (red rounded at picture). This time, BOINC was installed at C:BOINC.

I'm thinking in two possible causes (at least):
*Either a security permission between the process and the files/directory where it must write.
*Or a problem with de project's rsrc asignment at the very begining of starting the service, which would cause the division by zero.

Referring to the first issue, I've changed the account which runs the service from my user to Local System, giving total access to the directory and files under C:BOINC to SYSTEM and Local Service accounts.

Finally I've checked than stop and start the service causes the STD and LTD starting to calculate well, i.e. it isn't necessary to reset the counters. The first value shown by BOINC after starting now is -1.

Saludos,

What are the resource shares of the projects on the web sites? WHat are they in the client_state.xml file, and what are in the reply files?


I'm sorry, I'dont know what the reply files are.
But I've looked at client_state files, at BAM resources and at every project account as well. The results:


When I first started to use BAM, I rather assigned resources to every computer instead of assigning generic values in the resources page. Then I put zero to all projects there. ¿Can be these the cause of STD and LTD going to minus infinity? ¿Why it doesn't happens with the other two computers? The three computers are very different between them and the error have always happened at the same machine.

*EDIT*
I have put all projects resources to 1 at BAM resources page. I've checked this value has transmitted to the project's web. ¿Will these solution the problem?

Saludos, Rafa.
Eppur si muove

ID: 16049 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16051 - Posted: 24 Mar 2008, 3:06:02 UTC - in response to Message 16049.  

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.


It happened again.



It was just after reboot the machine. Besides the minus infinity, the rsrc was zero for each project (red rounded at picture). This time, BOINC was installed at C:BOINC.

I'm thinking in two possible causes (at least):
*Either a security permission between the process and the files/directory where it must write.
*Or a problem with de project's rsrc asignment at the very begining of starting the service, which would cause the division by zero.

Referring to the first issue, I've changed the account which runs the service from my user to Local System, giving total access to the directory and files under C:BOINC to SYSTEM and Local Service accounts.

Finally I've checked than stop and start the service causes the STD and LTD starting to calculate well, i.e. it isn't necessary to reset the counters. The first value shown by BOINC after starting now is -1.

Saludos,

What are the resource shares of the projects on the web sites? WHat are they in the client_state.xml file, and what are in the reply files?


I'm sorry, I'dont know what the reply files are.
But I've looked at client_state files, at BAM resources and at every project account as well. The results:


When I first started to use BAM, I rather assigned resources to every computer instead of assigning generic values in the resources page. Then I put zero to all projects there. ¿Can be these the cause of STD and LTD going to minus infinity? ¿Why it doesn't happens with the other two computers? The three computers are very different between them and the error have always happened at the same machine.

*EDIT*
I have put all projects resources to 1 at BAM resources page. I've checked this value has transmitted to the project's web. ¿Will these solution the problem?

Saludos, Rafa.

sched_reply_*.xml

could you also look in

account_*.xml?

These will come one per project.

BOINC WIKI
ID: 16051 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 16056 - Posted: 24 Mar 2008, 11:35:41 UTC - in response to Message 16051.  

If it doesn't fix the infinite LTD numbers, make sure to reset each project when it's got no work anymore in your queue (everything upload/reported).

Another way to reset all the debts is with BOINC DV.
You need to make sure BOINC isn't running when you reset the debts.


It happened again.



It was just after reboot the machine. Besides the minus infinity, the rsrc was zero for each project (red rounded at picture). This time, BOINC was installed at C:BOINC.

I'm thinking in two possible causes (at least):
*Either a security permission between the process and the files/directory where it must write.
*Or a problem with de project's rsrc asignment at the very begining of starting the service, which would cause the division by zero.

Referring to the first issue, I've changed the account which runs the service from my user to Local System, giving total access to the directory and files under C:BOINC to SYSTEM and Local Service accounts.

Finally I've checked than stop and start the service causes the STD and LTD starting to calculate well, i.e. it isn't necessary to reset the counters. The first value shown by BOINC after starting now is -1.

Saludos,

What are the resource shares of the projects on the web sites? WHat are they in the client_state.xml file, and what are in the reply files?


I'm sorry, I'dont know what the reply files are.
But I've looked at client_state files, at BAM resources and at every project account as well. The results:


When I first started to use BAM, I rather assigned resources to every computer instead of assigning generic values in the resources page. Then I put zero to all projects there. ¿Can be these the cause of STD and LTD going to minus infinity? ¿Why it doesn't happens with the other two computers? The three computers are very different between them and the error have always happened at the same machine.

*EDIT*
I have put all projects resources to 1 at BAM resources page. I've checked this value has transmitted to the project's web. ¿Will these solution the problem?

Saludos, Rafa.

sched_reply_*.xml

could you also look in

account_*.xml?

These will come one per project.


The new complete table:


Notes:
1)I've added other row with resource values founded at acc_mgr_reply.xml
2)Zero values wich appear at BAM!Resources and Project's web were updated to 1 yesterday
3)WCG files include a value for each venue. I was using home venue. Obviously, the web site did not receive the value from BAM when I updated to one yesterday. I've done today directly at WCG web site.

This morning I've seen that my laptop showed the same trouble. So it is not machine dependent.

After reinit the client:

I have to force an updating from BAM for the client to take the resource values that I want it to have.

I rebooted the machine yesterday, I think that it was before the updating of the BAM resources page to 1. I hadn't looked at it untill now because I was completely focused on the other machine :(
Eppur si muove

ID: 16056 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16062 - Posted: 24 Mar 2008, 18:20:51 UTC

1E-1#IND00 is probably self sustaining. If the values have not changed from this to more reasonable values, please change them to 0.0.

Please monitor for a couple of days to make certain that they do not change back to infinity. I believe the problem was the 0 values that were being seen in the reply and account files.

BOINC WIKI
ID: 16062 · Report as offensive
rpperezr

Send message
Joined: 18 Mar 08
Posts: 11
Spain
Message 16063 - Posted: 24 Mar 2008, 20:02:20 UTC - in response to Message 16062.  

1E-1#IND00 is probably self sustaining. If the values have not changed from this to more reasonable values, please change them to 0.0.

Please monitor for a couple of days to make certain that they do not change back to infinity. I believe the problem was the 0 values that were being seen in the reply and account files.


I agree. I'll do the monitoring and report.

Thanks a lot. Muchas gracias.
Eppur si muove

ID: 16063 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 16073 - Posted: 25 Mar 2008, 8:47:12 UTC


Should the BAM resource page validate the figures to ensure they're > 0 ? (if they don't already - I don't use BAM so I don't know).


ID: 16073 · Report as offensive
John McLeod VII
Avatar

Send message
Joined: 29 Aug 05
Posts: 147
Message 16075 - Posted: 25 Mar 2008, 10:50:40 UTC - in response to Message 16073.  


Should the BAM resource page validate the figures to ensure they're > 0 ? (if they don't already - I don't use BAM so I don't know).


It would be good if they did. I believe that we may have to add some validation code to the client.

BOINC WIKI
ID: 16075 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : STD and LTD go both to zero

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.