1 of 2 cores running

Message boards : BOINC client : 1 of 2 cores running
Message board moderation

To post messages, you must log in.

AuthorMessage
Justin Norman

Send message
Joined: 5 Feb 08
Posts: 4
United States
Message 15236 - Posted: 5 Feb 2008, 21:47:12 UTC


I am using BOINC v5.13 to run Seti@home on a dual core Intel processor. It has run flawlessly for the last year or so. Then about two weeks ago one project continues to run just fine, the other seems to be permanently stuck - as in the hours keep mounting up (last I check 260+) but it still says 9 hours to completion with 0% completed. As I said, the other core is running fine and continues to process and run the units. I tried un-installing and re-installing and identical results - one "tab" in the basic view is running and the other one appears not to be. There is no indication in the logs of anything abnormal (although I am not an expert). I could understand if the whole thing was not running or if the servers were down and couldn't fetch work, but this is weird. Thanks for the help! -jn
ID: 15236 · Report as offensive
Justin Norman

Send message
Joined: 5 Feb 08
Posts: 4
United States
Message 15237 - Posted: 5 Feb 2008, 21:47:50 UTC

Sorry, thats 5.10.30, not 5.13 - typing too fast!
ID: 15237 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 15238 - Posted: 5 Feb 2008, 22:06:17 UTC

Please switch to Advanced view and check what it says in the Status column of the Tasks tab.
ID: 15238 · Report as offensive
Justin Norman

Send message
Joined: 5 Feb 08
Posts: 4
United States
Message 15246 - Posted: 6 Feb 2008, 16:35:43 UTC

It says "running" although it appears to not be. Of course one of the units says "ready to upload" but if you look in the log it was actually uploaded sucessfully about 2 hours before so I am not sure how much stock I can put in the messages under the advanced view, task list, status column. If it is of interest there are 14 seti units ready to start (all with deadlines starting about 1.5 weeks away) - I am not sure if this is normal or not to have so many downloaded and ready - it might be completely irrelevant.
ID: 15246 · Report as offensive
Profile Ananas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 15248 - Posted: 6 Feb 2008, 17:00:21 UTC

You could try to abort the stuck one, the project will send it to someone else then.

You can do that from the "Tasks" tab in the advanced view, hilite the result that causes problems and click the "abort" button.
ID: 15248 · Report as offensive
Justin Norman

Send message
Joined: 5 Feb 08
Posts: 4
United States
Message 15249 - Posted: 6 Feb 2008, 17:55:54 UTC

Eureka! I didn't realize that it was so easy, I kind of feel like a heel now. It seems to be progressing just fine on the next unit. Although I would be interested as to how this happened, it might be beyond my technical ability to figure it out. If I can help in any way providing info let me know.

Thanks again for your quick and helpful responses.
ID: 15249 · Report as offensive
Profile Ananas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 15251 - Posted: 6 Feb 2008, 19:57:06 UTC
Last modified: 6 Feb 2008, 20:02:07 UTC

Before you start to look for the problem on your side, watch the workunit and see how the other hosts handle it.

If they cannot finish it either, it must be a problem in the workunit itself and the program handling it.

As it did use CPU time, it must have gone into an endless loop. That happens when a program has a loop with a certain condition to leave the loop but this condition never appears.

Loops with a certain, fixed number of runs usually cannot behave like this but (for example) if a program keeps repeating calculations until one value gets very close to 0, rounding problems might cause that it never gets close enough to 0.

There are a bunch of other possible reasons. If the others cannot finish the workunit either, it can be helpful for the developers to receive that information, together with the workunit name, so they can test at which point it enters that endless loop and find a way out.

The solution might be a loop counter that says "If you tried 10000 times and it's still not close to 0, give up" or an improved condition for the loop end.
ID: 15251 · Report as offensive

Message boards : BOINC client : 1 of 2 cores running

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.