"Exit 0 status no finished file"

Message boards : BOINC Manager : "Exit 0 status no finished file"
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9175 - Posted: 28 Mar 2007, 19:30:10 UTC - in response to Message 9172.  
Last modified: 28 Mar 2007, 19:33:56 UTC

"Bunsen" wrote:
...
Hmm, no. For me, the problem recurred repeatedly all the time that I *wasn't* dialled in, roughly once per minute (since the manager repeated its connection attempt with a 1-minute delay). The problem *stopped* as soon as the connection was established. I don't recall seeing any occurrences of the problem between when I connected and when I logged off.


Nicolas found a bug which sounds like it could be the culprit:

http://boinc.berkeley.edu/dev/forum_thread.php?id=1632&nowrap=true#8819

"Nicolas" wrote:
There is a network-related problem that can cause this.

BOINC recently switched to using synchronous DNS resolving, in an attempt to workaround a DNS cache bug. That means the core client can't do anything while it's waiting for the DNS to respond; it's essentially hanged until it gets a reply. If the DNS server is not replying, for example, if your internet connection has problems, it takes a relatively long time (say 30 seconds) for it to finally give up. During this period, the science app can't communicate with the core client (as the core client is "hanged", it can't reply). It may quit with the error "No heartbeat from core client for 30 seconds, exiting".

When the core client finally gets either a reply from the DNS server, or a timeout, and starts being able to do other things, it notices the science applications had suddenly disappeared. So it gives the error "Task [name] exited with zero status but no 'finished' file. If this happens repeatedly you may need to reset the project." That's the part where the clueless user follows instructions, resets project, and makes the project lose a climate model, all because of a slow or non-working Internet connection!

Another problem this DNS thing causes is unresponsive manager. BOINC Manager has always used blocking I/O for GUI RPCs. That means the BOINC Manager can't do anything while it's waiting for the core client to respond; it's essentially hanged until it gets a reply. If the core client is hanged waiting for DNS, it can't respond to the manager, so the manager can't respond to mouseclicks. It all ends in getting a completely unresponsive GUI, all because of a slow or non-working Internet connection!

Summary: A chain of nasty events. To solve everything I point out on this message, a big lot of fixes would be needed.


Basically if there are network issues it can trash Boinc + if you're very unluckly also any work units you have running at the time. From the description the diagnostic symptom would be that the Manager appears to freeze for 30 seconds at a time.

Does it help if you 'suspend network' most of the time, and just do 'allow network' briefly once you know for certain that you're connected?

ID: 9175 · Report as offensive
Bunsen

Send message
Joined: 3 Mar 07
Posts: 15
Message 9204 - Posted: 30 Mar 2007, 2:29:10 UTC - in response to Message 9175.  


Does it help if you 'suspend network' most of the time, and just do 'allow network' briefly once you know for certain that you're connected?


I've been using version 5.4.11 since I got a clear idea of the nature of the problem with 5.8.x and saw how it was affecting my productivity. But I've upgraded to 5.8.15 again and I'll try this for a few days, with my automated periodic checking of the work-unit progress/status. Having to enable/disable network communication manually will be a bit of a pain, but if it can help to produce useful diagnostic information... <*shrug*>
ID: 9204 · Report as offensive
Bunsen

Send message
Joined: 3 Mar 07
Posts: 15
Message 9446 - Posted: 8 Apr 2007, 23:35:33 UTC - in response to Message 9204.  

Does it help if you 'suspend network' most of the time, and just do 'allow network' briefly once you know for certain that you're connected?


Okay, I've been doing this with 5.8.15 for the last week and a half or so. Having to enable/disable the network connectivity manually is indeed an annoyance, but I haven't had that error occur once in that time. It looks like you may have found the culprit.

I don't suppose that the problem is likely to be solved any time soon..?
ID: 9446 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9449 - Posted: 9 Apr 2007, 8:15:41 UTC


That depends on the Boinc developers. I hope it's sorted soon, this is potentially a big problem for the climate projects. Because they run for so long (300-3000 hours), the platform they're running on needs to be really reliable.

Probably a good idea to go back to 5.4.11 for the time being now that your problem has been pinpointed...
ID: 9449 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9469 - Posted: 9 Apr 2007, 18:18:17 UTC


(The above was referring to the networking-issues-causing-zero-exit problem rather than the time-sync-causing-zero-exit problem).

ID: 9469 · Report as offensive
Bunsen

Send message
Joined: 3 Mar 07
Posts: 15
Message 9730 - Posted: 20 Apr 2007, 22:52:58 UTC - in response to Message 9449.  

That depends on the Boinc developers. I hope it's sorted soon, this is potentially a big problem for the climate projects. Because they run for so long (300-3000 hours), the platform they're running on needs to be really reliable.


Has all of the relevant information been added to the bug database? I'm reluctant to go messing around in there myself, since I'm not involved in the development process.
ID: 9730 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9737 - Posted: 21 Apr 2007, 1:58:22 UTC - in response to Message 9730.  
Last modified: 21 Apr 2007, 1:58:46 UTC

That depends on the Boinc developers. I hope it's sorted soon, this is potentially a big problem for the climate projects. Because they run for so long (300-3000 hours), the platform they're running on needs to be really reliable.


Has all of the relevant information been added to the bug database? I'm reluctant to go messing around in there myself, since I'm not involved in the development process.



I'll check out the open tickets. I don't remember opening one up (as Jord and I were moving the open tickets from the old system to the new one) but that doesn't mean there's not one already open.

Please remember, BOINC is software enhanced by the community. So if you'd like to open a ticket, please go for it. You'll need to register for the Trac system. I think it's easier to use than the old BOINCZilla system.
Kathryn :o)
ID: 9737 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9744 - Posted: 21 Apr 2007, 9:25:13 UTC


I posted it onto the Boinc_dev email list a few days ago, no idea if anyone picked it up and put it into Trac.

ID: 9744 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9749 - Posted: 21 Apr 2007, 12:15:06 UTC - in response to Message 9744.  
Last modified: 21 Apr 2007, 12:28:38 UTC


I posted it onto the Boinc_dev email list a few days ago, no idea if anyone picked it up and put it into Trac.



I must have missed it when you posted because I don't remember reading that email.

:-)

I'll open a ticket highlighting the main points and then pointing back to this thread.

[edit]Done! Ticket #113[/edit]
Kathryn :o)
ID: 9749 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9755 - Posted: 21 Apr 2007, 13:43:33 UTC
Last modified: 21 Apr 2007, 13:56:24 UTC

'Twas this one, not surprising you missed it since it was in the middle of a thread (in reply to Tigher's intention to make 5.8.x mandatory for his project, which makes the workaround of downgrading to 5.4.x impossible).

http://www.ssl.berkeley.edu/pipermail/boinc_dev/2007-April/000082.html

Thanks for adding the item into Trac :-)
ID: 9755 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15489
Netherlands
Message 9756 - Posted: 21 Apr 2007, 14:03:57 UTC - in response to Message 9754.  

On a side note, with this 'Trac' (Is that something generic in American English?), was the CVS Check-in Notes discontinued? Nothing was added since April 16.

CVS was discontinued, the new thing to use is SVN.
For checkin-notes look here: http://boinc.berkeley.edu/trac/browser/trunk/boinc/checkin_notes
ID: 9756 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9871 - Posted: 23 Apr 2007, 21:10:42 UTC


ARGH, I knew this bug would be a problem for the climate projects. Just lost 3 climate models (my firewall crashed and as a result internet and localhost traffic was blocked for 12 hours). All 3 models died at the same time, about 2 hours after the firewall crash happened.

I've had the firewall crash rarely before, but without ill effect (that was with 5.4.x rather than 5.8.x).

ID: 9871 · Report as offensive
Bunsen

Send message
Joined: 3 Mar 07
Posts: 15
Message 19139 - Posted: 3 Aug 2008, 19:43:39 UTC
Last modified: 3 Aug 2008, 20:19:31 UTC

Has the DNS behaviour that seems to be the cause of this been fixed in 6.2? (It's in the database as http://boinc.berkeley.edu/trac/ticket/113 -- the item has been "modified" a couple of times in the last couple of months, but I haven't seen any changes in the item.)
ID: 19139 · Report as offensive
Thund3rb1rd

Send message
Joined: 17 Apr 08
Posts: 22
United States
Message 19566 - Posted: 18 Aug 2008, 14:56:18 UTC - in response to Message 19139.  

Has the DNS behaviour that seems to be the cause of this been fixed in 6.2? (It's in the database as http://boinc.berkeley.edu/trac/ticket/113 -- the item has been "modified" a couple of times in the last couple of months, but I haven't seen any changes in the item.)


No. I very recently upgraded to 6.2.16 and the error still exists.

Funny thing is, I wasn't receiving this error at all UNTIL version 6.2.16. I had never even seen this situation, either on my old dial-up connection OR with my current broadband connection.

I posted a message to the CPDN board and received the "mostly benign" answer.


ID: 19566 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15489
Netherlands
Message 19576 - Posted: 18 Aug 2008, 18:28:16 UTC - in response to Message 19566.  
Last modified: 18 Aug 2008, 18:28:50 UTC

No. I very recently upgraded to 6.2.16 and the error still exists.

Funny thing is, I wasn't receiving this error at all UNTIL version 6.2.16. I had never even seen this situation, either on my old dial-up connection OR with my current broadband connection.

I posted a message to the CPDN board and received the "mostly benign" answer.

That's because it mostly is. In the case of CPDN, it happens at times and it usually goes away by itself as well.

If it happens on other projects as well, we'd be interested to know about it.
Especially since you use a single CPU computer. (So it can't be the throttling in combination with multiple CPU bug)
ID: 19576 · Report as offensive
Previous · 1 · 2

Message boards : BOINC Manager : "Exit 0 status no finished file"

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.