failed WU:s due dns error

Message boards : BOINC client : failed WU:s due dns error
Message board moderation

To post messages, you must log in.

AuthorMessage
hangya2

Send message
Joined: 4 Aug 11
Posts: 14
Sweden
Message 43083 - Posted: 17 Mar 2012, 16:41:18 UTC
Last modified: 17 Mar 2012, 17:03:22 UTC

I was running BOINC 7.xx, but this is true for any version.

For some reason, the internet connectione in my area has disturbances from now and then; the connection times out often, and there are short outages.

In these situations the boinc client starts the WU:s, runs them for some time, then ends with "computation error".
If i try to click on any button under one of these "outages", a message window says something on the lines of "communicating with the boinc client", and everything just freezez for some time. And then there may or may not be computation error.

These last weeks, many many hours of work has been wasted in computation error.

Some time ago, more than 200 days, it was pointed out to me that this was a known bug, that may require redesign of the "heartbeat" of BOINC or whatnot...

Are there plans to fix this?

(ps. english is my third language out of five, so do not complain, I am not Shakespeare :) )
ID: 43083 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 43084 - Posted: 17 Mar 2012, 17:37:50 UTC - in response to Message 43083.  

the WU:s

Which project's tasks is this with?
Can you give links to tasks with the computation error?

Freezes are normally due to hardware trouble, be it faulty RAM, an overheating computer (dust!) or other malformations.

And actually, it's not the heartbeat that's causing this. It is as you point out in the title, a DNS problem that is causing this. A very good post from Richard Haselgrove at Seti pointed this out:

In simple terms, it's in an area of networking called 'DNS' - Domain Name Service. Whenever we do anything over the internet - be it browsing the web, sending and receiving email, or crunching with BOINC, we like to refer to places by name. While I'm typing this, my browser says I'm talking to setiathome.berkeley.edu

But my computer isn't. My computer does this by number, and my computer thinks it's talking to 128.32.18.150

It's DNS which makes this happen. And because you really, really don't want to keep a list of every web server in the world on your computer (unless you're Google), your ISP provides a DNS service for you, as part of the package.

Whenever BOINC needs to call home, to report or request new work, it knows it has to contact setiboinc.ssl.berkeley.edu: so the first thing it has to do is to call directory enquiries (DNS) and get the number - 208.68.240.20

Yes, it would have been possible to write BOINC to use numbers from first principles, but it wouldn't have helped - if the line's down, you can't do anything with the number anyway. It's very rare for a line to be working, but DNS not to be available, so it's normal practice to use the readable form of web addresses, and DNS, all the time.

So when a line goes down, the very first that BOINC (or anything else) knows about it is when the DNS lookup fails. That's what gives you the message "can't resolve hostname" - that's right and proper, it gives you the information that DNS isn't available.


Seeing how you're the second person in a relatively short time pointing this out, I am now forwarding it to the developers. Something fun for their to do list. ;-)
ID: 43084 · Report as offensive
hangya2

Send message
Joined: 4 Aug 11
Posts: 14
Sweden
Message 43095 - Posted: 17 Mar 2012, 19:53:57 UTC - in response to Message 43084.  
Last modified: 17 Mar 2012, 19:56:02 UTC


Freezes are normally due to hardware trouble, be it faulty RAM, an overheating computer (dust!) or other malformations.

nonono, I use kingston ECC memory, also me and computer brush teeth together and such, so no dust :) (only the boinc gui froze for a time)

Which project's tasks is this with?
Can you give links to tasks with the computation error?

I run POEM and WCG. POEM did no seem to have many problems, but WCG did, alas, I do not know how to hardlink to my WU:s in the WCG account. Many WU:s for WCG ended with computation error, and some of the subprojects had so many download errors lately, that barely any WU:s finished.

But, here is an unbelievable situation, I just found out about it.
The realtek network driver used by default in Ubuntu 11.10 (and some other) linux distros is faulty; i had to downgrade to an older version...
http://ubuntuforums.org/showthread.php?t=1865436&page=3
some quotes from that thread:

The strange thing with kernel 3.0 and r8169 is, that sometimes it works, but most of the time, it doesnt.


@martin1969, it is not an Ubuntu problem, the problem is the r8169 driver in the Linux kernel. It affects many, if not all, distros. It has been on bug reports for several years. Why it is never fixed is beyond me.

Soooo, if the developers feel bored, and want to replicate my problems, just get hold of any board with realtek network adapter and driver r8169, and start crunching :D
ID: 43095 · Report as offensive
mr.larry

Send message
Joined: 6 Apr 12
Posts: 1
Finland
Message 43287 - Posted: 6 Apr 2012, 16:51:32 UTC

Hello

My team members has also seen these problems. We are running only WCG. Them are very common when using wireless mobile connections. Easiest way to get this is to load few days job to queue and then crunch without network connections. At some point jobs start to broke (computation error). It seems that this problem started at some point of BOINC version 6 which was found in Ubuntu 10.04 or earlier. But that is not Ubuntu specific it seems to happen with many distributions. We have not seen that in Windows version of BOINC.

Our team has discussed this at WCG forums.

I hope that this information helps fixing problem.
ID: 43287 · Report as offensive

Message boards : BOINC client : failed WU:s due dns error

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.