Work units unnecesarily restarting when DNS resolution does not work

Message boards : Questions and problems : Work units unnecesarily restarting when DNS resolution does not work
Message board moderation

To post messages, you must log in.

AuthorMessage
Andris Pavenis

Send message
Joined: 30 Aug 08
Posts: 3
Finland
Message 25285 - Posted: 8 Jun 2009, 4:46:12 UTC

I have noticed the following behavior when DNS resolution fails (on Linux x86_64):

Work units begins to exit with status 0 and no finished file (see log file below). In case of CPDN with time it causes work unit failure.

boincmgr freezes for rather long times.

I suspect the problem is with communication between boinc client and project processes.

Is boinc client using synchronous DNS queries?

If so the project and client communication is missing while waiting for DNS timeout is DNS server does not answer and as far as I understand project process decides that boinc client is dead.

Here is an example of CPDN WU which failed in that way: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9057275

08-Jun-2009 07:19:16 [---] Starting BOINC client version 6.4.7 for x86_64-pc-linux-gnu
08-Jun-2009 07:19:16 [---] log flags: task, file_xfer, sched_ops
08-Jun-2009 07:19:16 [---] Libraries: libcurl/7.19.4 NSS/3.12.2.0 zlib/1.2.3 libidn/0.6.14 libssh2/0.18
08-Jun-2009 07:19:16 [---] Data directory: /var/lib/boinc
08-Jun-2009 07:19:16 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
08-Jun-2009 07:19:16 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
08-Jun-2009 07:19:16 [---] OS: Linux: 2.6.27.24-170.2.68.fc10.x86_64
08-Jun-2009 07:19:16 [---] Memory: 3.87 GB physical, 22.90 GB virtual
08-Jun-2009 07:19:16 [---] Disk: 157.59 GB total, 106.63 GB free
08-Jun-2009 07:19:16 [---] Local time is UTC +3 hours
08-Jun-2009 07:19:16 [---] Not using a proxy
08-Jun-2009 07:19:16 [---] Can't load library libcudart
08-Jun-2009 07:19:16 [---] No coprocessors
08-Jun-2009 07:19:16 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1099657; location: home; project prefs: default
08-Jun-2009 07:19:16 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 826910; location: home; project prefs: default
08-Jun-2009 07:19:16 [---] General prefs: from http://bam.boincstats.com/ (last modified 28-Mar-2009 23:42:25)
08-Jun-2009 07:19:16 [---] Host location: none
08-Jun-2009 07:19:16 [---] General prefs: using your defaults
08-Jun-2009 07:19:16 [---] Reading preferences override file
08-Jun-2009 07:19:16 [---] Preferences limit memory usage when active to 1982.26MB
08-Jun-2009 07:19:16 [---] Preferences limit memory usage when idle to 3568.07MB
08-Jun-2009 07:19:16 [---] Preferences limit disk usage to 10.00GB
08-Jun-2009 07:19:16 [Einstein@Home] Restarting task h1_0867.10_S5R4__630_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:16 [Einstein@Home] Restarting task h1_0867.10_S5R4__629_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [Einstein@Home] Restarting task h1_0867.10_S5R4__628_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [Einstein@Home] Restarting task h1_0867.10_S5R4__627_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [climateprediction.net] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
08-Jun-2009 07:19:59 [Einstein@Home] Task h1_0867.10_S5R4__630_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:19:59 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:19:59 [Einstein@Home] Restarting task h1_0867.10_S5R4__630_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:59 [climateprediction.net] Scheduler request failed: Couldn't resolve host name
08-Jun-2009 07:20:00 [Einstein@Home] Task h1_0867.10_S5R4__629_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:00 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:01 [Einstein@Home] Restarting task h1_0867.10_S5R4__629_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:02 [Einstein@Home] Task h1_0867.10_S5R4__628_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:02 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:02 [Einstein@Home] Restarting task h1_0867.10_S5R4__628_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:03 [Einstein@Home] Task h1_0867.10_S5R4__627_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:03 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:03 [Einstein@Home] Restarting task h1_0867.10_S5R4__627_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:04 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 468757 seconds of work, reporting 3 completed tasks
ID: 25285 · Report as offensive

Message boards : Questions and problems : Work units unnecesarily restarting when DNS resolution does not work

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.