Posts by Andris Pavenis

1) Message boards : Questions and problems : Work units unnecesarily restarting when DNS resolution does not work (Message 25285)
Posted 8 Jun 2009 by Andris Pavenis
Post:
I have noticed the following behavior when DNS resolution fails (on Linux x86_64):

Work units begins to exit with status 0 and no finished file (see log file below). In case of CPDN with time it causes work unit failure.

boincmgr freezes for rather long times.

I suspect the problem is with communication between boinc client and project processes.

Is boinc client using synchronous DNS queries?

If so the project and client communication is missing while waiting for DNS timeout is DNS server does not answer and as far as I understand project process decides that boinc client is dead.

Here is an example of CPDN WU which failed in that way: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9057275

08-Jun-2009 07:19:16 [---] Starting BOINC client version 6.4.7 for x86_64-pc-linux-gnu
08-Jun-2009 07:19:16 [---] log flags: task, file_xfer, sched_ops
08-Jun-2009 07:19:16 [---] Libraries: libcurl/7.19.4 NSS/3.12.2.0 zlib/1.2.3 libidn/0.6.14 libssh2/0.18
08-Jun-2009 07:19:16 [---] Data directory: /var/lib/boinc
08-Jun-2009 07:19:16 [---] Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU @ 2.40GHz [Family 6 Model 15 Stepping 7]
08-Jun-2009 07:19:16 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr lahf_lm
08-Jun-2009 07:19:16 [---] OS: Linux: 2.6.27.24-170.2.68.fc10.x86_64
08-Jun-2009 07:19:16 [---] Memory: 3.87 GB physical, 22.90 GB virtual
08-Jun-2009 07:19:16 [---] Disk: 157.59 GB total, 106.63 GB free
08-Jun-2009 07:19:16 [---] Local time is UTC +3 hours
08-Jun-2009 07:19:16 [---] Not using a proxy
08-Jun-2009 07:19:16 [---] Can't load library libcudart
08-Jun-2009 07:19:16 [---] No coprocessors
08-Jun-2009 07:19:16 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 1099657; location: home; project prefs: default
08-Jun-2009 07:19:16 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 826910; location: home; project prefs: default
08-Jun-2009 07:19:16 [---] General prefs: from http://bam.boincstats.com/ (last modified 28-Mar-2009 23:42:25)
08-Jun-2009 07:19:16 [---] Host location: none
08-Jun-2009 07:19:16 [---] General prefs: using your defaults
08-Jun-2009 07:19:16 [---] Reading preferences override file
08-Jun-2009 07:19:16 [---] Preferences limit memory usage when active to 1982.26MB
08-Jun-2009 07:19:16 [---] Preferences limit memory usage when idle to 3568.07MB
08-Jun-2009 07:19:16 [---] Preferences limit disk usage to 10.00GB
08-Jun-2009 07:19:16 [Einstein@Home] Restarting task h1_0867.10_S5R4__630_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:16 [Einstein@Home] Restarting task h1_0867.10_S5R4__629_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [Einstein@Home] Restarting task h1_0867.10_S5R4__628_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [Einstein@Home] Restarting task h1_0867.10_S5R4__627_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:17 [climateprediction.net] Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
08-Jun-2009 07:19:59 [Einstein@Home] Task h1_0867.10_S5R4__630_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:19:59 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:19:59 [Einstein@Home] Restarting task h1_0867.10_S5R4__630_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:19:59 [climateprediction.net] Scheduler request failed: Couldn't resolve host name
08-Jun-2009 07:20:00 [Einstein@Home] Task h1_0867.10_S5R4__629_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:00 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:01 [Einstein@Home] Restarting task h1_0867.10_S5R4__629_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:02 [Einstein@Home] Task h1_0867.10_S5R4__628_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:02 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:02 [Einstein@Home] Restarting task h1_0867.10_S5R4__628_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:03 [Einstein@Home] Task h1_0867.10_S5R4__627_S5R5a_1 exited with zero status but no 'finished' file
08-Jun-2009 07:20:03 [Einstein@Home] If this happens repeatedly you may need to reset the project.
08-Jun-2009 07:20:03 [Einstein@Home] Restarting task h1_0867.10_S5R4__627_S5R5a_1 using einstein_S5R5 version 105
08-Jun-2009 07:20:04 [Einstein@Home] Sending scheduler request: To fetch work. Requesting 468757 seconds of work, reporting 3 completed tasks
2) Message boards : Questions and problems : WU misbehaving while starting BOINC client (Linux x86_64) (Message 19850)
Posted 30 Aug 2008 by Andris Pavenis
Post:
It sounds like a problem that occurred to a person on cpdn a few years ago, where the data in the slots folders got mixed up; the data for one wu is in the folder allocated to a different wu.



If so, what would be best way to fix it?

Simplest could be to finish all WU and then to start from scratch after cleaning
BOINC directory. Only thing is that finishing CPDN WUs would still take a rather long time. Could it be enough to detach from all projects except CPDN and then reattach?
3) Message boards : Questions and problems : WU misbehaving while starting BOINC client (Linux x86_64) (Message 19848)
Posted 30 Aug 2008 by Andris Pavenis
Post:
For a rather long time I have noticed the following behavior when starting BOINC client:

1) One WU sometimes gets the CPU time of another one (like Climate prediction WU time is set to one of WU of some other project). See for example:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8003980

2) Other problem is that WU sometimes crashes at startup. In case of Climate prediction I'm getting a message that WU have exited without result file and it restarts after that. The first problem seems to happen at the same time when this one. In case of SETI@home I'm getting a kernel message about SIGSEGV.
In several cases I have seen that when SETI@home WU crashes at this way, it gots also CPU time of another WU.

These things does not happen when client is up and running, only when starting BOINC client and sometimes when resuming the project.

I tried to suspend all projects before shutting down system. It did not help. For example I resumed at first Climate prediction project and after some short time also Einstein@Home. Climate prediction WU restarted (problem 2) when I resumed Einstein@home project.

All that does not seem to depend on BOINC version.

Some system related information:
Intel Core 2 Quad 2.4GHz, Fedora 9, x86_64.

Earlier I used BOINC package provided by Fedora 9 (5.10.45). Later I replaced it by 64 bit version of BOINC 6.2.14 and after that 6.2.15. Nothing changed, I'm still having these problems.




Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.