Thread 'BOINC client exits.'

Message boards : BOINC client : BOINC client exits.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 18234 - Posted: 5 Jul 2008, 3:22:57 UTC - in response to Message 18233.  

I do it so it comes up automatically when I boot the system.


Looks a lot like my system. :-)


I just restarted it and the error log is empty and the logfile is:


You'll need to get BOINC to actually crash before something gets written to the error log.

Those look like normal enough start up messages.

climate prediction is still suspended and the only work units that will get new tasks come from setiathome. Do you want me to enable other work units?


If the theory is that CPDN is crashing things, then you'll need to enable work fetch and see if you can get BOINC to crash.

Kathryn :o)
ID: 18234 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18242 - Posted: 5 Jul 2008, 12:02:29 UTC - in response to Message 18234.  

I do it so it comes up automatically when I boot the system.


Looks a lot like my system. :-)


Mine is a little sloppy for historical reasons. I ought to clean it up sometime, but not right now.

I just restarted it and the error log is empty and the logfile is:


You'll need to get BOINC to actually crash before something gets written to the error log.

Those look like normal enough start up messages.

climate prediction is still suspended and the only work units that will get new tasks come from setiathome. Do you want me to enable other work units?


If the theory is that CPDN is crashing things, then you'll need to enable work fetch and see if you can get BOINC to crash.


OK. It had the suspended one in the list, so I resumed it.
It has over 2000 hours to go, so I will let it run along with the setiathomes that will get new tasks. It is also finishing up a world community grid unit that has a little over 15 hours to go.

It now writes this to stderr (or stdout) -- anyway to my monitor window. It used to write this to the error log.??

Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time
Skipping: /mod_time
Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct
Skipping: 100.000000
Skipping: /max_ncpus_pct

It may not matter, but I wonder why it changed.
ID: 18242 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18244 - Posted: 5 Jul 2008, 13:10:46 UTC - in response to Message 18242.  

Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time
Skipping: /mod_time
Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct
Skipping: 100.000000
Skipping: /max_ncpus_pct

These are benign messages, telling that the application isn't build with the latest BOINC API. You can ignore it.
ID: 18244 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18245 - Posted: 5 Jul 2008, 13:20:21 UTC - in response to Message 18244.  

Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time
Skipping: /mod_time
Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct
Skipping: 100.000000
Skipping: /max_ncpus_pct

These are benign messages, telling that the application isn't build with the latest BOINC API. You can ignore it.


I know I can ignore it (even though I do not know which application is saying that).

What I am concerned about is that they used to appear in my error log, and now they do not. They appear on my terminal, as though the 2>>$ERRORLOG were not in my start-up script.
ID: 18245 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18246 - Posted: 5 Jul 2008, 13:44:51 UTC - in response to Message 18245.  

They are written to the stderr.txt file of the application that is running. This file is amongst thing of what's uploaded to the server when you upload your work. You can find it in the slot directory that the application runs in.
ID: 18246 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18247 - Posted: 5 Jul 2008, 14:33:14 UTC - in response to Message 18246.  

They are written to the stderr.txt file of the application that is running. This file is amongst thing of what's uploaded to the server when you upload your work. You can find it in the slot directory that the application runs in.


I can find those easily. Right now there are 4 slots used: 0, 1, 2, and 5.
Each has a stderr.txt file with stuff like that (and other stuff) in them.

But your comment does not explain why such messages come out on my screen, starting with when I put the 6.2.11 debug version in. Probably I "feature" I expect.
ID: 18247 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18291 - Posted: 8 Jul 2008, 14:33:44 UTC - in response to Message 18234.  


If the theory is that CPDN is crashing things, then you'll need to enable work fetch and see if you can get BOINC to crash.


I have had both my systems running boinc client 6.2.11 debug since late Sunday evening 24/7. They are running all my usual applications inclucing climate prediction things. No crashes of the boinc client yet.
ID: 18291 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18292 - Posted: 8 Jul 2008, 14:42:05 UTC - in response to Message 18291.  

Well, that is good news and bad news. Good news for you of course, but bad news if the developers try to figure out why it was crashing in the first place. Do post back when the client crashes again. Eagerly awaiting. Fingers crossed. (for bad news for once). :-))
ID: 18292 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18312 - Posted: 9 Jul 2008, 10:42:42 UTC - in response to Message 18292.  
Last modified: 9 Jul 2008, 10:44:05 UTC

Well, that is good news and bad news. Good news for you of course, but bad news if the developers try to figure out why it was crashing in the first place. Do post back when the client crashes again. Eagerly awaiting. Fingers crossed. (for bad news for once). :-))


I think of it as no news. Recall that it did not crash for years, using various versions of boinc client. then a little while ago, it started crashing with 5.10.45 after that had run for months (?) without crashing; since it first came out. Then it crashed a few times, then it would run a week without crashing, then the other machine, also running 5.10.45, crashed a couple of times. So the fact that 6.2.11 runs for a little over two days does not tell us much.

N.B.: the machine did not crash. The OS did not crash, just the boinc client and all its children.
ID: 18312 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18351 - Posted: 10 Jul 2008, 22:00:23 UTC - in response to Message 18292.  

Well, that is good news and bad news. Good news for you of course, but bad news if the developers try to figure out why it was crashing in the first place. Do post back when the client crashes again. Eagerly awaiting. Fingers crossed. (for bad news for once). :-))


OK, my old machine just had boinc client 6.2.11 debug and its children crash. The valinuxl.error.log file, in its entirety, said:

SIGSEGV: segmentation violation
Stack trace (2 frames):
/boinc/BOINC/boinc[0x80918c2]
/lib/tls/libc.so.6[0x138908]

Exiting...

The last few lines of valinuxl.boinc.log are:

10-Jul-2008 17:17:02 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13056 seconds of work, reporting 0 completed tasks
10-Jul-2008 17:17:07 [Hydrogen@Home] Scheduler request succeeded: got 1 new tasks
10-Jul-2008 17:17:09 [Hydrogen@Home] Started download of 1215724615_nsc45788.mol2
10-Jul-2008 17:17:09 [Hydrogen@Home] Started download of 1215724615_pdb1bxs.pdb
10-Jul-2008 17:17:10 [Hydrogen@Home] Finished download of 1215724615_nsc45788.mol2
10-Jul-2008 17:17:10 [Hydrogen@Home] Started download of ad_nsc45788.mol2_pdb1bxs.pdb_1215724615
10-Jul-2008 17:17:11 [Hydrogen@Home] Finished download of ad_nsc45788.mol2_pdb1bxs.pdb_1215724615
10-Jul-2008 17:17:13 [Hydrogen@Home] Finished download of 1215724615_pdb1bxs.pdb
10-Jul-2008 17:17:17 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 12252 seconds of work, reporting 0 completed tasks
10-Jul-2008 17:17:22 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
10-Jul-2008 17:17:22 [Hydrogen@Home] Message from server: No work sent
10-Jul-2008 17:18:23 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 12252 seconds of work, reporting 0 completed tasks
10-Jul-2008 17:18:28 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
10-Jul-2008 17:18:28 [Hydrogen@Home] Message from server: No work sent
10-Jul-2008 17:19:28 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13057 seconds of work, reporting 1 completed tasks
10-Jul-2008 17:19:33 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
10-Jul-2008 17:19:33 [Hydrogen@Home] Message from server: No work sent

Actually, I mis-spoke: some boinc processes did not die:

valinuxl:boinc[~]$ ps -fu boinc
UID PID PPID C STIME TTY TIME CMD
boinc 6967 6965 0 06:01 ? 00:00:54 sshd: boinc@pts/1
boinc 6968 6967 0 06:01 pts/1 00:00:00 -bash
boinc 7550 6968 0 17:51 pts/1 00:00:00 ps -fu boinc
boinc 14325 1 1 Jul04 ? 01:43:12 wcg_faah_autodock_6.05_i686-pc-linux-gnu -dpf faah4143_AB3_MIN3_xmd06240_02.dpf -gpf AB3_MIN3_xmd06240_02.gpf
boinc 14327 14325 0 Jul04 ? 00:00:00 wcg_faah_autodock_6.05_i686-pc-linux-gnu -dpf faah4143_AB3_MIN3_xmd06240_02.dpf -gpf AB3_MIN3_xmd06240_02.gpf

sshd is my connection from my main machine.
The two wcg_faah_autodock processes are work units for World Community Grid. I killed those two wcg processes and restarted the boinc client. There was only one wcg task in the list, and it was running. It was a dddt, not faah.

Was this what you expected to find? I thought the debug version of 6.2.11 boinc-clent would put out more than the production version of 5.10.45. Did I run it wrong of something?

ID: 18351 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18356 - Posted: 11 Jul 2008, 0:49:55 UTC - in response to Message 18351.  

I've forwarded it to the developers. Perhaps one of them comes to this thread to ask his own questions, else I'll find something in my email in the morning to ask you. ;-)
ID: 18356 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18369 - Posted: 12 Jul 2008, 0:50:37 UTC - in response to Message 18292.  

Well, that is good news and bad news. Good news for you of course, but bad news if the developers try to figure out why it was crashing in the first place. Do post back when the client crashes again. Eagerly awaiting. Fingers crossed. (for bad news for once). :-))


Here we go again. While I was at dinner, boinc client on my other machine quit.
The errorlog contained this in its entirety:

$ cat trillian.error.log
SIGSEGV: segmentation violation
Stack trace (2 frames):
/home/boinc/BOINC/boinc[0x80918c2]
[0xe6d420]

Exiting...

The boinclog contained this at the end:

11-Jul-2008 19:45:19 [rosetta@home] Computation for task t464_1_CASP8_1_t464_1_T0464_1RLYAIGNORE_THE_REST_4_4218_32_0 finished
11-Jul-2008 19:45:19 [World Community Grid] Resuming task dddt0602c0128_100442_1 using dddt version 605
11-Jul-2008 19:45:21 [rosetta@home] Started upload of t464_1_CASP8_1_t464_1_T0464_1RLYAIGNORE_THE_REST_4_4218_32_0_0
11-Jul-2008 19:45:21 [World Community Grid] Sending scheduler request: To fetch work. Requesting 7461 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:45:23 [rosetta@home] Finished upload of t464_1_CASP8_1_t464_1_T0464_1RLYAIGNORE_THE_REST_4_4218_32_0_0
11-Jul-2008 19:45:26 [World Community Grid] Scheduler request succeeded: got 1 new tasks
11-Jul-2008 19:45:28 [World Community Grid] Started download of batch00045_R00045_3f39bc8c34bde0d2ca28d76f3372ea9e.sequence
11-Jul-2008 19:45:28 [World Community Grid] Started download of batch00045_R00045_3f39bc8c34bde0d2ca28d76f3372ea9e.dist
11-Jul-2008 19:45:29 [World Community Grid] Finished download of batch00045_R00045_3f39bc8c34bde0d2ca28d76f3372ea9e.sequence
11-Jul-2008 19:45:30 [World Community Grid] Finished download of batch00045_R00045_3f39bc8c34bde0d2ca28d76f3372ea9e.dist
11-Jul-2008 19:45:46 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 24929 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:45:52 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
11-Jul-2008 19:45:52 [Hydrogen@Home] Message from server: No work sent
11-Jul-2008 19:46:52 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 24925 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:46:57 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
11-Jul-2008 19:46:57 [Hydrogen@Home] Message from server: No work sent
11-Jul-2008 19:49:07 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 24920 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:49:12 [Hydrogen@Home] Scheduler request succeeded: got 2 new tasks
11-Jul-2008 19:49:14 [Hydrogen@Home] Started download of 1215820133_nsc36877.mol2
11-Jul-2008 19:49:14 [Hydrogen@Home] Started download of 1215820133_pdb1uwq.pdb
11-Jul-2008 19:49:15 [Hydrogen@Home] Finished download of 1215820133_nsc36877.mol2
11-Jul-2008 19:49:15 [Hydrogen@Home] Started download of ad_nsc36877.mol2_pdb1uwq.pdb_1215820133
11-Jul-2008 19:49:17 [Hydrogen@Home] Finished download of 1215820133_pdb1uwq.pdb
11-Jul-2008 19:49:17 [Hydrogen@Home] Finished download of ad_nsc36877.mol2_pdb1uwq.pdb_1215820133
11-Jul-2008 19:49:17 [Hydrogen@Home] Started download of 1215820134_nsc46696.mol2
11-Jul-2008 19:49:17 [Hydrogen@Home] Started download of 1215820134_pdb1bxs.pdb
11-Jul-2008 19:49:18 [Hydrogen@Home] Finished download of 1215820134_nsc46696.mol2
11-Jul-2008 19:49:18 [Hydrogen@Home] Started download of ad_nsc46696.mol2_pdb1bxs.pdb_1215820134
11-Jul-2008 19:49:19 [Hydrogen@Home] Finished download of 1215820134_pdb1bxs.pdb
11-Jul-2008 19:49:19 [Hydrogen@Home] Finished download of ad_nsc46696.mol2_pdb1bxs.pdb_1215820134
11-Jul-2008 19:49:23 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 26948 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:49:28 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
11-Jul-2008 19:49:28 [Hydrogen@Home] Message from server: No work sent
11-Jul-2008 19:50:28 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 26948 seconds of work, reporting 0 completed tasks
11-Jul-2008 19:50:33 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
11-Jul-2008 19:50:33 [Hydrogen@Home] Message from server: No work sent
11-Jul-2008 19:51:33 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27298 seconds of work, reporting 1 completed tasks
11-Jul-2008 19:51:38 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
11-Jul-2008 19:51:38 [Hydrogen@Home] Message from server: No work sent

The dates of these two files are almost the same:

]$ ls -l trillian.*.log
-rw-r----- 1 boinc boinc 424688 Jul 11 19:51 trillian.boinc.log
-rw-r----- 1 boinc boinc 114 Jul 11 19:52 trillian.error.log

The system's temperatures, etc., were normal:
Fri Jul 11 19:45:01 EDT 2008
w83627hf-isa-0290
Adapter: ISA adapter
VCore: +1.44 V (min = +1.36 V, max = +1.47 V)
+3.3V: +3.31 V (min = +3.14 V, max = +3.46 V)
VBat: +3.18 V (min = +2.40 V, max = +3.60 V)
+5V: +4.92 V (min = +4.84 V, max = +5.24 V)
+12V: +11.92 V (min = +11.49 V, max = +12.59 V)
-12V: -11.78 V (min = -13.02 V, max = -11.37 V)
V5SB: +5.43 V (min = +4.84 V, max = +5.24 V)
CPU0 fan: 4688 RPM (min = 1592 RPM, div = 8)
CPU1 fan: 3013 RPM (min = 1592 RPM, div = 8)
System: +49 C (high = +50 C, hyst = +48 C) sensor = thermistor
CPU0: +56.0 C (high = +60 C, hyst = +58 C) sensor = thermistor
CPU1: +53.5 C (high = +60 C, hyst = +58 C) sensor = thermistor
vid: +1.525 V (VRM Version 9.0)

(According to Intel, these processors should not exceed 70C, and they never have.)

I looked at the addresses where these things have crashed. They seem to be from a limited choice of addresses, and both machines use some of the same ones. From this I infer that it is not just random crashes but software ones. These machines have similar operating systems; one is Red Hat Enterprise Linux 5.2 and the other is CentOS 4.something (the latest). Both have two physical Intel processors although the newer machine has way more memory and the processors on it are hyperthreaded, so you could say it has four processors.

Those addresses come in pairs

Machine T
0x80918c2 0xe6d420
0x808e90a 0x679420
0x80918c2 0xe6d420

Machine V
0x808e90a 0xa4e908
0x808e90a 0xa4e908
0x80918c2 0x138908

ID: 18369 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18378 - Posted: 12 Jul 2008, 11:14:22 UTC - in response to Message 18369.  

Here we go again. While I was at asleep this early morning, boinc client on my V machine quit. The errorlog contained this in its entirety:

$ cat valinuxl.error.log
SIGSEGV: segmentation violation
Stack trace (2 frames):
/boinc/BOINC/boinc[0x80918c2]
/lib/tls/libc.so.6[0x138908]

Exiting...
SIGSEGV: segmentation violation
Stack trace (2 frames):
/boinc/BOINC/boinc[0x80918c2]
/lib/tls/libc.so.6[0xa4e908]

The first one I reported Sunday (IIRC). The second one happened this morning:

$ ls -l valinuxl.*.log
-rw-r----- 1 boinc boinc 373104 Jul 12 03:52 valinuxl.boinc.log
-rw-r----- 1 boinc boinc 254 Jul 12 03:53 valinuxl.error.log

The last few lines of the boinc.log file are:

12-Jul-2008 03:02:20 [malariacontrol.net] Resuming task wu_126_158_140200_0_1215817740_1 using malariacontrol version 557
12-Jul-2008 03:02:22 [Hydrogen@Home] Started upload of nsc2560.mol2_pdb1lo8.pdb_1215845093_1_0
12-Jul-2008 03:02:26 [Hydrogen@Home] Finished upload of nsc2560.mol2_pdb1lo8.pdb_1215845093_1_0
12-Jul-2008 03:06:18 [Hydrogen@Home] Sending scheduler request: To report completed tasks. Requesting 0 seconds of work, reporting 1 completed tasks
12-Jul-2008 03:06:21 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:36:38 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13072 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:36:43 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:36:43 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:37:43 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13072 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:37:48 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:37:48 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:38:48 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13072 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:38:53 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:38:53 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:39:54 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13072 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:39:59 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:39:59 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:40:59 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13072 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:41:04 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:41:04 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:42:54 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13073 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:43:00 [Hydrogen@Home] Scheduler request succeeded: got 1 new tasks
12-Jul-2008 03:43:02 [Hydrogen@Home] Started download of 1215848565_nsc2559.mol2
12-Jul-2008 03:43:02 [Hydrogen@Home] Started download of 1215848565_pdb1e0e.pdb
12-Jul-2008 03:43:05 [Hydrogen@Home] Finished download of 1215848565_nsc2559.mol2
12-Jul-2008 03:43:05 [Hydrogen@Home] Started download of ad_nsc2559.mol2_pdb1e0e.pdb_1215848565
12-Jul-2008 03:43:06 [Hydrogen@Home] Finished download of ad_nsc2559.mol2_pdb1e0e.pdb_1215848565
12-Jul-2008 03:43:09 [Hydrogen@Home] Finished download of 1215848565_pdb1e0e.pdb
12-Jul-2008 03:43:11 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 12688 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:43:16 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:43:16 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:44:16 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 12688 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:44:21 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:44:21 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:45:22 [Hydrogen@Home] Sending scheduler request: To report completed tasks. Requesting 13074 seconds of work, reporting 1 completed tasks
12-Jul-2008 03:45:27 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:45:27 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:46:27 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13074 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:46:32 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:46:32 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:47:32 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13074 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:47:37 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:47:37 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:49:43 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13074 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:49:48 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:49:48 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 03:52:24 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 13075 seconds of work, reporting 0 completed tasks
12-Jul-2008 03:52:29 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 03:52:29 [Hydrogen@Home] Message from server: No work sent

I looked at the addresses where these things have crashed. They seem to be from a limited choice of addresses, and both machines use some of the same ones. From this I infer that it is not just random crashes but software ones. These machines have similar operating systems; one is Red Hat Enterprise Linux 5.2 and the other is CentOS 4.something (the latest). Both have two physical Intel processors although the newer machine has way more memory and the processors on it are hyperthreaded, so you could say it has four processors.

Those addresses come in pairs

Machine T
0x80918c2 0xe6d420
0x808e90a 0x679420
0x80918c2 0xe6d420

Machine V
0x808e90a 0xa4e908
0x808e90a 0xa4e908
0x80918c2 0x138908


This machine is Machine V, and the segmentation violation took place at:

0x80918c2 0xa4e908

as before. This seems to be in library: /lib/tls/libc.so.6 On Machine T, it does not seem to tell me where the segmentation violation is by name, only by address.

Here is something from machine V:

$ ldd boinc.unmodified
libz.so.1 => /usr/lib/libz.so.1 (0x00c71000)
libdl.so.2 => /lib/libdl.so.2 (0x00b55000)
libc.so.6 => /lib/tls/libc.so.6 (0x00a27000)
libm.so.6 => /lib/tls/libm.so.6 (0x00b5b000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0x00d00000)
/lib/ld-linux.so.2 (0x00a0d000)

$ rpm -qf /lib/tls/libc.so.6
glibc-2.3.4-2.39


and here is the same thing from machine T:

$ ldd boinc.unmodified
linux-gate.so.1 => (0x005b0000)
libz.so.1 => /usr/lib/libz.so.1 (0x00614000)
libdl.so.2 => /lib/libdl.so.2 (0x005f5000)
libc.so.6 => /lib/libc.so.6 (0x00110000)
libm.so.6 => /lib/libm.so.6 (0x005cc000)
libpthread.so.0 => /lib/libpthread.so.0 (0x005fb000)
/lib/ld-linux.so.2 (0x00469000)


ID: 18378 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18382 - Posted: 12 Jul 2008, 15:46:53 UTC - in response to Message 18378.  

Machine T just had boinc-client fail again.

12-Jul-2008 11:31:30 [malariacontrol.net] Computation for task wu_119_507_140190_0_1215816366_0 finished
12-Jul-2008 11:31:30 [rosetta@home] Resuming task h001__BOINC_CASP8_ABRELAX_KILLHAIRPINS-h001_-t482__4208_29175_0 using rosetta_beta version 598
12-Jul-2008 11:31:32 [malariacontrol.net] Started upload of wu_119_507_140190_0_1215816366_0_0
12-Jul-2008 11:31:34 [malariacontrol.net] Finished upload of wu_119_507_140190_0_1215816366_0_0
12-Jul-2008 11:32:59 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 25875 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:33:04 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 11:33:04 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 11:34:04 [Hydrogen@Home] Fetching scheduler list
12-Jul-2008 11:34:09 [Hydrogen@Home] Master file download succeeded
12-Jul-2008 11:34:14 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 25870 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:34:19 [Hydrogen@Home] Scheduler request succeeded: got 1 new tasks
12-Jul-2008 11:34:21 [Hydrogen@Home] Started download of 1215876846_nsc35938.mol2
12-Jul-2008 11:34:21 [Hydrogen@Home] Started download of 1215876846_pdb1od0.pdb
12-Jul-2008 11:34:22 [Hydrogen@Home] Finished download of 1215876846_nsc35938.mol2
12-Jul-2008 11:34:22 [Hydrogen@Home] Started download of ad_nsc35938.mol2_pdb1od0.pdb_1215876846
12-Jul-2008 11:34:24 [Hydrogen@Home] Finished download of 1215876846_pdb1od0.pdb
12-Jul-2008 11:34:24 [Hydrogen@Home] Finished download of ad_nsc35938.mol2_pdb1od0.pdb_1215876846
12-Jul-2008 11:34:30 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27347 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:34:35 [Hydrogen@Home] Scheduler request succeeded: got 1 new tasks
12-Jul-2008 11:34:37 [Hydrogen@Home] Started download of 1215812579_nsc21027.mol2
12-Jul-2008 11:34:37 [Hydrogen@Home] Started download of 1215812579_pdb1od0.pdb
12-Jul-2008 11:34:38 [Hydrogen@Home] Finished download of 1215812579_nsc21027.mol2
12-Jul-2008 11:34:38 [Hydrogen@Home] Started download of ad_nsc21027.mol2_pdb1od0.pdb_1215812579
12-Jul-2008 11:34:39 [Hydrogen@Home] Finished download of 1215812579_pdb1od0.pdb
12-Jul-2008 11:34:39 [Hydrogen@Home] Finished download of ad_nsc21027.mol2_pdb1od0.pdb_1215812579
12-Jul-2008 11:34:45 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27046 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:34:50 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 11:34:50 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 11:35:50 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27046 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:35:55 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 11:35:55 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 11:36:55 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27347 seconds of work, reporting 1 completed tasks
12-Jul-2008 11:37:00 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 11:37:00 [Hydrogen@Home] Message from server: No work sent
12-Jul-2008 11:38:00 [Hydrogen@Home] Sending scheduler request: To fetch work. Requesting 27347 seconds of work, reporting 0 completed tasks
12-Jul-2008 11:38:05 [Hydrogen@Home] Scheduler request succeeded: got 0 new tasks
12-Jul-2008 11:38:05 [Hydrogen@Home] Message from server: No work sent

$ cat trillian.error.log
SIGSEGV: segmentation violation
Stack trace (2 frames):
/home/boinc/BOINC/boinc[0x80918c2]
[0xcdb420]

While I do not see hydrogen@home applications start before the crashes, it seems to be a pattern that that task downloads soon before. I am going to let the boinc client run but to fetch no more hydrogen@home work units be fetched.
ID: 18382 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18385 - Posted: 12 Jul 2008, 17:33:51 UTC - in response to Message 18383.  

Excuse my complete ignorance but what is trillian.error.log doing in a BOINC log? Googling this only ever shows up on Seti forum and now here.

Used to use an alternate msnmgr called Trillian, so is this something u use and would you mind running BOINC if so without this program active?


Sorry.

I have two computers and one is named trillian and the other is named valinuxl when it is running Linux (and it is named valinuxw when running Windows).

trillian.error.log is the error log (stderr) the boinc client writes on the trillian machine. trillian.boinc.log is the normal log (stdout) the boinc client writes on the trillian machine.

Similarly for the valinuxl machine.
ID: 18385 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18386 - Posted: 12 Jul 2008, 17:55:28 UTC - in response to Message 18385.  

In the mean time, I've forwarded all information you've given to the developers. I am now waiting for one of them to tell me what to ask you, or him to dive into the thread all by himself. Whichever comes first. ;-)

ID: 18386 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18387 - Posted: 12 Jul 2008, 18:10:55 UTC - in response to Message 18386.  

In the mean time, I've forwarded all information you've given to the developers. I am now waiting for one of them to tell me what to ask you, or him to dive into the thread all by himself. Whichever comes first. ;-)


Should I continue to post crash data, or do you agree that I have posted enough of them until developers have suggestions.

Just a reminder: both machines are running boinc client 6.2.11 debug version (although I see no difference between it and the 5.10.45 non-debug version I used to run).
ID: 18387 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 18388 - Posted: 12 Jul 2008, 18:20:15 UTC - in response to Message 18387.  

Should I continue to post crash data, or do you agree that I have posted enough of them until developers have suggestions.

What seems to be missing is complete stacktraces, but that's why I am waiting for Rom to show up.
ID: 18388 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 18389 - Posted: 12 Jul 2008, 18:44:39 UTC

I'm not remembering the whole thread (and it's way too early/late - depending on how you look at it - to go back through it).

Is there something that seems to trigger the crash? If there is, you could always run it under gdb. I've only done it once, and with help, so I'm not any use beyond the suggestion.
Kathryn :o)
ID: 18389 · Report as offensive
Jean-David

Send message
Joined: 19 Dec 05
Posts: 93
United States
Message 18392 - Posted: 12 Jul 2008, 20:48:45 UTC - in response to Message 18389.  

I'm not remembering the whole thread (and it's way too early/late - depending on how you look at it - to go back through it).

Is there something that seems to trigger the crash? If there is, you could always run it under gdb. I've only done it once, and with help, so I'm not any use beyond the suggestion.


Normally, I sure would not wish to do that. I see that the boinc client shipped with a symbol table, which is good. So if someone would tell me how to setup the /etc/rc.d/boinc file to run it under gdb and capture the segmentation fault and get a trace of things, I guess I could do it, provided the output of gdb could be sent to a file or files.

It seems, but only seems, that it crashes after a hydrogen@home work unit has been downloaded. But I cannot believe downloading a work unit would do it. It seems, but only seems, that the hydrogen@home work unit does not get started. But this could be because it starts and causes the crash before the boinc client gets around to logging that it started. Many hydrogen@home work units complete, many of them successfully.
ID: 18392 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : BOINC client : BOINC client exits.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.