hard lock with the boinc on linux

Message boards : Questions and problems : hard lock with the boinc on linux
Message board moderation

To post messages, you must log in.

AuthorMessage
Dmitry Morozhnikov

Send message
Joined: 18 Mar 10
Posts: 3
Russia
Message 31648 - Posted: 18 Mar 2010, 8:49:48 UTC

Hello.

I have tried to run boinc (with rosetta tasks) and finished some tasks. But since some moment I'm unable to run it anymore. After just a several seconds of run it lock my computer.

This is always reproducible. I can't switch to the console, can't login with ssh, can't even use Alt+PrScrn combinations. The latter might mean what the problem is in the kernel, but I'm unsure.

There is nothing suspicious in the logs. To be precise, logs look like (I have tried to run boinc today after several days):

… old records …
06-Mar-2010 04:27:35 [rosetta@home] Started upload of placestub_1zvy_1yzf_ppk_Pr
oteinInterfaceDesign_28Feb2010_18489_87_0_0
06-Mar-2010 04:27:38 [rosetta@home] Finished upload of placestub_1zvy_1yzf_ppk_P
roteinInterfaceDesign_28Feb2010_18489_87_0_0
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
… several kilobytes of crap created by hard reset …
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
18-Mar-2010 13:05:32 [---] Starting BOINC client version 6.10.18 for i686-pc-linux-gnu
18-Mar-2010 13:05:32 [---] log flags: file_xfer, sched_ops, task
18-Mar-2010 13:05:32 [---] Libraries: libcurl/7.19.7 GnuTLS/2.8.5 zlib/1.2.3 libidn/1.18
18-Mar-2010 13:05:32 [---] Data directory: /var/lib/boinc
18-Mar-2010 13:05:32 [---] Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU     E7300  @ 2.66GHz [Family 6 Model 23 Stepping 6]
18-Mar-2010 13:05:32 [---] Processor: 3.00 MB cache
18-Mar-2010 13:05:32 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm
18-Mar-2010 13:05:32 [---] OS: Linux: 2.6.33-gentoo
18-Mar-2010 13:05:32 [---] Memory: 1.97 GB physical, 2.41 GB virtual
18-Mar-2010 13:05:32 [---] Disk: 17.58 GB total, 4.15 GB free
18-Mar-2010 13:05:32 [---] Local time is UTC +8 hours
18-Mar-2010 13:05:32 [---] No usable GPUs found
18-Mar-2010 13:05:32 [---] Not using a proxy
… normal boinc messages …
18-Mar-2010 13:06:09 [rosetta@home] Restarting task lr5_rama09_mix01_it04_run01_A_rlbd_1nps_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18589_930_0 using minirosetta version 205
18-Mar-2010 13:06:09 [rosetta@home] Restarting task t329__boinc_filtered_loopbuild_threading_cst_all_tex_IGNORE_THE_REST_16902_7482_0 using minirosetta version 205


Boinc was installed with emerge. Version: 6.10.18. As you can see from the logs, kernel is 2.6.33 and processor is E7300. I'm using free drivers for ati, so GPU is unavailable for boinc.

Beside problems with boinc, I have no other stability problems with my computer. Right now it run compilation on the full load for several hours and processor temperature is stable at about 50°C.

The questions is:


  • Does anyone hear about the same problems?
  • Does boinc use real time scheduling? (The problem is looks like an improper use of RT scheduling for me)
  • What else can I do to locate the problem?



Thanks in advance.

ID: 31648 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 31649 - Posted: 18 Mar 2010, 9:36:12 UTC - in response to Message 31648.  

Just to make sure it isn't processor related, can you test with another project than Rosetta if it locks your computer?

Computer lock ups are usually attributed to something horrendous going wrong on the CPU or memory level. (Also think motherboard).
ID: 31649 · Report as offensive
Dmitry Morozhnikov

Send message
Joined: 18 Mar 10
Posts: 3
Russia
Message 31650 - Posted: 18 Mar 2010, 11:00:02 UTC - in response to Message 31649.  

That is look interesting!

As you suggested, i have tried POEM@HOME. So far so go. It run without problems at least for hour. But when i have tried to “Resume” Rosetta@HOME tasks, computer hang again.

By the way, I was wrong telling what Alt+PrScrn combinations does not work. They work, besides the fact what Alt+PrScrn+K is useless. So, kernel is (probably) ok at the lock time.

The another problem is what I'm run rosetta on two more computers (with Fedora12 and Ubuntu9.10) without any locks nor problems.

As far as I'm understand boinc init scripts provided with gentoo package, boinc is run not with root privileges, but with privileges of special user “boinc”, and should not get realtime scheduling rights nor other rights enough to hand on the system.

Don't know what to do next…
ID: 31650 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15483
Netherlands
Message 31651 - Posted: 18 Mar 2010, 11:17:16 UTC - in response to Message 31650.  

It could still be processor related at this point. I don't know whether POEM and Rosetta use the same method for calculations, integer or floating point. If you can figure that out (hint: ask at their forums), you may be on your way to a solution.

We usually ask people to test run prime95, another grid computing program that stresses the CPU extensively. If that can run without problems (it's doing floating point calculations), then there is no problem with the floating point capability of your CPU. If on the other hand it locks your computer as well....

Apropos of nothing, have you opened your computer lately to clean out the dust and grime inside? CPU fans and heat sinks clogged up with dust can cause similar problems.
ID: 31651 · Report as offensive
Dmitry Morozhnikov

Send message
Joined: 18 Mar 10
Posts: 3
Russia
Message 31663 - Posted: 19 Mar 2010, 12:18:48 UTC - in response to Message 31651.  
Last modified: 19 Mar 2010, 12:22:35 UTC

Well… I can tell only what everything is fine with prime95, with POEM@HOME and everything else. I'm using computer 27/7 with every imaginable load — compilation games, etc…

My box have memtest passed, btw.

Thank you very much anyway!

Will ask on the Rosetta forum.
ID: 31663 · Report as offensive

Message boards : Questions and problems : hard lock with the boinc on linux

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.