World Community Grid Tasks Crash my Machine

Message boards : Questions and problems : World Community Grid Tasks Crash my Machine
Message board moderation

To post messages, you must log in.

AuthorMessage
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30861 - Posted: 6 Feb 2010, 22:13:43 UTC

World Community Grid keeps crashing my machine. The machine hangs and won't return. Have to reboot. This is a new problem, meaning recent. I had to suspend a WCD task last week for the same issue.

I've checked the ActiveX, etc. All are good.

The current project is Help Cure MD Phase 2 6.14. I haven't had any problems before and Rosetta@Home works fine. I believe this is the task that was failing:
CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1_0

I've suspended WCD and Boinc is working fine again with Rosetta@Home.

If you let me know what log I should send I will.

Thanks

ID: 30861 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30868 - Posted: 7 Feb 2010, 0:46:45 UTC - in response to Message 30866.  

Here's the output, but I regen-ed it b/c it went back to August. I have plenty of mem and disk space and I'm running fine with Rosetta@Home. I've been using this same machine with Rosetta and WCG for at least a couple years. This is the third time in the last 2 weeks that a WCG task has crashed this machine.

06-Feb-2010 19:42:53 [---] Starting BOINC client version 6.10.18 for windows_intelx86
06-Feb-2010 19:42:53 [---] log flags: file_xfer, sched_ops, task
06-Feb-2010 19:42:53 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
06-Feb-2010 19:42:53 [---] Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
06-Feb-2010 19:42:53 [---] Running under account Eric
06-Feb-2010 19:42:53 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2800+ [x86 Family 6 Model 10 Stepping 0]
06-Feb-2010 19:42:53 [---] Processor: 512.00 KB cache
06-Feb-2010 19:42:53 [---] Processor features: fpu tsc sse 3dnow mmx
06-Feb-2010 19:42:53 [---] OS: Microsoft Windows XP: Home x86 Edition, Service Pack 3, (05.01.2600.00)
06-Feb-2010 19:42:53 [---] Memory: 2.00 GB physical, 4.35 GB virtual
06-Feb-2010 19:42:53 [---] Disk: 74.52 GB total, 38.77 GB free
06-Feb-2010 19:42:53 [---] Local time is UTC -5 hours
06-Feb-2010 19:42:53 [---] No usable GPUs found
06-Feb-2010 19:42:53 [---] Not using a proxy
06-Feb-2010 19:42:54 [rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 372014; resource share 90
06-Feb-2010 19:42:54 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 91106; resource share 90
06-Feb-2010 19:42:54 [rosetta@home] General prefs: from rosetta@home (last modified 13-Dec-2006 10:29:55)
06-Feb-2010 19:42:54 [rosetta@home] Host location: none
06-Feb-2010 19:42:54 [rosetta@home] General prefs: using your defaults
06-Feb-2010 19:42:54 [---] Reading preferences override file
06-Feb-2010 19:42:54 [---] Preferences limit memory usage when active to 1023.74MB
06-Feb-2010 19:42:54 [---] Preferences limit memory usage when idle to 1842.74MB
06-Feb-2010 19:42:54 [---] Preferences limit disk usage to 25.00GB
BOINC initialization completed, beginning process execution...
06-Feb-2010 19:42:57 [rosetta@home] Restarting task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 using minirosetta version 205
ID: 30868 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30869 - Posted: 7 Feb 2010, 0:51:33 UTC - in response to Message 30868.  

I will get you the output when it hangs again. I turned off WCG for now... What file is the start up message log of the client (about first 30 lines)?
ID: 30869 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 30872 - Posted: 7 Feb 2010, 1:27:22 UTC - in response to Message 30869.  

You already found it. What you posted is what Sekerob asked for.

What is missing are the lines around a hang of your machine. You don't need to wait for a new one if you know the (approx.) time of the last hang.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)
ID: 30872 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30874 - Posted: 7 Feb 2010, 1:56:53 UTC - in response to Message 30872.  

I'm not sure if this has any informative data. The computer hung around 20:35. Doesn't take any input from the keyboard or mouse. I have to do a hard reboot (on/off switch). The system started up around 20:47 and that's when I suspended WCG again and switched back to Rosetta.

06-Feb-2010 19:55:34 [World Community Grid] resumed by user
06-Feb-2010 19:55:35 [World Community Grid] Sending scheduler request: Requested by project.
06-Feb-2010 19:55:35 [World Community Grid] Not reporting or requesting tasks
06-Feb-2010 19:55:41 [World Community Grid] Scheduler request completed
06-Feb-2010 19:55:43 [World Community Grid] task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 resumed by user
06-Feb-2010 19:56:02 [rosetta@home] update requested by user
06-Feb-2010 19:56:06 [rosetta@home] Sending scheduler request: Requested by user.
06-Feb-2010 19:56:06 [rosetta@home] Reporting 1 completed tasks, not requesting new tasks
06-Feb-2010 19:56:12 [rosetta@home] Scheduler request completed
06-Feb-2010 19:56:46 [---] Resuming computation
06-Feb-2010 19:57:15 [---] Suspending computation - user request
06-Feb-2010 19:57:52 [---] Resuming computation
06-Feb-2010 20:26:14 [rosetta@home] task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 suspended by user
06-Feb-2010 20:26:15 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614
06-Feb-2010 20:47:30 [---] Starting BOINC client version 6.10.18 for windows_intelx86
06-Feb-2010 20:47:30 [---] log flags: file_xfer, sched_ops, task
06-Feb-2010 20:47:30 [---] Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
06-Feb-2010 20:47:30 [---] Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
06-Feb-2010 20:47:30 [---] Running under account Eric
06-Feb-2010 20:47:30 [---] Processor: 1 AuthenticAMD AMD Athlon(tm) XP 2800+ [x86 Family 6 Model 10 Stepping 0]
06-Feb-2010 20:47:30 [---] Processor: 512.00 KB cache
06-Feb-2010 20:47:30 [---] Processor features: fpu tsc sse 3dnow mmx
06-Feb-2010 20:47:30 [---] OS: Microsoft Windows XP: Home x86 Edition, Service Pack 3, (05.01.2600.00)
06-Feb-2010 20:47:30 [---] Memory: 2.00 GB physical, 4.35 GB virtual
06-Feb-2010 20:47:30 [---] Disk: 74.52 GB total, 38.78 GB free
06-Feb-2010 20:47:30 [---] Local time is UTC -5 hours
06-Feb-2010 20:47:30 [---] No usable GPUs found
06-Feb-2010 20:47:31 [---] Not using a proxy
06-Feb-2010 20:47:31 [rosetta@home] URL http://boinc.bakerlab.org/rosetta/; Computer ID 372014; resource share 90
06-Feb-2010 20:47:31 [World Community Grid] URL http://www.worldcommunitygrid.org/; Computer ID 91106; resource share 90
06-Feb-2010 20:47:32 [rosetta@home] General prefs: from rosetta@home (last modified 13-Dec-2006 10:29:55)
06-Feb-2010 20:47:32 [rosetta@home] Host location: none
06-Feb-2010 20:47:32 [rosetta@home] General prefs: using your defaults
06-Feb-2010 20:47:32 [---] Reading preferences override file
06-Feb-2010 20:47:32 [---] Preferences limit memory usage when active to 1023.74MB
06-Feb-2010 20:47:32 [---] Preferences limit memory usage when idle to 1842.74MB
06-Feb-2010 20:47:42 [---] Preferences limit disk usage to 25.00GB
BOINC initialization completed, beginning process execution...
06-Feb-2010 20:47:46 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614
06-Feb-2010 20:47:55 [---] Suspending computation - user request
06-Feb-2010 20:48:08 [World Community Grid] suspended by user
06-Feb-2010 20:48:14 [rosetta@home] task 2cgq_Jan25_2cgq_1ise_26Jan2010_17412_49_0 resumed by user
06-Feb-2010 20:48:19 [World Community Grid] task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 suspended by user
ID: 30874 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30894 - Posted: 7 Feb 2010, 16:21:24 UTC - in response to Message 30880.  

Sekerob,

I thought the SS was the problem. yes, I do run in SS, but Rosetta@Home runs OK and WCG worked fine before.

I did what you recommended. I checked my nVidia drivers and I have the latest. I disabled SS and ran WCG - Boinc 6.10.18 and it ran fine. Went into SS mode and came out w/o any problem. I downloaded the 6.10.32 and WCG went in boinc SS and out of it fine.

BUT... When I was typing this reply the first time the system crashed. No input... Couldn't do CTRL-ALT-DEL to shutdown or even see what process was taking 100% of the cpu. Had to do a hard reboot again. This was while running the WCG task in this thread. I've started back up and Rosetta@Home doesn't crash the system.

So, I thought we had an answer with the SS, but it seems that's not the problem.
ID: 30894 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30895 - Posted: 7 Feb 2010, 16:33:49 UTC - in response to Message 30894.  

Sekerob,

This is interesting. At the time of the crash I noticed this in the dae.txt log:

07-Feb-2010 10:58:18 [rosetta@home] suspended by user
07-Feb-2010 10:58:19 [World Community Grid] Restarting task CMD2_0315-MYH14.clustersOccur-2Z5X_A.clustersOccur_224_300775_301137_1 using hcmd2 version 614
07-Feb-2010 10:58:29 [---] Suspending computation - CPU usage is too high
07-Feb-2010 10:58:39 [---] Resuming computation


Then I looked in the std err.txt log and I found this:

GLE: Another instance of BOINC is running.
GLE: Another instanc
Another instance of BOINC is running.
GLE: Another instance of BOINC is running.
GLE: Another instanc
Another instance of BOINC is running.
GLE: Another instance of BOINC is running.
GLE: Another instanc


When it doesn't finish the word instance I wonder if this is where the crashes are occurring???
ID: 30895 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30899 - Posted: 7 Feb 2010, 17:13:21 UTC - in response to Message 30897.  

I checked programs in control panel and there's only one BOINC. I did notice that there are 2 BOINC screensaver entries. This is kind of weird since I follow the standard BOINC install process. You would think it would've deleted the previous version?

I've reset WCG and downloaded my first RICE task. I'll let you know how it goes.

ID: 30899 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30929 - Posted: 9 Feb 2010, 2:15:14 UTC - in response to Message 30901.  

Update. Some good news...

It looks like after the last hard reboot there was no longer two BOINC entries in the screen save drop down.

I ran a couple Rice tasks to success. Then I decided to go back to Help Cure MD Phase 2. HCMD2 seems to be running ok now. I've finished 1 task and almost done with another. Both Rice and HCMD2 were working while we were doing other things on the PC. No hangs or freeze up.

Either that one HCMD2 task was a problem or the 2 instances of the Boinc screen saver were causing the issue. I'll go with the latter.

ID: 30929 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30936 - Posted: 9 Feb 2010, 19:23:19 UTC - in response to Message 30929.  

Sekerob,

Spoke to soon... I've had 2 crashes HCMD Phase 2 since my last post. I'm aborting HCMD and deselecting it in my projects on WCG.

If you want the task I was running I can post it.

ID: 30936 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30948 - Posted: 10 Feb 2010, 12:53:05 UTC - in response to Message 30946.  

Sekerob,

I found these errors from Friday:

2007-02-08 13:49:12 [rosetta@home] Unrecoverable error for result 1wit__BOINC_ABINITIO_TRIM2__1546_685_0 ( - exit code -1073741819 (0xc0000005))
2007-02-08 15:47:25 [rosetta@home] Unrecoverable error for result 1c9oA_BOINC_ABINITIO_TRIM2__1546_904_0 ( - exit code -1073741674 (0xc0000096))

I'm also seeing this in the error log:

Another instance of BOINC is running.
GLE: Another instance of BOINC is running.
GLE: Another instanc

The boinc screen saver is not running the application screen saver and there's only one instance. I'm going to change to the generic windows screen saver and see if that helps.


these are what's running
boinc.exe
boincmgr.exe
boinctray.exe

I'm going to restart and try again.
ID: 30948 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30949 - Posted: 10 Feb 2010, 13:00:04 UTC - in response to Message 30948.  

Here's my boinc SS settings:

I run plain Boinc SS after 10 minutes. After 30 minutes I go to blank screen.

I also turn the monitor off after 45 minutes. Not sure if these would affect it? It didn't before...
ID: 30949 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 30993 - Posted: 13 Feb 2010, 0:20:06 UTC - in response to Message 30950.  

I decided to run for awhile to rule out the power save on the screen saver after 45 mins. I turned it off and it seemed to run ok for a day or so...

Had another crash this afternoon.

This is the task that was running:

12-Feb-2010 18:44:05 [World Community Grid] Restarting task CMD2_0339-1MBM_D.clustersOccur-1RLY_A.clustersOccur_0_0 using hcmd2 version 614

I also see this is the std error file:

Another instance of BOINC is running.
GLE: Another instance of BOINC is running.
GLE: Another instanc


Unfortunately it doesn't give me a time to know if the last statement
"GLE: Another instanc"

is a failure. One would think it is since it didn't finish writing the line GLE: Another instance of BOINC is running.
ID: 30993 · Report as offensive
ewordmon

Send message
Joined: 6 Feb 10
Posts: 13
United States
Message 31017 - Posted: 13 Feb 2010, 22:43:59 UTC - in response to Message 31006.  

Hi Sekerob,

I would agree with you regarding the hw/sw but I've checked the drivers and this node is running Rosetta@Home and other WCG tasks like Nutritious Rice w/o any problems. All hw drivers are up to date.

Since I've been running Rosetta@Home after suspending HCMD I haven't had any of those errors to stderrdae.txt.

There's nothing in the file, and not this:

Another instance of BOINC is running.
GLE: Another instance of BOINC is running.
GLE: Another instanc

If it were a hw/sw issue I would be having problems with Rosetta@Home and Nutritious Rice.

Since we can't find the issue, I'm going to deselect HCMD2 on WCG and press on.

Thanks.
ID: 31017 · Report as offensive

Message boards : Questions and problems : World Community Grid Tasks Crash my Machine

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.