BOINC keeps crashing

Message boards : BOINC client : BOINC keeps crashing
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6584 - Posted: 23 Nov 2006, 8:39:50 UTC

I've got BOINC on a number of PCs, but over the last 2-3 weeks the core client keeps crashing, sometimes as often as twice a day.

The PCs were running 5.4.11 but I've upgraded them to different client versions (5.6.5, 5.7.2 and 5.7.4) to see if anything changes, and they still crash. The crash addresses are always the same for each version of BOINC.

5.4.11:

*** UNHANDLED EXCEPTION ****
Reason: Access Violation (0xc0000005) at address 0x0033B014 read attempt to address 0x00000008

*** Dump of the (offending) thread: ***
eax=00d91880 ebx=00944160 ecx=00000000 edx=00944208 esi=00d99fe8 edi=00944208
eip=0033b014 esp=01e7fee0 ebp=01188ff0
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202

ChildEBP RetAddr Args to Child
01188ff0 45537d92 00000000 000a3e65 00030026 0308012e libcurl!curl_strnequal+0x0
SymFromAddr(): GetLastError = '126' Address = '45537d92'
SymGetLineFromAddr(): GetLastError = '126' Address = '45537d92'
SymGetModuleInfo(): GetLastError = '126' Address = '45537d92'
01188ff4 00000000 000a3e65 00030026 0308012e 65736f72 libcurl!+0x0

Exiting...


5.7.2:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0033BA64 read attempt to address 0x00000008


The 5.7.4 client crashed at the same address as 5.7.2 above.


The 5.4.11 crashes seem to dump some extra info compared to 5.7.* and libcurl!curl_strnequal always seems to get a mention.

Anyone got any ideas?
ID: 6584 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6585 - Posted: 23 Nov 2006, 10:08:51 UTC

Just to add, the last line in the stdoutdae.txt file just prior to a crash always seems to be a "Started download of file " message, so maybe it is libcurl crashing each time. The downloads are always gz files from rosetta@home, but that's all those PCs are crunching for.
ID: 6585 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 6586 - Posted: 23 Nov 2006, 10:30:14 UTC


Has anything changed in your network / firewall at the point the crashes started? (When did you download the most recent microsoft update?)

ID: 6586 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6587 - Posted: 23 Nov 2006, 10:41:54 UTC

It looks like the crashes started on 07 Nov, which was a week before the latest updates from Microsoft were installed. All of the PCs seemed stable before that date. Nothing has changed on the network for ages either. That date doesn't seem to coincide with any new applications from rosetta.

I can't seem to find anything that happened around that date.
ID: 6587 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6665 - Posted: 27 Nov 2006, 15:40:50 UTC

I had a couple more crashes over the weekend, at the same addresses. I've also been looking around other message boards and found I'm far from alone with this problem! It's being reported on the Rosetta and Leiden message boards, there's also several threads here in the BOINC Manager forum (they all talk about suddenly losing connection to localhost and having to restart BOINC, but I'm fairly positive it's the same problem - boinc.exe has terminated).

Everywhere I look, the problems started in the first week of November.
ID: 6665 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 6666 - Posted: 27 Nov 2006, 15:49:21 UTC

OK, any error in BOINC that uses a (0xXXXXXXXX) error message can be compared to a Windows STOP error.

The 0xc0000005 error is usually because of:
# Over stressed hardware, usually by heat.
# Flaky, or bad device drivers.

In most cases it's caused by bad device drivers for the video card/DirectX corruption. If you are running the screen saver, please try to run without the screen saver for 24 hours and see if the problem goes away.

If it does, then there's something wrong with your video card drivers or OpenGL settings. Update the video card drivers, or your DirectX version. Don't allow Windows Update to update any of your drivers.
ID: 6666 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6667 - Posted: 27 Nov 2006, 16:03:05 UTC

0xC0000005 is an Access Violation; it happens when software tries to access memory it does not own.

I don't run the screen saver. I also have not touched the drivers on the PCs for months, so I'm somewhat curious that several different PCs with different video and network cards can start crashing at around the same time at the same code addresses within BOINC.

It's not the science applications themselves that crash, and they're placing far higher demands on the hardware.

I'm wondering if something really subtle changed on the project servers themselves and it's causing the BOINC client to crash when it downloads something from them. The crashes always happen when the client is downloading files.
ID: 6667 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 6669 - Posted: 27 Nov 2006, 16:12:36 UTC

How do you connect to the internet?

When you download work from the project servers, all you get in is a raw data file. The project doesn't send you anything else, unless their science application has changed and they send you a new executable.

Thinking about that, which anti virus do you use and does it scan all files upon downloading?
ID: 6669 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6680 - Posted: 28 Nov 2006, 12:53:20 UTC

It's a permanent connection to the internet.

I was thinking more along the lines of the project sending something in the scheduler reply that the client barfs on, or maybe it tells the client to download a data file using a URL so long that it overflows some internal buffer and it crashes.

Good point about the AV software, I'll try disabling it on one of the PCs (they're running AVG, but I've seen the same crash on a PC running Avast too).


Of all the users I've seen with this same problem, the one I wasn't sure about has turned out to be running Rosetta too, so that's a common factor now. Maybe Rosetta is triggering a previously unknown buffer overflow in the BOINC client...
ID: 6680 · Report as offensive
zombie67
Avatar

Send message
Joined: 14 Feb 06
Posts: 139
United States
Message 6726 - Posted: 1 Dec 2006, 19:11:25 UTC

This is happening to me as well. Started a couple of weeks ago, and has happened 15-20 times since then, across 10 machines (all windows).

Apparently it happening to a number of other people too. They are talking about on Rosetta:

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=2599

Reno, NV
Team: SETI.USA
ID: 6726 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 6728 - Posted: 1 Dec 2006, 19:35:29 UTC

A lot of projects seem to be getting problems starting from a few weeks ago. Personally I'm sure it's something to do with microsoft's update, not that there's any proof of course...
ID: 6728 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 6729 - Posted: 1 Dec 2006, 19:48:35 UTC
Last modified: 1 Dec 2006, 19:57:17 UTC

Can people who have this problem please post links to crashed results?
See if it has a stack dump. I'll forward the dumps to Rom Walton then, so he can check if it's a Windows problem or a Boinc problem.

I just checked quickly on Rosetta and saw the results don't show crashes.
Does anyone have a mini dump?
Else send your stderrdae.txt to Rom Walton at rwalton at ssl dot berkeley dot edu
Reference it to Boinc crashing with Windows. If you want to send a mini dump, make sure you ZIP it first.
ID: 6729 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6730 - Posted: 1 Dec 2006, 20:04:53 UTC
Last modified: 1 Dec 2006, 20:05:39 UTC

I think many people don't realise boinc.exe is crashing, they just see blank pages in BOINC Manager and find that it can't connect to localhost anymore. Then they find they have to stop and start that to get it going again.

In my case the crash is 'silent' - there's no error dialog or DrWatson dialog etc. It's only from looking in the stderrdae.txt file that I know that boinc.exe terminated with an unhandled exception. The WUs themselves don't fail.

I'm waiting for another crash so I can grab some more files.
ID: 6730 · Report as offensive
RamCharger

Send message
Joined: 17 Nov 06
Posts: 10
United States
Message 6733 - Posted: 1 Dec 2006, 20:32:23 UTC

i can tell BONIC has crashed because I have the task manager running and will see the cpu usage at zero when should be at 100%. also when maxing bonic there will be a message asking to connect to local host. that does not do anything and have to exit and restart to get everything going again.

only happing on one pc of a hand full that have current win updates done to them.
ID: 6733 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6734 - Posted: 1 Dec 2006, 20:36:01 UTC - in response to Message 6728.  
Last modified: 1 Dec 2006, 20:37:38 UTC

Personally I'm sure it's something to do with microsoft's update, not that there's any proof of course...

I can't speak for anyone else, but in my case the crashes started around 7th November, a full week before the November updates went out on the 14th November.

My money would be on a previously undiscovered bug in boinc.exe being triggered by something Rosetta is doing, but I have no proof of that either...
ID: 6734 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 6735 - Posted: 1 Dec 2006, 20:38:28 UTC

Didn't updates to IE go out a week before the 'normal' monthly updates?

ID: 6735 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 6736 - Posted: 1 Dec 2006, 20:43:11 UTC - in response to Message 6734.  

My money would be on a previously undiscovered bug in boinc.exe being triggered by something Rosetta is doing, but I have no proof of that either...

So send your stderrdae.txt file to Rom. Let him check it. (zip it though ;))
ID: 6736 · Report as offensive
Marky-UK

Send message
Joined: 12 Jul 06
Posts: 35
United Kingdom
Message 6737 - Posted: 1 Dec 2006, 21:14:35 UTC

I've sent one; I should be able to get hold of some more next week.
ID: 6737 · Report as offensive
David Ball

Send message
Joined: 2 Dec 06
Posts: 69
United States
Message 6755 - Posted: 2 Dec 2006, 21:33:40 UTC

I've experienced the same problem on Linux. From the stack dump, it looks like it's related to a vsprintf in the threading library (/lib/libpthread.so.0). That would explain why Rosetta is active but sleeping while the BOINC core client has exited. I just sent the info to Rom at the email address provided. BOINC often switches projects after downloading a work unit, so that would fit the scenario some people are describing. I think it's trying to either put Rosetta to sleep or wake it up and the huge command line is overflowing some buffer. I'm not sure if BOINC uses native Win32 thread functions or uses a PThread library on Windows.

-- David Ball

David Ball
ID: 6755 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 6757 - Posted: 2 Dec 2006, 21:54:00 UTC

Hmmmm...


Chu recently posted this at Ralph.

The command line file is added for the project team. To test a lot of Rosetta parameters without changing the executable, we made them as input arguments from the command line. One impact of doing so is that Rosetta command line becomes longer and longer, difficutlt to remember and difficult to set up ( and more errors could slip through). The file is meant to help that aspect. In my personal opinion, this is a positive step, though still far away to go, to provide a more friendly control interface for Rosetta, such as to build up a graphic interface and a pull-down menu etc in the future.

Kathryn :o)
ID: 6757 · Report as offensive
1 · 2 · 3 · Next

Message boards : BOINC client : BOINC keeps crashing

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.