Message boards : BOINC client : BOINC keeps crashing
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Jul 06 Posts: 35 |
I've got BOINC on a number of PCs, but over the last 2-3 weeks the core client keeps crashing, sometimes as often as twice a day. The PCs were running 5.4.11 but I've upgraded them to different client versions (5.6.5, 5.7.2 and 5.7.4) to see if anything changes, and they still crash. The crash addresses are always the same for each version of BOINC. 5.4.11: *** UNHANDLED EXCEPTION **** Reason: Access Violation (0xc0000005) at address 0x0033B014 read attempt to address 0x00000008 *** Dump of the (offending) thread: *** eax=00d91880 ebx=00944160 ecx=00000000 edx=00944208 esi=00d99fe8 edi=00944208 eip=0033b014 esp=01e7fee0 ebp=01188ff0 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 ChildEBP RetAddr Args to Child 01188ff0 45537d92 00000000 000a3e65 00030026 0308012e libcurl!curl_strnequal+0x0 SymFromAddr(): GetLastError = '126' Address = '45537d92' SymGetLineFromAddr(): GetLastError = '126' Address = '45537d92' SymGetModuleInfo(): GetLastError = '126' Address = '45537d92' 01188ff4 00000000 000a3e65 00030026 0308012e 65736f72 libcurl!+0x0 Exiting... 5.7.2: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0033BA64 read attempt to address 0x00000008 The 5.7.4 client crashed at the same address as 5.7.2 above. The 5.4.11 crashes seem to dump some extra info compared to 5.7.* and libcurl!curl_strnequal always seems to get a mention. Anyone got any ideas? |
Send message Joined: 12 Jul 06 Posts: 35 |
Just to add, the last line in the stdoutdae.txt file just prior to a crash always seems to be a "Started download of file " message, so maybe it is libcurl crashing each time. The downloads are always gz files from rosetta@home, but that's all those PCs are crunching for. |
Send message Joined: 16 Apr 06 Posts: 386 |
Has anything changed in your network / firewall at the point the crashes started? (When did you download the most recent microsoft update?) |
Send message Joined: 12 Jul 06 Posts: 35 |
It looks like the crashes started on 07 Nov, which was a week before the latest updates from Microsoft were installed. All of the PCs seemed stable before that date. Nothing has changed on the network for ages either. That date doesn't seem to coincide with any new applications from rosetta. I can't seem to find anything that happened around that date. |
Send message Joined: 12 Jul 06 Posts: 35 |
I had a couple more crashes over the weekend, at the same addresses. I've also been looking around other message boards and found I'm far from alone with this problem! It's being reported on the Rosetta and Leiden message boards, there's also several threads here in the BOINC Manager forum (they all talk about suddenly losing connection to localhost and having to restart BOINC, but I'm fairly positive it's the same problem - boinc.exe has terminated). Everywhere I look, the problems started in the first week of November. |
Send message Joined: 29 Aug 05 Posts: 15585 |
OK, any error in BOINC that uses a (0xXXXXXXXX) error message can be compared to a Windows STOP error. The 0xc0000005 error is usually because of: # Over stressed hardware, usually by heat. # Flaky, or bad device drivers. In most cases it's caused by bad device drivers for the video card/DirectX corruption. If you are running the screen saver, please try to run without the screen saver for 24 hours and see if the problem goes away. If it does, then there's something wrong with your video card drivers or OpenGL settings. Update the video card drivers, or your DirectX version. Don't allow Windows Update to update any of your drivers. |
Send message Joined: 12 Jul 06 Posts: 35 |
0xC0000005 is an Access Violation; it happens when software tries to access memory it does not own. I don't run the screen saver. I also have not touched the drivers on the PCs for months, so I'm somewhat curious that several different PCs with different video and network cards can start crashing at around the same time at the same code addresses within BOINC. It's not the science applications themselves that crash, and they're placing far higher demands on the hardware. I'm wondering if something really subtle changed on the project servers themselves and it's causing the BOINC client to crash when it downloads something from them. The crashes always happen when the client is downloading files. |
Send message Joined: 29 Aug 05 Posts: 15585 |
How do you connect to the internet? When you download work from the project servers, all you get in is a raw data file. The project doesn't send you anything else, unless their science application has changed and they send you a new executable. Thinking about that, which anti virus do you use and does it scan all files upon downloading? |
Send message Joined: 12 Jul 06 Posts: 35 |
It's a permanent connection to the internet. I was thinking more along the lines of the project sending something in the scheduler reply that the client barfs on, or maybe it tells the client to download a data file using a URL so long that it overflows some internal buffer and it crashes. Good point about the AV software, I'll try disabling it on one of the PCs (they're running AVG, but I've seen the same crash on a PC running Avast too). Of all the users I've seen with this same problem, the one I wasn't sure about has turned out to be running Rosetta too, so that's a common factor now. Maybe Rosetta is triggering a previously unknown buffer overflow in the BOINC client... |
Send message Joined: 14 Feb 06 Posts: 139 |
This is happening to me as well. Started a couple of weeks ago, and has happened 15-20 times since then, across 10 machines (all windows). Apparently it happening to a number of other people too. They are talking about on Rosetta: http://boinc.bakerlab.org/rosetta/forum_thread.php?id=2599 Reno, NV Team: SETI.USA |
Send message Joined: 16 Apr 06 Posts: 386 |
A lot of projects seem to be getting problems starting from a few weeks ago. Personally I'm sure it's something to do with microsoft's update, not that there's any proof of course... |
Send message Joined: 29 Aug 05 Posts: 15585 |
Can people who have this problem please post links to crashed results? See if it has a stack dump. I'll forward the dumps to Rom Walton then, so he can check if it's a Windows problem or a Boinc problem. I just checked quickly on Rosetta and saw the results don't show crashes. Does anyone have a mini dump? Else send your stderrdae.txt to Rom Walton at rwalton at ssl dot berkeley dot edu Reference it to Boinc crashing with Windows. If you want to send a mini dump, make sure you ZIP it first. |
Send message Joined: 12 Jul 06 Posts: 35 |
I think many people don't realise boinc.exe is crashing, they just see blank pages in BOINC Manager and find that it can't connect to localhost anymore. Then they find they have to stop and start that to get it going again. In my case the crash is 'silent' - there's no error dialog or DrWatson dialog etc. It's only from looking in the stderrdae.txt file that I know that boinc.exe terminated with an unhandled exception. The WUs themselves don't fail. I'm waiting for another crash so I can grab some more files. |
Send message Joined: 17 Nov 06 Posts: 10 |
i can tell BONIC has crashed because I have the task manager running and will see the cpu usage at zero when should be at 100%. also when maxing bonic there will be a message asking to connect to local host. that does not do anything and have to exit and restart to get everything going again. only happing on one pc of a hand full that have current win updates done to them. |
Send message Joined: 12 Jul 06 Posts: 35 |
Personally I'm sure it's something to do with microsoft's update, not that there's any proof of course... I can't speak for anyone else, but in my case the crashes started around 7th November, a full week before the November updates went out on the 14th November. My money would be on a previously undiscovered bug in boinc.exe being triggered by something Rosetta is doing, but I have no proof of that either... |
Send message Joined: 25 Nov 05 Posts: 1654 |
Didn't updates to IE go out a week before the 'normal' monthly updates? |
Send message Joined: 29 Aug 05 Posts: 15585 |
My money would be on a previously undiscovered bug in boinc.exe being triggered by something Rosetta is doing, but I have no proof of that either... So send your stderrdae.txt file to Rom. Let him check it. (zip it though ;)) |
Send message Joined: 12 Jul 06 Posts: 35 |
I've sent one; I should be able to get hold of some more next week. |
Send message Joined: 2 Dec 06 Posts: 69 |
I've experienced the same problem on Linux. From the stack dump, it looks like it's related to a vsprintf in the threading library (/lib/libpthread.so.0). That would explain why Rosetta is active but sleeping while the BOINC core client has exited. I just sent the info to Rom at the email address provided. BOINC often switches projects after downloading a work unit, so that would fit the scenario some people are describing. I think it's trying to either put Rosetta to sleep or wake it up and the huge command line is overflowing some buffer. I'm not sure if BOINC uses native Win32 thread functions or uses a PThread library on Windows. -- David Ball David Ball |
Send message Joined: 30 Oct 05 Posts: 1239 |
Hmmmm... Chu recently posted this at Ralph. The command line file is added for the project team. To test a lot of Rosetta parameters without changing the executable, we made them as input arguments from the command line. One impact of doing so is that Rosetta command line becomes longer and longer, difficutlt to remember and difficult to set up ( and more errors could slip through). The file is meant to help that aspect. In my personal opinion, this is a positive step, though still far away to go, to provide a more friendly control interface for Rosetta, such as to build up a graphic interface and a pull-down menu etc in the future. Kathryn :o) |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.