Thread 'manager not responding'

Message boards : BOINC Manager : manager not responding
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16219 - Posted: 31 Mar 2008, 22:03:20 UTC - in response to Message 16217.  
Last modified: 31 Mar 2008, 22:06:37 UTC

I'm using 5.10.45, not the debug version, but the stderrdae.txt log DOES have a callstack in it if that helps.

It does.

Is it of today's crash? Can you email it to Rom?

Anyone who has a stack trace of this happening, can you zip it up and email it to Rom Walton at romw at romwnet dot org
ID: 16219 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16221 - Posted: 31 Mar 2008, 22:07:55 UTC
Last modified: 31 Mar 2008, 22:09:45 UTC

To get past the crashing, edit the client_state.xml file, scroll all the way down and edit the network option to show <user_network_request>3</user_network_request>

That means you suspended network activity.

Do know that this means none of your projects will upload, download or report work.
ID: 16221 · Report as offensive
quiggibub

Send message
Joined: 31 Mar 08
Posts: 8
United States
Message 16223 - Posted: 31 Mar 2008, 22:15:57 UTC

Does anyone know if it'll be fixed when LHC comes back online? I have several work units in another project due tomorrow morning, it would be a shame to have the time spent crunching them wasted.
ID: 16223 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16224 - Posted: 31 Mar 2008, 22:19:25 UTC - in response to Message 16223.  

It should be fixed when LHC comes back online. The problem is that we don't know when that is as no one knows why they are off line in the first place.

So my edit there is only of temporary help.

If you don't mind losing the work for LHC, I can tell you how to edit client_state.xml to take out LHC... Just let me know.
ID: 16224 · Report as offensive
ProfileGuy
Avatar

Send message
Joined: 9 Feb 08
Posts: 54
United Kingdom
Message 16226 - Posted: 31 Mar 2008, 22:27:08 UTC - in response to Message 16221.  
Last modified: 31 Mar 2008, 22:40:29 UTC


[quote]To get past the crashing, edit the client_state.xml file, scroll all the way down and edit the network option to show <user_network_request>3</user_network_request>

That means you suspended network activity.

Do know that this means none of your projects will upload, download or report work.


Jord,
So will this edit wipe out completed tasks? Or just stop communication?

[/qoute

Oh - I thinkk this message answers that question

Oh no it doesn't! Wow. I'm not going anywhere near it. I'm just going to wait.
Come on you LHC SysOps! Yaaaay!
ID: 16226 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 16228 - Posted: 31 Mar 2008, 22:31:43 UTC
Last modified: 31 Mar 2008, 22:32:29 UTC

How about:

In Projects tab, select LHC and then click No new tasks
In Tasks tab, select a LHC wu, then click Suspend. Repeat for all LHC wus.

Would this stop the LHC problem, while allowing other projects to continue?

Just as a temporary measure, to upload wus for other projects, then turn off Network access again.

Also, doing this just before uploading the other wus will allow LHC work to continue to this point.
Rather fiddly for those with many projects and wus, but ...
ID: 16228 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16230 - Posted: 31 Mar 2008, 22:36:31 UTC - in response to Message 16228.  

Les,
The problem with this is that the BOINC core client has crashed already. So you can't set LHC to NNT. And it won't solve the crashing of the client as as soon as LHC tries to contact the scheduler, BOINC crashes.

Guy,
It will wipe out tasks to LHC that are trying to upload and those that are ready to report. It won't harm other projects.
ID: 16230 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 16232 - Posted: 31 Mar 2008, 22:41:31 UTC

Sorry, I meant IN ADDITION to your temp fix. After the client_state edit.)

ID: 16232 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16233 - Posted: 31 Mar 2008, 22:44:04 UTC - in response to Message 16232.  

Sorry, I meant IN ADDITION to your temp fix. After the client_state edit.)

Ah.. yes. Although, the scheduler isn't there, so people should get an error that BOINC can't parse a scheduler reply. It won't crash BOINC. At least, it doesn't in my case.
ID: 16233 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16241 - Posted: 31 Mar 2008, 22:56:37 UTC - in response to Message 16237.  

OK, I'll write a new thread for that. With a clear warning.
ID: 16241 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 16247 - Posted: 31 Mar 2008, 23:20:06 UTC

I have a how to in this thread.
Read it all first, please.
If you have questions, do ask.
If you don't want to try it, then don't try it.
ID: 16247 · Report as offensive
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 16249 - Posted: 31 Mar 2008, 23:23:57 UTC
Last modified: 31 Mar 2008, 23:26:26 UTC

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0035D9FC read attempt to address 0x65772075

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 5.10.45


Dump Timestamp : 03/31/08 18:21:19
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\Program Files\BOINC;C:\Program Files\BOINC;srv*C:\DOCUME~1\Raygun\LOCALS~1\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\DOCUME~1\Raygun\LOCALS~1\Temp\symbols*http://boinc.berkeley.edu/symstore

/////////////////////////////////////////////////////
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Then a bunch of mod-loads follow (none of which appear to be issues

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
/////////////////////////////////////////////////////

*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 346, Write: 0, Other 755

- I/O Transfers Counters -
Read: 0, Write: 14483, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 21504, QuotaPeakPagedPoolUsage: 23616
QuotaNonPagedPoolUsage: 34880, QuotaPeakNonPagedPoolUsage: 38016

- Virtual Memory Usage -
VirtualSize: 28655616, PeakVirtualSize: 30306304

- Pagefile Usage -
PagefileUsage: 5455872, PeakPagefileUsage: 5861376

- Working Set Size -
WorkingSetSize: 8048640, PeakWorkingSetSize: 8445952, PageFaultCount: 4303

*** Dump of the thread (454): ***

- Information -
Status: Ready, Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
eax=004603d0 ebx=0012f8c4 ecx=00000061 edx=00000000 esi=7813e457 edi=7813ed1f
eip=7c90eb94 esp=0012f1dc ebp=0012f1ec
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000212

- Callstack -
ChildEBP RetAddr Args to Child
0012f1d8 7c90e57c 7c80a027 00000078 00000000 0012f89c ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
0012f1dc 7c80a027 00000078 00000000 0012f89c 00445648 ntdll!_NtSetEvent@8+0x0 FPO: [2,0,0]
0012f1ec 00445648 00000078 0012f780 004455d0 7c884780 kernel32!_SetEvent@4+0x0
0012f200 7c863016 0012f8c4 00000000 00000000 00000000 boinc!boinc_catch_signal+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142)
0012f89c 7c8436da 0012f8c4 7c839b09 0012f8cc 00000000 kernel32!_UnhandledExceptionFilter@4+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142)
0012f8a4 7c839b09 0012f8cc 00000000 0012f8cc 00000000 kernel32!_BaseProcessStart@4+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142) FPO: [0,0,0]
0012f8cc 7c9037bf 0012f9b8 0012ffe0 0012f9d4 0012f98c kernel32!__except_handler3+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142) FPO: [3,0,7]
0012f8f0 7c90378b 0012f9b8 0012ffe0 0012f9d4 0012f98c ntdll!ExecuteHandler2@20+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142)
0012f9a0 7c90eafa 00000000 0012f9d4 0012f9b8 0012f9d4 ntdll!ExecuteHandler@20+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142)
0012f9a4 00000000 0012f9d4 0012f9b8 0012f9d4 c0000005 ntdll!_KiUserExceptionDispatcher@8+0x0 (c:srcboincsvnbranchesboinc_core_release_5_10libdiagnostics_win.c:2142) FPO: [2,0,0]

*** Dump of the thread (7e4): ***

- Information -
Status: Waiting, Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Registers -
eax=71a5d5af ebx=c0000000 ecx=7c913288 edx=ffffffff esi=00000000 edi=71a87558
eip=7c90eb94 esp=014aff7c ebp=014affb4
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
014aff78 7c90e31b 71a5d609 0000010c 014affbc 014affb0 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
014aff7c 71a5d609 0000010c 014affbc 014affb0 014affa4 ntdll!_ZwRemoveIoCompletion@20+0x0 FPO: [5,0,0]
014affb4 7c80b683 71a5d8ec 0012f700 7c90ee18 0015c378 mswsock!_SockAsyncThread@4+0x0
014affec 00000000 71a5d5af 0015c378 00000000 000000c8 kernel32!_BaseThreadStart@8+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...
ID: 16249 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16250 - Posted: 31 Mar 2008, 23:28:04 UTC - in response to Message 16249.  

Professor Ray wrote:
*** Dump of the Process Statistics: ***

What BOINC version are you using? Unfortunately, that stackdump seems useless (I think it's a crash on the code making the crash report; so the stackdump points to the crash reporter code).
ID: 16250 · Report as offensive
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 16252 - Posted: 31 Mar 2008, 23:46:06 UTC

Plain vanilla 5.10.45
ID: 16252 · Report as offensive
Ralph

Send message
Joined: 30 Sep 05
Posts: 50
Message 16260 - Posted: 1 Apr 2008, 0:47:38 UTC

Hi,

I took a look, and find I have company.
Yes, I'm running LHC as well. That would be when it broke.

Thanks for all the help, guys. I haven't lost anything, and it reminded me how to install properly again. I learned that much.

I wouldn't have suspected LHC. I've had tons of problems with Milkyway, and recently some with Cosmology, but this is the first I've heard with LHC.

I apologize to any BOINC people that are within reading distance. I was wrong. BOINC has been, and is still, the most reliable piece of software I've run across, using three OSs over the years. I use it as a personal benchmark for my machines.

I have added projects in the last couple years that are unfortunately not so well written.
ID: 16260 · Report as offensive
genes
Avatar

Send message
Joined: 14 Dec 06
Posts: 16
United States
Message 16262 - Posted: 1 Apr 2008, 0:58:19 UTC - in response to Message 16219.  
Last modified: 1 Apr 2008, 1:00:13 UTC

I'm using 5.10.45, not the debug version, but the stderrdae.txt log DOES have a callstack in it if that helps.

It does.

Is it of today's crash? Can you email it to Rom?

Anyone who has a stack trace of this happening, can you zip it up and email it to Rom Walton at romw at romwnet dot org


Sent. I chopped off all the stuff from before today, and there were several crashes from 5.10.42 and several from 5.10.45, all due to today's crash.
ID: 16262 · Report as offensive
Professor Ray

Send message
Joined: 31 Mar 08
Posts: 59
United States
Message 16263 - Posted: 1 Apr 2008, 1:01:07 UTC

Does this help?

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0035D79C read attempt to address 0x65772075

Engaging BOINC Windows Runtime Debugger...

********************

BOINC Windows Runtime Debugger Version 6.1.12

Dump Timestamp : 03/31/08 20:58:27
Debugger Engine : 4.0.5.0
Symbol Search Path: C:\Program Files\BOINC;C:\Program Files\BOINC;srv*C:\DOCUME~1\Raygun\LOCALS~1\Temp\symbols*http://msdl.microsoft.com/download/symbols;srv*C:\DOCUME~1\Raygun\LOCALS~1\Temp\symbols*http://boinc.berkeley.edu/symstore

[bla bla bla bla - mod loads - bla bla bla]

*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 14579, Write: 0, Other 1102

- I/O Transfers Counters -
Read: 0, Write: 12715, Other 0

- Paged Pool Usage -
QuotaPagedPoolUsage: 23984, QuotaPeakPagedPoolUsage: 26052
QuotaNonPagedPoolUsage: 35544, QuotaPeakNonPagedPoolUsage: 38352

- Virtual Memory Usage -
VirtualSize: 30932992, PeakVirtualSize: 33034240

- Pagefile Usage -
PagefileUsage: 5541888, PeakPagefileUsage: 5943296

- Working Set Size -
WorkingSetSize: 8396800, PeakWorkingSetSize: 8765440, PageFaultCount: 5448

*** Dump of thread ID 3132 (state: Waiting): ***

- Information -
Status: Wait Reason: UserRequest, , Kernel Time: 9413536.000000, User Time: 8512240.000000, Wait Time: 1862366.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0035D79C read attempt to address 0x65772075

- Registers -
eax=00f8a228 ebx=00c64580 ecx=00c64580 edx=003f0608 esi=65772065 edi=00000000
eip=0035d79c esp=0012fca0 ebp=0012fd6c
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010206

- Callstack -
ChildEBP RetAddr Args to Child
0012fd6c 0040c72f 00467c28 00000000 3fd322b0 7813eab9 libcurl!curl_multi_remove_handle+0x0
0012fdf0 00431501 00000000 3fd322b0 00000000 003f4ff8 boinc!+0x0
0012fe64 0043189d 00471aec 00000001 0012ffc0 00000000 boinc!+0x0
0012fe90 78136d6c f64df3d9 003f34b0 003f3498 7c90e027 boinc!+0x0
0012fedc 781323ff 781c3bc8 00000094 00000005 00000001 MSVCR80!__msize+0x0
0012ff28 0043c27e 0043c2ed f643d6b9 00471aec 0044e4d0 MSVCR80!__unlock+0x0
0012ff64 0043c2ed 0043c300 0044d670 0044d63a 0044d670 boinc!+0x0
0012ff68 0043c300 0044d670 0044d63a 0044d670 f643d7ad boinc!+0x0
0043c2ed 00000000 74ffc359 52e80424 f7ffffff f7c01bd8 boinc!+0x0

*** Dump of thread ID 3012 (state: Waiting): ***

- Information -
Status: Wait Reason: EventPairLow, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 1862306.000000

- Registers -
eax=71a5d5af ebx=c0000000 ecx=7c913288 edx=ffffffff esi=00000000 edi=71a87558
eip=7c90eb94 esp=014bff7c ebp=014bffb4
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202

- Callstack -
ChildEBP RetAddr Args to Child
014bff78 7c90e31b 71a5d609 00000164 014bffbc 014bffb0 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0]
014bff7c 71a5d609 00000164 014bffbc 014bffb0 014bffa4 ntdll!_ZwRemoveIoCompletion@20+0x0 FPO: [5,0,0]
014bffb4 7c80b683 71a5d8ec 0012f714 7c90ee18 00161708 mswsock!_SockAsyncThread@4+0x0
014bffec 00000000 71a5d5af 00161708 00000000 000000c8 kernel32!_BaseThreadStart@8+0x0


*** Debug Message Dump ****


*** Foreground Window Data ***
Window Name :
Window Class :
Window Process ID: 0
Window Thread ID : 0

Exiting...
ID: 16263 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16264 - Posted: 1 Apr 2008, 1:25:25 UTC - in response to Message 16263.  

- Callstack -
ChildEBP RetAddr Args to Child
0012fd6c 0040c72f 00467c28 00000000 3fd322b0 7813eab9 libcurl!curl_multi_remove_handle+0x0
0012fdf0 00431501 00000000 3fd322b0 00000000 003f4ff8 boinc!+0x0
0012fe64 0043189d 00471aec 00000001 0012ffc0 00000000 boinc!+0x0
0012fe90 78136d6c f64df3d9 003f34b0 003f3498 7c90e027 boinc!+0x0
0012fedc 781323ff 781c3bc8 00000094 00000005 00000001 MSVCR80!__msize+0x0
0012ff28 0043c27e 0043c2ed f643d6b9 00471aec 0044e4d0 MSVCR80!__unlock+0x0
0012ff64 0043c2ed 0043c300 0044d670 0044d63a 0044d670 boinc!+0x0
0012ff68 0043c300 0044d670 0044d63a 0044d670 f643d7ad boinc!+0x0
0043c2ed 00000000 74ffc359 52e80424 f7ffffff f7c01bd8 boinc!+0x0

Wow, a useful Windows stack dump for once :) Seems to show the same I found on Linux-based stackdumps, so the problem is definitely pin-pointed. Now to find the cause... :\
ID: 16264 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16265 - Posted: 1 Apr 2008, 1:31:12 UTC - in response to Message 16155.  

Anybody who can reproduce this crash on Linux, could you give me access to a ssh account on the machine having the problem? I can't reproduce the crash myself.

ID: 16265 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16273 - Posted: 1 Apr 2008, 4:24:43 UTC - in response to Message 16272.  

I'm going to look into this tomorrow, could be a problem that cropped up with this version of libcurl & perhaps also a new Microsoft patch level?

Look at threads in "core client" forum. It happens in Linux too, the problem trigger is definitely the LHC redirect.

I tracked down what part of the code crashes, but I'm still not sure of why.

--sent from my iPod

ID: 16273 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : BOINC Manager : manager not responding

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.