Bug: BOINC RPC locks up on full disk

Message boards : BOINC client : Bug: BOINC RPC locks up on full disk
Message board moderation

To post messages, you must log in.

AuthorMessage
boinc.schreiter.info

Send message
Joined: 12 Feb 07
Posts: 4
Germany
Message 8881 - Posted: 19 Mar 2007, 11:35:27 UTC
Last modified: 19 Mar 2007, 11:45:01 UTC

Hi there,

one of my BOINC clients (running on a BOINCpe machine) seems to have problems with its disk space (on the RAM disk). It stopped responding to RPC requests even though it still connects to BAM. It doesn't return any results as well. This situation happened after about 6 weeks of permanent operation. It's the first time, and it stopped working for about a week now.

Currently, I don't have physical access to the host itself. I only have remote access via VPN and BoincView. For the first time since a week, BoincView could connect to this host this morning (no idea why).
This is the interesting disk stats:


And this is the last snip of the message log:

	Host	Project	Date	Message

	...

	ha-tak02.ts-home.local	---	19.03.2007 10:25:47	Couldn't write state file: system fwrite
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:26:47	Task 703010016.001678_0 exited with zero status but no 'finished' file
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:26:47	If this happens repeatedly you may need to reset the project.
	ha-tak02.ts-home.local	---	19.03.2007 10:26:47	Rescheduling CPU: application exited
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:26:47	Restarting task 703010016.001678_0 using simap version 510
	ha-tak02.ts-home.local	---	19.03.2007 10:26:47	Couldn't write state file: system fwrite
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:27:47	Task 703010016.001678_0 exited with zero status but no 'finished' file
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:27:47	If this happens repeatedly you may need to reset the project.
	ha-tak02.ts-home.local	---	19.03.2007 10:27:47	Rescheduling CPU: application exited
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:27:47	Restarting task 703010016.001678_0 using simap version 510
	ha-tak02.ts-home.local	---	19.03.2007 10:27:47	Couldn't write state file: system fwrite
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:28:47	Task 703010016.001678_0 exited with zero status but no 'finished' file
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:28:47	If this happens repeatedly you may need to reset the project.
	ha-tak02.ts-home.local	---	19.03.2007 10:28:47	Rescheduling CPU: application exited
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:28:47	Restarting task 703010016.001678_0 using simap version 510
	ha-tak02.ts-home.local	---	19.03.2007 10:28:47	Couldn't write state file: system fwrite
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:29:47	Task 703010016.001678_0 exited with zero status but no 'finished' file
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:29:47	If this happens repeatedly you may need to reset the project.
	ha-tak02.ts-home.local	---	19.03.2007 10:29:47	Rescheduling CPU: application exited
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:29:47	Restarting task 703010016.001678_0 using simap version 510
	ha-tak02.ts-home.local	---	19.03.2007 10:29:47	Couldn't write state file: system fwrite
	ha-tak02.ts-home.local	---	19.03.2007 10:30:47	Suspending computation - running CPU benchmarks
	ha-tak02.ts-home.local	boincsimap	19.03.2007 10:30:47	Pausing task 703010016.001678_0 (removed from memory)
	ha-tak02.ts-home.local	---	19.03.2007 10:30:47	Suspending network activity - running CPU benchmarks
	ha-tak02.ts-home.local	---	19.03.2007 10:30:47	Couldn't write state file: system fwrite
[b]	ha-tak02.ts-home.local	---	19.03.2007 10:31:47	Running CPU benchmarks[/b]
	ha-tak02.ts-home.local	---	19.03.2007 10:31:57	Account manager contact succeeded
	ha-tak02.ts-home.local	---	19.03.2007 10:31:57	Couldn't write state file: system fwrite


My guess is that boinc.exe stopped responding to RPC because some default output stream cannot be written to the disk.
It's sort of a deadlock since I can't reset the projects (SIMAP has about 118 MB on the disk) without RPC access.

The interesting part is, that RPC seemed to work during the CPU benchmarks but not during normal operation. After the finished benchmarks the connection is lost again. This is possibly a bug in the client.

Thanks for your help.

Regards,
Torben
BOINCpe: Live-CD for BOINC (for your diskless, headless BOINC farm)

ID: 8881 · Report as offensive
River~~
Avatar

Send message
Joined: 12 Mar 07
Posts: 59
Message 8899 - Posted: 19 Mar 2007, 20:19:14 UTC - in response to Message 8881.  

... that RPC seemed to work during the CPU benchmarks but not during normal operation. After the finished benchmarks the connection is lost again. This is possibly a bug in the client. ...


My guess is that during the normal operation, the client is continuously busy writing error messages to the log, so busy that it simply does not have time to service the RPC. During benchmarking, there is a lull in the error-writing attempts, and therefore there is time to get a few RPC packets in and out.

Only a guess... hope it is helpful
ID: 8899 · Report as offensive
Pepo
Avatar

Send message
Joined: 3 Apr 06
Posts: 547
Slovakia
Message 8995 - Posted: 22 Mar 2007, 20:48:33 UTC - in response to Message 8881.  

Currently, I don't have physical access to the host itself. I only have remote access via VPN and BoincView. [...] It's sort of a deadlock since I can't reset the projects (SIMAP has about 118 MB on the disk) without RPC access.

The interesting part is, that RPC seemed to work during the CPU benchmarks but not during normal operation. After the finished benchmarks the connection is lost again. This is possibly a bug in the client.

You could wait for the nect benchmark (5 days? 24.03.2007 10:30:47) and then act FAST during the short RPC access window :-)

Peter
ID: 8995 · Report as offensive

Message boards : BOINC client : Bug: BOINC RPC locks up on full disk

Copyright © 2017 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.