Message boards :
BOINC client :
Bug: BOINC RPC locks up on full disk
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Feb 07 Posts: 4 ![]() |
Hi there, one of my BOINC clients (running on a BOINCpe machine) seems to have problems with its disk space (on the RAM disk). It stopped responding to RPC requests even though it still connects to BAM. It doesn't return any results as well. This situation happened after about 6 weeks of permanent operation. It's the first time, and it stopped working for about a week now. Currently, I don't have physical access to the host itself. I only have remote access via VPN and BoincView. For the first time since a week, BoincView could connect to this host this morning (no idea why). This is the interesting disk stats: ![]() And this is the last snip of the message log: Host Project Date Message ... ha-tak02.ts-home.local --- 19.03.2007 10:25:47 Couldn't write state file: system fwrite ha-tak02.ts-home.local boincsimap 19.03.2007 10:26:47 Task 703010016.001678_0 exited with zero status but no 'finished' file ha-tak02.ts-home.local boincsimap 19.03.2007 10:26:47 If this happens repeatedly you may need to reset the project. ha-tak02.ts-home.local --- 19.03.2007 10:26:47 Rescheduling CPU: application exited ha-tak02.ts-home.local boincsimap 19.03.2007 10:26:47 Restarting task 703010016.001678_0 using simap version 510 ha-tak02.ts-home.local --- 19.03.2007 10:26:47 Couldn't write state file: system fwrite ha-tak02.ts-home.local boincsimap 19.03.2007 10:27:47 Task 703010016.001678_0 exited with zero status but no 'finished' file ha-tak02.ts-home.local boincsimap 19.03.2007 10:27:47 If this happens repeatedly you may need to reset the project. ha-tak02.ts-home.local --- 19.03.2007 10:27:47 Rescheduling CPU: application exited ha-tak02.ts-home.local boincsimap 19.03.2007 10:27:47 Restarting task 703010016.001678_0 using simap version 510 ha-tak02.ts-home.local --- 19.03.2007 10:27:47 Couldn't write state file: system fwrite ha-tak02.ts-home.local boincsimap 19.03.2007 10:28:47 Task 703010016.001678_0 exited with zero status but no 'finished' file ha-tak02.ts-home.local boincsimap 19.03.2007 10:28:47 If this happens repeatedly you may need to reset the project. ha-tak02.ts-home.local --- 19.03.2007 10:28:47 Rescheduling CPU: application exited ha-tak02.ts-home.local boincsimap 19.03.2007 10:28:47 Restarting task 703010016.001678_0 using simap version 510 ha-tak02.ts-home.local --- 19.03.2007 10:28:47 Couldn't write state file: system fwrite ha-tak02.ts-home.local boincsimap 19.03.2007 10:29:47 Task 703010016.001678_0 exited with zero status but no 'finished' file ha-tak02.ts-home.local boincsimap 19.03.2007 10:29:47 If this happens repeatedly you may need to reset the project. ha-tak02.ts-home.local --- 19.03.2007 10:29:47 Rescheduling CPU: application exited ha-tak02.ts-home.local boincsimap 19.03.2007 10:29:47 Restarting task 703010016.001678_0 using simap version 510 ha-tak02.ts-home.local --- 19.03.2007 10:29:47 Couldn't write state file: system fwrite ha-tak02.ts-home.local --- 19.03.2007 10:30:47 Suspending computation - running CPU benchmarks ha-tak02.ts-home.local boincsimap 19.03.2007 10:30:47 Pausing task 703010016.001678_0 (removed from memory) ha-tak02.ts-home.local --- 19.03.2007 10:30:47 Suspending network activity - running CPU benchmarks ha-tak02.ts-home.local --- 19.03.2007 10:30:47 Couldn't write state file: system fwrite [b] ha-tak02.ts-home.local --- 19.03.2007 10:31:47 Running CPU benchmarks[/b] ha-tak02.ts-home.local --- 19.03.2007 10:31:57 Account manager contact succeeded ha-tak02.ts-home.local --- 19.03.2007 10:31:57 Couldn't write state file: system fwrite My guess is that boinc.exe stopped responding to RPC because some default output stream cannot be written to the disk. It's sort of a deadlock since I can't reset the projects (SIMAP has about 118 MB on the disk) without RPC access. The interesting part is, that RPC seemed to work during the CPU benchmarks but not during normal operation. After the finished benchmarks the connection is lost again. This is possibly a bug in the client. Thanks for your help. Regards, Torben BOINCpe: Live-CD for BOINC (for your diskless, headless BOINC farm) ![]() |
![]() Send message Joined: 12 Mar 07 Posts: 59 |
... that RPC seemed to work during the CPU benchmarks but not during normal operation. After the finished benchmarks the connection is lost again. This is possibly a bug in the client. ... My guess is that during the normal operation, the client is continuously busy writing error messages to the log, so busy that it simply does not have time to service the RPC. During benchmarking, there is a lull in the error-writing attempts, and therefore there is time to get a few RPC packets in and out. Only a guess... hope it is helpful |
![]() Send message Joined: 3 Apr 06 Posts: 547 ![]() |
Currently, I don't have physical access to the host itself. I only have remote access via VPN and BoincView. [...] It's sort of a deadlock since I can't reset the projects (SIMAP has about 118 MB on the disk) without RPC access. You could wait for the nect benchmark (5 days? 24.03.2007 10:30:47) and then act FAST during the short RPC access window :-) Peter |
Copyright © 2021 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.