Message boards : BOINC Manager : message timeout
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Apr 07 Posts: 13 |
I am running Boinc 5.8.15 on Mac OS 10.4.9. I keep getting these messages: Fri Mar 30 19:55:32 2007|SETI@home|Task 25ja04ab.7635.24577.554814.3.13_1 exited with zero status but no 'finished' file Fri Mar 30 19:55:32 2007|SETI@home|If this happens repeatedly you may need to reset the project. Fri Mar 30 19:55:32 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513 Sat Mar 31 06:22:27 2007||Restarting 25ja04ab.7635.24577.554814.3.13_1 - message timeout Sat Mar 31 06:22:27 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513 Sat Mar 31 06:22:28 2007||[error] Process 27600 not found Sat Mar 31 10:43:26 2007||Restarting 25ja04ab.7635.24577.554814.3.13_1 - message timeout Sat Mar 31 10:43:27 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513 Trouble is Boinc Manager stays on that client continuously and does not switch to the next client file. I suspend Seti and then it will go to the next client only to get another set of message time outs. Sat Mar 31 20:32:08 2007|rosetta@home|Starting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 Sat Mar 31 20:32:09 2007|rosetta@home|Starting task CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 using rosetta version 554 Sat Mar 31 20:35:16 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout Sat Mar 31 20:35:17 2007||[error] Process 6854 not found Sat Mar 31 20:38:23 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout Sat Mar 31 20:38:24 2007||[error] Process 6875 not found Sat Mar 31 20:41:28 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout Sat Mar 31 20:41:29 2007||[error] Process 6896 not found Sat Mar 31 20:44:33 2007||Restarting CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 - message timeout Sat Mar 31 20:44:33 2007|rosetta@home|Restarting task CNTRL_ABRELAX_SAVE_ALL_OUT_-1bm8_-_filters_1615_7460_0 using rosetta version 554 Sat Mar 31 20:44:34 2007||[error] Process 6916 not found I posted this on the client's web site and the answer I got was to wait for the next stable release of Boinc Manager to correct the issue. When will the next stable version be released? Or is this a different problem that I can fix on my end? |
Send message Joined: 30 Oct 05 Posts: 1239 |
I've sent an email to David, Rom and Charlie about this. There's a similar report to yours on the Einstein boards. At some point they might want to get some debugging logs from you. When they let me know which ones, I'll post the file you'll need to create. Kathryn :o) |
Send message Joined: 30 Oct 05 Posts: 1239 |
A few things... First... What kind of Mac is it? PPC or Intel? Second... This is from Charlie, the Mac developer. Are you running on an Intel or PowerPC Mac? A new version 5.18 of SETI@home client for Intel Macs, which should fix many of the crashes, has been ready for some time now but its release has been delayed for various reasons. Third... I've gotten the message timeout message in the past. I emailed David about it a couple alpha versions back and his response was... "Restarting XXX - message timeout" is what happens when So my question would be are you using the CPU throttling feature of BOINC? Kathryn :o) |
Send message Joined: 1 Apr 07 Posts: 13 |
It is a PPC iMac and no I am not using the CPU throttling feature. I don't even know what that is, let alone where it is selected. |
Send message Joined: 30 Oct 05 Posts: 1239 |
Thanks Cheryl. I've passed along that additional information to the developers. And just for your future knowledge, CPU throttling lets you decrease the amount of CPU BOINC uses. Many use it to control temperatures when running on laptops. You can find the settings for it under "Your Account" in the "General Preferences" on most of the project websites (as long as their server code is recent enough to have the field on the website). You can also set it through the the Simple View of BOINC Manager by clicking on the preferences button. Kathryn :o) |
Send message Joined: 1 Apr 07 Posts: 13 |
The CPU usage was set to 100%. Should this be set lower? In checking my results on the client web site, the last two work units showed Client Error and Compute Error. I am assuming this means nothing is being done on these work units. Where would I find the logs to Boinc Manager? I could not find any in my logs. |
Send message Joined: 30 Oct 05 Posts: 1239 |
Unless your computer is running hot, I wouldn't fool with that feature. I use it because my laptop runs *really* hot. The debugging logs would be something that would be generated after one of us gives you instructions on how to set up an cc_config.xml file. These messages will be logged in the same place most of the other messages are, in the messages tab/window. The messages from your message tab/window (depending on which view of the manager you're looking at) can be found in a file called stdoutdae. Where you might find it is a different question... I know BOINC on Macs has stuff in slightly different places depending on the file. With Windows it's all in the same directory. Kathryn :o) |
Send message Joined: 1 Apr 07 Posts: 13 |
Here is an update and something to inquire further about. I suspended Network activity and for the last 2 1/2 hours I have not been getting those 'message timeout' errors. I have also noticed that the workunits are working faster than before I suspended Network activity. This is a home connection to the internet - DSL - and a Linksys Router connected to the modem in order to share printers with a Windows XP machine - who is not running Boinc (just the Mac is). |
Send message Joined: 30 Oct 05 Posts: 1239 |
Here is an update and something to inquire further about. Interesting. Try re-enabling the network and see if you start getting the messages again. If you do, disable it and see what happens. Basically see if you can reproduce the behavior. Kathryn :o) |
Send message Joined: 16 Apr 06 Posts: 386 |
|
Send message Joined: 1 Apr 07 Posts: 13 |
I resumed Network Activity and after four hours there was no 'message Timeout' errors, but ---- The manager did not switch clients after 60 minutes as per my preferences. |
Send message Joined: 1 Apr 07 Posts: 13 |
No, it is not from clock sync. Mine is set to manual sync and I rarely do that. |
Send message Joined: 19 Jan 07 Posts: 1179 |
That post doesn't mention clock syncing. Maybe you read a different one? The bug I described was on "Message 8819". |
Send message Joined: 30 Oct 05 Posts: 1239 |
I resumed Network Activity and after four hours there was no 'message Timeout' errors, but ---- What project was it running? 5.8.x versions of BOINC won't switch tasks until a checkpoint has been reached. If it was Rosetta it's possible that the first model hadn't finished. If I'm remembering right, Rosetta only checkpoints at the end of a model. Kathryn :o) |
Send message Joined: 1 Apr 07 Posts: 13 |
Kathryn, Seti was running at the time. I restarted Boinc Manager. I'll watch it and let you know what happens. I now have it set to Network Activity based on preferences rather than Always Available. Nicolas, My internet connection has always been a good one. I got the Message Timeout errors for several hours on end no mater what client it was working on. Turning off Network Activity, then turning it back on several hours later may have corrected those timeouts. I never had this problem with previous versions of the manager until the 5.8.11 & the 5.8.15 version. |
Send message Joined: 30 Oct 05 Posts: 1239 |
Well... the checkpointing issue can't be it. Seti checkpoints pretty frequently. Definitely more than once an hour. Let me create that cc_config.xml file for you. I'll set some flags that might shed some light on the problem. Disclaimer... I'm guessing at the best flags to set... I haven't heard back from David or Rom on this yet. Kathryn :o) |
Send message Joined: 30 Oct 05 Posts: 1239 |
Exit out of BOINC. Create the following file with TextEdit. <cc_config> <log_flags> <task>1</task> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> <cpu_sched>1</cpu_sched> <cpu_sched_debug>0</cpu_sched_debug> <debt_debug>0</debt_debug> <state_debug>0</state_debug> <task_debug>1</task_debug> <file_xfer_debug>0</file_xfer_debug> <sched_op_debug>0</sched_op_debug> <http_debug>0</http_debug> <work_fetch_debug>0</work_fetch_debug> <proxy_debug>0</proxy_debug> <time_debug>0</time_debug> <http_xfer_debug>0</http_xfer_debug> <measurement_debug>0</measurement_debug> <poll_debug>0</poll_debug> <guirpc_debug>0</guirpc_debug> <scrsave_debug>0</scrsave_debug> <rr_simulation>0</rr_simulation> <cpu_sched>1</cpu_sched> <app_msg_send>0</app_msg_send> <app_msg_receive>0</app_msg_receive> <unparsed_xml>0</unparsed_xml> <network_status_debug>0</network_status_debug> </log_flags> </cc_config> Save it as cc_config.xml and make sure it doesn't get an extension like .txt (I don't know if Macs put this on like Windows does). Put this file in Macintosh HD --> Library --> Application Support --> BOINC Data. If any of the flags that are turned off (<flag>0</flag>) need to be turned on, I'll let you know. You'll just need to change the 0 to a 1 (<flag>1</flag>) and then tell BOINC to re-read the file. But if this is needed, I'll give better directions for that. Let BOINC run for a while. If you see those errors, exit out of BOINC. I'm guessing in that same directory should be a file called stoutdae. You can use Spotlight to search for it if it's not there. Then either post it here for us to look at or email it to me and I'll pass it along to the developers. [edited to fix the cc_config.xml] Kathryn :o) |
Send message Joined: 1 Apr 07 Posts: 13 |
Kathryn, I followed your instructions and I am letting it run. I have two stoutdae files now. One with .old and the other with .txt When I restarted Boinc I got: Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <measurement_debug> Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </measurement_debug> Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <checkpoint_debug> Mon Apr 2 08:24:34 2007||Unexpected text 0 in cc_config.xml Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: </checkpoint_debug> Mon Apr 2 08:24:35 2007||Starting BOINC client version 5.8.15 for powerpc-apple-darwin Mon Apr 2 08:24:35 2007||log flags: task, file_xfer, sched_ops, cpu_sched, task_debug Mon Apr 2 08:24:35 2007||Libraries: libcurl/7.15.5 OpenSSL/0.9.7l zlib/1.2.3 Mon Apr 2 08:24:35 2007||Data directory: /Library/Application Support/BOINC Data Mon Apr 2 08:24:37 2007||Processor: 1 Power Macintosh Power Macintosh [Power Macintosh Model PowerMac4,5] [AltiVec] As it runs I get: Mon Apr 2 08:24:37 2007||General prefs: no separate prefs for home; using your defaults Mon Apr 2 08:24:37 2007|SETI@home|[cpu_sched] Starting 25ja04ab.7635.24577.554814.3.13_1(resume) Mon Apr 2 08:24:37 2007||[task_debug] ACTIVE_TASK::start(): forked process: pid 20249 Mon Apr 2 08:24:37 2007|SETI@home|[task_debug] task_state=EXECUTING for 25ja04ab.7635.24577.554814.3.13_1 from start Mon Apr 2 08:24:37 2007|SETI@home|Restarting task 25ja04ab.7635.24577.554814.3.13_1 using setiathome_enhanced version 513 Mon Apr 2 08:25:41 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:26:42 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:27:42 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:28:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:29:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:30:43 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:31:44 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed Mon Apr 2 08:32:44 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed I appreciate your help with this. |
Send message Joined: 29 Aug 05 Posts: 15573 |
Mon Apr 2 08:24:34 2007||Unrecognized tag in cc_config.xml: <measurement_debug> <measurement_debug> is no longer in use. <checkpoint_debug> isn't in use yet. It'll be available in 5.10+ .. checkpoints are measured in 5.8 with the <task_debug> flag. So you can ignore those messages. |
Send message Joined: 30 Oct 05 Posts: 1239 |
Great! Now I guess just let it run and see if it starts throwing the original error messages. Sorry about those extra flags. I didn't check to make sure it was up to date. Kathryn :o) |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.