message timeout

Message boards : BOINC Manager : message timeout
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9308 - Posted: 2 Apr 2007, 16:03:17 UTC - in response to Message 9307.  

Kathryn,

I have not gotten those message timeout errors, nor those process errors (not since I fooled with network activity). But since I create that file Boinc Manager still has not changed clients. The only messages I am getting are:
(First one)
Mon Apr 2 08:25:41 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed

and the latest one

Mon Apr 2 11:00:58 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed

What is next?
ID: 9308 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9310 - Posted: 2 Apr 2007, 17:22:23 UTC

Log into your account on the Seti webpage (click on "Your Account"). Scroll down to the preferences section.

Copy and paste all of them into a message here.

Also, what projects are you running (Seti and Rosetta as fas as I can tell right now) and how many work units are on that computer, their estimated time to completion and their deadlines?

It's possible that the scheduler thinks you are in Earliest Deadline First mode and will work on that Seti unit until it's finished.
Kathryn :o)
ID: 9310 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9311 - Posted: 2 Apr 2007, 17:48:50 UTC

Kathryn,
Here are my Seti preferences

Do work while computer is running on batteries? (matters only for portable computers) no
Do work while computer is in use? yes
Do work only between the hours of (no restriction)
Leave applications in memory while suspended? no
Switch between applications every (recommended: 60 minutes) 60 minutes
On multiprocessors, use at most 1 processors
Use at most Enforced by version 5.6 and greater 100 percent of CPU time
Disk and memory usage
Use at most 8 GB disk space
Leave at least (Values smaller than 0.001 are ignored) 40 GB disk space free
Use at most 25% of total disk space
Write to disk at most every 60 seconds
Use at most 50% of page file (swap space)
Use at most
Enforced by version 5.8 and greater 50% of memory when computer is in use
Use at most
Enforced by version 5.8 and greater 90% of memory when computer is idle
Network usage
Connect to network about every
(determines size of work cache; maximum 10 days) 0.1 days
Confirm before connecting to Internet?
(matters only if you have a modem, ISDN or VPN connection) no
Disconnect when done? (matters only if you have a modem, ISDN or VPN connection) no
Maximum download rate: 1000 KB/s
Maximum upload rate: 600 KB/s
Use network only between the hours of (no restriction)
Skip image file verification? no

Projects:
Rosetta - to completion 21:46:10 Due April 10
Einstein to completion 24:18:18 - Due April 7
Seti - to completion 12:49:17 - Due April 5
I also have ClimatePrediction - but there are no workunits.
ID: 9311 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 9312 - Posted: 2 Apr 2007, 18:03:57 UTC

I'll have to see if I can find some more details on EDF. But the deadline for Seti is the closest. So it would make sense that EDF might be in effect.

Let me do a bit more checking. I might not be able to get back to you until late tonight.
Kathryn :o)
ID: 9312 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9314 - Posted: 2 Apr 2007, 18:22:27 UTC

Please open client_state.xml (with a browser, or with TextEdit).
Find and copy the following:

* Everything between the <time_stats></time_stats> flags.
(it'll be 4 flags here with numbers between them)

* For each project the <short_term_debt></short_term_debt> and <long_term_debt></long_term_debt> flags with the numbers between them.

Post all those in an answer window here.

Close the client_state.xml file, don't save it. Just exit it.
ID: 9314 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9316 - Posted: 2 Apr 2007, 19:48:29 UTC
Last modified: 2 Apr 2007, 19:53:25 UTC

Jord,

Here they are:

<time_stats>
<on_frac>0.940208</on_frac>
<connected_frac>-1.000000</connected_frac>
<active_frac>0.816090</active_frac>
<cpu_efficiency>0.647537</cpu_efficiency>
<last_update>1175542468.906536</last_update>
</time_stats>

Rosetta
<short_term_debt>20852.940989</short_term_debt>
<long_term_debt>189799.004157</long_term_debt>

ClimatePrediction (No Work)
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>9243.277100</long_term_debt>

Einstein
<short_term_debt>20511.203997</short_term_debt>
<long_term_debt>218639.863178</long_term_debt>

lhcathome (No Work)
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-51226.931973</long_term_debt>

Seti
<short_term_debt>-41364.144986</short_term_debt>
<long_term_debt>-366455.212463</long_term_debt>

I also should post the messages from BoincManager:

Mon Apr 2 14:27:56 2007|SETI@home|[task_debug] result 25ja04ab.7635.24577.554814.3.13_1 checkpointed
Mon Apr 2 14:28:27 2007|SETI@home|[cpu_sched] Preempting 25ja04ab.7635.24577.554814.3.13_1 (removed from memory)
Mon Apr 2 14:28:27 2007|SETI@home|[task_debug] task_state=QUIT_PENDING for 25ja04ab.7635.24577.554814.3.13_1 from preempt
Mon Apr 2 14:28:27 2007|Einstein@Home|[cpu_sched] Starting h1_0707.5_S5R1__8447_S5RIa_1(resume)
Mon Apr 2 14:28:27 2007||[task_debug] ACTIVE_TASK::start(): forked process: pid 22591
Mon Apr 2 14:28:27 2007|Einstein@Home|[task_debug] task_state=EXECUTING for h1_0707.5_S5R1__8447_S5RIa_1 from start
Mon Apr 2 14:28:27 2007|Einstein@Home|Restarting task h1_0707.5_S5R1__8447_S5RIa_1 using einstein_S5RI version 426
Mon Apr 2 14:28:28 2007|SETI@home|[task_debug] Process for 25ja04ab.7635.24577.554814.3.13_1 exited
Mon Apr 2 14:28:28 2007|SETI@home|[task_debug] task_state=UNINITIALIZED for 25ja04ab.7635.24577.554814.3.13_1 from handle_exited_app
Mon Apr 2 14:28:28 2007|SETI@home|[task_debug] exit status 0
Mon Apr 2 14:29:30 2007|Einstein@Home|[task_debug] result h1_0707.5_S5R1__8447_S5RIa_1 checkpointed
Mon Apr 2 14:30:31 2007|Einstein@Home|[task_debug] result h1_0707.5_S5R1__8447_S5RIa_1 checkpointed

So Seti ran for 6 hours before it switched to Einstein.
ID: 9316 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9321 - Posted: 2 Apr 2007, 21:06:36 UTC - in response to Message 9316.  

Rosetta
<short_term_debt>20852.940989</short_term_debt>
<long_term_debt>189799.004157</long_term_debt>

ClimatePrediction (No Work)
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>9243.277100</long_term_debt>

Einstein
<short_term_debt>20511.203997</short_term_debt>
<long_term_debt>218639.863178</long_term_debt>

lhcathome (No Work)
<short_term_debt>0.000000</short_term_debt>
<long_term_debt>-51226.931973</long_term_debt>

Seti
<short_term_debt>-41364.144986</short_term_debt>
<long_term_debt>-366455.212463</long_term_debt>

OK, these numbers here are seconds. The mean of all those seconds is always zero. So Seti won't crunch or download work for a while, it won't crunch for 41,364 seconds and it won't download new work for 366,455 seconds.

The next project that will run, measured by the short term debt figures, is either Einstein or Rosetta. That depends on which has work.
Both those projects have the highest positive debts.

Short term debt says to BOINC which project will get the CPU next.
Long term debt will tell BOINC which project to download work from next.
When the numbers are positive they are active projects.
When the numbers are negative, they are inactive projects.

I don't think anything ran in EDF (Earliest Deadline First) mode. It was probably just that Seti had a lot of time to catch up. Just let it run like this for now.

As for the message timeouts, still looking into it.
ID: 9321 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9323 - Posted: 2 Apr 2007, 21:18:28 UTC
Last modified: 2 Apr 2007, 21:19:53 UTC

Jord,

So should I continue to allow the Task Debug to run?

I have not been getting any message timeouts right now and Einstein is running now (for over 2 hours).

Maybe it is playing catchup?
ID: 9323 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9324 - Posted: 2 Apr 2007, 21:28:05 UTC - in response to Message 9323.  

Yes, let it run with the cc_config.xml flags on. Then in case your trouble returns, you have a log of it, either in the txt file or the old file (the *.old files are just backup files for when the original log gets too big).
ID: 9324 · Report as offensive
Cheryl

Send message
Joined: 1 Apr 07
Posts: 13
Message 9335 - Posted: 4 Apr 2007, 1:19:04 UTC

So far, I have not gotten any errors and it seems like the work units are taking turns every one to two hours. I have Network Activity set to 'based on preferences'.

I do hope (knock on wood) that those errors do not come back.
ID: 9335 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9679 - Posted: 19 Apr 2007, 7:06:28 UTC
Last modified: 19 Apr 2007, 7:17:24 UTC


Another 'message timeout' on the following thread, although it started off as a 'missing uploads' problem (hinting at network issues?). The 'message timeout' occurred while networking was suspended.

Also some error code 500s (is that serverside or clientside?).

http://www.climateprediction.net/board/viewtopic.php?p=62485#62485

ID: 9679 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9680 - Posted: 19 Apr 2007, 7:38:42 UTC - in response to Message 9679.  

Also some error code 500s (is that serverside or clientside?).

http://www.climateprediction.net/board/viewtopic.php?p=62485#62485

I haven't seen errors 500 in a long time and to be honest, I don't see one in that post either. But if they are around, it's a condition of the route between the client and the server. It can only be solved from the client side.

All that that specific post talks about is hadcm3 version 5.15 and scheduler version 5.09 (server version 509). Perhaps that that confused you?
ID: 9680 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 9685 - Posted: 19 Apr 2007, 12:40:15 UTC


It was this bit (same user, a few posts earlier):

Scheduler request failed: HTTP internal server error

I searched on that, and the post I found said that was an error code 500 (but perhaps that was a jump too far).


The CPDN servers are in a state of flux at the moment due to being moved from pillar to post, so I'm not sure which of his problems are due to problems on the client side, and which are due to the server.

ID: 9685 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9686 - Posted: 19 Apr 2007, 14:16:31 UTC - in response to Message 9685.  

As far as I know, the ordinary HTTP error is the error 500. An internal server error is server side. The BOINC Wiki still details what to do with errors 500.
ID: 9686 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15482
Netherlands
Message 9687 - Posted: 19 Apr 2007, 15:10:10 UTC

BOINC Wiki article on Errors 500.
ID: 9687 · Report as offensive
Previous · 1 · 2

Message boards : BOINC Manager : message timeout

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.