Stalled downloads

Message boards : Questions and problems : Stalled downloads
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95851 - Posted: 16 Feb 2020, 11:09:23 UTC

I keep getting Rosetta@home downloads of 3kB files getting stuck (a known problem with that project just now). Aborting the download, then aborting the task, then updating the project usually works. But sometimes I still can't get new work until I actually reboot the computer! Boinc thinks the download is still stalled:

Rosetta@home 16/02/2020 11:00:16 AM Not requesting tasks: some download is stalled

Why does Boinc not realise I cancelled it?
ID: 95851 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95852 - Posted: 16 Feb 2020, 11:27:12 UTC - in response to Message 95851.  

Which version of BOINC?
ID: 95852 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95853 - Posted: 16 Feb 2020, 11:40:40 UTC - in response to Message 95852.  

Which version of BOINC?


7.16.3 (x64)

Happens on all 4 of my computers. Only with Rosetta@home. Others experiencing the same thing judging by their forums.

I was going to paste a log here. Do you know how I can get log entries from before I rebooted it?
ID: 95853 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95854 - Posted: 16 Feb 2020, 12:27:35 UTC - in response to Message 95853.  
Last modified: 16 Feb 2020, 12:58:29 UTC

Which operating system? And if it's Linux, how was it installed?

If Linux, try variations on

journalctl --boot=-1 --unit=boinc-client
If Windows, file 'stdoutdae.txt' in data directory.
ID: 95854 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95855 - Posted: 16 Feb 2020, 13:37:22 UTC - in response to Message 95854.  

Which operating system? And if it's Linux, how was it installed?

If Linux, try variations on

journalctl --boot=-1 --unit=boinc-client
If Windows, file 'stdoutdae.txt' in data directory.


Windows 10.

16-Feb-2020 07:49:29 [Rosetta@home] Started download of 9v1nm_gb_c815_9mer_gb_001245.zip
16-Feb-2020 07:54:36 [Rosetta@home] Temporarily failed download of 9v1nm_gb_c815_9mer_gb_001245.zip: transient HTTP error
16-Feb-2020 07:54:36 [Rosetta@home] Backing off 03:44:45 on download of 9v1nm_gb_c815_9mer_gb_001245.zip
16-Feb-2020 07:54:37 [---] Project communication failed: attempting access to reference site
16-Feb-2020 07:54:38 [---] Internet access OK - project servers may be temporarily down.
16-Feb-2020 09:29:54 [Rosetta@home] Computation for task rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0 finished
16-Feb-2020 09:29:59 [Rosetta@home] Starting task 7ub7ru9a_3h_design1_893125_1_0
16-Feb-2020 09:30:00 [Rosetta@home] Started upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0
16-Feb-2020 09:30:04 [Rosetta@home] Finished upload of rb_02_15_16183_16041__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_894816_222_0_r614618089_0
16-Feb-2020 10:30:15 [Rosetta@home] Sending scheduler request: To report completed tasks.
16-Feb-2020 10:30:15 [Rosetta@home] Reporting 1 completed tasks
16-Feb-2020 10:30:15 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:30:17 [Rosetta@home] Scheduler request completed
16-Feb-2020 10:59:01 [Rosetta@home] task 9v1nm_gb_c815_9mer_gb_001245_SAVE_ALL_OUT_892880_29_0 aborted by user
16-Feb-2020 10:59:06 [Rosetta@home] update requested by user
16-Feb-2020 10:59:07 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 10:59:07 [Rosetta@home] Reporting 1 completed tasks
16-Feb-2020 10:59:07 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:59:08 [Rosetta@home] Scheduler request completed
16-Feb-2020 10:59:25 [Rosetta@home] update requested by user
16-Feb-2020 10:59:28 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 10:59:28 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 10:59:30 [Rosetta@home] Scheduler request completed
16-Feb-2020 11:00:14 [Rosetta@home] update requested by user
16-Feb-2020 11:00:16 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 11:00:16 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 11:00:17 [Rosetta@home] Scheduler request completed
16-Feb-2020 11:10:02 [Rosetta@home] update requested by user
16-Feb-2020 11:10:07 [Rosetta@home] Sending scheduler request: Requested by user.
16-Feb-2020 11:10:07 [Rosetta@home] Not requesting tasks: some download is stalled
16-Feb-2020 11:10:09 [Rosetta@home] Scheduler request completed
ID: 95855 · Report as offensive
Profile Dave

Send message
Joined: 28 Jun 10
Posts: 1437
United Kingdom
Message 95856 - Posted: 16 Feb 2020, 14:00:40 UTC - in response to Message 95853.  

Happens on all 4 of my computers. Only with Rosetta@home. Others experiencing the same thing judging by their forums.


If already raised on the Rosetta forums, my guess is you will just have to wait till someone kicks the appropriate server.
ID: 95856 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95857 - Posted: 16 Feb 2020, 14:09:13 UTC - in response to Message 95855.  

Unfortunately, the log doesn't show you cancelling the download - which is probably the log's fault, not yours.

Two suggestions:

1) if you see it happening, set <http_debug> in Event Log options, and retry the transfer - find out what's happening behind that 'transient HTTP error'.
2) make a careful and exact note of the file name in question. Cancel the download, and make sure it disappears from the transfers tab. Restart the client, and if the 'stalled download' message reappears, have a very careful 'read only' (no edits) peek inside client_state.xml - same folder. Find the reference (if any) to the file you cancelled, and post the whole of the

<file>
...
</file>
section it's enclosed in.
ID: 95857 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95858 - Posted: 16 Feb 2020, 14:10:29 UTC - in response to Message 95856.  

Happens on all 4 of my computers. Only with Rosetta@home. Others experiencing the same thing judging by their forums.
If already raised on the Rosetta forums, my guess is you will just have to wait till someone kicks the appropriate server.
Cancel and request new work shouldn't be prevented - that sounds more like a client issue to me.
ID: 95858 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95859 - Posted: 16 Feb 2020, 15:05:28 UTC - in response to Message 95856.  

Happens on all 4 of my computers. Only with Rosetta@home. Others experiencing the same thing judging by their forums.


If already raised on the Rosetta forums, my guess is you will just have to wait till someone kicks the appropriate server.


Yes, the server is busted, but the Boinc client is being stupid. I cancelled that download and it hasn't noticed I've done so. It can be fixed in the client too.
ID: 95859 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95860 - Posted: 16 Feb 2020, 15:07:42 UTC - in response to Message 95857.  

Unfortunately, the log doesn't show you cancelling the download - which is probably the log's fault, not yours.

Two suggestions:

1) if you see it happening, set <http_debug> in Event Log options, and retry the transfer - find out what's happening behind that 'transient HTTP error'.
2) make a careful and exact note of the file name in question. Cancel the download, and make sure it disappears from the transfers tab. Restart the client, and if the 'stalled download' message reappears, have a very careful 'read only' (no edits) peek inside client_state.xml - same folder. Find the reference (if any) to the file you cancelled, and post the whole of the

<file>
...
</file>
section it's enclosed in.


Will do. But restarting the machine (or presumably just the Boinc client) always clears the problem. My problem is it doesn't clear without restarting.
ID: 95860 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95861 - Posted: 16 Feb 2020, 15:33:08 UTC - in response to Message 95860.  

Your log said

16-Feb-2020 10:30:15 [Rosetta@home] Not requesting tasks: some download is stalled
That's entirely a client decision - the server isn't involved at all (and can't over-rule it). The problem seems to be that the download isn't completely forgotten about when it's cancelled. We need to find out why not.
ID: 95861 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95862 - Posted: 16 Feb 2020, 16:04:05 UTC - in response to Message 95861.  

Your log said

16-Feb-2020 10:30:15 [Rosetta@home] Not requesting tasks: some download is stalled
That's entirely a client decision - the server isn't involved at all (and can't over-rule it). The problem seems to be that the download isn't completely forgotten about when it's cancelled. We need to find out why not.


Agreed (although there's clearly a server problem aswell as this happens about every 3 days on each computer, and others have reported the same problem). It's always a tiny 3kB zip file (the task presumably) that sticks. I've never seen a problem with a large executable being downloaded. Perhaps it's their server that issues the tasks that's screwed.

Next time it happens I'll follow your instructions and see if we can spot why it thinks it's still downloading.

Someone on Rosetta says it does clear after about an hour (presumably once he's cancelled the download and task):
At the end of this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893
I also posted this but nobody has replied yet: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=13519
ID: 95862 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95863 - Posted: 16 Feb 2020, 16:37:45 UTC - in response to Message 95862.  

Well, I thought I'd try and see what happens for myself. Allowed new work for an old test attachment - got four new Rosetta v4.07 tasks. All downloaded cleanly, although there wasn't a 3KB file amongst them (nothing between 1KB FLAGS and 7KB robetta.zip files). Download server was slow nearly 20 minutes for the database file - my internet link is a lot faster than theirs). Nothing obviously wrong in client_state.xml

They'll run in due course - I'll watch what happens.
ID: 95863 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95864 - Posted: 16 Feb 2020, 16:59:15 UTC - in response to Message 95863.  

Well, I thought I'd try and see what happens for myself. Allowed new work for an old test attachment - got four new Rosetta v4.07 tasks. All downloaded cleanly, although there wasn't a 3KB file amongst them (nothing between 1KB FLAGS and 7KB robetta.zip files). Download server was slow nearly 20 minutes for the database file - my internet link is a lot faster than theirs). Nothing obviously wrong in client_state.xml

They'll run in due course - I'll watch what happens.


Most do download quite normally, it's only 1 every 3 days that goes wrong if you're running 4 cores 24/7 on Rosetta. You could try downloading a huge cache, but the problem might depend on time rather than how many.

Not sure why you say the download server is that slow. I have a 38Mbit connection and it maxed it out, it downloaded everything required for a computer which was attaching to Rosetta freshly in a couple of minutes.

The files I've seen stick always seem to be associated with WUs with a name beginning multistate, like http://boinc.bakerlab.org/rosetta/workunit.php?wuid=1010934932
ID: 95864 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95865 - Posted: 16 Feb 2020, 17:09:23 UTC - in response to Message 95864.  

My line is running at 71.17 Mbps, but it still took 20 minutes.

The files I've seen stick always seem to be associated with WUs with a name beginning multistate, like http://boinc.bakerlab.org/rosetta/workunit.php?wuid=1010934932
Thanks - it's helpful to know that. Mine all start 'rb_02'
ID: 95865 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95866 - Posted: 16 Feb 2020, 17:57:44 UTC - in response to Message 95865.  
Last modified: 16 Feb 2020, 17:59:41 UTC

My line is running at 71.17 Mbps, but it still took 20 minutes.


I can get 54, but it's extra a month.

I've never actually seen it download slowly, but then it's not often big files are needed - only when attaching to the project.

The files I've seen stick always seem to be associated with WUs with a name beginning multistate, like http://boinc.bakerlab.org/rosetta/workunit.php?wuid=1010934932
Thanks - it's helpful to know that. Mine all start 'rb_02'


The multistate are rare. And they don't always fail, the one I linked to was successfully completed.

Unless their server gets fixed, it shouldn't be long before I see another failure, then I'll try the troubleshooting. All 4 machines are only running Rosetta just now.
ID: 95866 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95868 - Posted: 16 Feb 2020, 19:57:24 UTC - in response to Message 95865.  

My line is running at 71.17 Mbps, but it still took 20 minutes.

The files I've seen stick always seem to be associated with WUs with a name beginning multistate, like http://boinc.bakerlab.org/rosetta/workunit.php?wuid=1010934932
Thanks - it's helpful to know that. Mine all start 'rb_02'


Unfortunately, unlike Einstein, you can't set it to give you certain applications.
ID: 95868 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95874 - Posted: 17 Feb 2020, 11:45:17 UTC

Just had another 'rb_' download. This time it allocated me the 64-bit version of the app, so two more big downloads.

Both of them stopped at 99.65%. That sounds to me like a dropped packet somewhere in the middle, and a wait for the resend.

I tried the usual trick for that - suspend networking for a few seconds via the 'Activity' menu in BOINC Manager, and then revert to networking 'always'. Worked a treat - the task is now ready to run.
ID: 95874 · Report as offensive
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 1144
United Kingdom
Message 95879 - Posted: 17 Feb 2020, 18:34:15 UTC - in response to Message 95874.  

Just had another 'rb_' download. This time it allocated me the 64-bit version of the app, so two more big downloads.

Both of them stopped at 99.65%. That sounds to me like a dropped packet somewhere in the middle, and a wait for the resend.

I tried the usual trick for that - suspend networking for a few seconds via the 'Activity' menu in BOINC Manager, and then revert to networking 'always'. Worked a treat - the task is now ready to run.


Ok, I'll bare that in mind if it keeps happening. There must be a way to fix it though. Did you manage to get any information on the transient HTTP error using <http_debug> like you asked me to do? I've not had another one sticking yet....
ID: 95879 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4536
United Kingdom
Message 95880 - Posted: 17 Feb 2020, 19:12:19 UTC - in response to Message 95879.  

Sorry - no, I didn't. (The morning was getting a bit busy with other tests around that time). I'll try and remember next time.

It did remind me of SETI message 1343415 from 2013: SETI was having major problems with stalled downloads at the time, and somebody found RFC 1323 - an official Internet standard for dealing with congested internet links. Linux and OS X have it enabled by default: Windows supports it, but it has to be enabled manually. It worked a treat for the SETI problems at that time.

I haven't looked at it again for Windows 10. It works in Windows 7, and is enabled on this machine - so it didn't prevent this morning's glitch. But it's worth knowing about.
ID: 95880 · Report as offensive
1 · 2 · 3 · Next

Message boards : Questions and problems : Stalled downloads

Copyright © 2021 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.