Suggestion: Retry Connection countdown

Message boards : BOINC client : Suggestion: Retry Connection countdown
Message board moderation

To post messages, you must log in.

AuthorMessage
dcdc

Send message
Joined: 29 Aug 06
Posts: 82
United Kingdom
Message 11786 - Posted: 25 Jul 2007, 23:03:41 UTC

Hi all

i'm not sure if this has been discussed/considered before, but I've noticed that if a machine doesn't/can't connect for a while and builds up a queue of completed jobs then when it comes to upload them they don't all upload at once. The countdowns continue and the retry isn't initiated until the countdown is complete on a task by task basis. It would seem beneficial to me to retry all transfers once a connection has been established, thereby increasing the likelihood of the jobs being uploaded if the connection is intermittent, and also reducing the number of connections to the project servers.

I don't think it would have to be on a project by project basis either because if the connection problems are due to an intermittent internet connection then it's beneficial to retry all projects communications.
ID: 11786 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 11792 - Posted: 26 Jul 2007, 9:23:52 UTC - in response to Message 11786.  

i'm not sure if this has been discussed/considered before, but I've noticed that if a machine doesn't/can't connect for a while and builds up a queue of completed jobs then when it comes to upload them they don't all upload at once. The countdowns continue and the retry isn't initiated until the countdown is complete on a task by task basis. It would seem beneficial to me to retry all transfers once a connection has been established, thereby increasing the likelihood of the jobs being uploaded if the connection is intermittent, and also reducing the number of connections to the project servers.

I don't think it would have to be on a project by project basis either because if the connection problems are due to an intermittent internet connection then it's beneficial to retry all projects communications.


Your thinking has some merit. However, the behaviour you're suggesting would be beneficial to dial-up users but wouldn't be so beneficial to projects.

Here's an example: S@H has weekly scheduled downtimes during which uploads are often not possible. All the clients start back-off timers which are different for any results pending upload. When the project gets on-line again, clients randomly notice availability and gradually upload results. It can take several hours to get all results uploaded this way. Graduality ensures that load on servers remains under some control.

If BOINC implemented behaviour as you proposed, then any given client would try to upload all the results at the same time while time offset between project becoming on-line and start of client's upload is different (randomly spread) between all the clients. However, if a client has many results pending for upload, it'll try to upload all results after minimum back-off time. The more results pending, the lower minimum back-off time (statistically). Which means that project upload server will face huge load immediately after becoming on-line and this load will get lower exponentially. As S@H experience proves, this will mean that many upload connections (as many as 90%?) will be unsuccessful, clients will retry uploads again and again.

Both scenarios practically ensure that wast majority of results will get uploaded in maximum back-off time (I believe this is somewhere near to 4 hours).

The whole issue is really a non-issue for users with permanent internet connection and only matters for those with dial-up. I'm pretty sure those regularly force upload manually.

My experience with dial-up machines also goes like this: it's good thing to let know BOINC CC that internet connection is not available by setting 'Network connection: never'. While network connection is set to none, client won't try any connection at all (nicely preventing log from becoming full of error messages). It will try any pending connection immediately after 'network connection' becomes set to always (or automatic while criteria for enabled connection are met). That includes immediate start of uploading all pending results as well as connection to any project schedulers if needed.
Metod ...
ID: 11792 · Report as offensive
William Roeder
Avatar

Send message
Joined: 31 May 07
Posts: 42
Message 11797 - Posted: 26 Jul 2007, 16:57:33 UTC - in response to Message 11792.  

My experience with dial-up machines also goes like this: it's good thing to let know BOINC CC that internet connection is not available by setting 'Network connection: never'. While network connection is set to none, client won't try any connection at all (nicely preventing log from becoming full of error messages). It will try any pending connection immediately after 'network connection' becomes set to always (or automatic while criteria for enabled connection are met). That includes immediate start of uploading all pending results as well as connection to any project schedulers if needed.


Exactly what I do. You can also setup two shortcuts to run
c:\Program Files\BOINC\boinccmd.exe --set_network_mode auto
c:\Program Files\BOINC\boinccmd.exe --set_network_mode never
ID: 11797 · Report as offensive
dcdc

Send message
Joined: 29 Aug 06
Posts: 82
United Kingdom
Message 11862 - Posted: 29 Jul 2007, 10:42:19 UTC

I wasn't really thinking of dial-up users in particular - more about machines that aren't on that much which will finish tasks but then not upload them before being switched off.

I'd have thought retrying all connections from a given client (in serial as they do) would be better than them making the connection and sending one task, then disconnecting, and then repeating the same a later on... I understand the point about too many connections swamping the server but surely the server will just tell the client to wait if it's busy, in which case you've not lost out over the current functionality.
ID: 11862 · Report as offensive
Metod, S56RKO

Send message
Joined: 9 Sep 05
Posts: 128
Slovenia
Message 11867 - Posted: 29 Jul 2007, 14:12:09 UTC - in response to Message 11862.  
Last modified: 29 Jul 2007, 14:17:26 UTC

I wasn't really thinking of dial-up users in particular - more about machines that aren't on that much which will finish tasks but then not upload them before being switched off.


Well, once upon a time there was option that made BOINC CC report any uploaded results to project schedulers immediately after successful upload. This option basically made BOINC developers and several project administrators quite unhappy.

Basically it boils down to proper selection of projects to participate. If an user runs his/her machine only from time to time, then she has to choose projects that are less strict on deadlines. Or else pay attention and do some manual work on BOINC CC. If that is not acceptable for an user, she can well decide not to take part in such project and/or BOINC as a whole. BOINC developers and project developers have to think about their project as a whole and sometimes this is in direct confrontation with individual users' expectation.

Both have to live with it.


I'd have thought retrying all connections from a given client (in serial as they do) would be better than them making the connection and sending one task, then disconnecting, and then repeating the same a later on... I understand the point about too many connections swamping the server but surely the server will just tell the client to wait if it's busy, in which case you've not lost out over the current functionality.


To avoid re-inventing the wheel they used an already existing protocol for downloading work unit files and uploading result files. It is called HTTP. In it's original version it doesn't support any kind of persistent connections which means that new connection (with all the IP overhead) is needed for every file to transfer. There are number of HTTP proxy servers out there that still support only this kind of HTTP transport protocol. Newer version of HTTP defines persistent connection, but to be on safe side, one can not assume their availability.
Understanding this one can see that there's no difference between making next connection immediately after the previous has been finished and making next connection after random period of time.

Problem is that if upload server gets hit by too many requests, most of the times client side will not get enough feedback to notice that there are problems. Whenever S@H was in such problems, most of the time clients were not able to get through initial TCP handshake which takes place on OS level. If the connection went through this stage, then upload was mostly successful.
Accepting inbound file and sending error message don't differ with regard to TCP overhead so this would be by no means any solution in case of congestion.
This approach works with scheduler requests though: when client rings home to get more work and there is none, then scheduler can tell client to wait certain amount of time before another attempt. This works because connection was already made so the cost of sending a reply with back-off request is low compared to cost of new incoming connection too soon.
Metod ...
ID: 11867 · Report as offensive

Message boards : BOINC client : Suggestion: Retry Connection countdown

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.