Ticket #139 (reopened Enhancement)

Opened 3 years ago

Last modified 3 months ago

Project-by-project network disable (similar to communications deferred)

Reported by: MikeMarsUK Assigned to: davea
Priority: Major Milestone: Undetermined
Component: Manager Version:
Keywords: Cc:

Description

Hi,

One thing which would be useful would be a way for the user to prevent communications to one particular project at a time (for example, when there are server troubles). If the existing 'communications deferred' facility could be extended to allow the user to defer communications themselves for a period of time, that would be sufficient.

Perhaps the facility should take into account the deadlines of any work units in progress.

This request is prompted by the recent server-out-of-space problems at CPDN which are causing problems for multiproject users on dial up networks (there have also been other ideas suggested to handle the same problem).

-Cheers,

Mike

Attachments

trac139_trunk.patch (47.6 kB) - added by Thyme Lawn on 08/14/09 10:43:06.
Patch for boinc_trunk (revision 18844)
trac139_6_6a.patch (48.3 kB) - added by Thyme Lawn on 08/14/09 10:44:44.
Patch for boinc_core_release_6_6a (revision 18844)
trac139_6_8.patch (48.1 kB) - added by Thyme Lawn on 08/14/09 10:57:03.
Patch for boinc_core_release_6_8 (revision 18844)

Change History

06/08/07 08:02:16 changed by KSMarksPsych

  • owner changed from davea to romw.
  • priority changed from Undetermined to Major.
  • component changed from Client - Scheduler Policy to Client - Manager.

This has also been suggested on a per file basis.

06/08/07 17:12:38 changed by mo.v

I agree with Mike. When the cpdn servers could not accept intermediate or final file uploads for a recent prolonged period, some multi-project crunchers who followed our advice to suspend boinc network activity while the problem lasted, ran out of work on other projects. I think this problem arose whether members had dialup or broadband.

Mo

10/09/07 10:24:01 changed by romw

  • owner changed from romw to davea.

06/03/09 14:23:15 changed by davea

  • status changed from new to closed.
  • resolution set to wontfix.

probably not worth the trouble

06/04/09 03:43:35 changed by Richard Haselgrove

Ironic that this was closed on the same day that the CPDN administrator wrote:

"cpdn-upload1.comlab is shown as down at the moment. The server is running but it's shut down apache as the data partition is full. I've got nowhere else to put the data at the moment so this may well cause a problem for hadam3p uploads until I can obtain more hardware."

06/05/09 11:04:26 changed by mo.v

  • status changed from closed to reopened.
  • resolution deleted.

I think such a facility would be worth the trouble and not just for members of CPDN. The disk of the CPDN server that takes file uploads has filled up more than once and I would be surprised if this has never occurred on any other project. File upload servers of course also occasionally crash or fail.

When this has happened we have advised members to either suspend BOINC network activity altogether or suspend tasks before they complete. Not all members read this advice and some do not realise in time. They can find themselves with partially-uploaded files which I consider to be in a fragile state. Such files can be the end result of long periods of processing.

If members in this position urgently need to fetch work to keep the computer busy, the only possibilities at the moment are to either suspend network activity altogether letting at least part of the computer's processing capability run idle, or to allow BOINC network activity and put partially-uploaded files at greater risk.

Now that CUDA devices and multi-core computers are increasingly common, requiring large amounts of work to keep all cores busy, I consider MikeMarsUK's proposal even more desirable than it was two years ago.

I am therefore taking the liberty of reopening this ticket in the hope that Mike's proposal will be reconsidered.

06/05/09 11:36:22 changed by davea

I'm not sure what you mean by "fragile state". BOINC has a mechanism for backing off and retrying file transfers. If this mechanism doesn't work we need to fix it. We can't rely on user actions to provide reliability.

(follow-up: ↓ 12 ) 06/06/09 06:52:13 changed by mo.v

I consider files waiting for transfer, whether already partially uploaded or not, to be in a fragile state for several reasons:

* the moment a transfer is attempted files are on a 14-day countdown to abandonment by BOINC. An extension of this period was requested on one of the BOINC mailing lists but did not meet with universal approval, it being thought that two weeks is plenty of time to obtain funding for a new server, order it, take delivery and install it

* many users with files already waiting for transfer give priority to fetching new work and leave network activity enabled. This causes multiple failed upload attempts of the untransferable files. Every failed attempt increases the likelihood of file corruption

* the greatest risk to waiting untransferable files is probably user action. Because the only apparent solution within the BOINC Manager Transfers tab is the Retry button, users may attempt this repeatedly. When this does not work users may resort to increasingly desperate attempts, for example repeatedly disallowing and reallowing BOINC network activity. (I have seen this action cause BOINC to abandon a file.)

It would therefore be helpful if users had a button within the Transfers tab to suspend tranfers either of selected files or to a specific project. It would allow users to take a safe precaution, in many cases avoiding all the risks I have outlined. Some users, having taken safe action, would be spared some worry, and I believe that the proportion of successfully uploaded results would increase.

Milo, the CPDN programmer who usually looks after the servers, said yesterday after reading this ticket 'It sounds like it would be very useful, particularly now.'

06/18/09 09:27:11 changed by Richard Haselgrove

Users at SETI have now also realised how useful this would be, during their current server outage:

http://setiathome.berkeley.edu/forum_thread.php?id=54188&nowrap=true#908786

08/13/09 15:50:40 changed by Thyme Lawn

I have implemented the requested functionality, tested by a number of users over the past 2 months.

The changes allow networking to be suspended and resumed for selected projects, adding a new "Suspend network"/"Resume network" button to BOINC Manager's Projects tab.

When project networking is suspended any in progress uploads will have their timers reset and upload will not be restarted until project networking is resumed. No scheduler requests will be made but any pending downloads for the project will be completed. The project's status will be displayed as "Network activity suspended by user".

If a network suspended project generates a trickle-up this will be shown in the project's status message as "Network activity suspended by user, Trickle upload pending".

A scheduler request can be forced at any time by clicking the Update button. That will send any pending trickle-up messages and (if required) request new work for the project. If new tasks are allocated any required downloads will be made automatically without the need to enable project networking.

The status message on the Tasks tab for completed tasks which haven't been uploaded will be "Uploading, project networking suspended".

The status message on the Transfers tab for blocked uploads will be "Upload pending, project networking suspended".

When project networking is resumed any blocked uploads will be started.

I have patches (at revision 18840) available for boinc_core_release_6_6a, boinc_core_release_6_8 and boinc_trunk but the attachment option seems to be disabled at the moment.

08/13/09 16:42:24 changed by mo.v

Thank you, Thyme, and also to your testers.

I assume that for the time being this is only available in your own private build.

08/14/09 10:43:06 changed by Thyme Lawn

  • attachment trac139_trunk.patch added.

Patch for boinc_trunk (revision 18844)

08/14/09 10:44:44 changed by Thyme Lawn

  • attachment trac139_6_6a.patch added.

Patch for boinc_core_release_6_6a (revision 18844)

08/14/09 10:57:03 changed by Thyme Lawn

  • attachment trac139_6_8.patch added.

Patch for boinc_core_release_6_8 (revision 18844)

(in reply to: ↑ 8 ) 08/14/09 11:14:12 changed by Nicolas

Replying to mo.v:

* many users with files already waiting for transfer give priority to fetching new work and leave network activity enabled. This causes multiple failed upload attempts of the untransferable files. Every failed attempt increases the likelihood of file corruption

How could a file get corrupted by just trying to upload it too many times?

* the greatest risk to waiting untransferable files is probably user action. Because the only apparent solution within the BOINC Manager Transfers tab is the Retry button, users may attempt this repeatedly. When this does not work users may resort to increasingly desperate attempts, for example repeatedly disallowing and reallowing BOINC network activity. (I have seen this action cause BOINC to abandon a file.)

BOINC abandons files if they spend more than 14 days trying to upload, but there is no limit for the number of retries (I thought there was, but checked code and there isn't).


If this page is incomplete or incorrect, please edit it or add it to the wiki to-do list. To do this, you must be logged in; click Login or Register above.