News on Project Outages

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 62 · Next

AuthorMessage
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104934 - Posted: 4 Aug 2021, 10:04:09 UTC

And it seems the working day in Oxford still hasn't finished. Forums and server status not accessible and project servers may be down message in event log.


Andy tells me the work is still ongoing and they expect things to be restored sometime later today.
ID: 104934 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104952 - Posted: 5 Aug 2021, 10:00:08 UTC - in response to Message 104934.  

However:

Hi Dave,

We've been told that this work is still ongoing unfortunately and is going to be continued to be worked on into today.

Best regards,

Andy


This about 3/4 hour ago.
ID: 104952 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104971 - Posted: 6 Aug 2021, 16:24:50 UTC - in response to Message 104952.  

CPDN website is back up but getting

Fri 06 Aug 2021 17:21:54 BST | climateprediction.net | Project is temporarily shut down for maintenance

On project update. Unless someone is doing overtime, it will be Monday before everything stands a chance of returning to normal.
ID: 104971 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104973 - Posted: 6 Aug 2021, 17:25:16 UTC - in response to Message 104971.  

On project update. Unless someone is doing overtime, it will be Monday before everything stands a chance of returning to normal.


1 hour later, 9 completed tasks uploaded and 8 new ones now downloading.

Clearly someone is doing overtime.
ID: 104973 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 104975 - Posted: 6 Aug 2021, 19:27:52 UTC - in response to Message 104973.  

Two tasks reported, four tasks allocated, four sets of downloads failed.

You'd have thought they knew about that one by now :-(
ID: 104975 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104978 - Posted: 7 Aug 2021, 8:29:58 UTC - in response to Message 104975.  

Got this from Andy.

Hi Dave,

Thanks. Engineering IT Support have partially restored networking to a number of machines, but a number of key machines still have no networking access following the switch work on Tuesday. I have submitted a ticket to them for the other machines and I will follow this up on Monday with them.

Best regards,

Andy
ID: 104978 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 104979 - Posted: 7 Aug 2021, 8:46:44 UTC - in response to Message 104978.  

I'm not very impressed by the Oxford University Engineering IT Support team. They will have scheduled this work for the summer vacation, when the undergraduate demand is low: but university postgrad and faculty research continues 52 weeks of the year. This is also a very busy time of year for university administration, dealing with applications from next year's intake of new students.

Letting a planned infrastructure upgrade over-run by a week is bad management, to say the least.
ID: 104979 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 284
United Kingdom
Message 104982 - Posted: 7 Aug 2021, 15:04:01 UTC - in response to Message 104979.  

I'm not very impressed by the Oxford University Engineering IT Support team. They will have scheduled this work for the summer vacation, when the undergraduate demand is low: but university postgrad and faculty research continues 52 weeks of the year. This is also a very busy time of year for university administration, dealing with applications from next year's intake of new students.

Letting a planned infrastructure upgrade over-run by a week is bad management, to say the least.


And then not working the weekend to clear the problem - I know I’d never have got away with that.
ID: 104982 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 104993 - Posted: 9 Aug 2021, 19:38:51 UTC - in response to Message 104982.  

And then not working the weekend to clear the problem - I know I’d never have got away with that.


Sadly, when I worked in the NHS they were as bad or worse about sorting out problems after, "upgrades." However having had a normal work day to sort things out and no signs of progress I am beginning to despair of them.
ID: 104993 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 284
United Kingdom
Message 104997 - Posted: 10 Aug 2021, 7:51:49 UTC - in response to Message 104993.  

And then not working the weekend to clear the problem - I know I’d never have got away with that.


Sadly, when I worked in the NHS they were as bad or worse about sorting out problems after, "upgrades." However having had a normal work day to sort things out and no signs of progress I am beginning to despair of them.


When I was supporting system upgrades you worked until the system worked - either fix forward or pull the upgrade and fall back to the starting position. You did not break the system then go home.
ID: 104997 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 104998 - Posted: 10 Aug 2021, 8:23:06 UTC

So, does anyone know whether the CPDN download servers have been re-connected to the internet yet? I'm on Andy Bowery's email distribution list, and I haven't seen anything yet - and I've completed upgrading my machines to Linux Mint v20.2

Memo to project staff: the project shouldn't be restarted after maintenance until all components are tested and working.
ID: 104998 · Report as offensive     Reply Quote
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 105000 - Posted: 10 Aug 2021, 9:50:24 UTC - in response to Message 104998.  

It doesn't appear to be.
And Andy is probably "in a mood" by now, so I'm staying well away from it.

If Oxford IT hired external workers to do this, the air in the place has probably turned blue by now. :)
ID: 105000 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 105001 - Posted: 10 Aug 2021, 10:08:01 UTC - in response to Message 104998.  

I am enabling internet access if I have an upload or two ready to go. One is almost uploading at the moment. I am going to suspend it again when it has finished as no movement on the downloads.

If it were possible to just suspend uploads or downloads I could leave internet access on and just check once a day to see whether the download server problem was fixed.
ID: 105001 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 105002 - Posted: 10 Aug 2021, 10:29:35 UTC - in response to Message 105001.  

CPDN needs to be aware that BOINC is designed to manage multiple projects in parallel, and that many of us use it that way. There was once a proposal by, I think, user 'Thyme Lawn' to allow/suspend transfers by project: he coded it for precisely this scenario, but it was rejected by the gatekeepers.

For that reason, I can't follow your example: all my recent tasks have declared their download errors to be permanent and have reported their task status as 'download failed'. I've set 'no new tasks' until I receive positive confirmation that the network is operating properly again.
ID: 105002 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 105003 - Posted: 10 Aug 2021, 10:35:19 UTC - in response to Message 105002.  
Last modified: 10 Aug 2021, 10:59:26 UTC

CPDN needs to be aware that BOINC is designed to manage multiple projects in parallel, and that many of us use it that way. There was once a proposal by, I think, user 'Thyme Lawn' to allow/suspend transfers by project: he coded it for precisely this scenario, but it was rejected by the gatekeepers.

For that reason, I can't follow your example: all my recent tasks have declared their download errors to be permanent and have reported their task status as 'download failed'. I've set 'no new tasks' until I receive positive confirmation that the network is operating properly again.


MY downloads are now shifting - two 10MB atmos.gz files have downloaded. The slow speed is I think my bored band rather than the servers getting hammered though I guess that is probably happening as well.

Edit: the trickle server isn't running again yet though.

Edit2:Well the server status page says that at least. I will know in about ten minutes whether trickles are going through as well. One task has finished downloading so that side seems to have been fixed.

Edit3: Does suspending the project stop the uploads/downloads?
ID: 105003 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 105005 - Posted: 10 Aug 2021, 12:08:10 UTC

Trickle server still showing as down after last update to server status page.
ID: 105005 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 105008 - Posted: 10 Aug 2021, 13:45:42 UTC - in response to Message 105003.  
Last modified: 10 Aug 2021, 13:46:09 UTC

Edit3: Does suspending the project stop the uploads/downloads?
I think not.
ID: 105008 · Report as offensive     Reply Quote
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 105009 - Posted: 10 Aug 2021, 15:37:59 UTC

I turned my net access back on a few hours ago, and the four that I had from before downloaded while I was sleeping.
ID: 105009 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 105010 - Posted: 10 Aug 2021, 17:26:48 UTC - in response to Message 105008.  

Edit3: Does suspending the project stop the uploads/downloads?
I think not.


A shame. that would be a simple solution.
ID: 105010 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 105048 - Posted: 12 Aug 2021, 16:48:32 UTC
Last modified: 12 Aug 2021, 17:22:14 UTC

At 15:24 UTC on 12 Aug 2021 Andy Bowery wrote:
All services have been restored now to climateprediction.net infrastructure. The Department of Engineering IT Support decided to roll back the changes they made to the networking. This has allowed us to restore all the CPDN services.
Edit: Yes, I can confirm that all files for new tasks are being downloaded cleanly.
ID: 105048 · Report as offensive     Reply Quote
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 62 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.