Message boards : Projects : News on Project Outages
Message board moderation
Previous · 1 . . . 30 · 31 · 32 · 33 · 34 · 35 · 36 . . . 67 · Next
Author | Message |
---|---|
Send message Joined: 5 Oct 06 Posts: 5129 |
Two tasks reported, four tasks allocated, four sets of downloads failed. You'd have thought they knew about that one by now :-( |
Send message Joined: 28 Jun 10 Posts: 2706 |
Got this from Andy. Hi Dave, Thanks. Engineering IT Support have partially restored networking to a number of machines, but a number of key machines still have no networking access following the switch work on Tuesday. I have submitted a ticket to them for the other machines and I will follow this up on Monday with them. Best regards, Andy |
Send message Joined: 5 Oct 06 Posts: 5129 |
I'm not very impressed by the Oxford University Engineering IT Support team. They will have scheduled this work for the summer vacation, when the undergraduate demand is low: but university postgrad and faculty research continues 52 weeks of the year. This is also a very busy time of year for university administration, dealing with applications from next year's intake of new students. Letting a planned infrastructure upgrade over-run by a week is bad management, to say the least. |
Send message Joined: 31 Dec 18 Posts: 296 |
I'm not very impressed by the Oxford University Engineering IT Support team. They will have scheduled this work for the summer vacation, when the undergraduate demand is low: but university postgrad and faculty research continues 52 weeks of the year. This is also a very busy time of year for university administration, dealing with applications from next year's intake of new students. And then not working the weekend to clear the problem - I know I’d never have got away with that. |
Send message Joined: 28 Jun 10 Posts: 2706 |
And then not working the weekend to clear the problem - I know I’d never have got away with that. Sadly, when I worked in the NHS they were as bad or worse about sorting out problems after, "upgrades." However having had a normal work day to sort things out and no signs of progress I am beginning to despair of them. |
Send message Joined: 31 Dec 18 Posts: 296 |
And then not working the weekend to clear the problem - I know I’d never have got away with that. When I was supporting system upgrades you worked until the system worked - either fix forward or pull the upgrade and fall back to the starting position. You did not break the system then go home. |
Send message Joined: 5 Oct 06 Posts: 5129 |
So, does anyone know whether the CPDN download servers have been re-connected to the internet yet? I'm on Andy Bowery's email distribution list, and I haven't seen anything yet - and I've completed upgrading my machines to Linux Mint v20.2 Memo to project staff: the project shouldn't be restarted after maintenance until all components are tested and working. |
Send message Joined: 25 Nov 05 Posts: 1654 |
It doesn't appear to be. And Andy is probably "in a mood" by now, so I'm staying well away from it. If Oxford IT hired external workers to do this, the air in the place has probably turned blue by now. :) |
Send message Joined: 28 Jun 10 Posts: 2706 |
I am enabling internet access if I have an upload or two ready to go. One is almost uploading at the moment. I am going to suspend it again when it has finished as no movement on the downloads. If it were possible to just suspend uploads or downloads I could leave internet access on and just check once a day to see whether the download server problem was fixed. |
Send message Joined: 5 Oct 06 Posts: 5129 |
CPDN needs to be aware that BOINC is designed to manage multiple projects in parallel, and that many of us use it that way. There was once a proposal by, I think, user 'Thyme Lawn' to allow/suspend transfers by project: he coded it for precisely this scenario, but it was rejected by the gatekeepers. For that reason, I can't follow your example: all my recent tasks have declared their download errors to be permanent and have reported their task status as 'download failed'. I've set 'no new tasks' until I receive positive confirmation that the network is operating properly again. |
Send message Joined: 28 Jun 10 Posts: 2706 |
CPDN needs to be aware that BOINC is designed to manage multiple projects in parallel, and that many of us use it that way. There was once a proposal by, I think, user 'Thyme Lawn' to allow/suspend transfers by project: he coded it for precisely this scenario, but it was rejected by the gatekeepers. MY downloads are now shifting - two 10MB atmos.gz files have downloaded. The slow speed is I think my bored band rather than the servers getting hammered though I guess that is probably happening as well. Edit: the trickle server isn't running again yet though. Edit2:Well the server status page says that at least. I will know in about ten minutes whether trickles are going through as well. One task has finished downloading so that side seems to have been fixed. Edit3: Does suspending the project stop the uploads/downloads? |
Send message Joined: 28 Jun 10 Posts: 2706 |
Trickle server still showing as down after last update to server status page. |
Send message Joined: 5 Oct 06 Posts: 5129 |
Edit3: Does suspending the project stop the uploads/downloads?I think not. |
Send message Joined: 25 Nov 05 Posts: 1654 |
I turned my net access back on a few hours ago, and the four that I had from before downloaded while I was sleeping. |
Send message Joined: 28 Jun 10 Posts: 2706 |
Edit3: Does suspending the project stop the uploads/downloads?I think not. A shame. that would be a simple solution. |
Send message Joined: 5 Oct 06 Posts: 5129 |
At 15:24 UTC on 12 Aug 2021 Andy Bowery wrote: All services have been restored now to climateprediction.net infrastructure. The Department of Engineering IT Support decided to roll back the changes they made to the networking. This has allowed us to restore all the CPDN services.Edit: Yes, I can confirm that all files for new tasks are being downloaded cleanly. |
Send message Joined: 28 Jun 10 Posts: 2706 |
But their having to roll back the network changes on top of the time taken over and above that scheduled has me joining your verdict on the IT staff at Oxford. |
Send message Joined: 5 Oct 06 Posts: 5129 |
Staff, or outside contractors? |
Send message Joined: 28 Jun 10 Posts: 2706 |
Staff, or outside contractors? I have no idea but the level of incompetence is the same. |
Send message Joined: 5 Oct 06 Posts: 5129 |
Andy reports that a new problem - tentatively identified as a hardware failure - has been observed on the CPDN 'dev' (test) servers. He has shut down "the project" to minimise data loss, and fears that this closure may last for several days. The main, production, CPDN server is also reporting 'shut down for maintenance'. I have asked Andy to clarify whether both versions of the project need to be shut down because of the single hardware failure, and am awaiting his reply. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.