Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17356 - Posted: 16 May 2008, 0:11:53 UTC


What is wrong with the SAP server?

If you don't want to know the technical details, you only need to read the green section below.

Milo says
Apologies for the SAP problems.
What is going on is as follows. The server software needs upgrading as well as the whole OS re-installing, new disks adding and the database fixing. Some of this we don't have time to do and others we can't due to the location of the server. So, as a temporary measure I changed the server so that requests would be re-directed to the main site instead, thus avoiding some of the broken bits.
This was supposed to not affect BOINC, but has not worked as planned. I can quite quickly restore the server to its previous state which will fix BOINC, but I can't ssh in to Physics right now. When I do, and revert it to its previous state, it will need some more updating and so it will end up going up and down again as before.
With luck I can get this done tomorrow morning.


The SAP server is located in a different department at Oxford University, which is why Milo hasn't got full and normal access to it. The problem is exacerbated by a recent serious Linux security problem/vulnerability affecting computers with Debian and Ubuntu. Most of the Physics department computers use Ubuntu. As a result of this, the Physics department has further restricted access to its computers and servers, including the SAP server, until they are sure the security problem is fixed.

In addition, the Physics Department only allows Milo and Tolu to access the SAP server remotely via a single CPDN computer, and this computer isn't behaving very well at the moment.

We will try to keep members informed about what's happening, but of course Milo can't constantly keep in touch with us while he's fixing problems.

If you have a SAP HADAM model:

* Don't detach from the CPDN or SAP projects
* Don't reset the ClimatePrediction or SAP projects
* Don't abort any SAP HADAM models
* Do suspend HADAM models in the Tasks tab of BOINC manager
* If you are already receiving unpleasant BOINC manager messages about your HADAM model, suspend BOINC network activity if possible.


ID: 17356 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17359 - Posted: 16 May 2008, 11:16:17 UTC

SAP

Milo has fixed the immediate problem of the SAP redirect to CPDN that didn't work as planned and produced the dreadful BOINC manager messages quoted two posts above this. If you have messages like that, please ignore them and don't do what they suggest!

The SAP website and forum are now running but the SAP server status page looks as if the SAP server won't accept trickle and zip file uploads at the moment.

ID: 17359 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17366 - Posted: 17 May 2008, 15:20:12 UTC

The problem of the SAP server redirect to CPDN doesn't appear to have been really solved. Thyme Lawn has posted a fix on the SAP forum here .
ID: 17366 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17368 - Posted: 18 May 2008, 16:09:57 UTC

Thyme Lawn has now modified those instructions. The link to them in the above post is still correct.
ID: 17368 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17369 - Posted: 18 May 2008, 21:22:43 UTC

SAP

This afternoon Milo managed to get into the Physics department. He has cancelled the SAP server redirect to the CPDN server and upgraded the server software. The SAP server is now completely up and running, though Milo can't guarantee its future reliability.

It should now be possible to run SAP models again and allow them to send trickles and uploads to the server.
ID: 17369 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15554
Netherlands
Message 17370 - Posted: 18 May 2008, 21:23:27 UTC

Hydrogen@Home has problems.

Jack Shultz wrote:
So I tried to restore the snapshot. Turns out that snapshot had a corruption. Good news is I got the service provider to recover the project directory and post it on an FTP so I can rebuild and restore everything like new. I have a database backup. I expect to have it back up in the next two days.

Jack
ID: 17370 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15554
Netherlands
Message 17448 - Posted: 23 May 2008, 15:56:18 UTC

You wonder what happened to Hydrogen@Home?

Jack Shultz wrote:
Hi All,

I just made a snapshot. After which I deleted something unintentionally, I just restored the snapshot, when I boot, none of the services are initiating. No apache, ssh, no ping. I submitted a ticket for this provider to look at it. Hopefully I don't have to rebuild again. If that's the case, I'm going to start looking at new providers. I can't run the project if the snapshots are unreliable.

Jack
ID: 17448 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 17528 - Posted: 29 May 2008, 8:06:27 UTC

cpdn & Seasonal Attribution Project

Some more work has been done on the servers, which should fix the last problems with trickles, on all sites.

ID: 17528 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17594 - Posted: 1 Jun 2008, 12:37:58 UTC

CPDN main project

CPDN server status currently appears normal

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/server_status.php

but model results pages are in many cases hanging and refusing to display. Milo will look at the database on the server tomorrow to see whether there's an overload.
ID: 17594 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 17598 - Posted: 1 Jun 2008, 17:22:24 UTC

CPDN main project

The CPDN server (climateapps2.oucs.ox.ac.uk) has been unstable since about 1000 UTC today. Milo did sort out the initial problem by restarting the HTTP server, but most connections have been failing since about 1200 UTC. This is likely to be the case until at least Monday morning.

The current problems will affect all scheduler requests (i.e. trickles, reporting completed results and work requests). Uploads will continue to work as they are sent to different servers.

The other CPDN projects (BBC Climate Change Experiment, Seasonal Attribution Poject and CPDN Beta) are unaffected.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 17598 · Report as offensive
The_Shadow_Knows

Send message
Joined: 4 Jun 08
Posts: 1
Canada
Message 17662 - Posted: 4 Jun 2008, 22:26:18 UTC

What happened to The Lattice Project ?
ID: 17662 · Report as offensive
mewbysea

Send message
Joined: 9 Jul 06
Posts: 6
United States
Message 17735 - Posted: 8 Jun 2008, 15:21:47 UTC - in response to Message 17662.  

What happened to The Lattice Project ?


June 6, 2008 Unexpected Outage
Some days ago we experienced an HVAC failure which caused our main file server to overheat. We have since been working on fixing the problems that resulted, and are still in the process of doing so, but we hope to be back online and fully functional shortly. As for the outstanding work in the system, we'll just have to see how well the system recovers. Sorry for the extended absence!


The servers are back up, but there is no work available this weekend. Perhaps by mid-day on 9 June.
ID: 17735 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17764 - Posted: 10 Jun 2008, 18:16:00 UTC

CPDN main project

Tolu worked on the database this morning to try to speed it up. He disabled the database server over lunchtime so the CPDN-BOINC forum was down for a while as well as access to our account and model pages. Everything appears to be running normally now.
ID: 17764 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17790 - Posted: 12 Jun 2008, 1:58:21 UTC

CPDN Beta2 project only

Here's the Beta2 server status page showing one of the server programs down:

http://cpdnbeta.oerc.ox.ac.uk/server_status.php

The best idea is if possible to suspend network activity to avoid multiple failed trickle attempts. Zip files will probably upload successfully but not the accompanying trickle.
ID: 17790 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17821 - Posted: 12 Jun 2008, 19:54:55 UTC
Last modified: 12 Jun 2008, 19:55:33 UTC

The CPDN Beta2 server programs have all now been up and running for several hours.
ID: 17821 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15554
Netherlands
Message 17861 - Posted: 13 Jun 2008, 17:49:19 UTC

Seeing all the errors at Einstein@Home, I think they're now technically off line. No news from anyone on what is happening though.
ID: 17861 · Report as offensive
Nothing But Idle Time

Send message
Joined: 7 Nov 05
Posts: 17
United States
Message 17869 - Posted: 14 Jun 2008, 12:50:59 UTC - in response to Message 17861.  

Seeing all the errors at Einstein@Home, I think they're now technically off line. No news from anyone on what is happening though.

Einstein is working fine for me, didn't notice any complaints on the msg boards; what are the "errors" to which you refer?
ID: 17869 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 17870 - Posted: 14 Jun 2008, 15:11:43 UTC - in response to Message 17869.  
Last modified: 14 Jun 2008, 15:18:50 UTC

Einstein is working fine for me, didn't notice any complaints on the msg boards; what are the "errors" to which you refer?

It's working for me too (again). Einstein has been offline for some time yesterday. I was wondering about the lack of information about the outage.
ID: 17870 · Report as offensive
mewbysea

Send message
Joined: 9 Jul 06
Posts: 6
United States
Message 17981 - Posted: 21 Jun 2008, 11:21:10 UTC

Looks like the Lattice website is off-line again since about 19 June. The servers have been out of work all week, but are still responding to BOINC mgr queries -- to say the project is down for maintenance.
ID: 17981 · Report as offensive
Ralph

Send message
Joined: 30 Sep 05
Posts: 50
Message 18087 - Posted: 28 Jun 2008, 22:20:57 UTC

Milkyway seems to be unaccessible.
I tried this morning and it connected and gave the message 'the servers may be down.'

Now I get a 'connect error'.

My dialup won't go to the website.
ID: 18087 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.