News on project outages.

log in

Advanced search

Message boards : Projects : News on project outages.

1 · 2 · 3 · 4 . . . 15 · Next
Author Message
Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20269 - Posted: 15 Sep 2008, 22:11:18 UTC

Continuing the news outage thread in a brand new coating.

Last news was posted by mo.v

CPDN main project

One of the upload servers has been down but it's now up and running again.

An extra server has been ordered for CPDN which is good news.


Please for clarity don't add your signature when you post to this thread.

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 20278 - Posted: 16 Sep 2008, 6:32:05 UTC
Last modified: 16 Sep 2008, 6:53:57 UTC

I'm sorry also to be starting the new thread.

CPDN, Beta, BBC and SAP

The forums and websites of all CPDN projects lost communication with their regional network about 30 minutes ago. My Beta models have just been unable to upload trickles so I expect the situation is the same on all projects. All Oxford University servers on the ja.net network are probably affected.

Edit: The network is now fully up again - Oxford Uni including all our projects is no longer incommunicado.

Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20314 - Posted: 16 Sep 2008, 22:46:01 UTC

The Einstein database server crashed; David H. is busy bringing the project back online now.

Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20321 - Posted: 17 Sep 2008, 0:02:52 UTC

Einstein's servers are back up.

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 20709 - Posted: 8 Oct 2008, 16:00:07 UTC

CPDN main project and CPDN Beta

Milo says

We've just received a large server that will give us another 10TB of space, and will try to set it up this week. This means that there may be outages on uploader.oerc and cpdn-upload1.comlab that may prevent uploading.

This is good news for CPDN and the programmers.

It would be a good idea for members to keep BOINC network activity suspended most of the time this week and only enable it when you are present and can see whether your trickles and files do upload. We realise that many multi-project crunchers cannot do this.

You can check CPDN server status here and Beta server status here.

Ralph
Send message
Joined: 30 Sep 05
Posts: 50
Message 20738 - Posted: 10 Oct 2008, 13:12:55 UTC

Cosmology's quite happy to start a download, then it gives interminable HTTP errors.
____________

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 20896 - Posted: 20 Oct 2008, 11:23:25 UTC

CPDN main project

The CPDN database including the CPDN-BOINC forum has been down this morning for maintenance. Trickles and file uploads should not have been affected but the awarding of credits and their export to the stats sites may be delayed.

Edit: The BBC database and forum also appear to be down.

Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20904 - Posted: 20 Oct 2008, 16:56:28 UTC

It looks like malariacontrol.net is off line. Anyone in the know who has information?

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 20908 - Posted: 20 Oct 2008, 22:53:38 UTC

CPDN main project and BBC

Update to my post two above this:

BBC and CPDN trickles and uploads are failing. Milo has had to turn off all BOINC services for both projects so the database isn't updated while he makes a copy of it. If possible please suspend BOINC network activity until the server status page is available and shows the upload server programs working.

Chap
Avatar
Send message
Joined: 7 Aug 08
Posts: 4
Message 20916 - Posted: 21 Oct 2008, 18:31:28 UTC - in response to Message 20904.

Malariacontrol.net seems to be back on-line again after a couple of days away. Nothing about the outage on their home page so far tho...

Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20917 - Posted: 21 Oct 2008, 18:38:41 UTC - in response to Message 20916.

From Nick at Malariacontrol.net:

It seems that the web-server was very unreliable over the last few days. As a consequence, the website and also the job scheduler here on the server could be reached only intermittently.

We have restarted the server and currently it seems to work fine. We have increased the grace period for overdue results to 3 three days to minimize the impact of this problem on your credit accounts. In addition, we have set up an additional monitoring service that will help us respond faster in case this should happen again.

We apologize for the inconvenience!
Nick

Chap
Avatar
Send message
Joined: 7 Aug 08
Posts: 4
Message 20918 - Posted: 21 Oct 2008, 19:24:34 UTC - in response to Message 20917.
Last modified: 21 Oct 2008, 19:25:15 UTC

From Nick at Malariacontrol.net:
It seems that the web-server was very unreliable over the last few days. As a consequence, the website and also the job scheduler here on the server could be reached only intermittently.

We have restarted the server and currently it seems to work fine. We have increased the grace period for overdue results to 3 three days to minimize the impact of this problem on your credit accounts. In addition, we have set up an additional monitoring service that will help us respond faster in case this should happen again.

We apologize for the inconvenience!
Nick


Thanks for the info Mr Ageless, sir.

"grace period"? I hadn't realised there was a grace period! :-)

Profile Ageless
Volunteer moderator
Avatar
Send message
Joined: 29 Aug 05
Posts: 8743
Message 20919 - Posted: 21 Oct 2008, 19:30:44 UTC - in response to Message 20918.
Last modified: 21 Oct 2008, 19:32:03 UTC

"grace period"? I hadn't realised there was a grace period! :-)

I like it that it's "3 three days" .. does that mean it's 9 or 33 days? ;-)

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 20949 - Posted: 24 Oct 2008, 13:41:49 UTC

CPDN main project and perhaps also BBC

Milo has posted on the independent forum:

The main database has been back on-line since yesterday; thanks for your patience during this outage.
We are still undergoing some database maintenance and cleanup and so there may well be periods in the near future when it goes down again. We will attempt to give as much warning as possible when this is likely to happen.

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 21149 - Posted: 6 Nov 2008, 3:20:43 UTC

CPDN main project

A member trying to upload a model's zip file has these BOINC messages:

|climateprediction.net|[error] Error on file upload: can't open file /home/cpdn/boinc/hadsm3fub_k2xu_005968597_1_3.zip: No space left on device
|climateprediction.net|Temporarily failed upload of hadsm3fub_k2xu_005968597_1_3.zip: transient upload error

The 'device' doesn't mean your own computer's disk. A server disk in Oxford has filled up. Members who see this message should if possible please suspend BOINC network activity until Milo can fix the server problem. Trickles may also be affected.

Because there are several CPDN upload servers, only some models will be affected.

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 21164 - Posted: 7 Nov 2008, 23:16:54 UTC
Last modified: 7 Nov 2008, 23:17:30 UTC

That server problem has now been fixed.

But please see the CPDN website index page for details of a planned server maintenance outage on Monday 10 Nov. Members who can do so should suspend BOINC network activity before the outage begins. Credits will probably not be exported to the stats sites until Tuesday or Wednesday.

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 21207 - Posted: 12 Nov 2008, 12:07:28 UTC

CPDN main project and BBC

Milo says 'The database and BOINC daemons are now back on line - thanks again for everyone's patience.'.

Heidi1
Avatar
Send message
Joined: 30 Mar 08
Posts: 18
Message 21238 - Posted: 14 Nov 2008, 6:06:46 UTC

I'm wondering if the Rosetta servers are down. My CPU is communicating with everyone else but them.
____________

Heidi1
Avatar
Send message
Joined: 30 Mar 08
Posts: 18
Message 21246 - Posted: 14 Nov 2008, 17:48:31 UTC

It turns out their file servers crashed the night of Nov 13-14. It will still have intermittant problems, but it is up and running right now.
____________

mo.v
Avatar
Send message
Joined: 13 Aug 06
Posts: 775
Message 21308 - Posted: 17 Nov 2008, 22:45:36 UTC

One of the CPDN main project upload servers has been intermittently refusing to accept trickle and file uploads and Milo hasn't yet discovered why. Due to this server's location he only has access to restart it during working hours.

If your model's uploads are affected, you can suspend BOINC network activity and from time to time check CPDN server status here.

1 · 2 · 3 · 4 . . . 15 · Next

Message boards : Projects : News on project outages.


BOINC home page · Log in · Create account

Copyright © 2014 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.