News on Project Outages

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 26 · Next

AuthorMessage
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1357
Australia
Message 65759 - Posted: 3 Dec 2015, 20:04:49 UTC - in response to Message 65738.  

There's 2 possibilities:
1) Software failure, and nobody's noticed yet.
2) This is the long anticipated/dreaded hardware failure, and the "maintenance" message is just the standard fall back message. And again, possibly no one has noticed yet, to change it to a "bad news" message.

I'm guessing the latter.
ID: 65759 · Report as offensive     Reply Quote
Christopher Taylor

Send message
Joined: 3 Dec 15
Posts: 2
United States
Message 65760 - Posted: 3 Dec 2015, 20:30:28 UTC - in response to Message 65738.  

QCN has been down for quite a few days now... Any updates ?
ID: 65760 · Report as offensive     Reply Quote
Christopher Taylor

Send message
Joined: 3 Dec 15
Posts: 2
United States
Message 65761 - Posted: 3 Dec 2015, 20:31:38 UTC - in response to Message 65760.  

It also seems QCN no longer accepts my e-mail address when I try to log into: http://qcn.emsc-csem.org/sensor/login_form.php

Perhaps related to the outage?
ID: 65761 · Report as offensive     Reply Quote
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1357
Australia
Message 65807 - Posted: 4 Dec 2015, 18:41:17 UTC

QCN is back up.
It was just a long maintenance break after all.

However, NO credits for a while.

There's a news item on the message boards.
ID: 65807 · Report as offensive     Reply Quote
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1357
Australia
Message 65818 - Posted: 5 Dec 2015, 4:56:44 UTC

Message from Carl regarding QCN:
It should be up now - I took it down to do one final update for BOINC server software, and created a million workunits which should last two years if the servers stay up.
The project is now on "autopilot" as there is no funding or personnel left. It's possible CalTech will apply for funding & hire staff in the future.
I'm sorry about the credits but since BOINC has no automatic function for 'non-compute intensive" (NCI) applications such as QCN - running the credit script means that the entire result & workunit tables (which are huge) have to be kept; and with our old hardware there's no way the project can even go a few months without these huge tables pruned and archived.
ID: 65818 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 559
United States
Message 65853 - Posted: 7 Dec 2015, 7:21:27 UTC

Have not been able to get BoincStats.com to load for around 6 hours now...

When attempting to load the web page the CloudFlare servers replies with:
Error 522 Ray ID: 250e771952982647 • 2015-12-07 07:20:15 UTC
Connection timed out

If you're the owner of this website:
Contact your hosting provider letting them know your web server is not completing requests. An Error 522 means that the request was able to connect to your web server, but that the request didn't finish. The most likely cause is that something on your server is hogging resources.


Confirmed it's down for the count: It's not just you! http://boincstats.com looks down from here
ID: 65853 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 559
United States
Message 65858 - Posted: 7 Dec 2015, 15:15:40 UTC - in response to Message 65853.  

BoincStats.com was down due to a power failure and is back up...
News Item at BoincStats from Willy:
There was a power failure at the DC in one of the two mains power supplies and without power the servers don't do so well.

We do not have a Automatic transfer switch (ATS) to switch between the two power supplies so I had to drive up here to get things going again.

We had about a twelve hour downtime.

Addendum: I ordered an ATS to prevent this problem in the future.


and this addendum:
There will be another downtime in a couple of days when the ATS is placed.
ID: 65858 · Report as offensive     Reply Quote
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 101
United Kingdom
Message 65879 - Posted: 8 Dec 2015, 14:38:11 UTC
Last modified: 8 Dec 2015, 14:38:38 UTC

Announcement by the CPDN project team:

We will be taking the project offline tomorrow (Wednesday 9th December) from 10am (UK time) in order to take a snapshot of the database. This is part of the process of the re-configuration of a slave database machine. Once this snapshot process has completed we will bring the project back online again, we anticipate that this process will take a minimum of 24 hours to complete. We apologise in advance for any inconvenience.
ID: 65879 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14159
Netherlands
Message 65912 - Posted: 9 Dec 2015, 15:39:44 UTC

Outage due to primary disk failure
Due to the primary disk finally failing completely there was about 8h of downtime for the project today. For some reason the disk failure brouht the server down even though we are running from the backup disk since July.

The failed disk has been replaced now and the project is slowly catching up with work distribution. Apologies for the outage!

https://moowrap.net/
ID: 65912 · Report as offensive     Reply Quote
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 101
United Kingdom
Message 65947 - Posted: 10 Dec 2015, 15:11:41 UTC - in response to Message 65879.  

Announcement by the CPDN project team:

We will be taking the project offline tomorrow (Wednesday 9th December) from 10am (UK time) in order to take a snapshot of the database. This is part of the process of the re-configuration of a slave database machine. Once this snapshot process has completed we will bring the project back online again, we anticipate that this process will take a minimum of 24 hours to complete. We apologise in advance for any inconvenience.

The planned work has been completed and CPDN is now back online.
ID: 65947 · Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 65952 - Posted: 10 Dec 2015, 17:33:01 UTC

The Collatz database crash sequence seems to be on the order of -- expect it to happen and plan for it with available alternative projects.

The crash happens when MySQL goes unresponsive. There is no solid way to anticipate when that will happen and thus no means to pre-empt an unplanned crash via a planned shutdows (or scripted shut down and restart).

The database crash happens after the server has been running for somewhere between 20 and 48 hours.

When it crashes, the recovery cycle takes between 4 hours and 24 hours, depending on a number of variables.

Until the length of the work units is significantly increased (something that is being worked on)< the workload on the server (one of the apparent variables regarding the semi-regular crashes) will continue to result in the current frequency of the crashes.

The hope is that with a shift to the longer work units, the server load will be abated and the frequency of crashes reduced from the current 2 to 3 times a week to less than once a week.
ID: 65952 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14159
Netherlands
Message 65975 - Posted: 11 Dec 2015, 15:37:16 UTC

Cosmology@Home is back up.

Marius wrote:
We were down for the last ~24hrs due to some networking issues which are now fixed. We apologize for the inconvenience.
ID: 65975 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 61
United States
Message 66304 - Posted: 20 Dec 2015, 11:33:10 UTC

Constellation appears to be down.
ID: 66304 · Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 66325 - Posted: 21 Dec 2015, 16:31:10 UTC

POEM site currently can't be accessed at all
ID: 66325 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 61
United States
Message 66437 - Posted: 26 Dec 2015, 17:48:26 UTC - in response to Message 66325.  
Last modified: 26 Dec 2015, 17:48:41 UTC

I am able to access the POEM site.
ID: 66437 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 61
United States
Message 66649 - Posted: 8 Jan 2016, 2:35:49 UTC

NFS@home has been going up and down for the last few days.
ID: 66649 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 61
United States
Message 66702 - Posted: 9 Jan 2016, 1:07:57 UTC

It looks like NFS@home is fully down for today.
ID: 66702 · Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 14 Feb 11
Posts: 61
United States
Message 66707 - Posted: 9 Jan 2016, 3:29:32 UTC

NFS@home is up. It turns out that it was moving to another data center, and I had missed the announcement that it was moving.
ID: 66707 · Report as offensive     Reply Quote
Matt Kowal
Avatar

Send message
Joined: 16 Dec 15
Posts: 15
United States
Message 66864 - Posted: 16 Jan 2016, 2:35:00 UTC

Collatz is down as of Friday, Jan 15th
ID: 66864 · Report as offensive     Reply Quote
Matt Kowal
Avatar

Send message
Joined: 16 Dec 15
Posts: 15
United States
Message 66925 - Posted: 18 Jan 2016, 1:02:04 UTC - in response to Message 66864.  

Collatz is back up as of Sunday, Jan 17th
ID: 66925 · Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 26 · Next

Message boards : Projects : News on Project Outages

Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.