Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 13 · Next

AuthorMessage
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 35011 - Posted: 29 Sep 2010, 18:20:25 UTC - in response to Message 35006.  

Is there an issue with SETI@home servers? They seem to have been down more then a day.

Yes the servers have been down for several days. The issue is that the upload server filled storage area for results returned. The server was scheduled for decommissioning within the next week. The move to the newer sever appears to have hit a bump. They had hoped to have it online Tuesday.
ID: 35011 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35012 - Posted: 29 Sep 2010, 19:17:21 UTC

SETI remains offline -- at this point, in the absence of any follow up information since Sunday, all guesses as to when their servers will come back online are just that guesses. Could be today, could be a week from now.

DNETC remains offline from last night (about 15 hours outage so far). I believe they were looking to a software upgrade. In the past month each upgrade they have deployed (software about a month ago, hardware two weeks ago, and the current software upgrade) have resulted in them being offline for 2 to 4 days. I suspect we will see the same sequence here. Either later today or early tomorrow, the will begin to get their home page back up. Getting the servers back online (they may only have a single server) will take another day or 2). Not sure why the process for DNETC is so disruptive, but we can hope that this is their last upgrade for quite a while (given how troublesome each upgrade has been for them).
ID: 35012 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15540
Netherlands
Message 35014 - Posted: 29 Sep 2010, 21:00:13 UTC

Bruce Allen answered me on my query about the Einstein trouble going on today:

Culprit was a DB query that one of the admins started. It should have been run on a replica DB, but was instead by mistake run on the live project DB. Query was killed, all should be back to normal.
ID: 35014 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35015 - Posted: 29 Sep 2010, 23:43:37 UTC

Looks like they are making progress over at SETI -- might be back online for uploads and downloads tomorrow (Thursday.

To compensate for that, looks like MilkyWay is now offline (no connectivity at all as for 4:30PM PDT).

Dnetc remains offline as well.
ID: 35015 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35016 - Posted: 30 Sep 2010, 0:03:48 UTC

Not so much an outage -- but I am guessing that a recent update to the BOINC server side software is causing some problems with the Validator side of things -- looks like it has affected a handful of projects this week. (Malaria, Aqua, MilkyWay, Einstein).
ID: 35016 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15540
Netherlands
Message 35017 - Posted: 30 Sep 2010, 0:29:25 UTC

Seti update:

Jeff Cobb, Seti admin wrote:
Upload service is succesfully migrated. We now have over 4TB of space for uploads! Not that we'd ever use that much. There are still a couple of kinks to work out with other minor server migrations. Thus, we will leave the back end (validation, etc) down tonight. The splitters and data service are up.
ID: 35017 · Report as offensive
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 35020 - Posted: 30 Sep 2010, 4:32:28 UTC

Rosetta seems to be offline. The main website os down and BOINC can't connect.
ID: 35020 · Report as offensive
Warped
Avatar

Send message
Joined: 25 Aug 08
Posts: 39
South Africa
Message 35046 - Posted: 1 Oct 2010, 3:09:34 UTC - in response to Message 35020.  

Rosetta seems to be offline. The main website os down and BOINC can't connect.


Indeed - they're still down and no indication of what is happening.
ID: 35046 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15540
Netherlands
Message 35049 - Posted: 1 Oct 2010, 9:51:15 UTC - in response to Message 35046.  

Rosetta pages are back. Uploads are still down.
ID: 35049 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15540
Netherlands
Message 35059 - Posted: 2 Oct 2010, 13:00:09 UTC - in response to Message 35049.  

Rosetta Project News wrote:
Oct 1, 2010
A few days ago our main filesystem went down but we are finally back to normal operation. Sorry for any inconvenience.
ID: 35059 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35099 - Posted: 4 Oct 2010, 15:57:29 UTC

SETI is offline again. On Sunday, they focused on allowing uploads and downloads to run and turned off everything else. Overnight, they no longer can perform uploads and downloads. At this point, since SETI can't indicate the status or schedule, it is unclear when things will return to SETI 'normal'.

I am really beginning to think they are running out of duct tape over there.


Einstein has a problem with mis-directed results -- forcing them into manual validation of reported work. Overnight their scheduler went offline -- good news -- that stops the incoming stream of of results they have to manually validate. Bad news, that stops the incoming stream of work (and downloads as well). Haven't seen a schedule for returning to service from Einstein -- though their newsgroups are functioning.

ID: 35099 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 35102 - Posted: 4 Oct 2010, 16:36:48 UTC - in response to Message 35099.  
Last modified: 4 Oct 2010, 16:37:26 UTC

SETI isn't offline at all, only the forum is down. My host uploaded at 14:17 UTC:
04/10/2010 16:17:05|SETI@home|Started upload of 08my10ac.30489.18881.9.10.55.vlar_2_0
04/10/2010 16:17:10|SETI@home|Finished upload of 08my10ac.30489.18881.9.10.55.vlar_2_0
04/10/2010 16:17:10|SETI@home|[file_xfer_debug] Throughput 11545 bytes/sec
And the SETI homepage explains:
The database used for web requests is down, and the main database cannot handle both web requests and work distribution. We have disabled web access to allow work distribution to occur smoothly.
3 Oct 2010 18:29:16 UTC

Gruß,
Gundolf
ID: 35102 · Report as offensive
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 35107 - Posted: 5 Oct 2010, 2:19:10 UTC

Something Broke @ SETI in the last few hours. The Server status page went from almost all green to 3/4 red not running. Hate it when things go bump in the night.

No update posted to the front page yet
ID: 35107 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2486
United States
Message 35108 - Posted: 5 Oct 2010, 2:34:48 UTC

SETI@Home:
Projects are down due to a database machine crash.
The machine that serves the BOINC database crashed. The projects are down until the recovery is complete. 5 Oct 2010 2:27:57 UTC

ID: 35108 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35110 - Posted: 5 Oct 2010, 3:44:40 UTC

Einstein scheduler is working now -- things are a tad 'pushed' there while they recover from a batch of work units that got uploaded to the wrong server a week ago -- so they are doing some extra work to get things sorted with the validator. In clearing that there is additional stress on all the processes -- should clear up over the next week.

SETI is, well, SETI. Possibly simply a case of too many users, too many workunits, and 'band-edge' of the existing hardware (and perhaps software).

Folks onsite are probably working very hard to get things back together -- though I suspect once that happens, the surge effect of thousands of work units hitting the server may cause additional issues. Pretty much a constant struggle these days over there.
ID: 35110 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 35128 - Posted: 6 Oct 2010, 15:46:39 UTC

SETI remains essentially totally offline as they attempt to cope with their latest problem (apparently a total database collapse). Once they resolve that, given the length of this outage, on top of the previous collection of outages and problems they have endured over the past few weeks, it is reasonable to expect a pretty severe and extended post outage traffic jam.

One hopes that SETI will eventually get their hardware and software environment up to a level which can handle the still very large group of SETI users.

Personally, another hope I have is that the current issues that SETI is working thru will encourage SETI-only users to spread their CPU/GPU wealth to other worthy projects, thus not only helping those other projects in their endeavors, but also, by reducing the very high I/O and processing load on SETI, giving the SETI project a bit of room to breathe.
ID: 35128 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 35129 - Posted: 6 Oct 2010, 16:02:56 UTC
Last modified: 6 Oct 2010, 16:06:14 UTC

CPDN main project

Climateapps2, which holds the CPDN database, is down for maintenance while Milo copies data from it to prevent its disk from filling up. He's been copying data for several hours and this is a long job. Members may still be able to download new models though this depends on which parts of climateapps2 he needs to close. We can upload files which go to the upload servers.

The CPDN server status page
ID: 35129 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 35135 - Posted: 7 Oct 2010, 21:09:34 UTC

CPDN main project

The copying from climateapps2 is taking a long time. We can upload files but not trickles. We cannot download new models, so if your computer has idle core(s) please fetch work from another project. Milo will probably complete this maintenance before the weekend. We will need to wait patiently for our credits.
ID: 35135 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 35148 - Posted: 8 Oct 2010, 18:54:51 UTC

CPDN main project

Milo has now succeeded in backing up the entire directory of climateapps2 but has to copy data back to a storage disk. This could be another long, slow job. We still can't upload trickles, report completed tasks or download new models. But we can still upload files and Milo has turned on the CPDN-Boinc forum. Our accounts and task web pages are now available.

He is running the program that gives us our credits, though these may not be exported to the stats sites until Saturday or even Sunday.

Thank you to all members for your patience.
ID: 35148 · Report as offensive
whynot

Send message
Joined: 8 May 10
Posts: 90
Ukraine
Message 35155 - Posted: 9 Oct 2010, 15:57:11 UTC

RCN seems to be down (server and forums) from sometime before midnight Oct 7 (EEST). That had happened before, but it takes too long for this time. Any news?

I'm counting for science,
points just make me sick.
ID: 35155 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 13 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.