Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
elgordodude

Send message
Joined: 9 Jan 11
Posts: 2
United States
Message 40715 - Posted: 19 Oct 2011, 3:07:00 UTC

Any word on Virtual Prairie? They've been out of work for six months or so, nothing new there, but the site has been down for a week as well. Starting to look like it might be done...
ID: 40715 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40722 - Posted: 19 Oct 2011, 19:10:26 UTC

From the Climate sysadmin:

Security incident: project down
by jaamiller ยป Wed Oct 19, 2011 12:00 pm

On 18 October 2011 the CPDN project server was attacked.

We have had to respond by taking the project offline.

I will provide more info as I have it, but I will be devoting my time to fixing the issues and determining the extent of the problem.

Jonathan
Jonathan Miller
CPDN SysAdmin
ID: 40722 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40738 - Posted: 20 Oct 2011, 18:13:14 UTC - in response to Message 40714.  

Dnet came back online about 8 hours later --- no comment on the website regarding the outage. (10/18)

Dnet went offline again about 3 hours ago (8AM PDT), like on Tuesday. Perhaps the site will go back online in another 5 hours (history repeats itself). Perhaps we will see some explanation of the problem -- perhaps not (history repeats itself).

Dnet went offline around 8:30AM PDT -- still offline (1PM PDT).

ID: 40738 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40757 - Posted: 21 Oct 2011, 18:41:16 UTC

Dnet shutting down (perhaps for real -- this being the third time this year):



We decided (OxyOne and Sesef) to close the project DNETC @ HOME.

Feel free to MooWrapper @ home.

Thank you for your cooperation.

The server runs while stocks WU.
ID: 40757 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40767 - Posted: 21 Oct 2011, 22:27:46 UTC

Further on Dnetc being shut down:

From the developer/administrator (and this is a quote):

The project is boring. I'm tired of it.
ID: 40767 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40768 - Posted: 21 Oct 2011, 22:33:22 UTC - in response to Message 40767.  
Last modified: 21 Oct 2011, 22:33:29 UTC

The project is boring. I'm tired of it.

Tell them to go use a VM... ;-)
ID: 40768 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40770 - Posted: 21 Oct 2011, 23:40:28 UTC

Seti News:
Jeff Cobb wrote:
For the last few months, network routing issues have been interfering with the connectivity of some participants. The actual problem turned out to be a lack of sufficient memory in our router at the PAIX in Palo Alto. Two days ago we increased the memory in that router by a factor of four. This fixed the problem.
ID: 40770 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40780 - Posted: 22 Oct 2011, 17:22:05 UTC

Looks like Leiden Classical has either database trouble or is under heavy load.

Unable to connect to database - please try again later Error: 2013Lost connection to MySQL server during query
ID: 40780 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40791 - Posted: 23 Oct 2011, 10:57:09 UTC - in response to Message 40780.  

Leiden Classical:

22 oktober 2011
Due to a disk failure on one of the servers, the database and the replicated copy of it got corrupted. A restore was done from a known working backup of yesterday. Some WUs which were send out in the mean time might have been lost due to this. Currently the project runs on just one of the servers, later this week the second server will be put to action again. Sorry about that!
ID: 40791 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 40849 - Posted: 26 Oct 2011, 9:54:07 UTC

CPDN main project

As many members already know from Barry's post above, CPDN was down for a week before being brought back up yesterday.

Last week Jonathan posted on the independent forum:

'On 18 October 2011 the CPDN project server was attacked. We have had to respond by taking the project offline.

I will provide more info as I have it, but I will be devoting my time to fixing the issues and determining the extent of the problem.

Jonathan'

Yesterday Andy added this:

'Update to: Security incident

On 18 October 2011 we suffered a hacking attack on the servers of the CPDN project.

As a volunteer driven research project for studying climate change we were very dismayed such action caused to disrupt the project. This attack unfortunately forced us to take the project down whilst the issue was investigated, during this time we conducting a full security analysis, now we are satisfied of the results of the security analysis we have brought the project back online.

Unfortunately some data of our users had been compromised. We have contacted by email all those users that are affected by this, this represents less than 0.5% of participants. If you have not been emailed (to the email address associated with that CPDN account) then you won't be affect by this issue and you need take no action on this.

We can only send this email to the email addresses associated with the accounts concerned, some of these address may now be inactive so please check any old addresses that you may have used when registering with the CPDN project. There also a possibility that your spam filter may have withheld the email.

We offer our apologies for any inconvenience caused by this. This is an exciting time for the project in terms of science, and we deeply value all the continued contribution of participants.'
ID: 40849 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40910 - Posted: 31 Oct 2011, 6:22:41 UTC

Milkyway went dark about 2 hours ago -- no access to the home page (or their servers for uploads/downloads). Standard scenario -- with their micro-queue, any outage over a half hour or so runs folks out of work to process.
ID: 40910 · Report as offensive
KAMasud

Send message
Joined: 13 Feb 07
Posts: 21
Pakistan
Message 40911 - Posted: 31 Oct 2011, 6:26:54 UTC - in response to Message 40910.  

LoL. Like your statment about micro-queue.
ID: 40911 · Report as offensive
Ralph

Send message
Joined: 30 Sep 05
Posts: 50
Message 40913 - Posted: 31 Oct 2011, 10:33:15 UTC

LHC@home is not giving out any work, and I just picked the time for Milkyway to be down, too. Orbit's fallen off again. Back to Cosmology.
ID: 40913 · Report as offensive
ProfileBlurf

Send message
Joined: 18 Jul 11
Posts: 217
United States
Message 40915 - Posted: 31 Oct 2011, 16:10:58 UTC

Noon EST...MW still down. Working on it....
ID: 40915 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40916 - Posted: 31 Oct 2011, 16:25:46 UTC - in response to Message 40915.  

Want me to make you a thread in which you can tell people when MW is actually up? ;-)
ID: 40916 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40917 - Posted: 31 Oct 2011, 16:56:52 UTC - in response to Message 40916.  

MW runs more often than not, it seems to have outages on the order of maybe once a week (unplanned outages -- as opposed to the weekly planned outages for SETI). The major difference with MW is that for various reasons, folks running MW GPU get a maximum queue size of maybe an hour, and depending on the GPU much less than that. So with MW, EVERY outage starves users of work.

MW is still offline this morning (10AM PDT)so they are in 'extended outage mode' at this point.

The basic approach for GPU folks running MW, you MUST have a backup GPU project queued up. For me, I find that if I have an alternate project (Collatz and Moo as I'm running AMD/ATI GPU's), that alternate project will tend to bump out MW in terms of what gets requested and processed. So the only way I can 'push' MW is suspend the alternates for some period of time -- and if I do that, I need to watch closely due to the frequent MW outages and micro-queue.


Want me to make you a thread in which you can tell people when MW is actually up? ;-)

ID: 40917 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 40918 - Posted: 31 Oct 2011, 17:02:45 UTC

It seems that MW over the years has had to contend with two 'outage components'. First is the project resources and configuration itself. Sometimes their software burps, sometimes their hardware belches, sometimes scripts meant to process don't.

The second is RPI (their host academic resource), sometimes they have issues which shut down the project.

The first 'outage component' is made more frequent as it seems MW operates quite close to the edge in terms of processing capability and storage space.

Of course, as noted often times over the years, MW is also plagued with their micro-queues, which is something oft discussed but also something that seemingly will never get addressed due to other constraints.
ID: 40918 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40919 - Posted: 31 Oct 2011, 18:49:01 UTC

Since we requested that Enigma disabled the upload certificates to anticipate the 6.13 clients, the project is now down. Sorry. ;-)
ID: 40919 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 40920 - Posted: 2 Nov 2011, 19:36:29 UTC - in response to Message 40919.  
Last modified: 2 Nov 2011, 20:32:37 UTC

Enigma has been back since 2 days. Sorry for the BOINC domain outage, a broken RAID array threw a spanner in the works. Thanks to Matt Lebofsky for painstakingly repairing everything.

Thus far it looks like Everything came back, but for Trac and the Wiki. If you find weird things around here, please let us know. I'll forward it to development then.

Edit: not everything then. The actual download servers are still down.
In case you need any version, please use the Einstein mirror at http://einstein.phys.uwm.edu/download/boinc/dl/?C=M;O=D for now. The BOINC/dl/ server is severely out of date.
ID: 40920 · Report as offensive
ProfileBlurf

Send message
Joined: 18 Jul 11
Posts: 217
United States
Message 40921 - Posted: 2 Nov 2011, 21:12:19 UTC

Milkyway will be down Thursday 11/3.

We'll be taking the server down tomorrow to figure out what this new hardware problem is. It looks like some issue with the interconnect is causing the server to crash repeatedly.
ID: 40921 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.