News on Project Outages

Author	Message
elgordodude Send message Joined: 9 Jan 11 Posts: 2	Message 40715 - Posted: 19 Oct 2011, 3:07:00 UTC Any word on Virtual Prairie? They've been out of work for six months or so, nothing new there, but the site has been down for a week as well. Starting to look like it might be done... ID: 40715 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40722 - Posted: 19 Oct 2011, 19:10:26 UTC From the Climate sysadmin: Security incident: project down by jaamiller » Wed Oct 19, 2011 12:00 pm On 18 October 2011 the CPDN project server was attacked. We have had to respond by taking the project offline. I will provide more info as I have it, but I will be devoting my time to fixing the issues and determining the extent of the problem. Jonathan Jonathan Miller CPDN SysAdmin ID: 40722 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40738 - Posted: 20 Oct 2011, 18:13:14 UTC - in response to Message 40714. Dnet came back online about 8 hours later --- no comment on the website regarding the outage. (10/18) Dnet went offline again about 3 hours ago (8AM PDT), like on Tuesday. Perhaps the site will go back online in another 5 hours (history repeats itself). Perhaps we will see some explanation of the problem -- perhaps not (history repeats itself). Dnet went offline around 8:30AM PDT -- still offline (1PM PDT). ID: 40738 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40757 - Posted: 21 Oct 2011, 18:41:16 UTC Dnet shutting down (perhaps for real -- this being the third time this year): We decided (OxyOne and Sesef) to close the project DNETC @ HOME. Feel free to MooWrapper @ home. Thank you for your cooperation. The server runs while stocks WU. ID: 40757 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40767 - Posted: 21 Oct 2011, 22:27:46 UTC Further on Dnetc being shut down: From the developer/administrator (and this is a quote): The project is boring. I'm tired of it. ID: 40767 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40768 - Posted: 21 Oct 2011, 22:33:22 UTC - in response to Message 40767. Last modified: 21 Oct 2011, 22:33:29 UTC The project is boring. I'm tired of it. Tell them to go use a VM... ;-) ID: 40768 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40770 - Posted: 21 Oct 2011, 23:40:28 UTC Seti News: Jeff Cobb wrote: For the last few months, network routing issues have been interfering with the connectivity of some participants. The actual problem turned out to be a lack of sufficient memory in our router at the PAIX in Palo Alto. Two days ago we increased the memory in that router by a factor of four. This fixed the problem. ID: 40770 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40780 - Posted: 22 Oct 2011, 17:22:05 UTC Looks like Leiden Classical has either database trouble or is under heavy load. Unable to connect to database - please try again later Error: 2013Lost connection to MySQL server during query ID: 40780 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40791 - Posted: 23 Oct 2011, 10:57:09 UTC - in response to Message 40780. Leiden Classical: 22 oktober 2011 Due to a disk failure on one of the servers, the database and the replicated copy of it got corrupted. A restore was done from a known working backup of yesterday. Some WUs which were send out in the mean time might have been lost due to this. Currently the project runs on just one of the servers, later this week the second server will be put to action again. Sorry about that! ID: 40791 ·

mo.v Send message Joined: 13 Aug 06 Posts: 778	Message 40849 - Posted: 26 Oct 2011, 9:54:07 UTC CPDN main project As many members already know from Barry's post above, CPDN was down for a week before being brought back up yesterday. Last week Jonathan posted on the independent forum: 'On 18 October 2011 the CPDN project server was attacked. We have had to respond by taking the project offline. I will provide more info as I have it, but I will be devoting my time to fixing the issues and determining the extent of the problem. Jonathan' Yesterday Andy added this: 'Update to: Security incident On 18 October 2011 we suffered a hacking attack on the servers of the CPDN project. As a volunteer driven research project for studying climate change we were very dismayed such action caused to disrupt the project. This attack unfortunately forced us to take the project down whilst the issue was investigated, during this time we conducting a full security analysis, now we are satisfied of the results of the security analysis we have brought the project back online. Unfortunately some data of our users had been compromised. We have contacted by email all those users that are affected by this, this represents less than 0.5% of participants. If you have not been emailed (to the email address associated with that CPDN account) then you won't be affect by this issue and you need take no action on this. We can only send this email to the email addresses associated with the accounts concerned, some of these address may now be inactive so please check any old addresses that you may have used when registering with the CPDN project. There also a possibility that your spam filter may have withheld the email. We offer our apologies for any inconvenience caused by this. This is an exciting time for the project in terms of science, and we deeply value all the continued contribution of participants.' ID: 40849 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40910 - Posted: 31 Oct 2011, 6:22:41 UTC Milkyway went dark about 2 hours ago -- no access to the home page (or their servers for uploads/downloads). Standard scenario -- with their micro-queue, any outage over a half hour or so runs folks out of work to process. ID: 40910 ·

KAMasud Send message Joined: 13 Feb 07 Posts: 21	Message 40911 - Posted: 31 Oct 2011, 6:26:54 UTC - in response to Message 40910. LoL. Like your statment about micro-queue. ID: 40911 ·

Ralph Send message Joined: 30 Sep 05 Posts: 50	Message 40913 - Posted: 31 Oct 2011, 10:33:15 UTC LHC@home is not giving out any work, and I just picked the time for Milkyway to be down, too. Orbit's fallen off again. Back to Cosmology. ID: 40913 ·

Blurf Send message Joined: 18 Jul 11 Posts: 217	Message 40915 - Posted: 31 Oct 2011, 16:10:58 UTC Noon EST...MW still down. Working on it.... ID: 40915 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40916 - Posted: 31 Oct 2011, 16:25:46 UTC - in response to Message 40915. Want me to make you a thread in which you can tell people when MW is actually up? ;-) ID: 40916 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40917 - Posted: 31 Oct 2011, 16:56:52 UTC - in response to Message 40916. MW runs more often than not, it seems to have outages on the order of maybe once a week (unplanned outages -- as opposed to the weekly planned outages for SETI). The major difference with MW is that for various reasons, folks running MW GPU get a maximum queue size of maybe an hour, and depending on the GPU much less than that. So with MW, EVERY outage starves users of work. MW is still offline this morning (10AM PDT)so they are in 'extended outage mode' at this point. The basic approach for GPU folks running MW, you MUST have a backup GPU project queued up. For me, I find that if I have an alternate project (Collatz and Moo as I'm running AMD/ATI GPU's), that alternate project will tend to bump out MW in terms of what gets requested and processed. So the only way I can 'push' MW is suspend the alternates for some period of time -- and if I do that, I need to watch closely due to the frequent MW outages and micro-queue. Want me to make you a thread in which you can tell people when MW is actually up? ;-) ID: 40917 ·

BarryAZ Send message Joined: 4 Sep 09 Posts: 381	Message 40918 - Posted: 31 Oct 2011, 17:02:45 UTC It seems that MW over the years has had to contend with two 'outage components'. First is the project resources and configuration itself. Sometimes their software burps, sometimes their hardware belches, sometimes scripts meant to process don't. The second is RPI (their host academic resource), sometimes they have issues which shut down the project. The first 'outage component' is made more frequent as it seems MW operates quite close to the edge in terms of processing capability and storage space. Of course, as noted often times over the years, MW is also plagued with their micro-queues, which is something oft discussed but also something that seemingly will never get addressed due to other constraints. ID: 40918 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40919 - Posted: 31 Oct 2011, 18:49:01 UTC Since we requested that Enigma disabled the upload certificates to anticipate the 6.13 clients, the project is now down. Sorry. ;-) ID: 40919 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15519	Message 40920 - Posted: 2 Nov 2011, 19:36:29 UTC - in response to Message 40919. Last modified: 2 Nov 2011, 20:32:37 UTC Enigma has been back since 2 days. Sorry for the BOINC domain outage, a broken RAID array threw a spanner in the works. Thanks to Matt Lebofsky for painstakingly repairing everything. Thus far it looks like Everything came back, but for Trac and the Wiki. If you find weird things around here, please let us know. I'll forward it to development then. Edit: not everything then. The actual download servers are still down. In case you need any version, please use the Einstein mirror at http://einstein.phys.uwm.edu/download/boinc/dl/?C=M;O=D for now. The BOINC/dl/ server is severely out of date. ID: 40920 ·

Blurf Send message Joined: 18 Jul 11 Posts: 217	Message 40921 - Posted: 2 Nov 2011, 21:12:19 UTC Milkyway will be down Thursday 11/3. We'll be taking the server down tomorrow to figure out what this new hardware problem is. It looks like some issue with the interconnect is causing the server to crash repeatedly. ID: 40921 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.