Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 38876 - Posted: 11 Jul 2011, 3:36:05 UTC

SETI has hiccuped again. Site is offline and the router graph has flatlined. Hope they can fixit easily in the morning when the staff comes in for the week.
ID: 38876 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15573
Netherlands
Message 38878 - Posted: 11 Jul 2011, 10:28:46 UTC - in response to Message 38876.  

Which is weird, as it's two different connections and systems. I mean, the 100Mbit pipe is only for data, their forums run off a separate 1Gbit pipe through campus. It's also different servers that it all runs on, so it sounds like a collective power-outage, but then how do we type on these boards? ;-)
ID: 38878 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5133
United Kingdom
Message 38879 - Posted: 11 Jul 2011, 10:58:36 UTC - in response to Message 38878.  

Which is weird, as it's two different connections and systems. I mean, the 100Mbit pipe is only for data, their forums run off a separate 1Gbit pipe through campus. It's also different servers that it all runs on, so it sounds like a collective power-outage, but then how do we type on these boards? ;-)

It can't be either a connection, or a power, issue, for the reasons you state.

SETI has a number of different servers, and tends to have a separate server for each major BOINC function. At the moment, the file upload server and the scheduling server are both intact, and so is the BOINC database - reported results are being validated. But the download server and the web server are inaccessible.
ID: 38879 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15573
Netherlands
Message 38887 - Posted: 11 Jul 2011, 17:38:42 UTC

Seti is back.
ID: 38887 · Report as offensive
Tomas

Send message
Joined: 16 Nov 08
Posts: 28
Sweden
Message 38900 - Posted: 12 Jul 2011, 14:59:03 UTC

And now it seams to be down again and Milkyway too
ID: 38900 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38901 - Posted: 12 Jul 2011, 18:08:24 UTC

Milkyway is still totally offline (been so for about 5 hours so far - 11AM PDT as I post). Of course with Milkyway, with their minimalist queues, being offline for more than an hour means out of work for users (on the GPU side).

Aqua remains in credit conundrum stasis -- perhaps they are discussing best approaches to deal with it -- though as of this morning, there has been no follow up from the project (nothing since Friday morning) on their home page or the message boards.

SETI of course is offline for their planned weekly outage (distinct from unplanned outages).
ID: 38901 · Report as offensive
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 38902 - Posted: 12 Jul 2011, 18:10:11 UTC - in response to Message 38900.  
Last modified: 12 Jul 2011, 18:13:06 UTC

And now it seams to be down again and Milkyway too

I think the last time Milkyway had a big outage that Travis said that RPI would over the summer shut down non key IT Functions/Server rooms would be shutdown any time daytime temps were expected to top 90F. They say they are not in full session so students are not harmed by facilities being offline. Supposedly RPI gives them something like 1/2 hour notice to shut things down and then they turn the AC and Power off to that room/ Facility. This is supposed to be a cost savings for the University.
ID: 38902 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38904 - Posted: 12 Jul 2011, 18:33:13 UTC - in response to Message 38902.  

OK -- that makes some sense -- I shut down some of my farm during the day here. The issue for Milkyway of course is with their very short time cycle queues, that means folks run out of work very quickly. Then again, that is simply an incentive to carry multiple projects. These days there are at least four 'native' ATI GPU projects out there - Collatz, MooWrapper, Dnet, and Milkyway.


I think the last time Milkyway had a big outage that Travis said that RPI would over the summer shut down non key IT Functions/Server rooms would be shutdown any time daytime temps were expected to top 90F. They say they are not in full session so students are not harmed by facilities being offline. Supposedly RPI gives them something like 1/2 hour notice to shut things down and then they turn the AC and Power off to that room/ Facility. This is supposed to be a cost savings for the University.

ID: 38904 · Report as offensive
Tomas

Send message
Joined: 16 Nov 08
Posts: 28
Sweden
Message 38905 - Posted: 12 Jul 2011, 18:57:28 UTC - in response to Message 38902.  

And now it seams to be down again and Milkyway too

I think the last time Milkyway had a big outage that Travis said that RPI would over the summer shut down non key IT Functions/Server rooms would be shutdown any time daytime temps were expected to top 90F. They say they are not in full session so students are not harmed by facilities being offline. Supposedly RPI gives them something like 1/2 hour notice to shut things down and then they turn the AC and Power off to that room/ Facility. This is supposed to be a cost savings for the University.


It looks like it is a Scheduled Downtime I'm glad they informed us ;)
ID: 38905 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38906 - Posted: 12 Jul 2011, 19:14:15 UTC - in response to Message 38905.  

Regarding Milkyway, to a large degree it doesn't matter if downtimes are planned or not, the short queues mean you can't plan for an outage by stocking up in any manner, it just means that you need second (and third) projects configured and active.

The thing is, with MW, about the only way you can run it as a primary is to have the other GPU projects suspended. For me the reverse ends up being the case, the other projects get the GPU cycle time unless I manually intervene to push MW work units.



It looks like it is a Scheduled Downtime I'm glad they informed us ;)

ID: 38906 · Report as offensive
Tomas

Send message
Joined: 16 Nov 08
Posts: 28
Sweden
Message 38909 - Posted: 12 Jul 2011, 19:34:42 UTC - in response to Message 38906.  

Regarding Milkyway, to a large degree it doesn't matter if downtimes are planned or not, the short queues mean you can't plan for an outage by stocking up in any manner, it just means that you need second (and third) projects configured and active.

The thing is, with MW, about the only way you can run it as a primary is to have the other GPU projects suspended. For me the reverse ends up being the case, the other projects get the GPU cycle time unless I manually intervene to push MW work units.



It looks like it is a Scheduled Downtime I'm glad they informed us ;)



I run Folding@home as a backup to Milkyway so in my case it DOES mater to be informed if the mikyway server will shut down so I can put Milkyway on stand by and start Folding before going to work.
ID: 38909 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38910 - Posted: 12 Jul 2011, 19:35:32 UTC

It looks like Aqua may end up in permanent stasis -- they feel they must rely on BOINC central for credit schemes and until (or if) those schemes are reasonable, the project won't proceed.
ID: 38910 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 38911 - Posted: 12 Jul 2011, 20:10:38 UTC - in response to Message 38910.  
Last modified: 12 Jul 2011, 20:10:53 UTC

It looks like Aqua may end up in permanent stasis -- they feel they must rely on BOINC central for credit schemes and until (or if) those schemes are reasonable, the project won't proceed.

Slicker at Collatz has managed to bypass NewCredit, and still runs his Credit system with recent Boinc server software,

Claggy
ID: 38911 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38915 - Posted: 12 Jul 2011, 22:37:33 UTC - in response to Message 38911.  
Last modified: 12 Jul 2011, 22:41:29 UTC

I understand -- the admin over at Aqua is aware that this can be done, but doesn't want to have DA get upset with Aqua by 'hacking' the server releases to avoid submission to the NewCredit diktat -- even if the NewCredit credits result in some ridiculously high numbers (making a mockery of the concept behind the NewCredit scheme. The thing is, while I have no problems with seeing projects within a range of credit per cycle parity, trying to control that for all projects, in all stages of application development for all ranges of hardware seems a tad ... utopian to me. (Noting that not all utopias are good utopias).



Slicker at Collatz has managed to bypass NewCredit, and still runs his Credit system with recent Boinc server software,

Claggy
ID: 38915 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38924 - Posted: 13 Jul 2011, 13:23:22 UTC - in response to Message 38909.  

It seems that the scheduled (but not pre=reported) MW outage is continuing into a second day -- no evidence that it will conclude all that soon.
ID: 38924 · Report as offensive
Tomas

Send message
Joined: 16 Nov 08
Posts: 28
Sweden
Message 38925 - Posted: 13 Jul 2011, 13:33:45 UTC - in response to Message 38924.  
Last modified: 13 Jul 2011, 13:36:14 UTC

null
ID: 38925 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 38931 - Posted: 13 Jul 2011, 16:49:22 UTC - in response to Message 38920.  

See my private message on this -- but for other eyes, it *may* have been a misread on my part on that thread, but that is qualitatively different from 'a deliberate lie' and I apologize for providing the grist for your mischaracterization.


That is a deliberate lie intended to slander the character of the AQUA@home admins. Anybody interested in the truth should read the Credit rollback!! and Crazy Credit threads at AQUA. The admin has stated in at least 2 posts that the reason they don't want to hack the server code is because it would require a lot of code merging in the future. You don't know squat about their problems so you make up bullshit. Those boys running AQUA are Canadians, they're my friends and they ain't afraid of DA or any other Yankee.

ID: 38931 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.