News on project outages.

Message boards : Projects : News on project outages.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

AuthorMessage
ZPM
Avatar

Send message
Joined: 14 Mar 09
Posts: 215
United States
Message 26610 - Posted: 15 Aug 2009, 3:03:10 UTC - in response to Message 26609.  

this type of crash has happened before and is a issue which they are working on, be patient...


all projects have a bad time every once in a while.
ID: 26610 · Report as offensive
Phil

Send message
Joined: 2 May 09
Posts: 9
United Kingdom
Message 26631 - Posted: 16 Aug 2009, 15:46:18 UTC - in response to Message 26609.  

EINSTEIN@HOME -

So are these sparse and vague little messages on the root index page, days apart, the best they can do?


Well lets see, its a weekend and instead of doing family stuff they have to spend it in the lab sending out for sandwiches and pizza. And they should sit composing polished HTML as well?

But there are folks out here who also take a great interest in the projects - and to have so little information about what is going on just kinda sucks...

I remember a lot of moaning at SETI@HOME which prompted this reply:

Matt Lebofsky wrote:

Matt Lebofsky
View profile
More options Jan 25 2001, 12:52 am

3.03 sucks so much because we cleaned up a bunch of sucking code in this version. We found several cases existed in 3.0 where the clients could process entire workunits without ever touching the suck functions. This was an undetected conflict between the science and suck code, and has been fixed in 3.03.
Now we are efficiently maximizing both science and sucking.
You are able to use 3.0 during random phases when I don't have mandatory upgrade messages turned on. I turn these on whenever I feel like it, and turn them off when 3.0 clients in tight loops load down our server. As of time of writing, you cannot download workunits with 3.0, but you may be able to later. Within a week, no matter what, 3.0 will be dead in the water. We can only accept results from suckless clients for so long.

- Matt - SETI@home



ID: 26631 · Report as offensive
Profile charleys

Send message
Joined: 15 Aug 09
Posts: 2
United States
Message 26632 - Posted: 16 Aug 2009, 18:18:52 UTC
Last modified: 16 Aug 2009, 18:35:20 UTC

Well thanks for clearing all that up Phil! I feel so much better about participating.

There was an offer of help in addition to a frustrated plea as to the lack of information. I've been doing Einstein@Home nearly since it's inception.

Polished html, no. A little more information *would* have been nice

Cheers and thanks again for your sage comments.
ID: 26632 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26636 - Posted: 17 Aug 2009, 9:53:19 UTC

CPDN main project

CPDN upload server cpdn-upload1.comlab has been down since yesterday for data copying/transfer, which is a long job.

This server accepts _2.zip files from HadAM3P models. These files cannot upload while this server is down.

These models' _1 and _3 files should upload without problems as they are allocated to different servers. While cpdn-upload.comlab is down here are some suggestions:

- You could suspend Boinc network activity as much possible until the server's running.
- You could suspend your HadAM3Ps before they complete and generate their zip files.
- You could do nothing at all because we do not expect this outage to last many days.
- Your network connection will be fine. Please do not repeatedly activate and suspend Boinc network activity.
- Do not use the Retry now button in your Boinc Manager Transfers tab. Repeated failed attempts are not good for files.
- Do not abort the file transfers or the models. They are all good!
ID: 26636 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26658 - Posted: 18 Aug 2009, 10:50:43 UTC

CPDN main project

Cpdn-upload1.comlab upload server has been running since yesterday evening.

ID: 26658 · Report as offensive
RGiskard
Avatar

Send message
Joined: 14 Dec 08
Posts: 5
United States
Message 26671 - Posted: 18 Aug 2009, 21:20:37 UTC
Last modified: 18 Aug 2009, 21:21:07 UTC

It would appear that Einstein@Home is back up & running.
Rob




ID: 26671 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26676 - Posted: 19 Aug 2009, 1:03:25 UTC

CPDN main project

On Wednesday 19 Aug Milo hopes to begin work on upload server uploader1.atm. This job is part of the installation of the second new server. He will need to disable the server and could begin 7 hours from now. Here is the CPDN server status page.

This server accepts _3.zip files from HadAM3P models. These files cannot upload while uploader1.atm is down.

These models' _1 and _2.zip files should upload without problems as they are allocated to different servers. When uploader1.atm is down here are some suggestions:

- You could suspend Boinc network activity as much possible until the server's running.
- You could suspend your HadAM3Ps before they complete and generate their zip files.
- You could take no action because we do not expect this outage to last many days.
- Please do not repeatedly activate and suspend Boinc network activity. Your network connection will be fine.
- Do not use the Retry now button in your Boinc Manager Transfers tab. Repeated failed attempts are not good for files.
- Do not abort the file transfers or the models. They are all good!
ID: 26676 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4797
United Kingdom
Message 26936 - Posted: 31 Aug 2009, 10:45:48 UTC

It looks as if Einstein@Home has gone down again.

IIRC, both of the two previous major filesystem outages started when SETI was having problems (Einstein is a popular backup project for SETI users). SETI ran out of raw data loaded for workunit generation about four hours ago, and is likely to remain dry for another five or six hours until the start of the working week in Berkeley.

Could/should there be any sort of cross-project early warning system to prevent this 'domino' or cascading effect of project failures?
ID: 26936 · Report as offensive
Victor

Send message
Joined: 3 Sep 09
Posts: 1
Brazil
Message 27022 - Posted: 3 Sep 2009, 2:32:50 UTC

FreeHAL -> Down (and I have 1 unit to be sent)
lhcathome -> Down, as expected
ramsey -> Down since a long time...
Spinhenge -> Down (probably gone off today, and their web-site is fine)
wanless2 -> Down since a long time, but their web-site is fine


And other 30 projects running fine in my cluser...


If one of the above projects has been definatly turned off, could someone tell me?
ID: 27022 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1637
Australia
Message 27023 - Posted: 3 Sep 2009, 3:39:12 UTC

Please note that is thread is really for posting notes about outages, not for asking about them, which should be in it's own thread.

***********
***********

Front page of FreeHAL, as cached by Google on 29 Aug 2009:

New version
August 29, 2009
There is a new version (called 0.50) available for Windows and Linux (32 and 64 bit). Please post issues in the forum or in the bug tracker: http://bugs.freehal.org -- Best Regards, Tobias Schulz

Denial of Service attack again
August 28, 2009
There has been a DDoS attack against our server again (like in June 2009), fortunately the new server seemed to deal quite well with it. I have blocked up to ten thousand IP adresses now (including the ones from June), if somebody is unable to reach the web site please contact me (mail: info@freehal.org)... -- Best Regards, Tobias Schulz


******************

Front page of Ramsey@Home, as cached by Google on 26 Aug 2009:
Still working! New Collaborators - Ben Ferenchak.
August 5, 2009
I'm still here guys! Ben Ferenchak, a student at Widener University, PA, and some of his fellow classmates are collaborating with me on the new direction of this project. I'm very excited to have their assistance in working out some of the mathematics and CUDA development we hope to do. As of now we are still brushing up on our group theory, developing small test applications, and corresponding with Dr. Exoo who has been a great source of information and promising leads. We're still fairly far away from an alpha test but please stay tuned! Thanks everyone!
This is taking forever! :)


******************

BOINCstats shows that a lot of users crunching Spinhenge@home got credit in the last 24 hours.

******************

Same as above for WEP-M+2@home, and also for FreeHAL.

ID: 27023 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27071 - Posted: 4 Sep 2009, 23:30:50 UTC

Looks like the Collatz project went offline a few hours ago and remains offline.

Collatz is a small and relatively recent project -- and one of the few with CUDA and ATI GPU support.
ID: 27071 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27072 - Posted: 4 Sep 2009, 23:33:16 UTC - in response to Message 27022.  

Spinhenge appears to be up and running (if there was an outage yesterday it wasn't for a long time).



Spinhenge -> Down (probably gone off today, and their web-site is fine)


ID: 27072 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27085 - Posted: 5 Sep 2009, 16:50:32 UTC - in response to Message 27071.  

Collatz remains offline 24 hours later -- their home page is down, so there is no indication of the nature or expected duration of the outage.

Looks like the Collatz project went offline a few hours ago and remains offline.

Collatz is a small and relatively recent project -- and one of the few with CUDA and ATI GPU support.

ID: 27085 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 27138 - Posted: 7 Sep 2009, 19:03:16 UTC

Milo announced on Tuesday morning on the main CPDN website:

Web Server Upgrade

We plan to shut down the web server hosting [the CPDN website] some time around 9AM BST [08.00 UTC] on the 8th of September to upgrade the RAM and processors. This isn't expected to be a long procedure, but whilst it is taking place BOINC clients will not be able to contact the scheduler and the main [CPDN] pages and PHP boards [the independent forum] will not be available. The upload servers and BOINC boards will continue as normal.

Thanks for your patience during this upgrade and the earlier, and much more extensive, database upgrades.

[While the scheduler is down it will be impossible to download new models. My extra comments are in brackets.]
ID: 27138 · Report as offensive
Profile KSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 27154 - Posted: 8 Sep 2009, 14:22:23 UTC

QMC@Home

I just got an email from Martin

qah went down about an hour ago. Unfortunately, neither Robert nor I are at the moment in Muenster (Robert is at a Conference and I am working in Prague). I hope that I will be able to describe was has to be done to our computer technician at the institute, but the whole thing might longer than usual for our project ...
ID: 27154 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27165 - Posted: 8 Sep 2009, 18:49:09 UTC - in response to Message 27085.  

Collatz did come back online on Friday, but they had a power outage on Monday morning which apparently took out the database on the way down. Collatz remains offline today -- it isn't clear how long it will take to recover.


Collatz remains offline 24 hours later -- their home page is down, so there is no indication of the nature or expected duration of the outage.

Looks like the Collatz project went offline a few hours ago and remains offline.

Collatz is a small and relatively recent project -- and one of the few with CUDA and ATI GPU support.


ID: 27165 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27177 - Posted: 8 Sep 2009, 21:20:43 UTC - in response to Message 27165.  

Collatz remains offline this afternoon:

Collatz Conjecture

Warning: mysql_pconnect() [function.mysql-pconnect]: Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2) in /home/boincadm/projects/collatz/html/inc/db.inc on line 39
Unable to connect to database - please try again later Error: 2002Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)

Collatz did come back online on Friday, but they had a power outage on Monday morning which apparently took out the database on the way down. Collatz remains offline today -- it isn't clear how long it will take to recover.


Collatz remains offline 24 hours later -- their home page is down, so there is no indication of the nature or expected duration of the outage.

Looks like the Collatz project went offline a few hours ago and remains offline.

Collatz is a small and relatively recent project -- and one of the few with CUDA and ATI GPU support.



ID: 27177 · Report as offensive
Gipsel

Send message
Joined: 8 Sep 09
Posts: 5
Germany
Message 27181 - Posted: 8 Sep 2009, 21:49:22 UTC - in response to Message 27177.  

Collatz remains offline this afternoon

And maybe even for a few days. There is a statement by Slicker, the Collatz admin:

Due to a mixture of water and electricity, the power blew and brought down the server. I'm working on getting it working and/or rebuilding it, but really have no idea whether it will be one day or one week. The good news is I have a very recent database backup from just an hour or so prior to the crash.
ID: 27181 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 27187 - Posted: 8 Sep 2009, 22:29:51 UTC - in response to Message 27181.  

Ah -- OK -- thanks for that.

Of course with that outage, GPUGrid is happier with me I suppose.
ID: 27187 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 27201 - Posted: 9 Sep 2009, 12:57:08 UTC
Last modified: 9 Sep 2009, 12:57:35 UTC

CPDN main project

Today upload server climateapps3.oucs is down temporarily while Milo moves some data from there to uploader1.atm.
ID: 27201 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 · Next

Message boards : Projects : News on project outages.

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.