Thread 'News on project outages.'

Message boards : Projects : News on project outages.
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

AuthorMessage
ZPM
Avatar

Send message
Joined: 14 Mar 09
Posts: 215
United States
Message 26310 - Posted: 28 Jul 2009, 13:45:28 UTC - in response to Message 26304.  

the forums at Rosetta have this problem in there number section.... lets hope is a temporary problem and gets fixed in the next 72 hrs.
ID: 26310 · Report as offensive
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 30 Aug 05
Posts: 505
Canada
Message 26342 - Posted: 29 Jul 2009, 18:26:39 UTC

SETI@home down for maintenance
ID: 26342 · Report as offensive
ProfileGundolf Jahn

Send message
Joined: 20 Dec 07
Posts: 1069
Germany
Message 26345 - Posted: 29 Jul 2009, 18:57:07 UTC - in response to Message 26342.  

SETI@home down for maintenance

???
Are you sure? As I see it, only the AP splitters are offline.

The outage notice on the homepage has either been forgotten or intenionally left there.

Gruß,
Gundolf
ID: 26345 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26432 - Posted: 2 Aug 2009, 16:56:52 UTC
Last modified: 2 Aug 2009, 16:57:27 UTC

CPDN main project

Upload server climateapps3.oucs is still down. The server status page can be seen here. The programmers hope to have it running again within a couple of days.

Climateapps3 takes decadal zip files uploaded from HadCM models if the files end in an odd number eg _1 or _3. (HadCM files that end in an even number like _4 go to uploader.oerc which should accept them without problems.) Zip files from some HadAM (but not HadAM3P) models may also be affected.

While climateapps is down it would be a good idea to suspend Boinc network activity as much as possible. In the Boinc Transfers tab please do not press the Retry now button, and certainly not the Abort transfer button which would cause the file to be completely lost. Do not repeatedly suspend and reactivate Boinc network activity; your network connection will be fine.

If you wish you can suspend your HadCM and HadAM models until climateapps3 is running again. This would prevent these models creating more files.
ID: 26432 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26458 - Posted: 3 Aug 2009, 12:18:18 UTC

CPDN main project

Milo has restarted climateapps3.oucs so all CPDN servers are now running.

The migration of the database to the new server has been postponed to next week at the earliest ie 10 August or later. This will involve a server shutdown. Please subscribe to a CPDN forum News thread (eg here) and in your account enable emails from the project to receive advance email notification of announcements about this.

Milo says that the combination of Carl's code revision and the new server has resulted in a credit calculation of around only half an hour. But the new scripts can only be activated when the new server starts running. Until then the old scripts will run but only about twice a week. So our credits will only be exported to the stats sites twice a week until the new server is installed.
ID: 26458 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26495 - Posted: 5 Aug 2009, 15:39:52 UTC

CPDN main project

Upload server uploader.oerc has been down for several hours. The server status page can be seen here. Its disk has filled up while Milo's been transferring data from one server disk to another.

Uploader.oerc takes decadal zip files uploaded from HadCM models if the files end in an even number eg _2 or _4. (HadCM files that end in an odd number like _1 go to climateapps3 which should accept them without problems.) Zip files numbered _1 from HadAM3P models will also be affected. (HadAM3P files numbered _2 and _3 upload to different servers and should transfer without problems.)

While uploader.oerc is down it would be a good idea to suspend Boinc network activity as much as possible. In the Boinc Transfers tab please do not press the Retry now button, and certainly not the Abort transfer button which would cause the file to be completely lost. Do not repeatedly suspend and reactivate Boinc network activity; your network connection will be fine.

If you wish you could suspend your HadCM and HadAM3P models until uploader.oerc is running again. This would prevent these models creating more files.
ID: 26495 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26526 - Posted: 7 Aug 2009, 13:41:58 UTC

CPDN main project

Uploader.oerc is now up and running.
ID: 26526 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26527 - Posted: 7 Aug 2009, 14:03:04 UTC
Last modified: 7 Aug 2009, 14:04:02 UTC

CPDN main project

Milo has announced:

'We are replacing a database server starting this Sunday night around 10PM BST (5PM EDT/2PM PDT), so we will probably be down through Monday morning. This is a much needed upgrade and we appreciate your patience during this outage. The PHP bulletin boards will still be available (as will the main CPDN website), but the CPDN/BOINC database access (credits/trickles/new signups/etc) will not be available.'

I expect that zip file uploads will still be possible, but not trickles. The CPDN-BOINC forum will not be available.

Carl has also posted about the outage here.
ID: 26527 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26551 - Posted: 9 Aug 2009, 10:59:38 UTC

CPDN main project

Amended time for start of outage

Milo and Carl are going to shut down the database server earlier than they originally planned. They now say:

We are replacing a database server starting this Sunday 5PM BST (12M EDT/9AM PDT) [16.00 UTC], so we will probably be down through Monday morning. This is a much needed upgrade and we appreciate your patience during this outage. The PHP bulletin boards will still be available (as will this website), but the CPDN/BOINC database access (credits/trickles/new signups/etc) will not be available.

It will not be possible to upload trickles, report completed models, visit our CPDN accounts or model web pages, use the CPDN-Boinc forum or fetch new work. The server status page will be down.

It will be possible to upload models' zip files, look at the Climateprediction website (except the Boinc part) and post on the independent forum.

You can allow your models to continue processing but may wish to suspend Boinc network activity as much as possible during the outage.

Carl has started forum threads about the outage here and here.

ID: 26551 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 26565 - Posted: 11 Aug 2009, 19:27:00 UTC
Last modified: 11 Aug 2009, 19:28:37 UTC

CPDN Main Project

Progress thru Processors seems to have taken off big time. Soon after the new server came on-line it was hit by 4,000 new PtP users, all attempting to download 100MB to run their first task. This sudden network load has effectively become a denial of service attack, making the server all but inaccessible. Carl and Tolu are feverishly working to mirror the downloads to reduce the server's network load. They are hoping that normal service will be resumed within a couple of hours.
ID: 26565 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 26568 - Posted: 12 Aug 2009, 2:18:44 UTC

Einstein is down at the moment. There is no information available as to the likely length of the outage. The only available information so far is a fairly terse email message from Oliver Bock which was posted some 10 hours ago, saying

Apparently our main file server has some issues again.
We're looking into it...

ID: 26568 · Report as offensive
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 26569 - Posted: 12 Aug 2009, 4:00:04 UTC

Einstein@home Project

There is a message on the home page that suggests the outage may last for several days :-).

Cheers,
Gary.
ID: 26569 · Report as offensive
Thyme Lawn

Send message
Joined: 2 Sep 05
Posts: 103
United Kingdom
Message 26571 - Posted: 12 Aug 2009, 10:31:22 UTC

CPDN Main Project

The new scheduler and database server is now fully functional (at about 10 times the speed of the system it replaced). Work is only available for the HADAM3P application at the moment; work for the other applications is currently being generated to make use of the new mirrored download system and should start appearing soon.

Milo has finally managed to get all of the HADAM3P data he'd stashed away back on to a server where the results website can access it. At some point today he will be taking down uploader1.atm.ox.ac.uk to copy its data off to yet another new server. uploader1 is the destination for one of the HADAM3P result files (_3.zip) and uploading of that file will not be possible while the server is being replaced.
ID: 26571 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 26593 - Posted: 13 Aug 2009, 18:05:22 UTC

CPDN main project

Uploader1.atm upload server is still down and Milo doesn't think he can activate the new server until Friday. This server takes _3.zip files from HadAM3P models when they complete. These models' _1 and _2 files should upload without problems as they are allocated to different servers. While uploader1.atm is still down here are some suggestions:

- You could suspend Boinc network activity as much possible until the server's running.
- You could suspend your HadAM3Ps before they complete and generate their zip files.
- You could do nothing at all because we do not expect this outage to last much longer.
- Your network connection will be fine. Please do not repeatedly activate and suspend Boinc network activity.
- Do not use the Retry now button in your Boinc Manager Transfers tab. Repeated failed attempts are not good for files.
- Do not abort the file transfers or the models. They are all good!
ID: 26593 · Report as offensive
ProfileMagic Quantum Mechanic
Avatar

Send message
Joined: 14 Aug 09
Posts: 8
United States
Message 26602 - Posted: 14 Aug 2009, 10:17:22 UTC
Last modified: 14 Aug 2009, 10:32:56 UTC

"Good" even though they will get turned in after the actual due dates?

I have so many and didn't have a chance to see what happened on the previous occasion.

(I don't abort them)

edit: I guess this is the answer

http://boinc.berkeley.edu/dev/forum_thread.php?id=4058&nowrap=true#25770
ID: 26602 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 26605 - Posted: 14 Aug 2009, 17:10:36 UTC - in response to Message 26602.  

The actual due dates is: "Finish them as quickly as possible, because the researchers want their results".
The long deadline date is a dummy value to try and prevent BOINC from going into panic mode.

ID: 26605 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5130
United Kingdom
Message 26606 - Posted: 14 Aug 2009, 18:00:15 UTC

Einstein

Update on their temporary front page:

Fri Aug 14 17:01:20 UTC 2009

The first attempt to fix the filesystem failed and the second attempt is
underway. We will have more news in about 36 hours.
ID: 26606 · Report as offensive
Profilecharleys

Send message
Joined: 15 Aug 09
Posts: 2
United States
Message 26609 - Posted: 15 Aug 2009, 2:28:31 UTC
Last modified: 15 Aug 2009, 2:29:30 UTC

EINSTEIN@HOME -

So are these sparse and vague little messages on the root index page, days apart, the best they can do?

I understand there is probably a lot of work to do, and too few to do it - who are either not paid, not paid enough or doing this at low priority relative to the tasks they are paid for...

But there are folks out here who also take a great interest in the projects - and to have so little information about what is going on just kinda sucks...

What else can we do to help?
ID: 26609 · Report as offensive
ZPM
Avatar

Send message
Joined: 14 Mar 09
Posts: 215
United States
Message 26610 - Posted: 15 Aug 2009, 3:03:10 UTC - in response to Message 26609.  

this type of crash has happened before and is a issue which they are working on, be patient...


all projects have a bad time every once in a while.
ID: 26610 · Report as offensive
Phil

Send message
Joined: 2 May 09
Posts: 9
United Kingdom
Message 26631 - Posted: 16 Aug 2009, 15:46:18 UTC - in response to Message 26609.  

EINSTEIN@HOME -

So are these sparse and vague little messages on the root index page, days apart, the best they can do?


Well lets see, its a weekend and instead of doing family stuff they have to spend it in the lab sending out for sandwiches and pizza. And they should sit composing polished HTML as well?

But there are folks out here who also take a great interest in the projects - and to have so little information about what is going on just kinda sucks...

I remember a lot of moaning at SETI@HOME which prompted this reply:

Matt Lebofsky wrote:

Matt Lebofsky
View profile
More options Jan 25 2001, 12:52 am

3.03 sucks so much because we cleaned up a bunch of sucking code in this version. We found several cases existed in 3.0 where the clients could process entire workunits without ever touching the suck functions. This was an undetected conflict between the science and suck code, and has been fixed in 3.03.
Now we are efficiently maximizing both science and sucking.
You are able to use 3.0 during random phases when I don't have mandatory upgrade messages turned on. I turn these on whenever I feel like it, and turn them off when 3.0 clients in tight loops load down our server. As of time of writing, you cannot download workunits with 3.0, but you may be able to later. Within a week, no matter what, 3.0 will be dead in the water. We can only accept results from suckless clients for so long.

- Matt - SETI@home



ID: 26631 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next

Message boards : Projects : News on project outages.

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.