Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 30 Aug 05
Posts: 505
Canada
Message 45501 - Posted: 31 Aug 2012, 17:11:26 UTC

From the CPDN - News and Announcements thread:

The hard disk running the operating system on upload server uploader1.atm has failed.
A replacement disk has been ordered and will be installed when it arrives on Monday.
Until that has been completed all uploads to that server (primarily the end of month uploads for most HadAM3P EU tasks) will fail.

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=5447&nowrap=true#44782
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7449&nowrap=true#44783
ID: 45501 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 45998 - Posted: 15 Oct 2012, 20:20:41 UTC
Last modified: 15 Oct 2012, 20:25:01 UTC

Looks like Einstein just took a nosedive. Even http://www.downforeveryoneorjustme.com reports it as down.

And they're back, slow but back.
ID: 45998 · Report as offensive
Bernd

Send message
Joined: 24 Aug 09
Posts: 91
United States
Message 46049 - Posted: 19 Oct 2012, 15:41:51 UTC

CPDN is having upload problems for the last 48h and I have seen no posts about it on the CPDN website forums. Anyone know what is going on with them?
ID: 46049 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 46050 - Posted: 19 Oct 2012, 16:09:45 UTC
Last modified: 19 Oct 2012, 16:10:21 UTC

CPDN Main Project

The Server Status page shows one of the upload servers out of action. The upload problem is primarily affecting Hadam EU (European) models, but files from other regional areas (SAF and PNW) and Hadcm global models may also be included in the transfer backoffs imposed by BOINC.

There is also some evidence that some (not all) files of some Hadam EU models cannot download and become stuck in the Transfers tab. These models cannot begin to run, but do not abort them as we hope that their final file downloads will complete when the servers are all fully functional again.

Please do not try to obtain new CPDN models for your computer by pressing the Update button in the Projects tab because almost no new models are available and our computers are only allowed to contact the CPDN server once per hour. Pressing the Retry now button in the Transfer tab will probably not help uploads to take place faster.

Thank you to all CPDN members for your patience.
ID: 46050 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 46103 - Posted: 24 Oct 2012, 14:52:58 UTC

Seti and Seti Beta forgot to pay their electricity bill?
ID: 46103 · Report as offensive
WezH

Send message
Joined: 1 Oct 12
Posts: 90
Finland
Message 46105 - Posted: 24 Oct 2012, 15:24:32 UTC - in response to Message 46103.  

Seti and Seti Beta forgot to pay their electricity bill?


It seems that BOINC master database 'carolyn' is down.

Unable to connect to database - please try again later
Error: 2003Can't connect to MySQL server on 'carolyn' (111)

ID: 46105 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 46111 - Posted: 24 Oct 2012, 20:10:21 UTC - in response to Message 46105.  
Last modified: 24 Oct 2012, 20:12:58 UTC

And they're back.

Edit: The forums at least. BOINC still ain't able to report work. Not weird with the master database still disabled.
ID: 46111 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 46115 - Posted: 25 Oct 2012, 10:01:05 UTC

CPDN main project

Andy says that CPDN researcher Neil Massey is preparing a new batch of workunits but he is not yet sure exactly when they'll be released.

Jonathan is freeing up space on uploader1.atm so that uploads can proceed. The amount of data involved is enormous which is why the job takes time. Until Jonathan enables this server again the uploads assigned to it cannot yet proceed and our files will remain stuck in the Transfers tab. You may find that your _13 files do upload because they are assigned to a different server.

The waiting files are allowed to remain in Transfers for weeks so we do not need to worry about them. They are allowed a total of 100 upload attempts. This number is more than enough even if the wait is long but please do not reduce your 100 attempts by repeatedly pressing the Retry now button. Similarly, pressing the Update button in the Projects tab cannot not make more models download if none are available.

Check the Server status page for the condition of the servers and the availability of models. It is a good idea to subcribe to the News thread to receive email notifications of new announcements.

Thanks to all members for your patience.
ID: 46115 · Report as offensive
Christoph

Send message
Joined: 16 Feb 12
Posts: 5
Germany
Message 46136 - Posted: 28 Oct 2012, 6:31:39 UTC

T4T is offline. No idea why.
ID: 46136 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 46170 - Posted: 31 Oct 2012, 1:11:03 UTC

CPDN main project

The server problems are gradually being sorted out.

Download server climateapps2 filled up last week. Among other problems this caused the credit script to fail with the result that trickles uploaded from 24 October have not yet received their credits. Jonathan has fixed this so the script should start to run again tomorrow, Wednesday. Our credits will not, however, appear on the stats sites immediately as they first have to be exported by CPDN.

ID: 46170 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 46328 - Posted: 17 Nov 2012, 0:06:55 UTC

Einstein@Home/Albert@Home pre-outage warning:
There is a planned power outage on the UWM campus affecting the physics building on Sat Nov 17, lasting from 7 AM - 2 PM local time (should be 1 PM - 8 PM UTC).

According to current plans the network connection to the outside world will be available, so we'll keep the core project servers running on UPS for that time, and ideally you here on Einstein@Home shouldn't notice anything (Albert@Home will be shut down, though). Not all may actually go according to plans, though.
ID: 46328 · Report as offensive
Bernd Machenschalk
Avatar

Send message
Joined: 7 Sep 05
Posts: 15
Message 46333 - Posted: 17 Nov 2012, 13:39:16 UTC - in response to Message 46328.  
Last modified: 17 Nov 2012, 13:41:24 UTC

Einstein@Home/Albert@Home pre-outage warning:
There is a planned power outage on the UWM campus affecting the physics building on Sat Nov 17, lasting from 7 AM - 2 PM local time (should be 1 PM - 8 PM UTC).

According to current plans the network connection to the outside world will be available, so we'll keep the core project servers running on UPS for that time, and ideally you here on Einstein@Home shouldn't notice anything (Albert@Home will be shut down, though). Not all may actually go according to plans, though.


The UPS failed. The project will be shut down immediately.

BM

Edit: Keep fingers crossed that it will come up again without problems.
ID: 46333 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 46356 - Posted: 19 Nov 2012, 12:06:37 UTC
Last modified: 29 Nov 2012, 12:01:27 UTC

Seti:

Continued server problems.
We're continuing to have issues due to a database problem early last week and a botched attempt to fix it.

The problem is that the result and host tables in the database have grown large enough, and hosts have gotten fast enough that the lookup of result in process for a host and the enumeration of new results to send don't finish before the web connection times out either on the server or the client side. This resulted in hosts being assigned large number or results to compute without the transaction that tells them about these results being completed. The host. think it received no results would then contact the server for more results, which it would again not receive.

This isn't a hardware problem. The database currently fits in memory and the processors are fast. We've just crossed a threshold where each host computes fast enough that host queues and the result table have become large enough to cause this problem. To solve it, we've put per host limits on results in process back in place. But hosts that are having this problem will probably continue to have it until the average number of results per host has fallen to a workable level. That could take weeks.

For a more permanent fix, we plan do more work in each result by quadrupling the size of the workunits. But that fix will probably take months to implement and test.
ID: 46356 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2495
United States
Message 46446 - Posted: 23 Nov 2012, 19:28:15 UTC
Last modified: 23 Nov 2012, 19:28:38 UTC

Seti:

Scheduler crashes have continued, so we'll be down until we've isolated and solved the problem.
ID: 46446 · Report as offensive
ProfileBlurf

Send message
Joined: 18 Jul 11
Posts: 217
United States
Message 46447 - Posted: 23 Nov 2012, 20:42:48 UTC - in response to Message 46446.  

Seti:

Scheduler crashes have continued, so we'll be down until we've isolated and solved the problem.


I would expect it to be down through the weekend.
ID: 46447 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 46469 - Posted: 25 Nov 2012, 15:25:51 UTC

Moowrap has been down -- no access to home page since early Saturday -- perhaps we will see it live during the coming week.
ID: 46469 · Report as offensive
BarryAZ

Send message
Joined: 4 Sep 09
Posts: 381
United States
Message 46490 - Posted: 28 Nov 2012, 17:21:57 UTC - in response to Message 46469.  

Moowrap came back up late Sunday night, no explanation, no user comment. It went back down again early Wednesday morning -- no explanation. Still offline. I suspect best approach is to suspend process, and set no new work. When it does come back up (I'm hopeful that happens), I'll let my existing work upload and report and process existing queues, but let things drain out until the folks back at the project can provide some idea of what we can expect in the coming weeks.



Moowrap has been down -- no access to home page since early Saturday -- perhaps we will see it live during the coming week.

ID: 46490 · Report as offensive
TJ

Send message
Joined: 17 Oct 09
Posts: 90
Netherlands
Message 46492 - Posted: 29 Nov 2012, 11:59:24 UTC
Last modified: 29 Nov 2012, 11:59:45 UTC

Am I the only one with problems reaching Rosetta?

Can't upload, get new work, report, or go to the main page.
ID: 46492 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15567
Netherlands
Message 46493 - Posted: 29 Nov 2012, 12:01:17 UTC - in response to Message 46492.  

Firefox can't find the server at boinc.bakerlab.org

Says all, I think. ;)
ID: 46493 · Report as offensive
TJ

Send message
Joined: 17 Oct 09
Posts: 90
Netherlands
Message 46495 - Posted: 29 Nov 2012, 13:03:32 UTC - in response to Message 46493.  

Firefox can't find the server at boinc.bakerlab.org

Says all, I think. ;)


Just to be sure that it is not at my end...
ID: 46495 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.