Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
frank

Send message
Joined: 21 Nov 10
Posts: 3
United States
Message 50104 - Posted: 7 Aug 2013, 0:22:38 UTC - in response to Message 50094.  

flashawk: the electrical work was supposed to be complete by Monday...then the system re-build/repair work would start...don't forget Rule 1 of IT: everything takes longer than it takes !!!
ID: 50104 · Report as offensive
JIM

Send message
Joined: 19 Sep 10
Posts: 24
United States
Message 50115 - Posted: 7 Aug 2013, 14:02:54 UTC

CPDN was down last Friday even before they began the scheduled test of the electrical system at OERC. As you have pointed out they had a broken RAID last week that took several days to fix. They probably couldn’t even work on it over the weekend. I think that Weather@home works off the same servers so it too would be down.

I too have WU’s ready to report. At least the zip files are uploading properly.

ID: 50115 · Report as offensive
Profileritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 50116 - Posted: 7 Aug 2013, 14:54:32 UTC - in response to Message 50115.  

At least the zip files are uploading properly...

They are? Lucky you... :-) I've been getting the dreaded "Internet access OK - project servers may be temporarily down" message for the last 48-hours plus. :-(
ID: 50116 · Report as offensive
ProfileBlurf

Send message
Joined: 18 Jul 11
Posts: 217
United States
Message 50123 - Posted: 7 Aug 2013, 21:24:04 UTC

Seti forum appears to be down...
ID: 50123 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5124
United Kingdom
Message 50124 - Posted: 7 Aug 2013, 21:27:12 UTC - in response to Message 50123.  

Seti forum appears to be down...

There appears to be some maintenance work scheduled on backup servers at the CoLo facility.

http://systemstatus.berkeley.edu/ (CMR: 2278)
ID: 50124 · Report as offensive
[CSF] Thomas H.V. Dupont

Send message
Joined: 30 May 12
Posts: 356
France
Message 50125 - Posted: 7 Aug 2013, 21:30:56 UTC - in response to Message 50124.  

Thanks Richard ! :)
I post this on SETI is down cafe
ID: 50125 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2491
United States
Message 50130 - Posted: 7 Aug 2013, 23:07:18 UTC

Looks like Seti is back up.
ID: 50130 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2491
United States
Message 50133 - Posted: 8 Aug 2013, 0:29:26 UTC

Looks like they crashed again.
ID: 50133 · Report as offensive
JIM

Send message
Joined: 19 Sep 10
Posts: 24
United States
Message 50139 - Posted: 8 Aug 2013, 16:31:24 UTC - in response to Message 50116.  
Last modified: 8 Aug 2013, 16:32:08 UTC

At least the zip files are uploading properly...

They are? Lucky you... :-) I've been getting the dreaded "Internet access OK - project servers may be temporarily down" message for the last 48-hours plus. :-(


Yes, one of my hadcm3n WU’s finished overnight and the zip files uploaded just fine. Of course, the WU won’t be able to report until the outage is over. That makes 2 that I have sitting on my machines that are “ready to report”. Hopefully they will get it sorted out soon as I now have an vacant core that I am running MalariaControl on until I can get new work from CP.
ID: 50139 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 50140 - Posted: 8 Aug 2013, 17:48:42 UTC

Andy Bowery, system admin CPDN wrote:
Hi All,

So you will see that we have the majority of services restored now! climateapps2 has now been replaced completely. However I am aware of a number of issues which still need to be solved these are:

- cpdn_restarts is producing error messages
- server_status is not running
- project stats page is reporting an error
- uploader.oerc not available

I am aiming to get these solved by some time tomorrow. In the meantime please let me know of any other issues.

Thanks,

Andy
ID: 50140 · Report as offensive
Profileritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 50144 - Posted: 8 Aug 2013, 18:32:27 UTC - in response to Message 50140.  

Andy Bowery, system admin CPDN wrote:
So you will see that we have the majority of services restored now!

Unfortunately, I cannot see that... :-( I don't seem to be able to access any parts of the site that I used to be able to access: climateprediction.net, my account page, the forums, etc. The only response I get in the BOINC manager to update requests is:

8/8/2013 2:27:45 PM update requested by user
8/8/2013 2:27:46 PM Fetching scheduler list
8/8/2013 2:28:11 PM Project communication failed: attempting access to reference site
8/8/2013 2:28:13 PM Internet access OK - project servers may be temporarily down.

This is the same on all hosts running on three different networks.

Are these parts of the project that still aren't back? Did something else change that I might have missed? Or, could it be that there's so much traffic now no everybody is getting through?


ID: 50144 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 50145 - Posted: 8 Aug 2013, 19:08:59 UTC - in response to Message 50144.  
Last modified: 8 Aug 2013, 19:18:41 UTC

... Unfortunately, I cannot see that... :-( I don't seem to be able to access any parts of the site that I used to be able to access: climateprediction.net, my account page, the forums, etc. ...


Yes, doesn't look like it stayed up for very long. I'm not altogether surprised, they had to unexpectedly move everything onto a brand-new server because the original one was falling to pieces. The new configuration probably isn't quite right yet (CPDN has a lot of customisations on top of the standard version of Boinc). I have to confess, I will be pleased when everything is up again because I only have one model remaining. My money would be on tomorrow.
ID: 50145 · Report as offensive
Profileritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 50148 - Posted: 8 Aug 2013, 19:38:40 UTC - in response to Message 50145.  

Yes, doesn't look like it stayed up for very long. I'm not altogether surprised, they had to unexpectedly move everything onto a brand-new server because the original one was falling to pieces. The new configuration probably isn't quite right yet (CPDN has a lot of customisations on top of the standard version of Boinc). I have to confess, I will be pleased when everything is up again because I only have one model remaining. My money would be on tomorrow.

Thanks, Mike.

I'm sure it hasn't been easy for them and hopefully some stability and reliability will be reached soon. I'll keep crunching my models and take another dose of patience pills... ;-)

MarkR
ID: 50148 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15552
Netherlands
Message 50151 - Posted: 8 Aug 2013, 21:53:34 UTC - in response to Message 50145.  

It could also be that it's a DNS problem, where we just have to wait until the DNS address is propagated again to all DNS servers out there, before we see the site or people can upload to it.
ID: 50151 · Report as offensive
JIM

Send message
Joined: 19 Sep 10
Posts: 24
United States
Message 50153 - Posted: 9 Aug 2013, 7:20:25 UTC

It is now early Friday morning in the U.K. If they don’t get the problems fixed by end of business today does that mean that they will be shut out of the server room at OERC until Monday morning?

ID: 50153 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 50155 - Posted: 9 Aug 2013, 15:07:46 UTC
Last modified: 9 Aug 2013, 15:12:06 UTC

Just had an update from the CPDN admins - climateprediction.net is up & accessible, but while climateapps2 (the key boinc server) is now up, it does not seem to be accessible outside their local network. They are trying to figure out the problem, but at least we know that there are signs of life!!
ID: 50155 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 50158 - Posted: 9 Aug 2013, 16:59:41 UTC

A further update - the firewall is now fixed, and should be letting through connections to climateapps2. But the DNS settings for the server need to propagate through the internet (allow up to a day for this to happen everywhere).


ID: 50158 · Report as offensive
Profileritterm
Avatar

Send message
Joined: 4 Jul 08
Posts: 82
United States
Message 50160 - Posted: 9 Aug 2013, 17:29:42 UTC

Thanks for the updates, Mike. I can get to climateprediction.net but am now getting "403 Forbidden" trying to get anywhere on climateapps2. Is that DNS related? (No sarcasm intended...I honestly don't know).
ID: 50160 · Report as offensive
MikeMarsUK

Send message
Joined: 16 Apr 06
Posts: 386
United Kingdom
Message 50161 - Posted: 9 Aug 2013, 19:51:55 UTC - in response to Message 50160.  

Nope, it means that you are getting through to the server (hence the DNS must be OK), but the server itself isn't accepting web connections. Jord has let the admins know.


ID: 50161 · Report as offensive
ProfileSaenger
Avatar

Send message
Joined: 9 Nov 05
Posts: 123
Germany
Message 50180 - Posted: 11 Aug 2013, 16:39:15 UTC
Last modified: 11 Aug 2013, 16:41:57 UTC

Docking has had some failure, their HP looks like this:
Warning: session_start() [function.session-start]: open(/var/lib/php/session/sess_mtr1qu1l7jbg8tlivr4f3lg6c4, O_RDWR) failed: Read-only file system (30) in /boinc/projects/docking/html_v2/project/project.inc on line 32

Warning: mysql_pconnect() [function.mysql-pconnect]: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111) in /boinc/projects/docking/html_v2/inc/db.inc on line 23
Unable to connect to database - please try again later Error: 2002Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
Warning: Unknown: open(/var/lib/php/session/sess_mtr1qu1l7jbg8tlivr4f3lg6c4, O_RDWR) failed: Read-only file system (30) in Unknown on line 0

Warning: Unknown: Failed to write session data (files). Please verify that the current setting of session.save_path is correct (/var/lib/php/session) in Unknown on line 0


And my account page says this:
Forbidden

You don't have permission to access /home.php on this server.
Apache/2.2.6 (Fedora) Server at docking.cis.udel.edu Port 80


And BOINC says:
So 11 Aug 2013 16:44:26 CEST | Docking | [fxd] starting upload, upload_offset -1
So 11 Aug 2013 16:44:26 CEST | Docking | Started upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_0
So 11 Aug 2013 16:44:26 CEST | Docking | [file_xfer] URL: http://docking.cis.udel.edu/docking_cgi/file_upload_handler
So 11 Aug 2013 16:44:26 CEST | Docking | [fxd] starting upload, upload_offset -1
So 11 Aug 2013 16:44:26 CEST | Docking | Started upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_1
So 11 Aug 2013 16:44:26 CEST | Docking | [file_xfer] URL: http://docking.cis.udel.edu/docking_cgi/file_upload_handler
So 11 Aug 2013 16:44:26 CEST | Docking | [fxd] starting upload, upload_offset 0
So 11 Aug 2013 16:44:26 CEST | Docking | Started upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_2
So 11 Aug 2013 16:44:26 CEST | Docking | [file_xfer] URL: http://docking.cis.udel.edu/docking_cgi/file_upload_handler
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] http op done; retval 0 (Success)
So 11 Aug 2013 16:44:27 CEST | Docking | [error] Error reported by file upload server: can't open log file
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing upload response: <data_server_reply>    <status>1</status>    <message>can't open log file</message></data_server_reply>
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing status: -127
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] http op done; retval 0 (Success)
So 11 Aug 2013 16:44:27 CEST | Docking | [error] Error reported by file upload server: can't open log file
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing upload response: <data_server_reply>    <status>1</status>    <message>can't open log file</message></data_server_reply>
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing status: -127
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] http op done; retval 0 (Success)
So 11 Aug 2013 16:44:27 CEST | Docking | [error] Error reported by file upload server: can't open log file
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing upload response: <data_server_reply>    <status>1</status>    <message>can't open log file</message></data_server_reply>
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] parsing status: -127
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] file transfer status -127 (transient upload error)
So 11 Aug 2013 16:44:27 CEST | Docking | Temporarily failed upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_0: transient upload error
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] project-wide xfer delay for 18296.980512 sec
So 11 Aug 2013 16:44:27 CEST | Docking | Backing off 1 hr 6 min 47 sec on upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_0
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] file transfer status -127 (transient upload error)
So 11 Aug 2013 16:44:27 CEST | Docking | Temporarily failed upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_1: transient upload error
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] project-wide xfer delay for 14446.881880 sec
So 11 Aug 2013 16:44:27 CEST | Docking | Backing off 2 hr 0 min 20 sec on upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_1
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] file transfer status -127 (transient upload error)
So 11 Aug 2013 16:44:27 CEST | Docking | Temporarily failed upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_2: transient upload error
So 11 Aug 2013 16:44:27 CEST | Docking | [file_xfer] project-wide xfer delay for 14269.344979 sec
So 11 Aug 2013 16:44:27 CEST | Docking | Backing off 1 hr 14 min 32 sec on upload of 1m0b1hsg_mod0014crossdockinghiv1_119842_387017_0_2

Gruesse vom Saenger

For questions about Boinc look in the BOINC-Wiki
ID: 50180 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.