Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

AuthorMessage
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 16641 - Posted: 13 Apr 2008, 21:17:26 UTC - in response to Message 16640.  

Any info about it?

This thread.
ID: 16641 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 16655 - Posted: 14 Apr 2008, 11:34:47 UTC

CPDN model availability

Although the CPDN server status page shows that plenty of seasonal attribution HADAM models are available, these HADAM models in fact come from the Seasonal Attribution Project server which is still down for maintenance. Members who want a HADAM model will have to wait two or three days to get one, or can select a different type in the CPDN preferences of their account.
ID: 16655 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 16692 - Posted: 16 Apr 2008, 23:19:08 UTC

CPDN upload server climateapps3 is down, which means that some computers won't be able to upload trickles and zip files. At the moment we don't know what has caused the problem.

If you encounter this problem, the best idea is to suspend BOINC network activity in the BOINC manager Activity menu. Multi-project crunchers who can't suspend network activity may prefer to suspend their climate models before they reach the next trickle of zip file creation point until the server problem is resolved. (If you do this, set CPDN to No New Tasks first to avoid getting extra unwanted models.)

CPDN server status page

ID: 16692 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 16693 - Posted: 17 Apr 2008, 11:57:50 UTC

SAP server

This server is, to quote Milo, in 'a bit of a mess' and needs to have its operating system reinstalled (not by one of CPDN's programmers). Milo says that if this is done later today as planned, the SAP server should be up and running again tomorrow.

SAP members still crunching models may have final zip files waiting in their Transfers window to be uploaded. After the first upload attempt is made, these zip files can safely stay in the Transfers window for up to two weeks. Then they time out. At the moment no-one's SAP zip files should be in danger of timing out.
ID: 16693 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 16694 - Posted: 17 Apr 2008, 12:07:14 UTC

CPDN servers

Upload server climateapps3 is again up and running, accepting trickles and zip files.

SAP HADAM models will not be available for CPDN crunchers until the SAP server is fixed (see the above post).
ID: 16694 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 16736 - Posted: 19 Apr 2008, 13:39:43 UTC

Predictor is offline due to some emergency repairs to the Chemistry Building. It'll be Monday before it is fixed.
ID: 16736 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 16764 - Posted: 21 Apr 2008, 12:14:33 UTC

Server update

SAP

The SAP server is running again but is under great pressure and may or may not allow trickle and zip file uploads, and may or may not provide HADAM seasonal models for CPDN members who have selected this model option in their CPDN preferences.

At some point this week the SAP server will be shut down for a system upgrade. When this will be Milo cannot say because it is entirely up to the IT person in AOPP (a different Oxford department). He was planning to do it last Friday...

Until the SAP server is back to normal, SAP members may want to suspend their models and crunch something else. CPDN members unable to get a SAP model could consider selecting a different type of model.

CPDN

* There's a big job running on the database server. It has been running since Saturday morning. This is to do with urgent firefighting that relates to processing data for physicists. This may cause slow response times on the CPDN-BOINC website and forum and slow access to CPDN project web pages (eg those showing model details).

* One of the CPDN upload servers, uploadcomlab, is down. Milo shut it down to check what disks are in it (ready for an upgrade) and when he re-connected it one of the power supplies blew. It will return when Milo can order some new disks and also a power supply. So this upload server problem could last for several days.

This means that some CPDN members will not be able to upload trickle and zip files. If you encounter this problem, please consider suspending network activity (BOINC manager Activity menu) to avoid repeated failed upload attempts and multiple BOINC messages. We realise that this may be impossible for multi-project crunchers.

You can check CPDN server status here.

Many thanks to all members for your patience, cooperation and good humour.
ID: 16764 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 16766 - Posted: 21 Apr 2008, 13:43:29 UTC

Predictor@Home is back on line. The building work seems to have been concluded.
ID: 16766 · Report as offensive
Nicolas

Send message
Joined: 19 Jan 07
Posts: 1179
Argentina
Message 16829 - Posted: 24 Apr 2008, 16:16:54 UTC

Cels@Home is closing.
*** ATTENTION - Cels@Home will be shutting down ***

Unfortunately, we are going to have to stop the Cels@Home project.
The project may start up again in the future, but if it does, the new server will have a different url name.

We are going to run one last set of jobs before we stop, however, issuing workunits until about 3 May.
We expect to keep this server running for some time after that, collecting the results of these jobs for analysis, and keeping the message board open.
[...]
I personally will have to stop working on this project after 10 May. Since I have been doing the technical work, this means the project will have to be shut down.

ID: 16829 · Report as offensive
ProfileKSMarksPsych
Avatar

Send message
Joined: 30 Oct 05
Posts: 1239
United States
Message 16883 - Posted: 26 Apr 2008, 8:54:25 UTC - in response to Message 16881.  

Seems QMC@Home is entirely off the air, forum, servers, front page.... no work:
26/04/2008 09:48:48|QMC@HOME|Sending scheduler request: To fetch work. Requesting 14865 seconds of work, reporting 0 completed tasks
26/04/2008 09:48:53|QMC@HOME|Scheduler request succeeded: got 0 new tasks
26/04/2008 09:48:53|QMC@HOME|Message from server: Server can't open log file (../log_QAH/scheduler.log)


Should be back up now.

A hardware failure of our new RAID system
brought the server down. Now everything is up and running again. I will try
to find out more about the error and wether it is likely to occure again now.

Have a nice weekend,
Martin

Kathryn :o)
ID: 16883 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 17152 - Posted: 3 May 2008, 20:44:51 UTC

cpdn

As of about 12 hours ago, the file server uploadcomlab is running.
There were some problems with climateapps2 at about that time, no doubt due to it's multi-function role, but this has also been working OK since then.

ID: 17152 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17183 - Posted: 4 May 2008, 15:28:54 UTC

With regard to CPDN upload server uploadcomlab, Milo says
Uploadcomlab has a new power supply but no new disks, so it's back in service to take some more uploads over the weekend. It will go down again for new disks when the missing part of my order actually turns up (some PCI SATA adapter cards).

Because there are several CPDN upload servers, this uploading problem only affected some CPDN crunchers. If you turned off BOINC network activity because your trickles and zip files couldn't upload, please enable network activity again now.

Milo also says he now expects the SAP server to remain problem-free.
ID: 17183 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 17267 - Posted: 9 May 2008, 11:14:59 UTC

Cosmology news:

May 8, 2008 Project down later on today
Project will be going down later on today. Since we'll be doing OS updates, the web server will also be down.

May 9, 2008 Maintainence Updates
The website is mostly up right now, but I'm still ironing out some bugs having to do with logging in. The project daemons seems to be up and running but I haven't verified that they are working properly yet. More updates to come.

May 9, 2008 Update
After much shenanigans, logging in is now working again, so the message board is up. Still working on the rest.

May 9, 2008 Break for a while
Still having problems with the validator. After I grab some shut-eye, I'll work on it some more (it's only permission problems).
ID: 17267 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 17285 - Posted: 9 May 2008, 22:55:26 UTC

Cosmology News update:

May 9, 2008 Another update
The project daemons seems to be mostly working at this point. There are still some problems with scheduling, though, and there are *far* too many WUs in the queue due to the population script not being able to get queries out in time, so we're going to wait a while to see if it settles. However, it looks like the database is doing *much* better and there are no hanging queries. I'll keep you posted on changes.
ID: 17285 · Report as offensive
7cures

Send message
Joined: 10 May 08
Posts: 1
United States
Message 17290 - Posted: 10 May 2008, 17:01:43 UTC - in response to Message 17285.  

Cosmology News update:

May 9, 2008 Another update
The project daemons seems to be mostly working at this point. There are still some problems with scheduling, though, and there are *far* too many WUs in the queue due to the population script not being able to get queries out in time, so we're going to wait a while to see if it settles. However, it looks like the database is doing *much* better and there are no hanging queries. I'll keep you posted on changes.


Hey. The WCG website is down. Was able to comm last on their forum, though with probs: hangups, freezes etc... now can't even log in,,,
ID: 17290 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17300 - Posted: 11 May 2008, 14:55:06 UTC

CPDN main project

CPDN upload server phkup is down. Milo says
That server is down because the filesystem containing the cpdn data was corrupted when the admin upgraded the OS. They are now restoring from backups and say it should be back on-line tomorrow.

This will affect the upload of some computers' trickles and zip files. If you can, suspend BOINC network activity while the problem lasts.
ID: 17300 · Report as offensive
ProfileJord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15569
Netherlands
Message 17301 - Posted: 11 May 2008, 19:03:49 UTC

Cosmology@Home update:

May 10, 2008 Database and Bandwidth errors
After more work, the source of a number of the database and bandwidth errors were located and fixed. Please report if the situation pops up again.

May 11, 2008 Work generator back online
It looks like work is flowing a bit better now, so I'm going to restart the work generator and see what happens.


ID: 17301 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17304 - Posted: 12 May 2008, 19:33:27 UTC

CPDN main project

The phkup upload server is up and running again, but upload server uploadcomlab is down. Milo is backing up its data today so it can have its new disk installed tomorrow. This means that some different CPDN computers are now unable to upload trickles and zip files. If you find that your computer can't upload to CPDN, if possible please suspend BOINC network activity for the time being.

SAP project

The SAP server has been problematic for several days. SAP crunchers have not been able to upload trickles and files. As SAP models are provided for CPDN crunchers from this server, there may also be difficulty in downloading new SAP HADAM models. Milo will investigate this server tomorrow.
ID: 17304 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17348 - Posted: 15 May 2008, 6:59:33 UTC


CPDN main project

Upload server uploadcomlab is now working.

There was no CPDN credit export to the stats sites on 12 and 13 May, but on 14 May the credit export started again.

ID: 17348 · Report as offensive
mo.v
Avatar

Send message
Joined: 13 Aug 06
Posts: 778
United Kingdom
Message 17355 - Posted: 15 May 2008, 23:12:34 UTC

SAP project and SAP HADAM models from the CPDN main project

There is a temporary but serious problem with the SAP server. CPDN SAP-HADAM models come from this server. When SAP models contact the server they will produce BOINC manager messages like this or similar:

15/05/2008 10:35:34|CPDN Seasonal Attribution Project|Fetching scheduler list
15/05/2008 10:35:39|CPDN Seasonal Attribution Project|Master file download succeeded
15/05/2008 10:35:45|CPDN Seasonal Attribution Project|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
15/05/2008 10:35:50|climateprediction.net|Scheduler request succeeded: got 0 new tasks
15/05/2008 10:35:50|climateprediction.net|You used the wrong URL for this project
15/05/2008 10:35:50|climateprediction.net|The correct URL is http://climateprediction.net/
15/05/2008 10:35:50|climateprediction.net|You seem to be attached to this project twice
15/05/2008 10:35:50|climateprediction.net|We suggest that you detach projects named climateprediction.net,
15/05/2008 10:35:50|climateprediction.net|then reattach to http://climateprediction.net/
15/05/2008 10:35:50|climateprediction.net|Already attached to a project named climateprediction.net (possibly with wrong URL)
15/05/2008 10:35:50|climateprediction.net|Consider detaching this project, then trying again
15/05/2008 10:35:50|climateprediction.net|Message from server: Invalid or missing account key. Visit this project's web site to get an account key.

Please do not detach! You have not really got an extra account and you are not really attached twice.
Please do not reset the SAP or CPDN project!
Please do not abort your SAP HADAM models from CPDN or SAP!
CPDN needs your SAP models - only last week a new experiment using these HADAM models was launched!
Please suspend your SAP HADAM models for the time being.
If you are already getting messages like what I've quoted, please if possible suspend BOINC network activity until the problem is solved.
Milo will be doing everything he can to resolve the problem on Friday.
ID: 17355 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 10 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.