Posts by Thyme Lawn

21) Message boards : Projects : News on Project Outages (Message 50608)
Posted 23 Sep 2013 by Thyme Lawn
Post:
CPDN main project

The CPDN BOINC webpages are back up and trickles are being accepted again.

Jonathan says the 403 access forbidden problem was caused by a failure to mount the project's NFS partitions after an unexpected reboot at 0300 UTC yesterday:

The servers running on our VM infrastructure seem to have rebooted at 4 am BST on 22 Sept.

The main webserver failed to re-mount its NFS partitions, upon which the website resides.

This is now fixed, and I am making investigations.
22) Message boards : Projects : News on Project Outages (Message 46707)
Posted 10 Dec 2012 by Thyme Lawn
Post:
climateapps2 at Oxford has been down since some time Saturday so Climate prediction dot net not available.

I've just received the following update from Jonathan:

On Saturday there was a power outage in Oxford, that took out OeRC's machine room.

Currently CPDN is completely offline. ClimateApps2, the database, and the webserver are all down.

Lots of things in the machine room did not come back on properly, and so I took this opportunity to work on some of the infrastructure here.
I am in the process of moving our critical servers to a cabinet that is protected by a UPS, and routing the networking through proper switches rather than the current series of daisy-chained 8 port hubs.

I should have it up again tomorrow.
23) Message boards : Questions and problems : Microsoft update SNAFU (Message 44260)
Posted 22 May 2012 by Thyme Lawn
Post:
Microsoft have fixed the detection problem which was causing this (see here).
24) Message boards : Questions and problems : Microsoft update SNAFU (Message 44257)
Posted 22 May 2012 by Thyme Lawn
Post:
There's an ongoing problem with .NET Framework 2.0 and 3.5 updates for XP and Server 2003 which causes 3 security updates (KB2518864, KB2572073 and KB2633880) to be repeatedly installed. The problem started some time today (possibly at around midnight EST).

If you have opted to install updates automatically (the default) it's possible the updates are being repeatedly applied in the background. That's not happening with my XP system as I have that option set to "Download updates for me, but let me choose when to install them" (I get a notification that they need to be installed soon after successfully installing them). You can change the option in Start - Settings - Control Panel - Automatic Updates.

According to my update history KB2633880 was included in the February 2012 set of updates, KB2572073 in the October 2011 set and KB2518864 in the June 2011 set. It also shows they've all been successfully installed 5 times today.

There are loads of threads about this on the Microsoft support forums (and elsewhere), including windows update keep installing "kb2518864"/"kb2572073"/"kb2633880" repeatedly

Apparently Microsoft have acknowledged there's a problem and have promised a fix. The 4th reply in the thread I've linked is a sledgehammer solution from a M$ support engineer (it uninstalls all of the .NET Framework versions and reinstalls them from scratch), but I'm not going to take that route unless Microsoft officially announce it can't be fixed in any other way.
25) Message boards : Projects : News on Project Outages (Message 37482)
Posted 13 Apr 2011 by Thyme Lawn
Post:
WUProp has an unannounced outage since this morning 0500 UTC.

It's back up now. Nothing posted about the outage yet.
26) Message boards : Projects : News on Project Outages (Message 37131)
Posted 9 Mar 2011 by Thyme Lawn
Post:
CPDN main project

We are fairly sure that the problem which resulted in 5,429 CPDN users losing varying amounts of credit after the database work has been identified. Jonathan is working to fix this.

Jonathan and Milo are still working to make more space available on climateapps1 to allow completion of stalled uploads of HadAM3P regional restart dumps (the *_13.zip files).
27) Message boards : Projects : News on Project Outages (Message 37114)
Posted 8 Mar 2011 by Thyme Lawn
Post:
CPDN main project

The scheduler has been restarted and it is now possible to upload trickles, report completed tasks, request new work, create new accounts and attach new computers.

It is still not possible to upload the final file generated by HadAM3P regional tasks (*_13.zip) as climateapps1.oucs.ox.ac.uk is currently out of disk space. Jonathan is working to make more space available.

A significant number of users are currently affected by credit anomalies, mostly with credits below the level calculated before the database maintenance started. We've been here after previous major periods of database maintenance. As before these credit problems will be resolved as a background task by the project team.
28) Message boards : Projects : News on Project Outages (Message 37089)
Posted 5 Mar 2011 by Thyme Lawn
Post:
CPDN main project

We are now into the final stages of the database maintenance.

Although the BOINC message board is back online the scheduler is still disabled. This means it is still not possible to create accounts or attach new computers to the project and all scheduler requests will continue to fail (this includes uploading trickles, reporting completed tasks and requesting new work).

Upload of most result files is possible, but the final upload file generated by HadAM3P regional tasks (*_13.zip) can't be uploaded at the moment. These files contain the restart dumps required to generate follow-up tasks and are sent to climateapps1.oucs.ox.ac.uk.

By keeping these features disabled the project team can make a direct comparison between the credits calculated before the old database was archived and those calculated using the optimised database.

The project will not be brought fully back to service until the project team are confident that the credit script is working correctly.
29) Message boards : Projects : News on Project Outages (Message 37067)
Posted 3 Mar 2011 by Thyme Lawn
Post:
CPDN main project

The database backup has been completed but the maintenance is taking longer than expected. It is now unlikely that the database will be back up before Friday afternoon. If there any further delays the down time could extend into next week.
30) Message boards : Projects : News on Project Outages (Message 37041)
Posted 1 Mar 2011 by Thyme Lawn
Post:
CPDN main project

Unfortunately the backup script (which takes 16 hours to run) failed overnight. It will have to be run again before the planned database maintenance can start, so it will be at least another 36 hours before the database will be running again.
31) Message boards : Projects : News on Project Outages (Message 37036)
Posted 28 Feb 2011 by Thyme Lawn
Post:
CPDN main project

The scheduled database maintenance has started and it is anticipated that service will be restored within 36 hours.
32) Message boards : Projects : News on Project Outages (Message 37012)
Posted 25 Feb 2011 by Thyme Lawn
Post:
CPDN main project - important database maintenance on Monday 28 February 2011

The CPDN main project database will offline on Monday in order to facilitate some much-needed maintenance.

During the maintenance period the BOINC forums will be inaccessible, it will not be possible to create accounts or attach new computers to the project and all scheduler requests will fail (this includes uploading trickles, reporting completed tasks and requesting new work). Upload of result files will be unaffected.

The aim is to reduce the time taken for the database backup which is thought to be the current cause both of the BOINC board and scheduler being unavailable for long periods and the slow connections from clients when they are.

The work will take many hours because it involves running the tortuously slow backup script and then archiving old data in the existing (huge) tables.

The database will certainly be offline all of Monday and possibly longer. Updates on the progress will be posted in the News thread on the phpBB forum.
33) Message boards : Projects : News on Project Outages (Message 36808)
Posted 10 Feb 2011 by Thyme Lawn
Post:
CPDN main project

Uploads to kraken.oerc.ox.ac.uk are now possible, but there may be delays in the upload of FAMOUS files until the server deals with the backlog of files buffered up on our computers.

Milo is now making space on uploader1.atm.ox.ac.uk for the _3, _6, _9 and _12 files from HadAM3P regional models. He is moving 1.6TB of FAMOUS data first to enable the machine to be set running and will then start transferring the 18TB of HadAM3P files ...
34) Message boards : Projects : News on Project Outages (Message 36780)
Posted 8 Feb 2011 by Thyme Lawn
Post:
CPDN main project

The upload server used by FAMOUS tasks on CPDN main project (kraken.oerc.ox.ac.uk) is currently offline.

This is necessary to complete the work required after transferring 11TB of FAMOUS data to a new server to make space for more of our upload files. FAMOUS uploads will remain stuck in the transfer queue until kraken comes back online (hopefully later today).
35) Message boards : Projects : News on Project Outages (Message 34462)
Posted 31 Aug 2010 by Thyme Lawn
Post:
CPDN main project

Milo has temporarily shut the project down to perform a database archive.

This means that all scheduler requests will fail (including trickles, work requests, reporting completed tasks and attaching to the project) and the BOINC forums are inaccessible.

uploader.oerc is also unavailable while more space is being made available by moving files elsewhere. This affects FAMOUS upload files 3, 6, 9, 12, 15 and 18.
36) Message boards : Projects : News on Project Outages (Message 34330)
Posted 23 Aug 2010 by Thyme Lawn
Post:
CPDN Main Project Upload Problem

uploader.oerc.ox.ac.uk is back up but Milo is not sure how stable it is going to be (a disk in the RAID array was showing a SMART failure). He is contacting the manufacturer for support.
37) Message boards : Projects : News on Project Outages (Message 34306)
Posted 21 Aug 2010 by Thyme Lawn
Post:
CPDN Main Project Upload Problem

uploader.oerc.ox.ac.uk has a problem which is causing uploads to fail with errors like the following:

21/08/2010 20:23:57	climateprediction.net	Started upload of famous_uab6_1799_200_006646205_2_6.zip
21/08/2010 20:23:58	climateprediction.net	[error] Error reported by file upload server: can't open file
21/08/2010 20:23:58	climateprediction.net	Temporarily failed upload of famous_uab6_1799_200_006646205_2_6.zip: transient upload error

The cause of the error cannot be investigated until Monday and the resumption of normal service will depend on what's found then.

This condition applies to FAMOUS upload files 3, 6, 9, 12, 15 and 18. You might find that other files are also stuck due to the way BOINC backs off uploads under error conditions, but you should be able to force their upload by selecting them in the Transfers tab of BOINC Manager's advanced view and clicking the Retry Now button.

As said before, if you have files stuck in Transfers and unable to upload you can either suspend BOINC network activity or take no action.
38) Message boards : Projects : News on project outages (Message 30885)
Posted 7 Feb 2010 by Thyme Lawn
Post:
CPDN main project

Upload server cpdn-upload1.comlab is down and Milo will have no access to it until Monday. Here's the server status page.

This server accepts the #2 file produced at the end of HadAM3P models. Files #1 and #3 are uploaded to different servers and are not affected by this outage. But #2 will remain stuck in the Transfers tab, unable to upload at the moment. Please do not try to force it to upload as it cannot. Other model types are not affected.

You could:
* temporarily suspend HadAM3P models before they complete so they do not generate their three files while the server is down (but remember that CPDN cannot send you new models while any of its tasks are suspended)
* or in the Boinc manager Activity menu temporarily suspend network activity
* or do nothing and continue as normal

All 4 upload files for HadSM3MH tasks are also affected by this outage (no tasks are currently being issued for this application but many are still being run). The upload files are generated at 25%, 50%, 75% and on completion.
39) Message boards : Questions and problems : 6.6.40 SIGSEGV on Gentoo (Message 27707)
Posted 1 Oct 2009 by Thyme Lawn
Post:
By default, the stderrdae.txt and stdoutdae.txt files can be found in your BOINC Data directory. Its position can be read in the 2nd or 3rd line when starting up BOINC.

That will only happen on Linux if BOINC is started with the command line arg --redirectio
40) Message boards : Questions and problems : % of time BOINC client is running 0.675 % (Message 26849)
Posted 28 Aug 2009 by Thyme Lawn
Post:
If you're happy editing an XML file you could stop BOINC and edit the file client_state.xml in your BOINC data directory.

At about line 25 you'll see something like

<on_frac>0.006750</on_frac>

Change the ".00" to ".99", save the file, restart BOINC and your work requests should start working.


Previous 20 · Next 20

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.