Thread 'News on Project Outages'

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 67 · Next

AuthorMessage
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2493
United States
Message 90210 - Posted: 22 Feb 2019, 5:44:51 UTC - in response to Message 90209.  

Gary can you upload?

Answer is, of 5 boxes I have checked only 1 of them (a Raspberry Pi) has a W/U stuck uploading, but it may be the 4 haven't had a competed W/U since the project went down so they haven't tried to upload.

Also https://albert.phys.uwm.edu is down.
ID: 90210 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5129
United Kingdom
Message 90212 - Posted: 22 Feb 2019, 9:21:21 UTC

I run the Binary Radio Pulsar Search application. Tasks were downloading, uploading, and reporting OK yesterday while the website was down. But uploads failed around the time that Jim1348 reported that his uploads were stuck, and they haven't moved since.
ID: 90212 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 90213 - Posted: 22 Feb 2019, 10:07:23 UTC

My CPU uploads are OK; five have gone in the last 10 hours. But GPU uploads are still stuck.
Maybe they go to a different place?
ID: 90213 · Report as offensive     Reply Quote
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 130
Australia
Message 90214 - Posted: 22 Feb 2019, 10:26:28 UTC - in response to Message 90212.  

... uploads failed around the time that Jim1348 reported that his uploads were stuck, and they haven't moved since.
Uploads had started failing somewhat earlier. On one of my machines an upload succeeded at 7:42PM UTC (Feb 21st), whilst the next one at 7:59PM, and all subsequent, have been fails. The problem started at least a half hour before Jim's report.

Interestingly, on that same machine, there were a couple of uploaded tasks that were successfully reported at 8:41PM UTC, nearly an hour after the uploads started failing. Not too long after that however, even reporting must have failed since I've noticed another machine with 3 uploaded tasks that hasn't been able to report them.

All my machines are out of work with zillions of tasks stuck in upload and with multi-hour back-offs ticking down. It's around 8:30PM here and I've been hoping the problem gets fixed so that I can run a script to cancel the back-offs and then head off home. The script will also replenish data files from my cache to stop unnecessary downloads for resend tasks that will inevitably turn up once each host is able to start getting fresh work. Hopefully this might get sorted soonish :-(.
Cheers,
Gary.
ID: 90214 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5129
United Kingdom
Message 90215 - Posted: 22 Feb 2019, 10:31:11 UTC - in response to Message 90213.  

My BRPS tasks are also GPU, and the uploads are sent in the first instance to einstein4.aei.uni-hannover.de

But the ultimate failure, as it has been several times in recent weeks, is HTTP/1.1 504 Gateway Time-out: that's an onwards transmission to another server, with might be either in Hannover or in Milwaukee, Wisconsin.
ID: 90215 · Report as offensive     Reply Quote
anniet
Avatar

Send message
Joined: 12 Jul 14
Posts: 656
Zambia
Message 90216 - Posted: 22 Feb 2019, 12:40:34 UTC

If it's any consolation, my work fetch blunder
of Tuesday is turning into a much happier
event than it felt at the time. Yes :)

I don't usually have more than 9 Einstein tasks
in progress, but as of 7 this morning, I still had
84 yet to start.

One of the h1 cpu tasks did somehow upload
and report itself midst the carnage of others
that didn't though. I'm not sure what time
that was, although my head is saying it was
around 11pm.
ID: 90216 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 90219 - Posted: 22 Feb 2019, 13:19:20 UTC

The website is now up, but GPU uploads are not working yet.
I see no explanation on the forums, and will leave it to a knowledgeable person to query them.
ID: 90219 · Report as offensive     Reply Quote
ProfileRichie

Send message
Joined: 2 Jul 14
Posts: 186
Finland
Message 90220 - Posted: 22 Feb 2019, 13:21:10 UTC
Last modified: 22 Feb 2019, 13:22:08 UTC

Yep. Server status says scheduler daemon is the only one currently running. "The database server is not accessible".
ID: 90220 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5129
United Kingdom
Message 90222 - Posted: 22 Feb 2019, 14:30:46 UTC

Sawn Kwang has posted in the Technical News area.

It appears they had two separate problems: a power outage and a cooling failure.

On 2019-02-20, at about 1930 UTC there was a power outage at UWM. The E@H Web site front-end went down when the power shut off, but power has been restored.

Update: After power was restored to UWM, the data-center which houses the E@H infrastructure had a cooling failure. The end-result is that we moved servers to a new data-center, ...
Then, there's a second post about networking:

Re the Server Status page: It looks like the server status page is not working; it says everything is down. This is probably due to the networking at UWM is not fully operational yet after the power outage and data-center migration.
I think that's our problem with uploads too. Either the network still hasn't been properly configured for the new IP addresses and routing, or we're still waiting for new DNS settings to propagate. We can't do anything about either.
ID: 90222 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 90224 - Posted: 22 Feb 2019, 16:08:39 UTC

After a manual retry, all of my GPU work units have uploaded and are reported.
Case closed.
ID: 90224 · Report as offensive     Reply Quote
ProfileRichie

Send message
Joined: 2 Jul 14
Posts: 186
Finland
Message 90498 - Posted: 4 Mar 2019, 17:49:38 UTC

Asteroids is down...
ID: 90498 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2703
United Kingdom
Message 90691 - Posted: 14 Mar 2019, 14:21:59 UTC

cpdn.org went down a while ago. climateprediction.net front page and static pages are there but no response from servers or forums. Perhaps the wind has brought something down?
ID: 90691 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 90692 - Posted: 14 Mar 2019, 14:25:21 UTC - in response to Message 90691.  
Last modified: 14 Mar 2019, 14:35:05 UTC

cpdn.org went down a while ago.

I have not been able to access my statistics for a day or two. Now I don't even get the front page.
EDIT: Now I get the forums, and now my statistics too. I hope it lasts.
ID: 90692 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2703
United Kingdom
Message 90693 - Posted: 14 Mar 2019, 14:34:58 UTC - in response to Message 90692.  

I have not been able to access my statistics for a day or two. Now I don't even get the front page.


The forums are back up and scheduler request just completed. Stats have also just updated.

I think someone has changed the weekly weekend running of the stats batch file for a random number generator.
ID: 90693 · Report as offensive     Reply Quote
ProfileTigers Dave

Send message
Joined: 24 Dec 05
Posts: 52
United States
Message 91303 - Posted: 30 Apr 2019, 23:06:18 UTC

Looks like Collatz is down. I was having trouble downloading and uploading tasks for the last few hours prior to the shutdown.
ID: 91303 · Report as offensive     Reply Quote
ProfilePierre A Renaud
Avatar

Send message
Joined: 19 Jan 18
Posts: 66
Canada
Message 91323 - Posted: 1 May 2019, 13:42:07 UTC
Last modified: 1 May 2019, 13:43:37 UTC

WCG - Planned Maintenance on Wednesday, May 1, 2019

30 Apr 2019
https://www.worldcommunitygrid.org/about_us/viewNewsArticle.do?articleId=594

Summary
We are updating the operating system on our servers on Wednesday, May 1, beginning at 19:00 UTC.

We will be applying an important operating system update to our servers on Wednesday, May 1, beginning at 19:00 UTC. We anticipate that the work will take approximately two hours.

During some of this time, volunteers will not be able to upload or download new work, and the website will not be accessible.

Volunteers will not need to take any particular action, as your devices will automatically retry their connections after the maintenance work is completed.

ID: 91323 · Report as offensive     Reply Quote
ProfileTigers Dave

Send message
Joined: 24 Dec 05
Posts: 52
United States
Message 91324 - Posted: 1 May 2019, 13:48:02 UTC - in response to Message 91303.  

Collatz now appears to be functioning properly.
ID: 91324 · Report as offensive     Reply Quote
Dougga

Send message
Joined: 24 Mar 08
Posts: 16
Message 91487 - Posted: 14 May 2019, 17:19:21 UTC

Collatz Conjecture - Down
Even their website throws errors for me. No communications at all.
Work Units piling up.
ID: 91487 · Report as offensive     Reply Quote
Dougga

Send message
Joined: 24 Mar 08
Posts: 16
Message 91488 - Posted: 14 May 2019, 18:15:23 UTC - in response to Message 91487.  
Last modified: 14 May 2019, 18:25:22 UTC

Collatz Conjecture Outage: further information

URL: boinc.thesonntags.com/collatz
The URL resolves to 67.167.89.131
This in turn resolves back to "c-67-167-89-131.hsd1.il.comcast.net".

A traceroute suggests the server is down or unreachable:
traceroute to 67.167.89.131 (67.167.89.131), 30 hops max, 40 byte packets using UDP

1 192.168.0.1 (192.168.0.1) 0.769 ms 0.552 ms 0.464 ms
2 * * * LOCAL DETAILS HIDDEN
3 * * * LOCAL DETAILS HIDDEN
4 * * * LOCAL DETAILS HIDDEN
5 * * * LOCAL DETAILS HIDDEN
6 68.86.86.225 (68.86.86.225) 3.131 ms 5.244 ms 4.176 ms
7 68.86.84.206 (68.86.84.206) 32.530 ms 31.859 ms *
8 * * *
9 68.86.85.169 (68.86.85.169) 51.131 ms * *
10 * * *
11 * * *
12 * * *
13 68.87.204.74 (68.87.204.74)(H!) 52.094 ms * *

Note: "H!" = host, network or protocol unreachable
Did someone forget to pay Comcast?
ID: 91488 · Report as offensive     Reply Quote
Dougga

Send message
Joined: 24 Mar 08
Posts: 16
Message 91490 - Posted: 14 May 2019, 20:31:52 UTC - in response to Message 91488.  

Quite possible that ping is restricted and the root problem is simply a certificate expiring or something annoying like that.
ID: 91490 · Report as offensive     Reply Quote
Previous · 1 . . . 18 · 19 · 20 · 21 · 22 · 23 · 24 . . . 67 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.