News on Project Outages

Message boards : Projects : News on Project Outages
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 49 · 50 · 51 · 52 · 53 · 54 · 55 . . . 62 · Next

AuthorMessage
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111254 - Posted: 10 Mar 2023, 16:46:56 UTC
Last modified: 10 Mar 2023, 16:50:15 UTC

Yeah, no support after business hours. Incredible.

To the question "How long should I expect to wait for support?", on this page: https://helpwiki.sharcnet.ca/wiki/FAQ,
The answer is:

"Unfortunately Compute Canada/SHARCNET does not have adequate funding to provide support 24 hours a day, 7 days a week.
User support and system monitoring is limited to regular business hours: there is no official support on weekends or holidays,
or outside 9:00 - 17:00 EST .

Please note that this includes monitoring of our systems and operations, so typically when there are problems overnight or on
weekends/holidays system notices will not be posted until the next business day."


So, no wonder then that everything, including the migration from IBM, takes such long time, compared to when WCG was run by IBM.
That state of affairs is not going to work in the long run. If there's no support outside of business hours, WCG will slowly fade away.
ID: 111254 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111255 - Posted: 10 Mar 2023, 19:35:56 UTC
Last modified: 10 Mar 2023, 19:47:25 UTC

WCG New update, 15 minutes ago:

"Update #5: The storage server was revived yesterday late afternoon. Both database filesystems mounted as before,
but the science filesystem did not. It needs a repair; erasing the old log first."
ID: 111255 · Report as offensive     Reply Quote
[CSF] Aleksey Belkov

Send message
Joined: 3 Mar 23
Posts: 14
Russia
Message 111256 - Posted: 10 Mar 2023, 23:59:17 UTC - in response to Message 111253.  

Just find other projects to use your computer time. No use complaining. Nothing is going to change.

"Came, offended, left." (=

Perhaps, before inflating further hysteria that "everything is lost", still wait for this story ends and only THEN draw any conclusions (especially with calls to abandon the project)?
ID: 111256 · Report as offensive     Reply Quote
Profile Contact
Avatar

Send message
Joined: 29 Aug 05
Posts: 71
Canada
Message 111259 - Posted: 11 Mar 2023, 16:12:36 UTC - in response to Message 111246.  

as we have learned by now, SHARCNET (Shared Hierarchical Academic Research Computing Network),
does not help their customers, (at least not WCG) during Weekends, evenings, and nights.
SHARCNET has free access to Compute Canada for academic research.
https://youtu.be/hWkWAaNBILs?t=146

Free makes sense. I don't see a flow of cash to the project. Limited service makes sense from a free service. It's actually amazing to have any service at all for no charge! After all, somebody (Canadian taxpayer) is paying for replacement parts and labour and delivery etc...
It also makes sense that this system is now overburdened by World Community Grid. It was not set up with the intention to host anything like a huge BOINC project.
Good on these people for still trying to help us.
They are relentless :)
ID: 111259 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 111261 - Posted: 11 Mar 2023, 21:19:47 UTC

Asteroids@home is back online.


ID: 111261 · Report as offensive     Reply Quote
Profile Yavanius
Avatar

Send message
Joined: 19 May 15
Posts: 123
Antarctica
Message 111270 - Posted: 12 Mar 2023, 21:03:11 UTC - in response to Message 111215.  

[Dennis currently telling me it has no work available.


(Wonders if anybody ever reads anything on the projects or just connect blindly...)


DENIS is realizing work in large batches as they fine-tune their models. They just finished the last batch and posted the results to News.


Ironically, it's one main researcher who is overseeing the project and he is a professor at the University (there seem to be a team in the background analyzing things though). He posts and communicates more than the whole Krembil team... I do wonder if the communications intern doesn't know what to post or they aren't letting her post.

Someday I see in an interview: I was a communications intern at Krembil but they never wanted to let me post updates about failures occurring at the project...
ID: 111270 · Report as offensive     Reply Quote
Profile Yavanius
Avatar

Send message
Joined: 19 May 15
Posts: 123
Antarctica
Message 111271 - Posted: 12 Mar 2023, 21:06:57 UTC - in response to Message 111261.  
Last modified: 12 Mar 2023, 21:20:57 UTC

Asteroids@home is back online.


Asteroids@home periodically runs out of work. They just came back to activity rather recently after a hiatus of a few years after their old hardware bit the dust. It's one person who is running the project probably on a shoe-string budget. I'm sure he'd be ecstatic if he got the rounding error of the budget LHC has. ^_^
ID: 111271 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 68
United States
Message 111272 - Posted: 13 Mar 2023, 2:06:12 UTC - in response to Message 111261.  
Last modified: 13 Mar 2023, 2:07:50 UTC

Asteroids@home is back online.


Maybe.

But I Ihave tasks stuck in uploading, can't access my account, can't get to their message boards or the Home Page.

S. Gaber
Oldsmar, FL
ID: 111272 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1329
United States
Message 111273 - Posted: 13 Mar 2023, 3:26:54 UTC - in response to Message 111272.  

Asteroids@home is back online.


Maybe.

But I Ihave tasks stuck in uploading, can't access my account, can't get to their message boards or the Home Page.

S. Gaber
Oldsmar, FL

I have not had any problems accessing the website from DFW Metro area in Texas nor server access to send/receive tasks since the new certificate was installed.

Restart your web browser and/or empty the browser cache to clean out old information it might contain. Then try accessing the forums.

For stuck tasks in BOINC go to the transfers tab in advanced view and select 4 to 6 Asteroids tasks and retry upload until you have successfully transferred all.
ID: 111273 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 68
United States
Message 111274 - Posted: 13 Mar 2023, 6:28:35 UTC - in response to Message 111273.  

Asteroids@home is back online.


Maybe.

But I Ihave tasks stuck in uploading, can't access my account, can't get to their message boards or the Home Page.

S. Gaber
Oldsmar, FL

I have not had any problems accessing the website from DFW Metro area in Texas nor server access to send/receive tasks since the new certificate was installed.

Restart your web browser and/or empty the browser cache to clean out old information it might contain. Then try accessing the forums.

For stuck tasks in BOINC go to the transfers tab in advanced view and select 4 to 6 Asteroids tasks and retry upload until you have successfully transferred all.


Still getting downloads from Universe. But all 26 or my tasks in Transfer say "Upload pennding: project backoff."
ID: 111274 · Report as offensive     Reply Quote
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2518
United Kingdom
Message 111276 - Posted: 13 Mar 2023, 9:07:47 UTC

(Wonders if anybody ever reads anything on the projects or just connect blindly...)
I did have a look at their forums but obviously not carefully enough!
ID: 111276 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1329
United States
Message 111279 - Posted: 13 Mar 2023, 16:19:33 UTC - in response to Message 111259.  

It has been nearly 2 weeks since WCG crashed & burned into the ether.

Another Monday 1/2 gone and nothing but cricket's from Krembil about what if anything is happening with the RAID STORAGE failure at WCG.

WCG Facebook page: https://facebook.com/197379135651/
ID: 111279 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111281 - Posted: 13 Mar 2023, 18:44:04 UTC
Last modified: 13 Mar 2023, 18:44:42 UTC

Yup, last update from WCG according to the timestamp of the tweet on Twitter, was March 10, at 19:19 UTC.
Now, it's March 13, 18:44 UTC.
ID: 111281 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111283 - Posted: 13 Mar 2023, 21:30:06 UTC
Last modified: 13 Mar 2023, 21:43:33 UTC

New WCG Update, 20 minutes ago:

"The web pages and forums are back online, but the recovery process continues.
As a result, performance is slower than usual, and not all functionality is there.
Until we can restart the science database and BOINC, stats/contributions are not
accurate. We will provide further updates as we progress. Thank you for your patience."


Edit, added: Well to say that the website is back, was to go a bit too far. Not possible to log in. "System Error", or "503 Service Unavailable", is the response to any attempt to log in.
ID: 111283 · Report as offensive     Reply Quote
Phillip Spencer

Send message
Joined: 3 Mar 23
Posts: 10
France
Message 111288 - Posted: 14 Mar 2023, 9:28:35 UTC
Last modified: 14 Mar 2023, 9:29:04 UTC

WCG website back and forums working (but, sadly, no official communication update yesterday)
ID: 111288 · Report as offensive     Reply Quote
Phillip Spencer

Send message
Joined: 3 Mar 23
Posts: 10
France
Message 111289 - Posted: 14 Mar 2023, 13:47:02 UTC - in response to Message 111288.  

WCG website back and forums working (but, sadly, no official communication update yesterday)

It looks like I spoke too soon. Website and forums down once more with the "System Error" message again. This does not bode well for the overall recovery.
ID: 111289 · Report as offensive     Reply Quote
Warped
Avatar

Send message
Joined: 25 Aug 08
Posts: 39
South Africa
Message 111290 - Posted: 14 Mar 2023, 13:50:24 UTC - in response to Message 111288.  

Our website is currently down and we are looking into the root cause and a method to fix it. We will post a follow-up when it has been resolved.

The latest news on WCG from their Twitter feed. Seems to be going backwards!
ID: 111290 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111291 - Posted: 14 Mar 2023, 13:52:50 UTC
Last modified: 14 Mar 2023, 13:54:29 UTC

New WCG Update, on FB and Twitter, 15 minutes ago:

Our website is currently down and we are looking into the root cause and a method to fix it.
We will post a follow-up when it has been resolved.
ID: 111291 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111292 - Posted: 14 Mar 2023, 15:36:26 UTC
Last modified: 14 Mar 2023, 15:53:08 UTC

First the problem was with the RAID card. Then a borrowed card from the data centre was installed, and then they managed to successfully rebuild the RAID array.
That didn't help, so now they said the problem was the PCI bus. (how could they successfully rebuild the RAID array with a broken PCI Bus?)

So, another storage system (DSS 7000) was installed by the data center, and again rebuilt the RAID array. "The "new" system did recognize the data hardware RAIDs.
All have been rebuilt, and the data center is attempting to repair the OS drives/RAID." Later on "The storage server was revived yesterday late afternoon. Both database
filesystems mounted as before, but the science filesystem did not. It needs a repair; erasing the old log first." So, yesterday the website came back, but then took a dive
again some hours later. BOINC is still MIA of course.

I think they are chasing ghosts, and looking in the wrong direction. As said before: how could they successfully rebuild the RAID array, the first time, (after they first changed
only the RAID card), with a broken PCI Bus.
ID: 111292 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 372
Sweden
Message 111293 - Posted: 14 Mar 2023, 16:55:24 UTC
Last modified: 14 Mar 2023, 16:59:45 UTC

New WCG Update, on FB and Twitter. 30 minutes ago:

Update: The system error has been resolved and all users should regain access to the website. Thank you for your patience.

I doubt the website will stay up, for long. Still no BOINC....
ID: 111293 · Report as offensive     Reply Quote
Previous · 1 . . . 49 · 50 · 51 · 52 · 53 · 54 · 55 . . . 62 · Next

Message boards : Projects : News on Project Outages

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.