Thread 'World Community Grid has announced an extended outage from Feb 14 to April 22, 2022'

Message boards : Projects : World Community Grid has announced an extended outage from Feb 14 to April 22, 2022
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10

AuthorMessage
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2704
United Kingdom
Message 110708 - Posted: 12 Dec 2022, 13:06:27 UTC - in response to Message 110707.  

lol.... Another day, more "stuck" downloads for ARP tasks
They are behaving much better for me today than they did yesterday.
ID: 110708 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110716 - Posted: 13 Dec 2022, 7:47:19 UTC - in response to Message 110707.  

News from WCG:
2022-12-12 update (ARP & HTTP errors)
Hi everyone, we are currently working to fix the BOINC HTTP errors being reported, and investigating what may have caused them, and the possible correlation with ARP workunits.

Last week, we identified an issue with the ARP pipeline where new workunits were not being distributed due to a backlog of volunteers’ completed results accumulating on WCG servers. Normally, completed results would be downloaded by the specific research teams, archived to tape in our datacenter, marked as done, and removed from the production system to make room in the pipeline for new work. However, due to a download issue on the ARP side, backlog of completed WUs started to grow, triggering an automated project pause. We have since been in contact with the ARP team to address the backlog. They confirmed it is only a temporary issue, and the team plans to resume downloading completed results tomorrow. Once that happens, we will be able to revert the changes we put into effect late last week that increased the upper limit on the backlog we allow before distribution of new ARP workunits is halted. While we did not expect these changes to have any adverse consequences, we are investigating the possibility and examining the specific workunits that have since erred.

We will share a technical breakdown of the situation once we can rule out coincidences from cause.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team at Krembil Research Institute
ID: 110716 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110774 - Posted: 19 Dec 2022, 17:43:33 UTC

ID: 110774 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2704
United Kingdom
Message 110775 - Posted: 19 Dec 2022, 17:53:24 UTC - in response to Message 110774.  

WCG has posted this message to FarceBook:
We are aware of the issues with WUs. We are investigating the cause of it and will report more details in the technical update.


Interestingly, I just got a resend of an ARP task and four downloads were downloading faster than one was before. I think I was getting about 6 times the data throughput albeit, still not maxing out even my bored band.
ID: 110775 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 420
Sweden
Message 110790 - Posted: 21 Dec 2022, 13:42:39 UTC

And the website is screwed up again.

"System error
World Community Grid is currently experiencing an unexpected error. Please check Facebook or Twitter for more information."

And of course, nothing about that on Facebook, or Twitter
ID: 110790 · Report as offensive
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2704
United Kingdom
Message 110791 - Posted: 21 Dec 2022, 17:44:17 UTC - in response to Message 110790.  

And the website is screwed up again.

"System error
World Community Grid is currently experiencing an unexpected error. Please check Facebook or Twitter for more information."

And of course, nothing about that on Facebook, or Twitter
Seems to be working normally again for me right now.
ID: 110791 · Report as offensive
ProfileBill Freauff
Avatar

Send message
Joined: 26 Mar 11
Posts: 192
United States
Message 110817 - Posted: 25 Dec 2022, 20:02:09 UTC - in response to Message 109507.  
Last modified: 25 Dec 2022, 20:03:20 UTC

For the Mini-nuke's ... do they offer a contractor's discount for a half dozen ?

Bill F
ID: 110817 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110901 - Posted: 9 Jan 2023, 23:03:54 UTC

WCG has posted to Facebook almost 30 minutes ago that they are currently experiencing website problems... And are looking into cause.
ID: 110901 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 420
Sweden
Message 110902 - Posted: 10 Jan 2023, 2:42:04 UTC - in response to Message 110901.  
Last modified: 10 Jan 2023, 2:43:01 UTC

WCG has posted to Facebook almost 30 minutes ago that they are currently experiencing website problems... And are looking into cause.

The whole UHN hospital network in Toronto is down. Major problems for UHN, with other words. So WCG is still down. Most likely a cyber attack.

https://globalnews.ca/news/9397133/uhn-network-outage/
https://www.cbc.ca/news/canada/toronto/ont-uhn-1.6708319
https://www.cp24.com/news/expect-delays-toronto-s-uhn-hospital-network-remains-under-code-grey-amid-digital-systems-outage-1.6223289
ID: 110902 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110903 - Posted: 10 Jan 2023, 9:56:49 UTC - in response to Message 110902.  

Apparently Krembil / Hospital University IT Dept got things working again.

Completed tasks upload & new ones download fine for me.
ID: 110903 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110915 - Posted: 13 Jan 2023, 22:43:24 UTC

WCG has posted a new update about the status of the project
ID: 110915 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 110966 - Posted: 26 Jan 2023, 8:16:28 UTC

NEWS from WCG: 2023-01-25 Update (ARP & OPN1 workunits)
ARP & OPN1 workunits

On Monday afternoon, many volunteers reported receiving new ARP1 and OPN1 workunits. These workunits are not from a new batch; these are older WUs that were never sent out due to an overloaded server causing problems in our workunit-distribution process. ARP1 and OPN1/OPNG teams remain on temporary pause, preparing new workunits.

In addition, this infusion of about 2 million WUs helped us to confirm that the networking/download issues we have in the data center persist under a normal load. Improvements made by the SHARCNET team did reduce network congestion. However, based on these results, they are now implementing further modifications to the network, which should resolve these issues for the future. We will keep you updated with further details about the upcoming maintenance, once we receive more information from the SHARCNET team.

Thank you for sending reports of HTTP errors that were experienced by volunteers processing the recent ARP1/OPN1 workunits, which helped us diagnose these errors. The effect is especially strong after an outage, because of the pent-up demand by all the connected BOINC clients. The backlog of workunits released for distribution over the last few days produced the same effect. We continue working together with the SHARCNET team on improving our network. In parallel, we are finalizing the SSD storage upgrade we mentioned in December, and this will also help in improving WCG backend performance.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team
ID: 110966 · Report as offensive
Sir LanDroid

Send message
Joined: 7 Apr 13
Posts: 64
United States
Message 111223 - Posted: 8 Mar 2023, 15:56:01 UTC

Another round of cascading hardware failures since about 3/1, replacing one thing and finding another problem. Here's the latest update from WCG - they're posting to Facebook & Twitter since their web site is down.

Hello everyone, hope you had a great weekend. We are still working with data centre to resolve the hardware failure so we can restart the storage, BOINC and website ASAP. We will post updates as we receive them. Thank you for your patience.

Update: Unfortunately, additional hardware problem on the storage server besides the RAID card are preventing us from restarting. Working with the data center on the alternative solutions.

Update #2: Unfortunately, the RAID controller was not the root cause of our storage system failure, the PCI bus failed. Data center is in the process of moving the disks to an alternate system and we will post updates as we progress. Once again, thank you for your patience.

March 6, 2023
https://www.facebook.com/worldcommunitygrid.org/
ID: 111223 · Report as offensive
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 420
Sweden
Message 111224 - Posted: 8 Mar 2023, 16:13:49 UTC - in response to Message 111223.  
Last modified: 8 Mar 2023, 16:14:42 UTC

Another round of cascading hardware failures since about 3/1, replacing one thing and finding another problem. Here's the latest update from WCG - they're posting to Facebook & Twitter since their web site is down.

Hello everyone, hope you had a great weekend. We are still working with data centre to resolve the hardware failure so we can restart the storage, BOINC and website ASAP. We will post updates as we receive them. Thank you for your patience.

Update: Unfortunately, additional hardware problem on the storage server besides the RAID card are preventing us from restarting. Working with the data center on the alternative solutions.

Update #2: Unfortunately, the RAID controller was not the root cause of our storage system failure, the PCI bus failed. Data center is in the process of moving the disks to an alternate system and we will post updates as we progress. Once again, thank you for your patience.

March 6, 2023
https://www.facebook.com/worldcommunitygrid.org/
Much more about the WCG problem, in the following thread, and from this post and forward: https://boinc.berkeley.edu/forum_thread.php?id=10279&postid=111141#111141
ID: 111224 · Report as offensive
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1444
United States
Message 111413 - Posted: 24 Mar 2023, 21:31:37 UTC

All future postings about (WCG) World Community Grid Go in:
* Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid *
ID: 111413 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10

Message boards : Projects : World Community Grid has announced an extended outage from Feb 14 to April 22, 2022

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.