Thread 'Anything and Everything to do with (WCG) World Community Grid'

Author	Message
kasdashdfjsah Send message Joined: 29 Jan 24 Posts: 60	Message 115478 - Posted: 21 Feb 2025, 8:55:08 UTC still no tasks available ID: 115478 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2789	Message 115481 - Posted: 21 Feb 2025, 10:20:11 UTC - in response to Message 115478. In reply to kasdashdfjsah's message of 21 Feb 2025: still no tasks available Last task I got was about 14 hours ago, an ARP resend. Till they release the next batch of ARP, resends are about all we are likely to get. I have system set to only take ARP so don't know about MCM. ID: 115481 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2789	Message 115482 - Posted: 21 Feb 2025, 11:26:40 UTC - in response to Message 115478. In reply to kasdashdfjsah's message of 21 Feb 2025: still no tasks available ARP141s now released. ID: 115482 · Reply Quote

Doug Send message Joined: 11 Mar 22 Posts: 4	Message 115490 - Posted: 22 Feb 2025, 18:25:47 UTC Hi all, Does anyone have any actual info on whether there will ever be any more OPN work? There seems to be no useful info on the WCG site. Thanks. Doug ID: 115490 · Reply Quote

[CSF] Aleksey Belkov Send message Joined: 3 Mar 23 Posts: 15	Message 115491 - Posted: 22 Feb 2025, 23:13:17 UTC - in response to Message 115490. Last modified: 22 Feb 2025, 23:13:37 UTC In reply to Doug's message of 22 Feb 2025: Does anyone have any actual info on whether there will ever be any more OPN work? Hi Obviously, you should ask that on WCG forum. ID: 115491 · Reply Quote

kasdashdfjsah Send message Joined: 29 Jan 24 Posts: 60	Message 115512 - Posted: 28 Feb 2025, 18:33:35 UTC - in response to Message 115482. Still no tasks, for over a week now on my M4 Mac mini ID: 115512 · Reply Quote

kasdashdfjsah Send message Joined: 29 Jan 24 Posts: 60	Message 115539 - Posted: 2 Mar 2025, 21:01:37 UTC - in response to Message 115512. In reply to kasdashdfjsah's message of 28 Feb 2025: Still no tasks, for over a week now on my M4 Mac mini Update: Now getting 10 concurrent WCG tasks again, but only from the Mapping cancer sub project, not the africa rainfall project and open pandemics covid-19 sub projects. Didn't get anything from these before, but still. ID: 115539 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2789	Message 115547 - Posted: 4 Mar 2025, 10:02:10 UTC And now getting Feeder not running error from WCG ID: 115547 · Reply Quote

PMH_UK Send message Joined: 24 Dec 10 Posts: 50	Message 115549 - Posted: 4 Mar 2025, 14:27:10 UTC - in response to Message 115547. From https://www.cs.toronto.edu/~juris/jlab/wcg.html March 4, 2025 Services seem to be down. We are working on identifying and fixing the issue. Paul. ID: 115549 · Reply Quote

PMH_UK Send message Joined: 24 Dec 10 Posts: 50	Message 115550 - Posted: 4 Mar 2025, 19:22:41 UTC - in response to Message 115549. March 4, 2025 Services seem to be down. We are working on identifying and fixing the issue. BOINC db node crashed. Thus, all running BOINC services, API services and message queues that need to talk to db01 die similarly; the connection is closed, although the node itself is still running. 10:38 am ET: Crash recovery starting now. We should be able to restart all the services soon. 12:21 pm ET: crash recovery successful; bounced all services; restarted the feeder; should start to see work going out again. Paul. ID: 115550 · Reply Quote

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2789	Message 115551 - Posted: 5 Mar 2025, 13:45:20 UTC March 5, 2025 The system seems to be down (again) - we will investigate. [url] https://www.cs.toronto.edu/~juris/jlab/wcg.html[/url] I managed to get one ARP task this morning before it all fell over again. Can't get onto any of the user pages on their site currently but BOINC seems to be contacting the server OK. I just can't change my project settings to allow MCM tasks. Getting no tasks available for Africa Rain Forest at the moment. ID: 115551 · Reply Quote

Grumpy Swede Send message Joined: 30 Mar 20 Posts: 451	Message 115554 - Posted: 5 Mar 2025, 19:34:43 UTC Still down, but the BOINC part is working. I'm getting MCM tasks. ID: 115554 · Reply Quote

PMH_UK Send message Joined: 24 Dec 10 Posts: 50	Message 115555 - Posted: 5 Mar 2025, 20:01:46 UTC - in response to Message 115554. March 5, 2025 The DHCP lease issue, or whatever the root cause of our production VMs losing all network access at an increasing rate such that we are almost sure to experience a server crash multiple times a week, is being investigated by hosting. Our plan to resolve this regardless of the outcome of the investigation is to fully migrate most production boxes to Kubernetes including the DB2, Websphere, and IBM MQ "axis" of the website/forums and webservices provided by WCG. Previously, we had only provisioned QA on the Kubernetes cluster, and intended to further provision and deploy containers running Mesos workers as our first production boxes orchestrated by Kubernetes on the new hardware to blue/green deploy and eventually move the coordinator responsibilities and finally all workunit management pipeline responsibilities to Kubernetes running Mesos, which would give us fault tolerance at last as we pick apart all the old Mesos job descriptions and crontabs to fully migrate to Kubernetes, Slurm, Redpanda, and distributed postgres (Citus-Data). Once finished, we will decomission Aurora/Mesos and the old CentOS 7 boxes that run and coordinate the Mesos cluster, and provision new VMs with an LTS version of Ubuntu as we have on the new hardware to add that capacity to the Kubernetes cluster. We apologize for the delays to the start of the MAM project, we did not account for the sword of damocles hanging over every production server and falling with increasing frequency, nor for the reduced capacity of the environment in the new year. Thank you for your patience and understanding, we will be starting MAM shortly, as soon as we are through this issue. Paul. ID: 115555 · Reply Quote

Dr Who Fan Send message Joined: 10 May 07 Posts: 1490	Message 115556 - Posted: 5 Mar 2025, 22:14:05 UTC - in response to Message 115555. The latest update on WCG: 4:27pm ET: db02 is back online; the website is back up; the DHCP agents were flushed on the old WCG nodes, which should resolve the issue I can confirm website is back online ID: 115556 · Reply Quote

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.