Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message board moderation
Previous · 1 . . . 17 · 18 · 19 · 20
Author | Message |
---|---|
Send message Joined: 29 Jan 24 Posts: 60 |
still no tasks available |
![]() Send message Joined: 28 Jun 10 Posts: 2789 ![]() |
In reply to kasdashdfjsah's message of 21 Feb 2025: still no tasks available Last task I got was about 14 hours ago, an ARP resend. Till they release the next batch of ARP, resends are about all we are likely to get. I have system set to only take ARP so don't know about MCM. |
![]() Send message Joined: 28 Jun 10 Posts: 2789 ![]() |
In reply to kasdashdfjsah's message of 21 Feb 2025: still no tasks available ARP141s now released. |
Send message Joined: 11 Mar 22 Posts: 4 |
Hi all, Does anyone have any actual info on whether there will ever be any more OPN work? There seems to be no useful info on the WCG site. Thanks. Doug |
Send message Joined: 3 Mar 23 Posts: 15 ![]() |
In reply to Doug's message of 22 Feb 2025: Does anyone have any actual info on whether there will ever be any more OPN work? Hi Obviously, you should ask that on WCG forum. |
Send message Joined: 29 Jan 24 Posts: 60 |
Still no tasks, for over a week now on my M4 Mac mini |
Send message Joined: 29 Jan 24 Posts: 60 |
In reply to kasdashdfjsah's message of 28 Feb 2025: Still no tasks, for over a week now on my M4 Mac mini Update: Now getting 10 concurrent WCG tasks again, but only from the Mapping cancer sub project, not the africa rainfall project and open pandemics covid-19 sub projects. Didn't get anything from these before, but still. |
![]() Send message Joined: 28 Jun 10 Posts: 2789 ![]() |
And now getting Feeder not running error from WCG |
Send message Joined: 24 Dec 10 Posts: 50 ![]() |
From https://www.cs.toronto.edu/~juris/jlab/wcg.html March 4, 2025 Services seem to be down. We are working on identifying and fixing the issue. Paul. |
Send message Joined: 24 Dec 10 Posts: 50 ![]() |
March 4, 2025 Services seem to be down. We are working on identifying and fixing the issue. BOINC db node crashed. Thus, all running BOINC services, API services and message queues that need to talk to db01 die similarly; the connection is closed, although the node itself is still running. 10:38 am ET: Crash recovery starting now. We should be able to restart all the services soon. 12:21 pm ET: crash recovery successful; bounced all services; restarted the feeder; should start to see work going out again. Paul. |
![]() Send message Joined: 28 Jun 10 Posts: 2789 ![]() |
March 5, 2025 The system seems to be down (again) - we will investigate. [url] https://www.cs.toronto.edu/~juris/jlab/wcg.html[/url] I managed to get one ARP task this morning before it all fell over again. Can't get onto any of the user pages on their site currently but BOINC seems to be contacting the server OK. I just can't change my project settings to allow MCM tasks. Getting no tasks available for Africa Rain Forest at the moment. |
![]() Send message Joined: 30 Mar 20 Posts: 451 ![]() |
Still down, but the BOINC part is working. I'm getting MCM tasks. |
Send message Joined: 24 Dec 10 Posts: 50 ![]() |
March 5, 2025 The DHCP lease issue, or whatever the root cause of our production VMs losing all network access at an increasing rate such that we are almost sure to experience a server crash multiple times a week, is being investigated by hosting. Our plan to resolve this regardless of the outcome of the investigation is to fully migrate most production boxes to Kubernetes including the DB2, Websphere, and IBM MQ "axis" of the website/forums and webservices provided by WCG. Previously, we had only provisioned QA on the Kubernetes cluster, and intended to further provision and deploy containers running Mesos workers as our first production boxes orchestrated by Kubernetes on the new hardware to blue/green deploy and eventually move the coordinator responsibilities and finally all workunit management pipeline responsibilities to Kubernetes running Mesos, which would give us fault tolerance at last as we pick apart all the old Mesos job descriptions and crontabs to fully migrate to Kubernetes, Slurm, Redpanda, and distributed postgres (Citus-Data). Once finished, we will decomission Aurora/Mesos and the old CentOS 7 boxes that run and coordinate the Mesos cluster, and provision new VMs with an LTS version of Ubuntu as we have on the new hardware to add that capacity to the Kubernetes cluster. We apologize for the delays to the start of the MAM project, we did not account for the sword of damocles hanging over every production server and falling with increasing frequency, nor for the reduced capacity of the environment in the new year. Thank you for your patience and understanding, we will be starting MAM shortly, as soon as we are through this issue. Paul. |
![]() Send message Joined: 10 May 07 Posts: 1490 ![]() |
The latest update on WCG: 4:27pm ET: db02 is back online; the website is back up; the DHCP agents were flushed on the old WCG nodes, which should resolve the issue I can confirm website is back online |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.