| Info | Message |
|---|---|
| 1) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117568 Posted 15 days ago by PMH_UK |
Bear in mind that WCG started with United Devices before BOINC. The system had to cope with that so was customized more than most. |
| 2) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117516 Posted 20 days ago by PMH_UK |
November 21, 2025 We are testing required changes to the scheduler and feeder to resolve the corrupt/truncated "os_name" and "os_version" entries such as "W"/"W" for some hosts, as reported by users in the forums, and to resolve frequent "stuck" feeder states where "No tasks available for platform" is logically incorrect by hr_class, yet the tasks populating the feeder shared memory segment remain unassigned by the scheduler passes and manual intervention is required to get work flowing again. Passes through uploaded results that have not been credited by the new system will begin next week, to backfill missing credits. We have been performing dry runs to establish correctness. As a precaution, we will be running the program in multiple passes starting with the oldest uploads, to the most recent. Volunteers have reported that the API sometimes shows an invalid state for multiple results, where only one result is marked valid, which should be impossible. Preliminary investigation points to the new MCM1 assimilation procedure interacting with the transitioner. The new MCM1 assimilation procedure acts to validate and credit all in progress results for a workunit as soon as it has consumed any pair/quorum of files, whether original 0 and 1 results or resends 2 and up, that have passed validation. We will review this issue in full and report our findings, whether a bug in the assimilator, or poorly modeled interaction between assimilator transactions and the transitioner, which is where we expect to find an explanation. https://www.cs.toronto.edu/~juris/jlab/wcg.html |
| 3) Message boards : Projects : iThena.Measurements
Message 117403 Posted 11 Nov 2025 by PMH_UK |
Currently space I think but see below. OONI added around 16:30 UTC, none of mine ended yet. CNode running for several hours, last 40 minute ones around 10:00 UTC. Perf running for 3 hours since about 10:50 UTC. Also Hex on computational. |
| 4) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117261 Posted 28 Oct 2025 by PMH_UK |
October 28, 2025 We have fixed the main validation throughput issues with the new Kafka-based workflow, and reprocessed uploads from around the time we started sending out test batches. We are reviewing the Kafka topics and BOINC database to see if the volunteer reports of both results for a test workunit uploaded but no validation/assimilation occured during the reprocess is another bug to fix, and if so is it severe enough to block regular MCM1_024% batch distribution until resolved. In reviewing the transitioner implementation (which we intended to start yesterday to begin triggering resends for test batches), we found the new paradigm for storing configuration details that are required to populate resends in the result table needed to be incorporated into key functions. We are testing these relatively minor changes to the transitioner now. Our plan is to deploy the updated transitioner, verify resends work, verify it times out expired workunits, and depending on how that and the review of "missed validations" noted above goes we may then be ready to resume MCM1 batches in the normal range. Regarding uploads that span the downtime for migration, we will reconcile validation and credit for these workunits as soon as the production path for MCM1 described above is running. We should be able to use the new components to do that, after walking the filesystems where those uploads live, double-checking the list that need validation and crediting in the database, and pursuing a similar "reprocessing" path which worked well to re-attempt validation and crediting of the test MCM1 batches. Then, we will begin testing beta30/MAM1, and ARP1 using the new system, which we expect to progress much faster now that we have ironed out the logic with MCM1. tats updates will be restarted as soon as the MCM1 workflow is stable, that will include the daily export to https://download.worldcommunitygrid.org/boinc/stats/ |
| 5) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117075 Posted 13 Oct 2025 by PMH_UK |
October 13, 2025 Happy Thanksgiving to our Canadian volunteers and partners. Work on finishing deployment setup will resume tomorrow. |
| 6) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116918 Posted 22 Sep 2025 by PMH_UK |
Below from Dylan near midnight UTC.Email should be working normally now, in moving the postfix/sendgrid relay to Ubuntu 24.04 there were additional parameters I needed in the /etc/postfix/main.cf file, until added the JavaMail threads would just wait forever (no timeout passed to Websphere JVM), eventually filling up the thread pool and crashing Websphere. Knock on wood, after that and a bunch of fixes to the database connection pools which were similarly crashing the website and forums, website should remain up and working. Working on BOINC now. |
| 7) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116885 Posted 17 Sep 2025 by PMH_UK |
I was able to post a response to forum that BOINC was not responding yet. |
| 8) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116878 Posted 17 Sep 2025 by PMH_UK |
Now getting below, so may be progressing. 403 Forbidden Request forbidden by administrative rules. |
| 9) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116854 Posted 12 Sep 2025 by PMH_UK |
September 12, 2025 Configuration of Websphere and IBM MQ is taking longer than expected. We are moving all provisioning, build, and deploy stages for all repos from Ansible and Gitlab CI to Dockerfiles and docker compose files, which is a step that precedes running these containers as StatefulSets on Kubernetes. So far, we have functional containers for IBM MQ, Websphere, DB2, MariaDB, and all BOINC endpoints up and running, and what we are still struggling through is configuration. This approach will benefit site reliability and scalability in an obvious way on Kubernetes, and will improve our development and QA lifecycles drastically. It was also necessary to preserve a maximum compatibility with the CentOS 7 virtual machines that the legacy stack was previously running on, a requirement for the redirected restore of the DB2 data for example, https://www.ibm.com/docs/en/db2/11.5.x?topic=restore-performing-redirected-operation. So why are we not up, and when will we be up? We are debugging the entrypoint scripts for Websphere and IBM MQ containers. Website cannot be brought up until Websphere is up and configured correctly, receiving messages from all MQ sidecars across the stack, sending emails, etc. Each of the databases, the webserver, and the scheduler have to run MQ, and we are still adapting some of the previous mqsc and other runtime configuration for the MQ service to work with this new setup where each important container that requires one gets an MQ sidecar container that uses the Ubuntu 24.04 host VM network. |
| 10) Message boards : Questions and problems : New install 8.2.4 does not request tasks
Message 116821 Posted 8 Sep 2025 by PMH_UK |
WCG is down for migration, issues found, expected back this week. See https://www.cs.toronto.edu/~juris/jlab/wcg.html |
| 11) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116683 Posted 21 Aug 2025 by PMH_UK |
Fora and other pages are now loading. |
| 12) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116678 Posted 21 Aug 2025 by PMH_UK |
Scheduler still responding to BOINC. |
| 13) Message boards : Projects : News on Project Outages
Message 116661 Posted 19 Aug 2025 by PMH_UK |
So far that has been worse than https://stats.free-dc.org in that it did not load much of the page. A few refreshes did not help but on the site above I got my stats, albeit after several refreshes. |
| 14) Message boards : Projects : News on Project Outages
Message 116626 Posted 14 Aug 2025 by PMH_UK |
FREE-DC responded OK when I retried after getting 500. WUProp uploaded OK but got "feeder not running" trying to report. |
| 15) Message boards : Projects : iThena.Measurements
Message 116590 Posted 8 Aug 2025 by PMH_UK |
Both websites active, measure has work. |
| 16) Message boards : Projects : News on Project Outages
Message 116290 Posted 30 Jun 2025 by PMH_UK |
Still sometimes seeing this on stats6.free-dc.org with expiry 23/06/2025. |
| 17) Message boards : Projects : News on Project Outages
Message 116280 Posted 26 Jun 2025 by PMH_UK |
Free-DC now gets "expired certificate". Forum refers to discord, I am not on that. Anyone know if this is in hand? |
| 18) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116255 Posted 19 Jun 2025 by PMH_UK |
https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=704358...scheduler requests use a per-host lock file to ensure that there aren't two concurrent requests from one host. The file is created at the start of the request, holds the PID of the scheduler instance, and is deleted at the end of the request. |
| 19) Message boards : Projects : News on Project Outages
Message 116198 Posted 5 Jun 2025 by PMH_UK |
At least some parts of the website are working for me, e.g. [url]https://stats6.free-dc.org/user/wup/<user number>[/url] Forum link fails as you say. It is slow since a h/w fail and now sometimes fails to load for a while. |
| 20) Message boards : Projects : .Denis at Home.
Message 116191 Posted 3 Jun 2025 by PMH_UK |
New BETA units version 0.05 now out. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.