Posts by PMH_UK

InfoMessage
1) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117568
Posted 15 days ago by PMH_UK
Bear in mind that WCG started with United Devices before BOINC.
The system had to cope with that so was customized more than most.
2) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117516
Posted 20 days ago by PMH_UK
November 21, 2025

We are testing required changes to the scheduler and feeder to resolve the corrupt/truncated "os_name" and "os_version" entries such as "W"/"W" for some hosts, as reported by users in the forums, and to resolve frequent "stuck" feeder states where "No tasks available for platform" is logically incorrect by hr_class, yet the tasks populating the feeder shared memory segment remain unassigned by the scheduler passes and manual intervention is required to get work flowing again.
Passes through uploaded results that have not been credited by the new system will begin next week, to backfill missing credits. We have been performing dry runs to establish correctness. As a precaution, we will be running the program in multiple passes starting with the oldest uploads, to the most recent.
Volunteers have reported that the API sometimes shows an invalid state for multiple results, where only one result is marked valid, which should be impossible. Preliminary investigation points to the new MCM1 assimilation procedure interacting with the transitioner. The new MCM1 assimilation procedure acts to validate and credit all in progress results for a workunit as soon as it has consumed any pair/quorum of files, whether original 0 and 1 results or resends 2 and up, that have passed validation. We will review this issue in full and report our findings, whether a bug in the assimilator, or poorly modeled interaction between assimilator transactions and the transitioner, which is where we expect to find an explanation.
https://www.cs.toronto.edu/~juris/jlab/wcg.html
3) Message boards : Projects : iThena.Measurements
Message 117403
Posted 11 Nov 2025 by PMH_UK
Currently space I think but see below.
OONI added around 16:30 UTC, none of mine ended yet.
CNode running for several hours, last 40 minute ones around 10:00 UTC.
Perf running for 3 hours since about 10:50 UTC.

Also Hex on computational.
4) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117261
Posted 28 Oct 2025 by PMH_UK
October 28, 2025

We have fixed the main validation throughput issues with the new Kafka-based workflow, and reprocessed uploads from around the time we started sending out test batches. We are reviewing the Kafka topics and BOINC database to see if the volunteer reports of both results for a test workunit uploaded but no validation/assimilation occured during the reprocess is another bug to fix, and if so is it severe enough to block regular MCM1_024% batch distribution until resolved.
In reviewing the transitioner implementation (which we intended to start yesterday to begin triggering resends for test batches), we found the new paradigm for storing configuration details that are required to populate resends in the result table needed to be incorporated into key functions. We are testing these relatively minor changes to the transitioner now.
Our plan is to deploy the updated transitioner, verify resends work, verify it times out expired workunits, and depending on how that and the review of "missed validations" noted above goes we may then be ready to resume MCM1 batches in the normal range.
Regarding uploads that span the downtime for migration, we will reconcile validation and credit for these workunits as soon as the production path for MCM1 described above is running. We should be able to use the new components to do that, after walking the filesystems where those uploads live, double-checking the list that need validation and crediting in the database, and pursuing a similar "reprocessing" path which worked well to re-attempt validation and crediting of the test MCM1 batches.
Then, we will begin testing beta30/MAM1, and ARP1 using the new system, which we expect to progress much faster now that we have ironed out the logic with MCM1.
tats updates will be restarted as soon as the MCM1 workflow is stable, that will include the daily export to https://download.worldcommunitygrid.org/boinc/stats/
5) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 117075
Posted 13 Oct 2025 by PMH_UK
October 13, 2025

Happy Thanksgiving to our Canadian volunteers and partners.
Work on finishing deployment setup will resume tomorrow.
6) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116918
Posted 22 Sep 2025 by PMH_UK
Below from Dylan near midnight UTC.
Email should be working normally now, in moving the postfix/sendgrid relay to Ubuntu 24.04 there were additional parameters I needed in the /etc/postfix/main.cf file, until added the JavaMail threads would just wait forever (no timeout passed to Websphere JVM), eventually filling up the thread pool and crashing Websphere. Knock on wood, after that and a bunch of fixes to the database connection pools which were similarly crashing the website and forums, website should remain up and working. Working on BOINC now.
7) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116885
Posted 17 Sep 2025 by PMH_UK
I was able to post a response to forum that BOINC was not responding yet.
8) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116878
Posted 17 Sep 2025 by PMH_UK
Now getting below, so may be progressing.
403 Forbidden
Request forbidden by administrative rules.
9) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116854
Posted 12 Sep 2025 by PMH_UK
September 12, 2025

Configuration of Websphere and IBM MQ is taking longer than expected. We are moving all provisioning, build, and deploy stages for all repos from Ansible and Gitlab CI to Dockerfiles and docker compose files, which is a step that precedes running these containers as StatefulSets on Kubernetes. So far, we have functional containers for IBM MQ, Websphere, DB2, MariaDB, and all BOINC endpoints up and running, and what we are still struggling through is configuration.
This approach will benefit site reliability and scalability in an obvious way on Kubernetes, and will improve our development and QA lifecycles drastically. It was also necessary to preserve a maximum compatibility with the CentOS 7 virtual machines that the legacy stack was previously running on, a requirement for the redirected restore of the DB2 data for example, https://www.ibm.com/docs/en/db2/11.5.x?topic=restore-performing-redirected-operation.
So why are we not up, and when will we be up? We are debugging the entrypoint scripts for Websphere and IBM MQ containers. Website cannot be brought up until Websphere is up and configured correctly, receiving messages from all MQ sidecars across the stack, sending emails, etc. Each of the databases, the webserver, and the scheduler have to run MQ, and we are still adapting some of the previous mqsc and other runtime configuration for the MQ service to work with this new setup where each important container that requires one gets an MQ sidecar container that uses the Ubuntu 24.04 host VM network.
10) Message boards : Questions and problems : New install 8.2.4 does not request tasks
Message 116821
Posted 8 Sep 2025 by PMH_UK
WCG is down for migration, issues found, expected back this week.
See https://www.cs.toronto.edu/~juris/jlab/wcg.html
11) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116683
Posted 21 Aug 2025 by PMH_UK
Fora and other pages are now loading.
12) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116678
Posted 21 Aug 2025 by PMH_UK
Scheduler still responding to BOINC.
13) Message boards : Projects : News on Project Outages
Message 116661
Posted 19 Aug 2025 by PMH_UK
So far that has been worse than https://stats.free-dc.org in that it did not load much of the page.
A few refreshes did not help but on the site above I got my stats, albeit after several refreshes.
14) Message boards : Projects : News on Project Outages
Message 116626
Posted 14 Aug 2025 by PMH_UK
FREE-DC responded OK when I retried after getting 500.
WUProp uploaded OK but got "feeder not running" trying to report.
15) Message boards : Projects : iThena.Measurements
Message 116590
Posted 8 Aug 2025 by PMH_UK
Both websites active, measure has work.
16) Message boards : Projects : News on Project Outages
Message 116290
Posted 30 Jun 2025 by PMH_UK
Still sometimes seeing this on stats6.free-dc.org with expiry 23/06/2025.
17) Message boards : Projects : News on Project Outages
Message 116280
Posted 26 Jun 2025 by PMH_UK
Free-DC now gets "expired certificate".
Forum refers to discord, I am not on that.
Anyone know if this is in hand?
18) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 116255
Posted 19 Jun 2025 by PMH_UK
https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=704358
...scheduler requests use a per-host lock file to ensure that there aren't two concurrent requests from one host. The file is created at the start of the request, holds the PID of the scheduler instance, and is deleted at the end of the request.

There are two possible error conditions, one of which is that the lock file can't be acquired in the first place, the other that there is an existing lock. Unfortunately, although the message written to the server log distinguishes the two cases, the message sent to the client does not.

In this case, I suspect the issue is an inability to create the lock file in the first place :-(
19) Message boards : Projects : News on Project Outages
Message 116198
Posted 5 Jun 2025 by PMH_UK
At least some parts of the website are working for me, e.g. [url]https://stats6.free-dc.org/user/wup/<user number>[/url]
Forum link fails as you say.
It is slow since a h/w fail and now sometimes fails to load for a while.
20) Message boards : Projects : .Denis at Home.
Message 116191
Posted 3 Jun 2025 by PMH_UK
New BETA units version 0.05 now out.
Next 20

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.