Posts by PMH_UK

InfoMessage
1) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119305
Posted 13 days ago by PMH_UK
Forum currently up.
2) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119295
Posted 15 days ago by PMH_UK
Forum now up.
3) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119259
Posted 22 days ago by PMH_UK
Forum now up, no new posts since this morning yet.
4) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119194
Posted 29 days ago by PMH_UK
Forum now up, no new posts yet.
5) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119179
Posted 19 May 2026 by PMH_UK
Forum still down, BOINC & other web pages appear OK.
6) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 119175
Posted 18 May 2026 by PMH_UK
Update from https://www.cs.toronto.edu/~juris/jlab/wcg.html
(WCG Forum still down)
May 18, 2026
Recent MAM1/beta30 smoke testing batches unbounded memory use on BOINC clients - the cause was a bug in the dataset loader, combined with a placeholder dataset file that was pushed from staging to prod with the batch generation logic. In place of the correctly structured and non-empty MAM1 dataset, this smaller file being read by the dataloader without guards against the invalid formatting caused the OOM crashes reported by users. We deprecated the application and cancelled all workunits when we saw this happening, and we have fixed the issue in the dataset loader in a new build of the MAM1 application. We then released a handful (10) beta30 project workunits tonight May 15th, 2026 to confirm the fix, and we will resume smoke testing once we have Windows, WSL, and Docker support tested through the beta30 project. Expect the beta testing thread about MAM1 next week, preceeding any further smoke testing in the MAM1_9999900+ batch range to exercise the production lifecycle. We apologize for the inconvenience, we did test locally and in our staging environment but did not catch this as with the correct dataset, it worked.
Results API not showing IN_PROGRESS, not consistent with authoritative BOINC database - we will be switching out the connection to the legacy database that currently serves the Results API for a connection to the new postgres cluster coordinator, so that the website will then fetch authoritative data from postgres including IN_PROGRESS results. When this is ready, it should resolve many of the mysterious missing or inconsistent states for results reported by volunteers in the forum.
Found the bug in the file_upload_handler causing the missing validations issue generally for MCM1, affecting stats and results - working on the fix after sweeping the database and file system to find leftover inflated credit values and issues resulting from the outages at the data center, partition by partition.
7) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118992
Posted 29 Apr 2026 by PMH_UK
Update on https://www.cs.toronto.edu/~juris/jlab/wcg.html
April 29, 2026
BOINC traffic resumed around 12:30 UTC on April 25th, 2026 and cluster is currently stable as of this writing (18:00 UTC on April 29th, 2026) - the issues at the data center was resolved, recovery was successful and the issues causing large numbers of 404s on download and 503s on upload unrelated to the issue at the data center were both resolved as well. We implemented backpressure for the validators, juggled some services like the backfill validations off the database cluster nodes to other nodes in the cluster, further tuned postgres for our workload, and modified some BOINC components to harden the cluster against future outages.
BOINC stats export to https://download.worldcommunitygrid.org/boinc/stats resumed - server status page will follow once ARP1 and MAM1 are up and running again.
IN_PROGRESS results do not display on the website - until the credit_flusher batch upserts those rows into the legacy MariaDB database after validation, these workunits are not visible to the website APIs. We are working to add to the Results API a fetch and cache from the new BOINC database postgres cluster so that these IN_PROGRESS results can be seen on the website and retrieved from the APIs. Likely, this fix will conincide with improvements to allow users with Result sets large enough to timeout the API to see or at least download their results, and fixes and tooltips for the new Summary feature on the Results page.
Data Sharing radio button does not work - thank you for the report, working to fix this.
403 Forbidden - frustrating forum users - when we fixed the team challenge registration, the issue causing 403 Forbidden on that page was the updated mod-security rules for apache2 on the load balancer server. We will start there and look at the mod-security rules from load balancer through to the container that hosts the website behind HAProxy which also has it's own set of rules, and hopefully provide relief soon.
When will ARP1 be released? - current blocker is the geographical split with overlapping edges between regions to match our partitioned backend, so that downloads and uploads will be routed to a mostly contiguous geographic region of the overall sub-Saharan region for which the project is predicting the weather. As this involves fetching ARP1 results across boundaries of those mostly contiguous regions so that "halo" domains can have their next generations of workunit inputs generated from the completed work of all their neighbours within, and the neighbours across the partition border, it requires more devleopment and testing. Now that we are seeing stability in the new architecture, this is a priority and we will update as we get a better sense for the exact timing. We may release workunits that are completely within a geographic partition to test the ARP1 BOINC components in general, before we can announce that the project is back up and running.
When will MAM1 and the GPU build "MAMG" workunits be released? - while MAM1 and the corresponding beta30 project are up to date with the MCM1 pipeline and could have work issued at any time, we are exploring the options afforded to us by upgrading our BOINC build such as the BOINC Universal Docker App "BUDA" (https://github.com/BOINC/boinc/wiki/BUDA-overview), and also using Mojo with the existing LibTorch support within the application to run on newer AMD and NVIDIA GPUs https://docs.modular.com/max/develop/custom-kernels-pytorch). We expect to start sending out batches of MAM1 workunits for the mt CPU build through the beta30 app and possibly the MAM1_9999900+ testing range this week, barring some new blocker, and will update on GPU support as we release new builds through the beta30 application.
8) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118815
Posted 13 Apr 2026 by PMH_UK
Edit: now back for me, had to back-page tabs I had tried to refresh.

https://www.cs.toronto.edu/~juris/jlab/wcg.html

April 13, 2026

We are aware of the web site and forum issues - looking into it. Our certificates are valid.
9) Message boards : Projects : iThena.Measurements
Message 118756
Posted 6 Apr 2026 by PMH_UK
iThena.Measurements now back with PERF and Cnode but CNode tasks still failing.
10) Message boards : Projects : News on Project Outages
Message 118734
Posted 4 Apr 2026 by PMH_UK
WUProp is not counting hours returned but workunits appear to be flowing OK so far.
iThena is down.
WCG is wobbling.
11) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118722
Posted 2 Apr 2026 by PMH_UK
Suggest use WCG forum while it is up.

From thread Project Status (First Post Updated):
https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,47663_lastpage,yes#lastpost

According to Igor ~21k users are affected by this issue, and they are working hard now to correct this bug.

I'll let him post an update, with the rest of the information I got about this. I do not want to post the full content from a private mail.

Edit: They have deliberately shut down access to the BOINC system, while they are working on correcting this issue.
12) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118352
Posted 21 Feb 2026 by PMH_UK
New update on:
https://www.cs.toronto.edu/~juris/jlab/wcg.html

February 21, 2026

Not a perfect state yet - but hopefully the systems are now stable enough. Congratulations to BOINC@AUSTRALIA for reaching 2,289,999 score on In Memory of Dylan Bucci - 2026 Challenge.
there is still time to join Dylan's challenge - it runs till March 9:
https://www.worldcommunitygrid.org/team/challenge/viewTeamChallenge.do?challengeId=11075
13) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118341
Posted 18 Feb 2026 by PMH_UK
New update:
https://www.cs.toronto.edu/~juris/jlab/wcg.html
14) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118291
Posted 9 Feb 2026 by PMH_UK
Doubling is an error during catch-up processing, see below link.
https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,47633
15) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118274
Posted 7 Feb 2026 by PMH_UK
OK for me now.
16) Message boards : Projects : Axiom project
Message 118236
Posted 3 Feb 2026 by PMH_UK
See also messages on WUProp forum:
https://wuprop.statseb.fr/forum_thread.php?id=191#12098
17) Message boards : Projects : iThena.Measurements
Message 118110
Posted 17 Jan 2026 by PMH_UK
Both back with tasks.
Even got credit for tasks from November.
18) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118097
Posted 15 Jan 2026 by PMH_UK
Now up, below from Dylan.
Interim update -

We have regained access to our project at Nibicloud. We have ssh access to our servers again, and I am in the process of damage control now before restarting the feeder.

Most of our servers/VMs remained online during the outage, but some appear to have been soft rebooted, losing in-memory caches that I need to repopulate from Kafka/Redpanda. Should have everything back up and running "soon", somewhere in the hours to tomorrow morning range as my current best estimate.

Validation should improve when I am done, as I have the opportunity to push some changes and separate the validation streams for old result pair upload events, vs. new result pair upload events, and launch additional validators with code changes to stripe them on workunit ID within the node-local partition, and do a second tier of batching to keep load on the BOINC db from spiking from multiple validator_assimilator daemons trying to batch update state and credit at once.

I will update here in the forums once I get through everything, and hopefully can address some of the concerns raised in the forums. If all goes well, plan is to start MAM1 beta for Windows as soon as MCM1 is flowing and validating again, along with a new build for Linux, both will run through rounds of beta30 before we run some smoke tests in the MAM1_9999903+ range.
19) Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message 118018
Posted 9 Jan 2026 by PMH_UK
In reply to Sir LanDroid's message of 9 Jan 2026:
Our fix adds fallback/fallthrough logic to the validator_assimilator daemon to facilitate remote file retrieval and process the tens of millions of backlog events we published to the queue it consumes from.

Wowzer! Appears it's gonna be quite a while...

Over 16 million were validated in 1 day recently so not too long, once ready to go.
20) Message boards : Projects : News on Project Outages
Message 117997
Posted 7 Jan 2026 by PMH_UK
Spacious now appears to be working OK but credit still high.
At 100x I suspect issue with thousands marker between locales and import/export.
Next 20

Copyright © 2026 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.