Thread 'Anything and Everything to do with (WCG) World Community Grid'

Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

AuthorMessage
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 425
Sweden
Message 114785 - Posted: 4 Nov 2024, 16:52:41 UTC - in response to Message 114783.  
Last modified: 4 Nov 2024, 16:55:15 UTC

In reply to Dave's message of 4 Nov 2024:
Over 2 hours since tasks allocated to my box. I am not sure any will download before my CPDN tasks are all finished in a few days time. Have they actually moved to the new infrastructure for the storage for these tasks yet or are they still sending them out from Kembril?
AFAIK, the new ARP infrastructure have nothing to do with the crunching of ARP. The WU's and preparations of WU's, are still going to be handled by Krembil. Downloads, uploads, and reporting.

There's some info on these locations about it all:

https://www.worldcommunitygrid.org/about_us/article.s?articleId=811
https://www.worldcommunitygrid.org/about_us/article.s?articleId=814

There's more links on those two links, to older updates when it comes to the ARP restart, and what it will take.
ID: 114785 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114788 - Posted: 4 Nov 2024, 17:52:04 UTC

Still waiting on multiple timeouts/retries on ARP files on 2 PC's.

One PC has been trying for over 4 hours to get the files with me mashing the retry button about every 15 minutes.

All downloads time out almost instantly and none of the files have any successful bytes received so far.
ID: 114788 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114790 - Posted: 4 Nov 2024, 20:35:49 UTC - in response to Message 114788.  

Now got one task running but still more than a full page of downloads queued. Now I have one running, I am not going to bother hitting the retry button any more as I have cpdn work to keep me going.
ID: 114790 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 425
Sweden
Message 114791 - Posted: 4 Nov 2024, 23:08:17 UTC
Last modified: 4 Nov 2024, 23:30:41 UTC

You don't need to babysit and manually click the retry button. Instead use a script. or batch file running this command, at your chosen interval, in the folder where boinccmd.exe is located:

boinccmd.exe --network_available

That will make BOINC automatically try stalled downloads.

Edit: But by the looks of how the WCG website behaves now, that may now help, because I would not be surprised if we're heading for another "SYSTEM ERROR", and a total breakdown of WCG, pretty soon.
ID: 114791 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114792 - Posted: 5 Nov 2024, 8:45:13 UTC - in response to Message 114791.  

You don't need to babysit and manually click the retry button. Instead use a script. or batch file running this command, at your chosen interval, in the folder where boinccmd.exe is located:

I have never tried running scripts using WINE. -once current batch of CPDN work is complete will be returning to native Linux client. In WINE the manager is a bit flakey (at least 8.0.4 is) crashing sometimes on an action and sometimes seemingly for no reason. Client continues fine. I am now down to 11 tasks downloading and 24 files still to download between them, one task running and two more to start when other work finishes.

I see from the WCG fora that the amount of data is slowing downloads from MCM work now unless WCG have decided to slow that down as an equal opportunities thing?
ID: 114792 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114798 - Posted: 5 Nov 2024, 15:29:18 UTC - in response to Message 114792.  

I suspect WCG servers are overload - too many simultaneous connections and possible bandwidth capacity to the WWW is the problem.

I see the same thing on WCG with MCM data being slowed down to multiple retry also.

About 2 hours ago finally got 1 ARP tasks of 6 across 3 PC's to finish downloading all it's data files after almost 18 hours on retries. Now to see if it will finish by the deadline.
ID: 114798 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114800 - Posted: 5 Nov 2024, 21:11:59 UTC

WCG servers still overwhelmed. Have several Android devices with MCM tasks stuck in download back off.
ID: 114800 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114802 - Posted: 5 Nov 2024, 21:54:04 UTC - in response to Message 114800.  

In reply to Dr Who Fan's message of 5 Nov 2024:
WCG servers still overwhelmed. Have several Android devices with MCM tasks stuck in download back off.

Though my uploads from ARP seem to get through with no intervention now, albeit slowly.
ID: 114802 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114804 - Posted: 6 Nov 2024, 8:55:19 UTC

The MCM tasks on my Android devices have huge download:
World Community Grid
7.61 Mapping Cancer Markers
MCM1_0227607_2550_1
e55b6bdba4ed0b4b6e315c6767d68e3f.txt
47,048.72 K

WCG is finally downloading the task data file now after several stalled attempts.
ID: 114804 · Report as offensive     Reply Quote
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1302
United Kingdom
Message 114807 - Posted: 6 Nov 2024, 10:27:38 UTC

47Mb is not exactly huge, ARP downloads are coming in at about 75-100Mb.

I've just had a chance to do a speed comparison between WCG & CPDN - the WCG downolad was running at about 35kbs, wile the CPDN download was running at about 355kbs, so finished a long time before WCG despite having stated after it. Real or throttled bandwidth issue?
ID: 114807 · Report as offensive     Reply Quote
Profilerilian
Avatar

Send message
Joined: 31 Aug 09
Posts: 11
Ukraine
Message 114814 - Posted: 6 Nov 2024, 15:15:50 UTC - in response to Message 114800.  

In reply to Dr Who Fan's message of 5 Nov 2024:
WCG servers still overwhelmed. Have several Android devices with MCM tasks stuck in download back off.

Seems like downloads are working much better than 1st day

I already have Pending Validation tasks on 3 different machines (macos, linux, windows).. No "Valid" yet..
I crunch for Ukraine
ID: 114814 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114819 - Posted: 7 Nov 2024, 1:54:51 UTC

WCG / Krembil have posted an "explanation" why there has been download problems lately.
Regarding ARP1 and MCM1 download issues since ARP1's launch on Monday Nov 4th, 2024
We have been working with hosting at SHARCNET today to identify potential bottlenecks and solutions to the issue of stalled downloads in the BOINC client for ARP1 and MCM1 workunit inputs. A failing drive on one of the download servers appears to have been contributory, and SHARCNET have migrated this VM to a healthy host. However, there are additional measures we have taken and plan to take.

We have decreased the app weight of ARP1 relative to MCM1 in the feeder, and increased the number of automatic retries per HTTP connection to backend download servers in the load balancer config (HAProxy). We have also decreased the number of concurrent connections allowed per IP recorded in the stick-table for the download server group only in HAProxy. We will decrease the upper limit on workunits to produce and index in BOINC if necessary in the coming days, currently set at 10,000.

Obvious high leverage solutions to the problem as suggested on the forums are to scale out the download server group and reduce the number of downloads for ARP1 workunits to a single file. Also, we have taken steps and will continue to take steps to aggressively pursue complete file transfers after the first request from the BOINC client. Manual intervention from the user clicking "Retry Now" and running "auto-clickers" should be essentially useless by design and provide negligible benefit - that is our goal, we apologize that it is not already met.

In general, these HTTP errors are due to unavailable/busy backend servers that HAProxy cannot establish a connection with - thus a 503 service unavailable is returned. With the help of SHARCNET today, we have the hardware to scale downloads both out and up, and we are provisioning these additional servers now.

HAProxy will be upgraded to a more recent version, we specifically look forward to the potential impact of the retry-on 503 directive and option redispatch directive. Until we can handle transient HTTP errors on our end, we will also look to adjust the project backoff cadence for ARP1 and MCM1 to be less conservative.

With regard to deadlines, though it may not be reflected in your BOINC client and we apologize to users whose deadlines we did not extend in time, we have extended deadlines by 5 days for every single ARP1 workunit in flight this week, on two occasions. Once yesterday Nov 4th, and once today Nov 6th, for all workunits with server_state=IN_PROGRESS in the BOINC result table (https://github.com/BOINC/boinc/wiki/BackendState). The earliest deadline for any ARP1 workunit in the result table of the BOINC db at the moment is Nov 8th, and we will extend proximate deadlines again tomorrow Nov 7th if necessary.

We also agree with feedback on the forums throughout this debacle that the deadlines are too short in general, and we will be extending the deadlines for ARP1 going forward once we get through this. We are currently thinking ~24h additional deadline would work out okay, and we can adjust from there. Please provide feedback if you believe this is the wrong direction or the wrong duration, we appreciate all volunteer feedback on the forums around the issues since launch even those who are rightly upset by our insufficiency. Thank you.
ID: 114819 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114820 - Posted: 7 Nov 2024, 22:25:41 UTC
Last modified: 7 Nov 2024, 23:08:50 UTC

Well so much for yesterday's "fixing broken things." Downloads are still borked at WCG.

I currently have 2 MCM Android tasks & 1 ARP task that have been stuck in download timeouts for several hours now.

Edit: After several retries on each individual file I finally got them all downloaded.
ID: 114820 · Report as offensive     Reply Quote
Grumpy Swede
Avatar

Send message
Joined: 30 Mar 20
Posts: 425
Sweden
Message 114824 - Posted: 9 Nov 2024, 19:46:45 UTC
Last modified: 9 Nov 2024, 20:24:42 UTC

There was a short "System Error" event on the WCG website a few minutes ago. It came back though pretty fast.
It's clear by now though, that WCG doesn't have the required infrastructure to run ARP. Downloading and uploading,
of even the small MCM files, still suffers from lots of HTTP errors.

Edit, added: I'm taking a break form WCG, because this is not very fun at the moment. I'll do some research about the new project dodo@home , and maybe I'll join it.
ID: 114824 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114829 - Posted: 10 Nov 2024, 8:29:18 UTC

A few days ago put my phone on to wcg. Managed to get MCM tasks but website didn't want to let me see it to assign it a different profile. In retrospect, I should have just changed the profile of my computer which I only want to download ARP tasks. The option to change the profile of the phone only seemed to be there after I had returned some results. Odd.

With respect to downloads, they are happening fast enough to fill my empty cores as they are vacated by CPDN work but were I only running these tasks it would be a pain in the proverbial.
ID: 114829 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114868 - Posted: 17 Nov 2024, 18:08:24 UTC

Uploads are borked again. typical speed 7KB/s. Even crunching only one task I was producing data faster than it uploaded. Now with cpdn work running out and more cores available, I have too many uploads to get any more work still some clear.

#hurryupandwait
ID: 114868 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 1452
United States
Message 114869 - Posted: 17 Nov 2024, 18:31:12 UTC - in response to Message 114868.  

Uploads are borked again. typical speed 7KB/s. Even crunching only one task I was producing data faster than it uploaded. Now with cpdn work running out and more cores available, I have too many uploads to get any more work still some clear.

#hurryupandwait

The upload/download problems have not gone away at WCG and won't be any better until one of several things happen:
Increasing up/down bandwidth to affected servers; limiting number of simultaneous connections to servers; restrictions on number of ARP tasks being sent out; people quit aborting tasks; people doing project resets trying to "fix" what they think is broken on their PC's.
ID: 114869 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 304
United Kingdom
Message 114870 - Posted: 18 Nov 2024, 1:24:30 UTC - in response to Message 114868.  

In reply to Dave's message of 17 Nov 2024:
Uploads are borked again. typical speed 7KB/s. Even crunching only one task I was producing data faster than it uploaded. Now with cpdn work running out and more cores available, I have too many uploads to get any more work still some clear.

#hurryupandwait


Odd, my uploads run very quickly to 100% but then hang for a while before failing. They’re small files admittedly but the actual data load appears to. be well less than 1 second.
ID: 114870 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114871 - Posted: 18 Nov 2024, 5:34:01 UTC - in response to Message 114870.  

Odd, my uploads run very quickly to 100% but then hang for a while before failing. They’re small files admittedly but the actual data load appears to. be well less than 1 second.
I am only running ARP tasks on my computer. MCM on phone but haven't looked at what is happening there.
ID: 114871 · Report as offensive     Reply Quote
ProfileDave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2721
United Kingdom
Message 114883 - Posted: 19 Nov 2024, 21:59:48 UTC

Don't know how long it will last but my uploads are now going through at the maximum rate of my bored band. (100KB/s between 2 uploads.) Did someone kick something or did the wind change direction?
ID: 114883 · Report as offensive     Reply Quote
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · Next

Message boards : Projects : Anything and Everything to do with (WCG) World Community Grid

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.