World Community Grid gets stuck

Message boards : Questions and problems : World Community Grid gets stuck
Message board moderation

To post messages, you must log in.

AuthorMessage
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104082 - Posted: 24 Apr 2021, 9:06:12 UTC
Last modified: 24 Apr 2021, 9:11:42 UTC

WCG runs for a while and gets stuck. The task shown below has been stuck at 01:48:50, occasionally showing 01:48:49 then back to 01:48:50

OPNG_ 0004819_ 00109_ 0-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 728 User Aborted 4/19/21 22:45:47 4/23/21 04:55:15 0.03 72.8 / 0.0

Yesterday I aborted a running task because the time remaining was increasing instead of decreasing, and it was up to 3 days, 17+ hours with no progress being made.
That one was
OPNG_0004819_00109_0 Steve-PC User Aborted 4/19/21 22:45:47 4/23/21 04:55:15 0.03 / 78.01 72.8 /

Today this one timed out.
OPNG_ 0003451_ 00211_ 3-- Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00) 728 Too Late 4/24/21 06:26:56 4/24/21 07:54:55 0.06 0.1 / 0.0

Stuff like this has happened several times in the past two weeks. What is going on here?

Somebody said that perhaps my GPU was not compatible with these tasks. Why would BOINC repeatedly send me tasks my computer can't process?

It's wasting my computer time and interfering with my other projects and is pissing me off. The computer could be doing some good research crunching, but is not. My stats are way down.

Any help would be appreciated.

SGaber
ID: 104082 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4498
United Kingdom
Message 104083 - Posted: 24 Apr 2021, 9:55:10 UTC - in response to Message 104082.  

Sadly, this is a known problem which the WCG team are aware of, but haven't yet addressed.

I'm guessing that you're running an Intel GPU (iGPU) under Windows. What follows only applies to that combination: if your situation is different, please stop reading here and post back with details of your GPU.

Not every iGPU is affected, which is why they haven't turned off the supply entirely. But for those affected - it's mainly the i5 range of processors and below - the WCG iGPU is simply too slow, and triggers a safeguard/watchdog in Windows. The app continues to run, but isn't actually using the GPU any longer.

There are several possible courses of action, but most of them would need to be carried out by the WCG project staff working with the Scripps Institute science team. I think they're all too busy to work on it yet.

It is possible to disable the Windows watchdog, but it's dangerous: depending on what else you use the computer for, you could encounter other problems far worse than this one. So I'm not going to post details in public: if anyone else understands Windows internals enough to deduce what I'm alluding to, feel free to use your own judgement as to whether it's safe in your environment.

Otherwise, the Scripps programmers are going to have to redesign their application pretty substantially. I think it's probably best for you to withdraw your computer from the iGPU part of the project for the time being, and watch out for any new version announcements on the WCG project forums.
ID: 104083 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104092 - Posted: 24 Apr 2021, 21:06:07 UTC - in response to Message 104083.  

Sadly, this is a known problem which the WCG team are aware of, but haven't yet addressed.

I'm guessing that you're running an Intel GPU (iGPU) under Windows. What follows only applies to that combination: if your situation is different, please stop reading here and post back with details of your GPU.

Not every iGPU is affected, which is why they haven't turned off the supply entirely. But for those affected - it's mainly the i5 range of processors and below - the WCG iGPU is simply too slow, and triggers a safeguard/watchdog in Windows. The app continues to run, but isn't actually using the GPU any longer.

There are several possible courses of action, but most of them would need to be carried out by the WCG project staff working with the Scripps Institute science team. I think they're all too busy to work on it yet.

It is possible to disable the Windows watchdog, but it's dangerous: depending on what else you use the computer for, you could encounter other problems far worse than this one. So I'm not going to post details in public: if anyone else understands Windows internals enough to deduce what I'm alluding to, feel free to use your own judgement as to whether it's safe in your environment.

Otherwise, the Scripps programmers are going to have to redesign their application pretty substantially. I think it's probably best for you to withdraw your computer from the iGPU part of the project for the time being, and watch out for any new version announcements on the WCG project forums.

++++++++++++++++++++++++++++++++++++++++++++++++++
Richard Hasselgrove:
Thank you for that response.

However, the computer does not have an Intel GPU. It has a two-core AMD A-6 6400 APU with Radeon(TM) HD Graphics and a Gigabyte motherboard, 8 gigs of RAM and a 1TB hard drive.

What can I do to get WCG admins to fix this glitch? I can't be the only one to suffer these setbacks.

But thanks for your help with this issue.

Cheers,
Steven Gaber
ID: 104092 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104094 - Posted: 24 Apr 2021, 21:19:06 UTC - in response to Message 104092.  

Sadly, this is a known problem which the WCG team are aware of, but haven't yet addressed.

I'm guessing that you're running an Intel GPU (iGPU) under Windows. What follows only applies to that combination: if your situation is different, please stop reading here and post back with details of your GPU.

Not every iGPU is affected, which is why they haven't turned off the supply entirely. But for those affected - it's mainly the i5 range of processors and below - the WCG iGPU is simply too slow, and triggers a safeguard/watchdog in Windows. The app continues to run, but isn't actually using the GPU any longer.

There are several possible courses of action, but most of them would need to be carried out by the WCG project staff working with the Scripps Institute science team. I think they're all too busy to work on it yet.

It is possible to disable the Windows watchdog, but it's dangerous: depending on what else you use the computer for, you could encounter other problems far worse than this one. So I'm not going to post details in public: if anyone else understands Windows internals enough to deduce what I'm alluding to, feel free to use your own judgement as to whether it's safe in your environment.

Otherwise, the Scripps programmers are going to have to redesign their application pretty substantially. I think it's probably best for you to withdraw your computer from the iGPU part of the project for the time being, and watch out for any new version announcements on the WCG project forums.

++++++++++++++++++++++++++++++++++++++++++++++++++
Richard Hasselgrove:
Thank you for that response.

However, the computer does not have an Intel GPU. It has a two-core AMD A-6 6400 APU with Radeon(TM) HD Graphics and a Gigabyte motherboard, 8 gigs of RAM and a 1TB hard drive.

What can I do to get WCG admins to fix this glitch? I can't be the only one to suffer these setbacks.

But thanks for your help with this issue.

Cheers,
Steven Gaber

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
I submitted a problem query via the WCG "Contact Us" link on the main page. We'll see if they care enough to do something about it. I guess it depends on how prevalent the issue is within the wider community of users.

Steven Gaber
ID: 104094 · Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 10 May 07
Posts: 690
United States
Message 104095 - Posted: 24 Apr 2021, 21:53:48 UTC - in response to Message 104094.  

+++++++++++++++++++++++++++++++++++++++++++++++++++++++
I submitted a problem query via the WCG "Contact Us" link on the main page. We'll see if they care enough to do something about it. I guess it depends on how prevalent the issue is within the wider community of users.

Steven Gaber

Good luck getting an answer. Anything project related should be posted on the projects web forums. So, for WCG YOU POST NEED TO HERE:


World Community Grid Forums
Category: Support
Forum: GPU Support Forum
Thread: GPU Work Units - Post Your Tech Support Questions Here
ID: 104095 · Report as offensive     Reply Quote
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4498
United Kingdom
Message 104100 - Posted: 25 Apr 2021, 9:22:04 UTC - in response to Message 104092.  
Last modified: 25 Apr 2021, 9:22:36 UTC

Richard Haselgrove:
Thank you for that response.

However, the computer does not have an Intel GPU. It has a two-core AMD A-6 6400 APU with Radeon(TM) HD Graphics and a Gigabyte motherboard, 8 gigs of RAM and a 1TB hard drive.

What can I do to get WCG admins to fix this glitch? I can't be the only one to suffer these setbacks.

But thanks for your help with this issue.

Cheers,
Steven Gaber
From your original description of the problem, it does sound as if the AMD APU is triggering the same Windows protection mechanism as the Intel iGPU. My own iGPUs have performed perfectly since I traced the cause of the problem: but unfortunately, I don't have any APUs here, so - although it seems likely - I can't confirm whether the same tweak will work on the equivalent AMD platform.

If you - or anyone else with a similar APU, and experiencing the same problem - would be prepared to test my suspicion, I could send you the details by PM or email (initial dot surname at btinternet dot com - be careful with spelling). It's a simple one-line tweak, and completely reversible - but it would require a Windows restart.
ID: 104100 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104142 - Posted: 27 Apr 2021, 7:00:18 UTC - in response to Message 104095.  

[quote]+++++++++++++++++++++++++++++++++++++++++++++++++++++++
I submitted a problem query via the WCG "Contact Us" link on the main page. We'll see if they care enough to do something about it. I guess it depends on how prevalent the issue is within the wider community of users.

Steven Gaber

Good luck getting an answer. Anything project related should be posted on the projects web forums. So, for WCG YOU POST NEED TO HERE:


I did get an answer.

Natasha told me to send them the event log. But there wer no WCG tasks running. The event log only had Rosetta tasks, which I don't think was what they wanted to see.

Now, there's n even BIGGER problem.

BOINC MANAGER won't let me connect to a client.It said I should reload BOINC, which I did.

Now it shows no projects or tasks and won't let me do anything.

This is really frustrating.
SGaber
ID: 104142 · Report as offensive     Reply Quote
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1486
Australia
Message 104144 - Posted: 27 Apr 2021, 7:09:58 UTC - in response to Message 104142.  

The Manager is only a GUI, which gets it's info from the BOINC client.
So, is the client running?
ID: 104144 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104365 - Posted: 12 May 2021, 8:02:43 UTC - in response to Message 104100.  
Last modified: 12 May 2021, 8:07:14 UTC

Richard Haselgrove:

Thanks for your help with my WCG problem and for offering to send me a suggested fix. However, I have no experience with the inner workings of Windows or any other software and my attempt to resolve the issue would probably make things worse.

I did put this computer together from a Tiger Direct bare-bones kit. But that was simple assembly. It's been running 24/7/365 for the past three years, mostly crunching four, now three projects. It does little else. I built my first computer, a Zenith PC clone from a Heathkit, in 1986. That was thousands of solder joints. Before I souped it up, it ran DOS at 4.77 MHz. I have patience and can do mechanical assembly. But programming is not among my skill set. I also make a lot of typos.

I was crunching SETI@Home for nearly 20 years and others since it went away, But I am scarcely aware of how BOINC works. I just let the computer do the work. How it does it is largely a mystery to me. You guys make me feel stupid.

So I sent another request to WCG for their assistance. The previous response from WCG has been more sympathetic than I expected.

But if they can't help, I'll find another project.

Cheers,
S. Gaber.
ID: 104365 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104366 - Posted: 12 May 2021, 8:17:06 UTC - in response to Message 104144.  

Lee Bayliss:

Yes, WCG is running. Or was, until I suspended it.

Yesterday it worked on a project that was indicated to finish in 5 hours. Instead, it took more than 24 hours to complete.

This evening it ran another project which was supposed to be completed in a little over 3 hours. But after 3 hours and 50 minutes of running, it was showing 0.500% progress, unchanged for the past 2 1/2 hours. And while it's doing that, the computer won't work on any of my other projects.

So I suspended WCG and let Rosetta run.

This is pissing me off.

S. Gaber
ID: 104366 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 154
United Kingdom
Message 104367 - Posted: 12 May 2021, 11:38:34 UTC - in response to Message 104366.  

Lee Bayliss:

Yes, WCG is running. Or was, until I suspended it.

Yesterday it worked on a project that was indicated to finish in 5 hours. Instead, it took more than 24 hours to complete.

This evening it ran another project which was supposed to be completed in a little over 3 hours. But after 3 hours and 50 minutes of running, it was showing 0.500% progress, unchanged for the past 2 1/2 hours. And while it's doing that, the computer won't work on any of my other projects.

So I suspended WCG and let Rosetta run.

This is pissing me off.

S. Gaber


You could always go back to the position before WCG introduced GPU work units and continue to run their CPU tasks which is what I had to do.

Just deselect GPUs in the preferences.
ID: 104367 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104372 - Posted: 13 May 2021, 5:19:54 UTC - in response to Message 104367.  
Last modified: 13 May 2021, 5:21:06 UTC

Lee Bayliss:

Yes, WCG is running. Or was, until I suspended it.

Yesterday it worked on a project that was indicated to finish in 5 hours. Instead, it took more than 24 hours to complete.

This evening it ran another project which was supposed to be completed in a little over 3 hours. But after 3 hours and 50 minutes of running, it was showing 0.500% progress, unchanged for the past 2 1/2 hours. And while it's doing that, the computer won't work on any of my other projects.

So I suspended WCG and let Rosetta run.

This is pissing me off.

S. Gaber


You could always go back to the position before WCG introduced GPU work units and continue to run their CPU tasks which is what I had to do.

Just deselect GPUs in the preferences.


Bryn Mawr (a prestigious college near Philadelphia where I'm from):

Thanks for your suggestion. I just did that and later will resume WCG. We'll see if that results in any change. I'll let Asteroids catch up first.

Except now, the computer is running two Asteroids tasks, which it had not done for several months. Interesting.

S. Gaber
ID: 104372 · Report as offensive     Reply Quote
Bryn Mawr
Help desk expert

Send message
Joined: 31 Dec 18
Posts: 154
United Kingdom
Message 104380 - Posted: 13 May 2021, 14:27:31 UTC - in response to Message 104372.  

Bryn Mawr (a prestigious college near Philadelphia where I'm from):

Named after a mining village in South Wales where the original settlers (and myself) came from.

The fact it’s a ladies’ Colledge has caused me grief in the past leading people to assume I’m of the feminine persuasion.
ID: 104380 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104382 - Posted: 14 May 2021, 17:13:10 UTC - in response to Message 104380.  

Bryn Mawr (a prestigious college near Philadelphia where I'm from):

Named after a mining village in South Wales where the original settlers (and myself) came from.[quote]

The fact it’s a ladies’ Colledge has caused me grief in the past leading people to assume I’m of the feminine persuasion.


Well, these days there is some controversy about whether Bryn Mawr should admit some men. Maybe you should apply and not tell them, in the interest of equal opportunity. It is an expensive and somewhat elitist school, rated #1 among women's colleges in the USA. Some famous aumnae. There are several areas around Philadelphia suburbs with Welsh names.
ID: 104382 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104383 - Posted: 15 May 2021, 4:45:24 UTC - in response to Message 104380.  

Bryn Mawr (a prestigious college near Philadelphia where I'm from):

Named after a mining village in South Wales where the original settlers (and myself) came from.

The fact it’s a ladies’ Colledge has caused me grief in the past leading people to assume I’m of the feminine persuasion.


Actually, although the Bryn Mawr undergraduates are all women, the graduate school does admit men.

Back to BOINC, In the Activities tab of the BOINC Manager I selected Suspend GPU. I did selected the four boxes in the WCG page Advanced
options tab as below:

Graphics Card Usage
Do work on my graphics card while computer is in use?
No
Use my AMD/ATI graphics card if possible:
No
Use my Intel graphics card if possible:
No
Use my NVIDIA graphics card if possible:
No

And saved that device profile. I searched the entire BOINC Manager and WCG site for a anything else that mentioned GPUs for anything else I could de-select. But those efforts produced no change in processing of the latest WCG tasks (25-hour completion times, preventing the computer from working on other projects, etc.). Today, the computer ran a Rosetta task and an Asteroids task simultaneously, Now it's running two Asteroids tasks. Those things haven't happened for a long time.

I don't know where to find an earlier version of WCG that doesn't feature GPUs.

So now I have suspended the WCG project until I can find a solution.

I appreciate all the suggestions offered on this message board.

Thank you.
Cheers.
S. Gaber
ID: 104383 · Report as offensive     Reply Quote
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 14626
Netherlands
Message 104391 - Posted: 16 May 2021, 14:52:47 UTC - in response to Message 104383.  

Back to BOINC, In the Activities tab of the BOINC Manager I selected Suspend GPU.
Which only suspends work in progress, it doesn't stop you from getting work for the GPU.

I did selected the four boxes in the WCG page Advanced

And saved that device profile.
Ah, but did you also tell to use that device profile? In https://www.worldcommunitygrid.org/ms/device/viewProfiles.do you can see you have 4 different profiles. If you saved the changes in the default profile, but chose to use the Home profile here, it won't use the changes you saved.
https://www.worldcommunitygrid.org/ms/device/viewDevices.do, click on your device's name, make sure the profile selected here matches the profile you made the changes in.

And crucially, what is the status of "If there is no work available for the project(s) I have selected above, please send me work from another project." ?? If it's checked, this will probably send you GPU work when you don't want it.
ID: 104391 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104412 - Posted: 19 May 2021, 5:20:18 UTC - in response to Message 104391.  

Back to BOINC, In the Activities tab of the BOINC Manager I selected Suspend GPU.
Which only suspends work in progress, it doesn't stop you from getting work for the GPU.

I did selected the four boxes in the WCG page Advanced

And saved that device profile.
Ah, but did you also tell to use that device profile? In https://www.worldcommunitygrid.org/ms/device/viewProfiles.do you can see you have 4 different profiles. If you saved the changes in the default profile, but chose to use the Home profile here, it won't use the changes you saved.
https://www.worldcommunitygrid.org/ms/device/viewDevices.do, click on your device's name, make sure the profile selected here matches the profile you made the changes in.

And crucially, what is the status of "If there is no work available for the project(s) I have selected above, please send me work from another project." ?? If it's checked, this will probably send you GPU work when you don't want it.


I don't know where to find an earlier version of WCG that doesn't feature GPUs.
So now I have suspended the WCG project until I can find a solution.
I appreciate all the suggestions offered on this message board.[quote]

I tried to do all th e things you suggested. Then I resumed (un-suspended) WCG.

Now, WCG isn't sending me anything at all, despite several attempts to update.

This is beginning to be more trouble than anything.

Can you suggest any other projects that feature consistency, stability, doing important science and will actually work on my computer?? I'm already running Rosetta and Asteroids.

S. Gaber
ID: 104412 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 28 Jun 20
Posts: 31
United States
Message 104608 - Posted: 21 Jun 2021, 5:40:09 UTC
Last modified: 21 Jun 2021, 5:40:53 UTC

Somehow, most of the problems with extra-long completion times and logging me off have resolved themselves, either by divine intervention, possibly the ministrations of support entities or maybe even something unknown that I did.

Now all four of my projects are running: WCG, Asteroids, Milky Way and Rosetta.

Most of the problem was with World Community Grid, or so I thought. Lately I've been getting mostly normal W.U.s that take 2-3 hours to complete. Now and then I get some that show 2% completion after 5 hours of running. I abort those.

And WCG may have been wrongly accused of bumping me off BOINC, deleting all my projects and tasks and not being able to connect.

Turns out, I think it was Rosetta doing all of that when I update the project or report finished tasks. And when it does happen, sometimes it resolves itself, with my projects and tasks bl.inking on and off the monitor before stabilizing. When it does not stabilize, I restart the compute and that fixes it till the next time.

Sorry, WCG, if I bore false witness. It was not intentional. So all is mostly right with the world.

Unless you listen to the news or pay attention to social media.

S. Gaber
ID: 104608 · Report as offensive     Reply Quote

Message boards : Questions and problems : World Community Grid gets stuck

Copyright © 2021 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.