BOINC 7.16.11 / Intel GPU Jobs - stuck or running indefinitely

Message boards : Questions and problems : BOINC 7.16.11 / Intel GPU Jobs - stuck or running indefinitely
Message board moderation

To post messages, you must log in.

AuthorMessage
Bogdan

Send message
Joined: 26 May 21
Posts: 2
Message 104476 - Posted: 26 May 2021, 7:55:10 UTC

Hi BOINC Team and Community,

I've noticed that in the last few days my laptop started receiving Intel GPU Jobs [OpenPandemics COVID 19 GPU 7.28 (opencl_intel_gpu_102)].

Their estimated runtime is around 15 minutes, but some of them have been running for more than 90 minutes - stuck at the some percentage (which varies across Jobs), the way to get them running is to close BOINC / suspend and re-enable GPU Jobs.

Could you please advise if this is a known issue or not.

If not, please let me know what logs I should provide, in order to troubleshoot this.

Config info below:
5/25/2021 4:51:44 PM | | Starting BOINC client version 7.16.11 for windows_x86_64
5/25/2021 4:51:44 PM | | log flags: file_xfer, sched_ops, task
5/25/2021 4:51:44 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2s zlib/1.2.8
5/25/2021 4:51:44 PM | | Data directory: C:\ProgramData\BOINC

5/25/2021 4:51:45 PM | | CUDA: NVIDIA GPU 0: NVIDIA GeForce GTX 1050 Ti (driver version 466.47, CUDA version 11.3, compute capability 6.1, 4096MB, 3379MB available, 2488 GFLOPS peak)
5/25/2021 4:51:45 PM | | OpenCL: NVIDIA GPU 0: NVIDIA GeForce GTX 1050 Ti (driver version 466.47, device version OpenCL 3.0 CUDA, 4096MB, 3379MB available, 2488 GFLOPS peak)
5/25/2021 4:51:45 PM | | OpenCL: Intel GPU 0: Intel(R) HD Graphics 630 (driver version 27.20.100.8681, device version OpenCL 2.1 NEO, 9787MB, 9787MB available, 211 GFLOPS peak)
5/25/2021 4:51:45 PM | | Windows processor group 0: 8 processors

5/25/2021 4:51:45 PM | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz [Family 6 Model 158 Stepping 9]
5/25/2021 4:51:45 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 tm2 pbe fsgsbase bmi1 smep bmi2
5/25/2021 4:51:45 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.21387.00)
5/25/2021 4:51:45 PM | | Memory: 23.89 GB physical, 27.86 GB virtual



Thank you,

Bogdan
ID: 104476 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4793
United Kingdom
Message 104477 - Posted: 26 May 2021, 8:37:53 UTC - in response to Message 104476.  

Ir's a known issue. I ran into it during the Beta phase, observed something which gave me a clue what was going wrong, and then tested/confirmed my suspicions. I've reported the problem and the likely cause to the project staff.

It's difficult. I've deliberately written that last sentence very vaguely, because the solution which I've adopted (and which is working fine on my personal machines to this day) involves disabling a safety feature which might be needed in other circumstances: the manufacturer's advice is very strongly that this should not be done except by software developers for testing purposes. I don't come into that category, but I can accept responsibility for my own machines only. I'm not writing details of the cause and solution in public, because I can't take responsibility for rendering another volunteer's machine inoperable - that's the worst-case scenario.

In the end, WCG will have to ask Scripps to re-write part of the application code so it can run, in particular, on the slower variants of the Intel GPU range. Until then, the only safe advice is to avoid using those iGPUs for the WCG covid-19 application.

Having said that, if you want to PM me and confirm that you understand the risks, I can point you to the official documentation with the warnings and the (simple) change which might be enough to get you running again.
ID: 104477 · Report as offensive
Bogdan

Send message
Joined: 26 May 21
Posts: 2
Message 104478 - Posted: 26 May 2021, 9:30:20 UTC - in response to Message 104477.  

Thank you for the quick and detailed reply - I've just reached out to you via PM.

Best,

Bogdan
ID: 104478 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 4793
United Kingdom
Message 104480 - Posted: 26 May 2021, 13:42:30 UTC - in response to Message 104478.  

Received and replied. Sorry, I got distracted by the real world.
ID: 104480 · Report as offensive

Message boards : Questions and problems : BOINC 7.16.11 / Intel GPU Jobs - stuck or running indefinitely

Copyright © 2022 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.