Message boards : GPUs : Can't get the AMD Radeon RX 7900 XTX to crunch for BOINC projects on Ubuntu 22.04 LTS
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Feb 23 Posts: 10 |
Dear all, I have a new computer dedicated to BOINC, featuring an AMD Ryzen 9 7950X CPU and an AMD Radeon RX 7900 XTX GPU. I'm getting good results with projects like Universe@Home and Asteroids@Home on the CPU front, but so far I've been unable to get the GPU to work with BOINC. The computer is running Ubuntu 22.04 LTS x86_64, and I followed the official procedure to install the AMD drivers: https://amdgpu-install.readthedocs.io/en/latest/ I tried two GPU projects: Einstein@Home and Milkyway@Home, and both failed in similar ways: Error at Einstein@Home: https://einsteinathome.org/fr/task/1423890400 Error at Milkyway@Home: https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=685395746 I believe the drivers are correctly installed, at least "clinfo" is working just fine. I know BOINC complains about the lack of memory in the logs but this is nonsense IMHO: the GPU has 24GB or RAM, and the host itself has over 10GB of RAM available, out of 16GB. What am I doing wrong ? Should I report a bug over at AMD, as per https://amdgpu-install.readthedocs.io/en/latest/install-bugrep.html ? Thank you all in advance for your help :) Best regards, Samuel |
Send message Joined: 17 Nov 16 Posts: 888 |
I find the use of BOINC 7.18.1 the most suspect issue. This release was NEVER intended for x86 architectures. The 7.18.1 release was intended ONLY for Android. The issue is that the distro maintainers grabbed the wrong branch to include in their distros. And never have fixed the issue even though they have been informed of the problem from multiple sources. I highly recommend changing to the latest 7.20.5 BOINC release for Linux at the Costamagna PPA. https://launchpad.net/~costamagnagianfranco/+archive/ubuntu/boinc |
Send message Joined: 7 Feb 23 Posts: 10 |
Dear Keith, I upgraded BOINC to 7.20.5 as you told me to and tried to run some Einstein@Home jobs on the GPU. No luck so far: https://einsteinathome.org/fr/task/1428468904 What am I doing wrong ? Best regards, Samuel |
Send message Joined: 29 Aug 05 Posts: 15552 |
Before the data is lost (tasks don't stay in the database forever, they get deleted once they ran to completion fine on two different systems +24 hours), here's the error output: <core_client_version>7.20.5</core_client_version>With this being the main culprit I think: Couldn't create OpenCL command queue (error: -6)! OpenCL shutdown complete! initialize_ocl returned error [2013] OCL context null OCL queue null Error generating generic FFT context object [5] 16:58:06 (2027): [CRITICAL]: ERROR: MAIN() returned with error '5' |
Send message Joined: 7 Feb 23 Posts: 10 |
Thank you @Jord for copying the error output to this thread :) There are two platforms showing in "clinfo": I believe they are the Radeon RX 7900 XTX GPU and the embedded GPU part on board the Ryzen 9 7950X, right ? https://www.amd.com/en/products/graphics/amd-radeon-rx-7900xtx https://www.amd.com/en/product/12151 One thing I find odd in the "clinfo" output: "max compute units" is half what it should be, i.e 48 vs. 96 CU on the GPU, and 1 vs. 2 on the CPU: oot@kymera:~# clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (3513.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Radeon RX 7900 XTX Device Topology: PCI[ B#3, D#0, F#0 ] Max compute units: 48 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 3220Mhz Address bits: 64 Max memory allocation: 21890072576 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 16384 Max image 3D height: 16384 Max image 3D depth: 8192 Max samplers within kernel: 29772 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 25753026560 Constant buffer size: 21890072576 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 415236096 Max global variable size: 21890072576 Max global variable preferred total size: 25753026560 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 32 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0x7f46a119ceb0 Name: gfx1100 Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 3513.0 (HSA1.1,LC) Profile: FULL_PROFILE Version: OpenCL 2.0 Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Device Topology: PCI[ B#17, D#0, F#0 ] Max compute units: 1 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 2200Mhz Address bits: 64 Max memory allocation: 456340272 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 16384 Max image 3D height: 16384 Max image 3D depth: 8192 Max samplers within kernel: 5710 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 536870912 Constant buffer size: 456340272 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 456340272 Max global variable size: 456340272 Max global variable preferred total size: 536870912 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 32 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0x7f46a119ceb0 Name: gfx1036 Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 3513.0 (HSA1.1,LC) Profile: FULL_PROFILE Version: OpenCL 2.0 Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program Best regards, Samuel |
Send message Joined: 29 Aug 05 Posts: 15552 |
One thing I find odd in the "clinfo" output: "max compute units" is half what it should be, i.e 48 vs. 96 CU on the GPU, and 1 vs. 2 on the CPU:For the GPU it says it has 2 devices, so if each device takes 48 compute units, that's also 96. Anyway, your OpenCL is ROCr, and I am not sure that it is supported by the projects. One OpenCL (ROCr) isn't the same as the next OpenCL (ROCm), especially on Linux. ROCm/ROCr isn't for usual users, it seems developped for specific industrial usages, it does not support graphical applications (AMD said it's temporary but that can last for a long time) and only supports a very small amount of hardware : a tiny selection of PCIe graphics cards and no one integrated graphics solution from AMD APUs. Currently only three chips are said to be supported by ROCm.Source. Hardware support is spotty: Hardware and Software Support. Have you tried instead these drivers? https://www.amd.com/en/support/linux-drivers |
Send message Joined: 7 Feb 23 Posts: 10 |
This is where I downloaded the drivers: https://www.amd.com/en/support/graphics/amd-radeon-rx-7000-series/amd-radeon-rx-7900-series/amd-radeon-rx-7900xtx I followed these instructions: https://amdgpu-install.readthedocs.io/en/latest/ There are two OpenCL implementations available: ROCr and "legacy": https://amdgpu-install.readthedocs.io/en/latest/install-script.html#specifying-an-opencl-implementation Should I give the "legacy" one a try ? Samuel |
Send message Joined: 7 Feb 23 Posts: 10 |
I just installed both OpenCL implementations (ROCr and legacy), still no luck: How could I be sure which implementation is used ? <core_client_version>7.20.5</core_client_version> <![CDATA[ <message> process exited with code 69 (0x45, -187)</message> <stderr_txt> 20:14:43 (2080): [normal]: This Einstein@home App was built at: Jan 16 2017 08:09:16 20:14:43 (2080): [normal]: Start of BOINC application '../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati'. 20:14:43 (2080): [debug]: 1e+16 fp, 1e+09 fp/s, 10500000 s, 2916h40m00s00 command line: ../../projects/einstein.phys.uwm.edu/hsgamma_FGRPB1G_1.18_x86_64-pc-linux-gnu__FGRPopencl1K-ati --inputfile ../../projects/einstein.phys.uwm.edu/LATeah4021L00.dat --alpha 0.943218186562 --delta 1.30995332125 --skyRadius 8.726650e-08 --ldiBins 30 --f0start 1180.0 --f0Band 8.0 --firstSkyPoint 0 --numSkyPoints 1 --f1dot -1e-13 --f1dotBand 1e-13 --df1dot 1.413729381e-15 --ephemdir ../../projects/einstein.phys.uwm.edu/JPLEPH --Tcoh 2097152.0 --toplist 10 --cohFollow 10 --numCells 1 --useWeights 1 --Srefinement 1 --CohSkyRef 1 --cohfullskybox 1 --mmfu 0.1 --reftime 56100 --model 0 --f0orbit 0.005 --mismatch 0.1 --demodbinary 1 --BinaryPointFile ../../projects/einstein.phys.uwm.edu/templates_LATeah4021L00_1188_9484839.dat --debug 0 --device 0 -o LATeah4021L00_1188.0_0_0.0_9484839_1_0.out output files: 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_0' 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu' '../../projects/einstein.phys.uwm.edu/LATeah4021L00_1188.0_0_0.0_9484839_1_1' 20:14:43 (2080): [debug]: Flags: X64 SSE SSE2 GNUC X86 GNUX86 20:14:43 (2080): [debug]: glibc version/release: 2.35/stable 20:14:43 (2080): [debug]: Set up communication with graphics process. boinc_get_opencl_ids returned [0x2804a50 , 0x7f0017225eb0] Using OpenCL platform provided by: Advanced Micro Devices, Inc. Using OpenCL device "gfx1100" by: Advanced Micro Devices, Inc. Max allocation limit: 21890072576 Global mem size: 25753026560 Couldn't create OpenCL command queue (error: -6)! OpenCL shutdown complete! initialize_ocl returned error [2013] OCL context null OCL queue null Error generating generic FFT context object [5] 20:14:43 (2080): [CRITICAL]: ERROR: MAIN() returned with error '5' FPU status flags: mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory mv: cannot stat 'LATeah4021L00_1188.0_0_0.0_9484839_1_0.out.cohfu': No such file or directory 20:14:55 (2080): [normal]: done. calling boinc_finish(69). 20:14:55 (2080): called boinc_finish </stderr_txt> ]]> |
Send message Joined: 10 Sep 05 Posts: 726 |
This is an issue with the E@h and MW@h home apps. Please report the problem on their message boards (hopefully the developers will see it) -- D |
Send message Joined: 7 Feb 23 Posts: 10 |
Dear David, I followed your advice and posted a message at the Einstein@Home boards: https://einsteinathome.org/fr/content/cant-get-amd-radeon-rx-7900-xtx-crunch-einsteinhome-ubuntu-2204-lts Best regards, Samuel |
Send message Joined: 7 Feb 23 Posts: 10 |
Dear all, Problem solved :) https://einsteinathome.org/fr/content/cant-get-amd-radeon-rx-7900-xtx-crunch-einsteinhome-ubuntu-2204-lts I had to install the full ROCm package (https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html) and edit /usr/lib/systemd/system/boinc-client.service: #ProtectSystem=strict ProtectSystem=off Milkyway@Home GPU tasks still refuse to run though, but it appears to be a well known issue: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4885&postid=75074#75074 Best regards, Samuel |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.