AMD's Radeon open compute (RocM) has problem with boinc

Message boards : Questions and problems : AMD's Radeon open compute (RocM) has problem with boinc
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 538
United States
Message 95996 - Posted: 21 Feb 2020, 22:05:32 UTC
Last modified: 21 Feb 2020, 22:06:10 UTC

Been running the tests at Einstein using their beta app that relies on the RocM driver. Boinc does not see multiple GPUs, only the one in the X16 slot. Going to post url to the message over at Einstein rather than duplicate it all here. I actually was unaware of this driver prior to seeing the complaint over there that the app was not working.

https://einsteinathome.org/content/clbuildprogramfailure-02mdf-gw-opencl-ati?page=1#comment-175716
ID: 95996 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 366
United States
Message 95998 - Posted: 22 Feb 2020, 2:52:21 UTC - in response to Message 95996.  

It is supposed to be the next big thing for compute on AMD. Still in infancy regarding majority of compatibility with BOINC projects.
ID: 95998 · Report as offensive     Reply Quote
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 538
United States
Message 96003 - Posted: 22 Feb 2020, 15:56:16 UTC - in response to Message 95998.  
Last modified: 22 Feb 2020, 15:57:25 UTC

yea, next big thing if you buy the best AMD motherboard and GPUs money can buy.. Not going to repost everything from that thread over at Einstein, summery below as applies to linux:

ROCm or rocM or WTF it is called requires you have as many GEN3 PCIe lanes to the GPU as you have GPUs. It is part of all recent amdpro drivers, just needs to be activated to work.

Once installed, if you do not have enough "PCIe atomics" (nice word for gen3) then those GPUs not having a gen3 lane to the CPU disappear for apps like boinc. They will show up using, for example "clinfo", or "sensors" or any diagnostics program. So, like me, if you have 5 GPUs but boinc only sees a single one and all the diagnostics you run indicated there are 4 more GPUs available you starts suspecting that something is wrong with BOINC not realizing the problem is the AMD driver that wants to run only on a superhighway and ignores all the little x1 back roads.
ID: 96003 · Report as offensive     Reply Quote
akostadinov

Send message
Joined: 19 Mar 20
Posts: 1
Bulgaria
Message 96922 - Posted: 19 Mar 2020, 10:19:42 UTC - in response to Message 96003.  

Do you say that boinc works for you with a single card and ROCm? This will be very interesting to know.

You shouldn't need atomics for vega and later cards. Only for older. With a VEGA 10 I was running on PCIe 1.1. I was on ROCm 1.9.
I fail to run Folding@Home so if I can run BOINC that would be nice.
ID: 96922 · Report as offensive     Reply Quote
Gary Roberts

Send message
Joined: 7 Sep 05
Posts: 124
Australia
Message 96947 - Posted: 20 Mar 2020, 0:41:57 UTC - in response to Message 95996.  
Last modified: 20 Mar 2020, 0:49:03 UTC

Been running the tests at Einstein using their beta app that relies on the RocM driver.
I've highlighted the bit that is just plain wrong. There is no app that needs ROCm. Maybe it would be a good idea to read the whole thread in it's entirety and the following key points would then emerge.

    * A user who had a properly configured ROCm system posted about the GW app crashing.
  • The precise error message (clearly listed) pointed to a coding style in the Einstein app that wasn't supported under ROCm.
  • When Bernd's attention was eventually attracted, the problem got passed to the app author/developer.
  • A 'fix' was quickly developed and a test app was distributed that solved the problem for ROCm.
  • I ran this app deliberately on non-ROCm systems to make sure there were no regressions. There was no problem.
    * Bernd has since taken the app out of beta. It became the default app. To my knowledge, there continue to be no app related problems.



I suspect any problem you have is down to you trying to use non-ROCm-compliant hardware configurations. Have you really tried reading some documentation? Here are a couple of key points from that link that might be appropriate to your setup. You should read the whole document carefully to really be sure your setup is fully compliant with all the restrictions.

As such, by default ROCm requires that these GPUs be installed in PCIe slots with PCI Express 3.0 or higher capabilities with transfer rates of 8.0 GT/s in either x16 or x8 lanes. The system configuration can have the PCIe slots directly on CPU’s root port or a PCIe switch, but everything between the CPU and the GPU must support atomics.


Note that the physical PCIe slot size does not guarantee support for ROCm. Some motherboards have physical x16 PCIe slots, but the PCIe connector is electrically connected as PCIe Express 2.0 to the southbridge. Since the PCIe slot connector matters to the GPU, care must be taken to not place them in on motherboards configured this way.


The ROCm kernel driver logs if ROCm capable GPUs are installed on system that does not support PCIe atomics.

Example text from kernel log:

kfd: skipped device 1002:7300, PCI rejects atomics


You actually posted that very message (shown above) over at Einstein. I know nothing about ROCm but the documentation (which I've just now found and read) seems to be saying that your problem is your non-ROCm-compliant hardware setup. In other words, nothing to do with any purported conflict between ROCm and BOINC. You should try to be less misleading in your choice of thread title.
Cheers,
Gary.
ID: 96947 · Report as offensive     Reply Quote

Message boards : Questions and problems : AMD's Radeon open compute (RocM) has problem with boinc

Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.