Thread 'PCI express risers to use multiple GPUs on one motherboard

Author	Message
Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95322 - Posted: 19 Jan 2020, 2:58:40 UTC - in response to Message 95320. Last modified: 19 Jan 2020, 2:59:09 UTC friend of mine had something similar happen where he works (government/military installation), an employee was caught using their computers for coin mining. he's in jail now. ID: 95322 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95329 - Posted: 19 Jan 2020, 17:41:32 UTC - in response to Message 95327. friend of mine had something similar happen where he works (government/military installation), an employee was caught using their computers for coin mining. he's in jail now. Ouch. Well I was running Boinc, which isn't for my profit, and it wasn't a secret organisation I was borrowing the computers from. Some of the staff actually appreciated what I was doing. yeah if you have permission its no big deal. about PCIe releases. those dates are releases of the specs/technology, but not that mainstream products were using them. PCIe 4.0 motherboards weren't largely available until July 2019 when AMD released their X570 boards, and GPUs weren't available with PCIe 4.0 until AMD released the RX5700/XT. There may have been some niche/custom/enterprise stuff available before this, but not widely available as consumer products. as for "why"? lack of a need. GPUs weren't bottlenecking on PCIe 3.0 x16 yet, so there was no rush. I think only just now, are the fastest GPUs starting to creep up on that limit, where the extra bandwidth from 4.0 will matter in some applications. 5.0 now is where 4.0 was a few years ago. so it will be several more years before we start to see any 5.0 products. ID: 95329 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95345 - Posted: 20 Jan 2020, 15:55:02 UTC Last modified: 20 Jan 2020, 15:56:28 UTC circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing. I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore. As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips. luckily Gamma-ray tasks don't seem to mind running on a weak CPU, but you may have bad results with the Gravity wave tasks. you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project. ID: 95345 ·

Joseph Stateson Volunteer tester Send message Joined: 27 Jun 08 Posts: 642	Message 95349 - Posted: 20 Jan 2020, 17:16:18 UTC - in response to Message 95345. Last modified: 20 Jan 2020, 17:57:42 UTC circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing. I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore. As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips. luckily Gamma-ray tasks don't seem to mind running on a weak CPU, but you may have bad results with the Gravity wave tasks. you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project. That gravity wave "2.07" consistently take 100% of CPU on my 4/8t and I had to limit concurrent tasks to 6 (not just 8) and also exclude the Zotac P106-90 card which was OK on SETI but not too useful on Einstein. However, Asteroids at home uses only 0.01 CPU and that seem to work OK on the two slowest GPUs. Currently running 6 of the gravity and 2 Asteroids. Maybe you can comment on this post and bump up my question. [edit] Should be running 3 Asteroids as there are a total of 9 GPUs. the cpu count should be six of the 1.0 and three of the 0.01 but unaccountably only 2 Asteroids are running. Took a while and scheduling priority went from -1,000 to only -0.29 but I am not running the additional Asteroids. Something in 7.16.3 I think ID: 95349 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95376 - Posted: 21 Jan 2020, 15:47:07 UTC - in response to Message 95351. Agreed - Gravity tasks are impossible on that machine. Even one card running 4 tasks at once, so they can use the 4 CPU cores, I only get 40% GPU usage. I'd get through Gravity damn fast on those cards, but a) I'd need a better CPU, and b) since they're a good DP:SP ratio, I'm using them for MW whenever they have tasks available. I also notice I never get the CPU above 90%, no matter what is running, I assume that's probably the slow DDR2 memory? on such a slow system, i think your only shot at running GW would be to run 1 GPU only with 1 GW task and no other CPU tasks running. but maybe not even then. I did some playing around with the GW tasks, and I was only able to get good results running it on my monstrous 10-GPU (RTX 2070) system with 2x 10c/20t CPUs, 40 threads total. the GW GPU tasks use a lot more CPU support than any other task i've seen. using about 1.2-1.5 CPU threads for each GPU WU. running 1x GW WU per GPU i can get ~70-80% GPU utilization with 35-35% CPU used (12-14 threads for 10x GPU WUs). running 2x GW WU per GPU i can get 95-98% GPU utilization, with 60-70% CPU used (24-28 threads used for 20x GPU WUs). that's with 2x E5-2680v2 running about 3.1GHz. ID: 95376 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95384 - Posted: 21 Jan 2020, 16:37:28 UTC - in response to Message 95380. No, that would be worse. That would give only 1 CPU core to help it. Running four is better, then the CPU gets all its cores used. Then I'd run another non GW task on the GPU aswell to fill it up. incorrect. the job will use as much CPU as it can and/or needs. running 1 WU on the GPU does not limit the CPU to only 1 thread. my system overflows CPU use into a second thread for each GPU WU. try it. you'll probably see >25% CPU used meaning more than one CPU thread being used. running 4 is worse because now you are pegging the CPU out and it can't give anymore. forcing all 4 jobs to run super slow. ID: 95384 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95386 - Posted: 21 Jan 2020, 17:10:21 UTC - in response to Message 95385. On that subject I did notice a more modern but slower laptop CPU seemed to do what you say with programs I've never seen be able to use more than one core. Perhaps multi-threading has been improved on the CPU level? P.S. I now have my 4 way multiplexer and it works perfectly for MW and Gamma. Two GPUs sharing one PCI 1.1 x1 lane, both GPUs maxed out and tasks completing in the same time as before. perhaps. if somehow the app is able to distinguish between a core with 2 threads and a core with 1 thread. but I was under the impression that at the OS level where the app runs it's all just threads and the app won't know if a thread is a whole core of "half" a core. I really don't know. system memory, both speed and bandwidth could be slowing you down also. DDR2 is quite slow by itself, and on that board you're only running dual channel @800MHz or less. my system in comparison is running 1600MHz ram in quad channel mode. possible it can't push past a single core because the memory is the bottleneck. just spitballing. good to know that MW also doesn't care about PCIe bandwidth. should be a fine system if you remain aware of the limited capabilities on the CPU side. ID: 95386 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 241	Message 95388 - Posted: 21 Jan 2020, 17:33:29 UTC - in response to Message 95387. So how come the memory allows 4 cores to be used if 4 GW tasks are run? Because you are spinning up 4 separate instances of the same application. Each new instance starts on a new thread. But it’s likely running slower than just running 1. And what on earth is "quad channel memory"? Is this a new thing? My 2 year old https://www.gigabyte.com/Motherboard/Z370-HD3P-rev-10#kf only has dual channel, even though it takes 4 DIMMs. Exactly what it sounds like. 4 memory channels instead of 2. I’m running old enterprise grade stuff. Xeon E5-2680v2 CPUs on a Supermicro motherboard. On the consumer desktop/“prosumer” end this would be called “HEDT” or high end desktop. HEDT and enterprise stuff has supported quad channel memory for a long time since about 2011/2012 or so. Quad channel is also standard in current HEDT platforms such as Intel X299 or AMD threadripper. In the server space, Intel has 6-channel memory and the latest AMD Epyc CPUs have 8-channel memory. ID: 95388 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 719	Message 95502 - Posted: 24 Jan 2020, 2:22:38 UTC Last modified: 24 Jan 2020, 2:52:08 UTC Boinc is closer to folding, as it is to mining. Mining is sending a miniscule amount of data to a GPU, and let it crunch it, and return a miniscule result. Folding is sending less of a miniscule amount of data, and return a packet of data. The amount of data read and written over the pcie Lanes is approximately 20-200MB for less than half a second, every ten or so seconds. That means if the pcie Lanes slow down twice the speed, the result will take double as long to transfer. 1 second out of 10, means a 5% slower speed, or 10% lower PPD (due to QRB). For an RTX 2060 with 1920 cores, that threshold is at PCIE 3.0 1x speeds. Plug it into a pcie 2.0 1x slot, and you'll lose 20% PPD compared to a 3.0 slot. Pcie 2.0 1x slots have a 5% speed drop (10% PPD drop) from a GTX1050 or equivalent. Any slower GPUs, and the speed/ppd drop will be less. Any faster GPUs, and the speed drop will be more significant. On mining, you could plug almost any GPU on a pcie 1.0 1x slot just fine. (Provided not the fastest RTX GPUs, they didn't exist for most of that hardware). For BOINC, you'll need to keep about the same parameters in mind as folding. If you could split up a pcie 1x slot to 4 slots, you'll be running it at 1/4 th the speed, which might be too slow for modern GPUs. That mining board I posted? It provides PCIE 2.0 1x slots (for up to a GTX 1050, a 1060 if you push it). Those run well at 65-90W (power capped and overclocked), which means you can easily connect 16 GPUs to that board (on 2 PSUs). For that, you'll preferably run a core i7, with 8threads or more, at 4Ghz or more (so the CPU won't bottleneck). That way, you'll be running about 4gpus per core. Preferably a 10 core Xeon , but they don't fit the board. If you run it with anything slower, the CPU won't keep up. Each GPU has it's own pcie Lane, which means a lot more data speed than running through a pcie splitter (which usually means 1/4th the speed at best, not counting on network traffic collisions). ID: 95502 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 719	Message 95580 - Posted: 29 Jan 2020, 2:12:33 UTC I know for Folding, there's a benchmark utility. This isn't available for Boinc. But the best way to see how your GPU is doing, is to use Boinc for a few days, on a single project, and at the same time, run it from a GPU placed in a PCIE 16x slot. And see how their score differs after a few days. The x1 slot may seem like it's crunching fine, but could be crunching at only 80% (or less) of it's potential. ID: 95580 ·

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Thread 'PCI express risers to use multiple GPUs on one motherboard - not detecting card?'