PCI express risers to use multiple GPUs on one motherboard - not detecting card?

Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

AuthorMessage
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95322 - Posted: 19 Jan 2020, 2:58:40 UTC - in response to Message 95320.  
Last modified: 19 Jan 2020, 2:59:09 UTC

friend of mine had something similar happen where he works (government/military installation), an employee was caught using their computers for coin mining.

he's in jail now.
ID: 95322 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95327 - Posted: 19 Jan 2020, 16:55:09 UTC - in response to Message 95322.  

friend of mine had something similar happen where he works (government/military installation), an employee was caught using their computers for coin mining.

he's in jail now.


Ouch. Well I was running Boinc, which isn't for my profit, and it wasn't a secret organisation I was borrowing the computers from. Some of the staff actually appreciated what I was doing.
ID: 95327 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95328 - Posted: 19 Jan 2020, 16:57:18 UTC - in response to Message 95175.  
Last modified: 19 Jan 2020, 16:57:51 UTC

Since there aren't too many PCIE 4.0 devices out, I wouldn't know about PCIE 4.0.


Hang on, Wikipedia says PCIE 4.0 came out in 2017, and 5.0 in 2019, and 6.0 is being planned for 2021. So why are most cards still on 3.0?

https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions
ID: 95328 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95329 - Posted: 19 Jan 2020, 17:41:32 UTC - in response to Message 95327.  

friend of mine had something similar happen where he works (government/military installation), an employee was caught using their computers for coin mining.

he's in jail now.


Ouch. Well I was running Boinc, which isn't for my profit, and it wasn't a secret organisation I was borrowing the computers from. Some of the staff actually appreciated what I was doing.

yeah if you have permission its no big deal.

about PCIe releases. those dates are releases of the specs/technology, but not that mainstream products were using them. PCIe 4.0 motherboards weren't largely available until July 2019 when AMD released their X570 boards, and GPUs weren't available with PCIe 4.0 until AMD released the RX5700/XT. There may have been some niche/custom/enterprise stuff available before this, but not widely available as consumer products.

as for "why"? lack of a need. GPUs weren't bottlenecking on PCIe 3.0 x16 yet, so there was no rush. I think only just now, are the fastest GPUs starting to creep up on that limit, where the extra bandwidth from 4.0 will matter in some applications.

5.0 now is where 4.0 was a few years ago. so it will be several more years before we start to see any 5.0 products.
ID: 95329 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95330 - Posted: 19 Jan 2020, 18:15:46 UTC - in response to Message 95329.  

about PCIe releases. those dates are releases of the specs/technology, but not that mainstream products were using them. PCIe 4.0 motherboards weren't largely available until July 2019 when AMD released their X570 boards, and GPUs weren't available with PCIe 4.0 until AMD released the RX5700/XT. There may have been some niche/custom/enterprise stuff available before this, but not widely available as consumer products.

as for "why"? lack of a need. GPUs weren't bottlenecking on PCIe 3.0 x16 yet, so there was no rush. I think only just now, are the fastest GPUs starting to creep up on that limit, where the extra bandwidth from 4.0 will matter in some applications.

5.0 now is where 4.0 was a few years ago. so it will be several more years before we start to see any 5.0 products.


Even the fastest current NVME drive only needs PCI 3.0 x4. Well I guess it's good to make everything as fast as possible. Especially for nuts like us that connect billions of things to the bus :-)

Even just two NVMEs in a mirror arrangement, plus two GPUs in crossfire for a game, that would use 40 lanes of 3.0, which I assume is more than available. Get everything on 4.0 and they can all share nicely.
ID: 95330 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95345 - Posted: 20 Jan 2020, 15:55:02 UTC
Last modified: 20 Jan 2020, 15:56:28 UTC

circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing.

I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore.

As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without

The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips. luckily Gamma-ray tasks don't seem to mind running on a weak CPU, but you may have bad results with the Gravity wave tasks.

you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project.
ID: 95345 · Report as offensive     Reply Quote
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 538
United States
Message 95349 - Posted: 20 Jan 2020, 17:16:18 UTC - in response to Message 95345.  
Last modified: 20 Jan 2020, 17:57:42 UTC

circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing.

I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore.

As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without

The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips. luckily Gamma-ray tasks don't seem to mind running on a weak CPU, but you may have bad results with the Gravity wave tasks.

you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project.


That gravity wave "2.07" consistently take 100% of CPU on my 4/8t and I had to limit concurrent tasks to 6 (not just 8) and also exclude the Zotac P106-90 card which was OK on SETI but not too useful on Einstein. However, Asteroids at home uses only 0.01 CPU and that seem to work OK on the two slowest GPUs. Currently running 6 of the gravity and 2 Asteroids. Maybe you can comment on this post and bump up my question.

[edit] Should be running 3 Asteroids as there are a total of 9 GPUs. the cpu count should be six of the 1.0 and three of the 0.01 but unaccountably only 2 Asteroids are running. Took a while and scheduling priority went from -1,000 to only -0.29 but I am not running the additional Asteroids. Something in 7.16.3 I think
ID: 95349 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95351 - Posted: 20 Jan 2020, 18:59:44 UTC - in response to Message 95345.  
Last modified: 20 Jan 2020, 19:02:53 UTC

circling back to the original discussion. I signed up for Einstein at home and did some PCIe testing.

I had always heard from other users that Einstein was PCIe dependent, to the point that anything less than x16 links caused tasks to run slower. but actual testing on numerous different cards and PCIe lane widths shows that's not true. Einstein is even less PCIe dependent on both the Gamma Ray and Gravity Wave tasks. I saw about 1% PCIe bus use on both types of tasks on just a PCIe 3.0 x1 link, so it's no surprise that you haven't seen a slowdown. In light of this, it looks like SETI actually uses more PCIe bandwidth (at least on the optimized CUDA special App). Maybe in the past with old tasks Einstein used to have more reliance on PCIe, but it does not appear to be the case anymore.

As far as how many cards you can run, you will have to test and find the limiting factor of how many GPUs can be attached before the system will no longer boot. my guess is it will be somewhere between 3-7 GPUs. no way to tell without


According to the Windows task manager, each card uses one seventh of the bandwidth of PCIE 1.0 x1. So I can run 7 cards per 1.0 lane. Since I have two of those (=14 cards) and two 2.0 x16s (448 cards), I have a theoretical limit of 462 cards :-)

The next limit will be CPU resources to support the GPU tasks. you only have a 4c/4t CPU, and a rather old/weak one at that compared to modern chips.


Yip, it's an old (free) machine I cobbled together, it runs the noisy large GPUs in the garage. My main PC is an i5 8600K, and it's graphics card is slower (but quiet!), so it manages Gravity just fine.

With two R9 280x GPUs, the CPU is used 10.03% for Milkyway, and 23.65% for Gamma. So a limit of about 18 cards for Milkyway or 7 cards for Gamma.

luckily Gamma-ray tasks don't seem to mind running on a weak CPU,


Indeed, I changed my Einstein settings to only get Gamma for that machine (I put that PC in a different group to the others).

but you may have bad results with the Gravity wave tasks.


Agreed - Gravity tasks are impossible on that machine. Even one card running 4 tasks at once, so they can use the 4 CPU cores, I only get 40% GPU usage. I'd get through Gravity damn fast on those cards, but a) I'd need a better CPU, and b) since they're a good DP:SP ratio, I'm using them for MW whenever they have tasks available.

I also notice I never get the CPU above 90%, no matter what is running, I assume that's probably the slow DDR2 memory?

you'll have to test the impact to Milkway though. I'm not going to attach to that one with the machines I have now since it relies so heavily on DP performance and recent Nvidia cards like I have have abysmal DP performance for the cost/power use. I might build a Radeon VII based system in the future for Milkyway though, that's the best bang for buck card on that project.


I deliberately bought 280X as they are amazingly fast at DP and cost virtually nothing to buy 2nd hand. The electricity isn't a problem, I can get that quite cheaply.
ID: 95351 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95352 - Posted: 20 Jan 2020, 19:10:20 UTC - in response to Message 95349.  

That gravity wave "2.07"


How come I have Gravity 2.02 tasks? Is the windows version a bit behind?

consistently take 100% of CPU on my 4/8t and I had to limit concurrent tasks to 6 (not just 8) and also exclude the Zotac P106-90 card which was OK on SETI but not too useful on Einstein. However, Asteroids at home uses only 0.01 CPU and that seem to work OK on the two slowest GPUs. Currently running 6 of the gravity and 2 Asteroids.


Yip, I guess you need many projects to get 100% out of every chip you own. You could always do Gamma aswell if you want to do all Einstein on that machine. Gamma uses much less CPU.

Maybe you can comment on this post and bump up my question.


Tried to, but I apparently need some kind of "reputation"? Stackexchange seems a rather harsh place. Everybody gets yelled at for slight grammatical errors etc. I tend to avoid it.
ID: 95352 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95376 - Posted: 21 Jan 2020, 15:47:07 UTC - in response to Message 95351.  

Agreed - Gravity tasks are impossible on that machine. Even one card running 4 tasks at once, so they can use the 4 CPU cores, I only get 40% GPU usage. I'd get through Gravity damn fast on those cards, but a) I'd need a better CPU, and b) since they're a good DP:SP ratio, I'm using them for MW whenever they have tasks available.

I also notice I never get the CPU above 90%, no matter what is running, I assume that's probably the slow DDR2 memory?


on such a slow system, i think your only shot at running GW would be to run 1 GPU only with 1 GW task and no other CPU tasks running. but maybe not even then.

I did some playing around with the GW tasks, and I was only able to get good results running it on my monstrous 10-GPU (RTX 2070) system with 2x 10c/20t CPUs, 40 threads total. the GW GPU tasks use a lot more CPU support than any other task i've seen. using about 1.2-1.5 CPU threads for each GPU WU. running 1x GW WU per GPU i can get ~70-80% GPU utilization with 35-35% CPU used (12-14 threads for 10x GPU WUs). running 2x GW WU per GPU i can get 95-98% GPU utilization, with 60-70% CPU used (24-28 threads used for 20x GPU WUs). that's with 2x E5-2680v2 running about 3.1GHz.
ID: 95376 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95380 - Posted: 21 Jan 2020, 16:06:16 UTC - in response to Message 95376.  

on such a slow system, i think your only shot at running GW would be to run 1 GPU only with 1 GW task and no other CPU tasks running. but maybe not even then.


No, that would be worse. That would give only 1 CPU core to help it. Running four is better, then the CPU gets all its cores used. Then I'd run another non GW task on the GPU aswell to fill it up.

I did some playing around with the GW tasks, and I was only able to get good results running it on my monstrous 10-GPU (RTX 2070) system with 2x 10c/20t CPUs, 40 threads total. the GW GPU tasks use a lot more CPU support than any other task i've seen. using about 1.2-1.5 CPU threads for each GPU WU. running 1x GW WU per GPU i can get ~70-80% GPU utilization with 35-35% CPU used (12-14 threads for 10x GPU WUs). running 2x GW WU per GPU i can get 95-98% GPU utilization, with 60-70% CPU used (24-28 threads used for 20x GPU WUs). that's with 2x E5-2680v2 running about 3.1GHz.


Indeed, GW are CPU/GPU tasks, rather than more or less pure GPU tasks like Gamma and Milkyway. I'll just stick to running what suits my system best. I run GW on my other system which has more CPU than GPU in it.
ID: 95380 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95384 - Posted: 21 Jan 2020, 16:37:28 UTC - in response to Message 95380.  

No, that would be worse. That would give only 1 CPU core to help it. Running four is better, then the CPU gets all its cores used. Then I'd run another non GW task on the GPU aswell to fill it up.


incorrect. the job will use as much CPU as it can and/or needs. running 1 WU on the GPU does not limit the CPU to only 1 thread. my system overflows CPU use into a second thread for each GPU WU. try it. you'll probably see >25% CPU used meaning more than one CPU thread being used.

running 4 is worse because now you are pegging the CPU out and it can't give anymore. forcing all 4 jobs to run super slow.
ID: 95384 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95385 - Posted: 21 Jan 2020, 16:50:45 UTC - in response to Message 95384.  
Last modified: 21 Jan 2020, 16:52:05 UTC

No, that would be worse. That would give only 1 CPU core to help it. Running four is better, then the CPU gets all its cores used. Then I'd run another non GW task on the GPU aswell to fill it up.


incorrect. the job will use as much CPU as it can and/or needs. running 1 WU on the GPU does not limit the CPU to only 1 thread. my system overflows CPU use into a second thread for each GPU WU. try it. you'll probably see >25% CPU used meaning more than one CPU thread being used.

running 4 is worse because now you are pegging the CPU out and it can't give anymore. forcing all 4 jobs to run super slow.


Maybe it depends on how well the CPU is designed. I have tried what you say above on my old CPU. Running one GW task on the GPU causes the CPU to be 25% used (presumably 1 of 4 cores fully utilised). Running two GW tasks causes it to be 50% used, with the GPU completing both tasks in the same time (therefore doing twice the work), etc.

On that subject I did notice a more modern but slower laptop CPU seemed to do what you say with programs I've never seen be able to use more than one core. Perhaps multi-threading has been improved on the CPU level?

P.S. I now have my 4 way multiplexer and it works perfectly for MW and Gamma. Two GPUs sharing one PCI 1.1 x1 lane, both GPUs maxed out and tasks completing in the same time as before.
ID: 95385 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95386 - Posted: 21 Jan 2020, 17:10:21 UTC - in response to Message 95385.  

On that subject I did notice a more modern but slower laptop CPU seemed to do what you say with programs I've never seen be able to use more than one core. Perhaps multi-threading has been improved on the CPU level?

P.S. I now have my 4 way multiplexer and it works perfectly for MW and Gamma. Two GPUs sharing one PCI 1.1 x1 lane, both GPUs maxed out and tasks completing in the same time as before.


perhaps. if somehow the app is able to distinguish between a core with 2 threads and a core with 1 thread. but I was under the impression that at the OS level where the app runs it's all just threads and the app won't know if a thread is a whole core of "half" a core. I really don't know.

system memory, both speed and bandwidth could be slowing you down also. DDR2 is quite slow by itself, and on that board you're only running dual channel @800MHz or less. my system in comparison is running 1600MHz ram in quad channel mode. possible it can't push past a single core because the memory is the bottleneck. just spitballing.

good to know that MW also doesn't care about PCIe bandwidth. should be a fine system if you remain aware of the limited capabilities on the CPU side.
ID: 95386 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95387 - Posted: 21 Jan 2020, 17:19:57 UTC - in response to Message 95386.  
Last modified: 21 Jan 2020, 17:22:14 UTC

perhaps. if somehow the app is able to distinguish between a core with 2 threads and a core with 1 thread. but I was under the impression that at the OS level where the app runs it's all just threads and the app won't know if a thread is a whole core of "half" a core. I really don't know.


I wasn't considering threads and cores, since this CPU doesn't have HT. But on this CPU, a GW task will only use one core, even though since the GPU is hardly used, it could clearly take more. I assumed it was a single threaded application (for the CPU part).

system memory, both speed and bandwidth could be slowing you down also. DDR2 is quite slow by itself, and on that board you're only running dual channel @800MHz or less. my system in comparison is running 1600MHz ram in quad channel mode. possible it can't push past a single core because the memory is the bottleneck. just spitballing.


So how come the memory allows 4 cores to be used if 4 GW tasks are run?

And what on earth is "quad channel memory"? Is this a new thing? My 2 year old https://www.gigabyte.com/Motherboard/Z370-HD3P-rev-10#kf only has dual channel, even though it takes 4 DIMMs.

good to know that MW also doesn't care about PCIe bandwidth. should be a fine system if you remain aware of the limited capabilities on the CPU side.


It's not that limited really, since MW uses bugger all CPU. By my calculations, 18 GPUs can go in there. Boy will that make a noise.... definitely being run in the garage - but when I've finished my house extension, the heat can flow through to the house so it isn't wasted.
ID: 95387 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 115
United States
Message 95388 - Posted: 21 Jan 2020, 17:33:29 UTC - in response to Message 95387.  

So how come the memory allows 4 cores to be used if 4 GW tasks are run?


Because you are spinning up 4 separate instances of the same application. Each new instance starts on a new thread. But it’s likely running slower than just running 1.

And what on earth is "quad channel memory"? Is this a new thing? My 2 year old https://www.gigabyte.com/Motherboard/Z370-HD3P-rev-10#kf only has dual channel, even though it takes 4 DIMMs.


Exactly what it sounds like. 4 memory channels instead of 2. I’m running old enterprise grade stuff. Xeon E5-2680v2 CPUs on a Supermicro motherboard. On the consumer desktop/“prosumer” end this would be called “HEDT” or high end desktop. HEDT and enterprise stuff has supported quad channel memory for a long time since about 2011/2012 or so. Quad channel is also standard in current HEDT platforms such as Intel X299 or AMD threadripper. In the server space, Intel has 6-channel memory and the latest AMD Epyc CPUs have 8-channel memory.
ID: 95388 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95389 - Posted: 21 Jan 2020, 17:43:30 UTC - in response to Message 95388.  

Because you are spinning up 4 separate instances of the same application. Each new instance starts on a new thread. But it’s likely running slower than just running 1.


So similar to CPU cores, memory can do two things at once, but not one thing double speed? But since I've only got dual channel memory, I'm surprised I got more processing done with 4 GW tasks instead of 2.

Exactly what it sounds like. 4 memory channels instead of 2. I’m running old enterprise grade stuff. Xeon E5-2680v2 CPUs on a Supermicro motherboard. On the consumer desktop/“prosumer” end this would be called “HEDT” or high end desktop. HEDT and enterprise stuff has supported quad channel memory for a long time since about 2011/2012 or so. Quad channel is also standard in current HEDT platforms such as Intel X299 or AMD threadripper. In the server space, Intel has 6-channel memory and the latest AMD Epyc CPUs have 8-channel memory.


[Grumble] I thought I had a good board here.
ID: 95389 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 460
United States
Message 95502 - Posted: 24 Jan 2020, 2:22:38 UTC
Last modified: 24 Jan 2020, 2:52:08 UTC

Boinc is closer to folding, as it is to mining.
Mining is sending a miniscule amount of data to a GPU, and let it crunch it, and return a miniscule result.

Folding is sending less of a miniscule amount of data, and return a packet of data.

The amount of data read and written over the pcie Lanes is approximately 20-200MB for less than half a second, every ten or so seconds.

That means if the pcie Lanes slow down twice the speed, the result will take double as long to transfer.
1 second out of 10, means a 5% slower speed, or 10% lower PPD (due to QRB).

For an RTX 2060 with 1920 cores, that threshold is at PCIE 3.0 1x speeds.
Plug it into a pcie 2.0 1x slot, and you'll lose 20% PPD compared to a 3.0 slot.

Pcie 2.0 1x slots have a 5% speed drop (10% PPD drop) from a GTX1050 or equivalent. Any slower GPUs, and the speed/ppd drop will be less. Any faster GPUs, and the speed drop will be more significant.

On mining, you could plug almost any GPU on a pcie 1.0 1x slot just fine. (Provided not the fastest RTX GPUs, they didn't exist for most of that hardware).

For BOINC, you'll need to keep about the same parameters in mind as folding. If you could split up a pcie 1x slot to 4 slots, you'll be running it at 1/4 th the speed, which might be too slow for modern GPUs.

That mining board I posted? It provides PCIE 2.0 1x slots (for up to a GTX 1050, a 1060 if you push it).
Those run well at 65-90W (power capped and overclocked), which means you can easily connect 16 GPUs to that board (on 2 PSUs).
For that, you'll preferably run a core i7, with 8threads or more, at 4Ghz or more (so the CPU won't bottleneck). That way, you'll be running about 4gpus per core.
Preferably a 10 core Xeon , but they don't fit the board.
If you run it with anything slower, the CPU won't keep up.

Each GPU has it's own pcie Lane, which means a lot more data speed than running through a pcie splitter (which usually means 1/4th the speed at best, not counting on network traffic collisions).
ID: 95502 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 566
United Kingdom
Message 95514 - Posted: 24 Jan 2020, 18:15:57 UTC - in response to Message 95502.  
Last modified: 24 Jan 2020, 18:46:41 UTC

Boinc is closer to folding, as it is to mining.
Mining is sending a miniscule amount of data to a GPU, and let it crunch it, and return a miniscule result.

Folding is sending less of a miniscule amount of data, and return a packet of data.

The amount of data read and written over the pcie Lanes is approximately 20-200MB for less than half a second, every ten or so seconds.

That means if the pcie Lanes slow down twice the speed, the result will take double as long to transfer.
1 second out of 10, means a 5% slower speed, or 10% lower PPD (due to QRB).

For an RTX 2060 with 1920 cores, that threshold is at PCIE 3.0 1x speeds.
Plug it into a pcie 2.0 1x slot, and you'll lose 20% PPD compared to a 3.0 slot.

Pcie 2.0 1x slots have a 5% speed drop (10% PPD drop) from a GTX1050 or equivalent. Any slower GPUs, and the speed/ppd drop will be less. Any faster GPUs, and the speed drop will be more significant.

On mining, you could plug almost any GPU on a pcie 1.0 1x slot just fine. (Provided not the fastest RTX GPUs, they didn't exist for most of that hardware).

For BOINC, you'll need to keep about the same parameters in mind as folding. If you could split up a pcie 1x slot to 4 slots, you'll be running it at 1/4 th the speed, which might be too slow for modern GPUs.

That mining board I posted? It provides PCIE 2.0 1x slots (for up to a GTX 1050, a 1060 if you push it).
Those run well at 65-90W (power capped and overclocked), which means you can easily connect 16 GPUs to that board (on 2 PSUs).
For that, you'll preferably run a core i7, with 8threads or more, at 4Ghz or more (so the CPU won't bottleneck). That way, you'll be running about 4gpus per core.
Preferably a 10 core Xeon , but they don't fit the board.
If you run it with anything slower, the CPU won't keep up.

Each GPU has it's own pcie Lane, which means a lot more data speed than running through a pcie splitter (which usually means 1/4th the speed at best, not counting on network traffic collisions).


Lots of good information, thanks. I seem to be fine with Milkyway (it must have a pretty low amount of data to transfer) sharing two R9 280X GPUs on one PCIE 1.1 x1 slot. Two tasks running on each card so they're never waiting for data transfer or CPU. Also fine with Einstein Gamma tasks. When I hit a limit with more GPUs, I'll connect some cards to another computer - I have 4 now, I just inherited a laptop slightly more powerful than the desktop I have the GPUs on - I wonder if I can connect external GPUs to that somehow? [Searches Ebay] Yes I can :-) Well one per slot anyway: https://www.ebay.co.uk/itm/283755298920 Laptop only has one slot (it's internal for the wireless card I don't need), and no room in there, but I've found a small card that raises the socket up a bit first. Also, I'll try putting the 4 way multiplexer onto the end afterwards :-)
ID: 95514 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 460
United States
Message 95580 - Posted: 29 Jan 2020, 2:12:33 UTC

I know for Folding, there's a benchmark utility.
This isn't available for Boinc.
But the best way to see how your GPU is doing, is to use Boinc for a few days, on a single project, and at the same time, run it from a GPU placed in a PCIE 16x slot.
And see how their score differs after a few days.
The x1 slot may seem like it's crunching fine, but could be crunching at only 80% (or less) of it's potential.
ID: 95580 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 20 · Next

Message boards : GPUs : PCI express risers to use multiple GPUs on one motherboard - not detecting card?

Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.