Multiple AMD GPUs = crash?

Message boards : GPUs : Multiple AMD GPUs = crash?
Message board moderation

To post messages, you must log in.

AuthorMessage
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98623 - Posted: 18 May 2020, 23:26:41 UTC
Last modified: 18 May 2020, 23:29:15 UTC

Has anyone tried to get lots of AMD GPUs running on one machine? If so, were you successful? And what motherboard, CPU, GPUs do you have? Because on a variety of computers, I've had no luck getting more than two to run. I'm using those USB cable risers intended for bitcoin mining, and they don't cause a problem unless I try to use lots at once. I usually get "thread stuck in device driver" blue screen of death in Windows 10. Sometimes on booting, before Boinc runs. Sometimes within an hour or day of starting Boinc. It doesn't seem to be a limit on the number of risers, but on the number of cards. For example one machine will take two cards on ribbons, but adding a third on a riser causes some crashing. On another machine I have two on risers no problem. And some machines allow the use of a quad multiplexer riser and some crash a lot. There must be differences in the way the CPU or motherboard operate the PCI Express buses?
ID: 98623 · Report as offensive     Reply Quote
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 397
United States
Message 98628 - Posted: 19 May 2020, 2:14:30 UTC

Ian has lots of experience with using those USB based risers. Always stated you have to use quality shielded USB cables for anything to work or the cards fall off the bus.
ID: 98628 · Report as offensive     Reply Quote
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 116
United States
Message 98629 - Posted: 19 May 2020, 3:36:04 UTC - in response to Message 98628.  

Not always. But they do have a high failure rate for PCIe 3.0 speeds, sometimes they work though. I had like 2 work OK out of over 10 cables.

Peter has old boards not even capable of PCIe 3.0 though, and I think some of the slots are even PCIe 1.0 if I recall correctly.

But it’s probably a motherboard limitation, can’t handle multiple GPUs.

I don’t have any experience running multiple AMD GPUs for BOINC, but I had a machine running 8x RX 570s mining a few years ago. That system used a Pentium dual core CPU on a Z270 motherboard and running Windows 10.

There’s nothing special about an AMD system vs an nvidia system. You just need the right platform to begin with. I think if you want to seriously run a multi GPU setup you just need to invest in newer hardware. Z170 or newer, something with lots of PCIe slots. Intel motherboards seem to play nicer with this kind of thing than AMD motherboards.
ID: 98629 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98637 - Posted: 19 May 2020, 12:25:45 UTC - in response to Message 98629.  

Not always. But they do have a high failure rate for PCIe 3.0 speeds, sometimes they work though. I had like 2 work OK out of over 10 cables.

Peter has old boards not even capable of PCIe 3.0 though, and I think some of the slots are even PCIe 1.0 if I recall correctly.

But it’s probably a motherboard limitation, can’t handle multiple GPUs.

I don’t have any experience running multiple AMD GPUs for BOINC, but I had a machine running 8x RX 570s mining a few years ago. That system used a Pentium dual core CPU on a Z270 motherboard and running Windows 10.

There’s nothing special about an AMD system vs an nvidia system. You just need the right platform to begin with. I think if you want to seriously run a multi GPU setup you just need to invest in newer hardware. Z170 or newer, something with lots of PCIe slots. Intel motherboards seem to play nicer with this kind of thing than AMD motherboards.


Thanks. I'll just accept some machines only take a few cards. If I run out of places to connect cards reliably, I'll build another machine. I have this perhaps stupid habit of gathering free and cheap parts and making things out of them - it's fun. I guess if I was rich I'd build high spec stuff. But it's more satisfying to me making something for £100 than something for £2000. I get this sense of making use of old stuff that would have gone in the bin. I'm the same with cars, my car is 18 years old.

What's odd is things suddenly becoming unreliable. My Q8400 (oldest) machine used to be the best at it, it would take 4 cards. Then it only took 3. Now it's down to 2 or it won't run for 24 hours. And nothing will take the quad connectors anymore (4 cards on USB cables from one PCI slot), yet two of them used to be fine with it. Only 2 months ago I went on holiday for a week and left all 4 cards connected to one machine and they were still running when I got back. Try that now and it won't last a day. Driver change maybe? Board/cards getting older and tireder?
ID: 98637 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98638 - Posted: 19 May 2020, 12:44:40 UTC - in response to Message 98628.  
Last modified: 19 May 2020, 12:46:45 UTC

Ian has lots of experience with using those USB based risers. Always stated you have to use quality shielded USB cables for anything to work or the cards fall off the bus.


Yip, I've got good quality connections. I'll go with what Ian & Steve (which one am I talking to?) said. I suspect it's my old motherboards, since some are more tolerant than others. I shall just connect what connects reliably and if I run out of places to plug the cards in, I'll build another PC. My Boinc stuff (5 machines - the 6th is a normal computer I'm typing this on) now occupies a wooden cabinet of dimensions 5 foot by 4 foot by 1.3 foot. Worryingly it's in my recently built conservatory, which I've just realised gets very strong sunlight, should have faced the cabinet the other way, but there's plenty cooling fans. For example a row of four of these to blast air over the two dual xeon server motherboards (which aren't in their cases, just on a shelf): https://gridchoice.com/shop/fans-blowers/8489-used-computer-case-cooling-fan-torin-ta450-115vac.html Yes, they're 115 volt, I just wired them in pairs and they run off the UK's more manly 240V ok. But they're LOUD! I think they must be mainframe coolers or something. I acquired them 2nd hand 25 years ago from a company (Bull Electrical, doesn't seem to be trading any more) specialising in bankrupt stock and miscellaneous old electrical tat. I even got hold of a bunch of runway lights! I could replace the fans with quiet ones, but I refuse to throw out anything that's functional!
ID: 98638 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98640 - Posted: 19 May 2020, 14:14:34 UTC - in response to Message 98638.  

Yip, I've got good quality connections. I'll go with what Ian & Steve (which one am I talking to?) said. I suspect it's my old motherboards, since some are more tolerant than others. I shall just connect what connects reliably and if I run out of places to plug the cards in, I'll build another PC. My Boinc stuff (5 machines - the 6th is a normal computer I'm typing this on) now occupies a wooden cabinet of dimensions 5 foot by 4 foot by 1.3 foot. Worryingly it's in my recently built conservatory, which I've just realised gets very strong sunlight, should have faced the cabinet the other way, but there's plenty cooling fans. For example a row of four of these to blast air over the two dual xeon server motherboards (which aren't in their cases, just on a shelf): https://gridchoice.com/shop/fans-blowers/8489-used-computer-case-cooling-fan-torin-ta450-115vac.html Yes, they're 115 volt, I just wired them in pairs and they run off the UK's more manly 240V ok. But they're LOUD! I think they must be mainframe coolers or something. I acquired them 2nd hand 25 years ago from a company (Bull Electrical, doesn't seem to be trading any more) specialising in bankrupt stock and miscellaneous old electrical tat. I even got hold of a bunch of runway lights! I could replace the fans with quiet ones, but I refuse to throw out anything that's functional!


Moving computer setup to the cooler garage, no south facing windows.
ID: 98640 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 546
United States
Message 98649 - Posted: 19 May 2020, 17:13:32 UTC

I've ran multiple RTX 2080Tis on PCIE 3.0 with cheap x16 risers for many years, and for the most part they run without problems.

But when I do run them, I make sure I get the 'double ribbon' x16 risers.
The ones where 2 ribbons are in parallel with one another.

They're good enough for up to 20cm and never really failed me.
I've had 1 or 2 bad ones out of a good 10, one of them was because I had bent the cable and the solder joints came off, one of them had worn contact pins from swapping GPUs too much (over 20 swaps) and only ran at x4 or x2 speeds. Probably bad contact points.
And possibly 1 more with some errors that could have a variety of reasons non-PCIE related, but I just threw it away anyway, and installed a new one..

Once you get more than 20cm ~8in, you need shielded risers. The more expensive black ones.
They have much better cable mounting, and can withstand much better torquing and flexing.
For basic mounting a GPU and never touching it, the regular grey ones (with parallel ribbons) are good enough.



If I were you, I'd try to install the GPUs without risers on the motherboard, just to see if the motherboard accepts it like that (to rule out any bad risers).
You might also need to go into the Bios to see if the right speeds (PCIE 2.0 or 3.0) are selected, and if you need to enable a second GPU slot or not.
On some boards you first need to set the "Above 4g decoding" option, if your GPU has more than 4GB of VRAM.

If your BIOS supports it, you can see if the GPUs are recognized, what slots they populate and what the slot speed is (x4/x8/x16), what PCIE speeds they run on (PCIE 2.0/3.0/..).

If you can't even boot into BIOS with 2 GPUs, try swapping out one of the GPUs with a spare older one you may or may not have lying around.
A GT710 is a rather poor GPU, but a GT730 (with DDR3) is a good GPU for regular day to day activities, that only costs like $25-50 on the second hand market.
ID: 98649 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98653 - Posted: 19 May 2020, 17:38:11 UTC - in response to Message 98649.  
Last modified: 19 May 2020, 17:40:26 UTC

I've ran multiple RTX 2080Tis on PCIE 3.0 with cheap x16 risers for many years, and for the most part they run without problems.

But when I do run them, I make sure I get the 'double ribbon' x16 risers.
The ones where 2 ribbons are in parallel with one another.


That's my problem, my motherboards don't have many x16 slots.

If I were you, I'd try to install the GPUs without risers on the motherboard, just to see if the motherboard accepts it like that (to rule out any bad risers).


Can't. One computer only has one slot. The other has two, but they're only two units apart, so installing two cards means they actually touch each other - no chance of cooling, stupid motherboard design. I use ribbons instead of one lane USB risers wherever possible to eliminate extra problems.

You might also need to go into the Bios to see if the right speeds (PCIE 2.0 or 3.0) are selected, and if you need to enable a second GPU slot or not.
On some boards you first need to set the "Above 4g decoding" option, if your GPU has more than 4GB of VRAM.


Tried that 4G setting once, it stopped the whole thing booting. A google search shows it's quite problematic. Anyway I only have 3GB cards on the difficult machines. I have one 4GB card, but that's on this gaming machine which never goes wrong - a normal machine with a single graphics card plugged straight in.

Should I need the 4G option turned on if I have say three 3GB cards = a total of 9GB?

If your BIOS supports it, you can see if the GPUs are recognized, what slots they populate and what the slot speed is (x4/x8/x16), what PCIE speeds they run on (PCIE 2.0/3.0/..).


I don't think I've ever seen them detected in there.

If you can't even boot into BIOS with 2 GPUs, try swapping out one of the GPUs with a spare older one you may or may not have lying around.
A GT710 is a rather poor GPU, but a GT730 (with DDR3) is a good GPU for regular day to day activities, that only costs like $25-50 on the second hand market.


Booting is usually fine, it just crashes after anywhere from 5 minutes to 24 hours of Boinc running.
ID: 98653 · Report as offensive     Reply Quote
ProDigit

Send message
Joined: 8 Nov 19
Posts: 546
United States
Message 98654 - Posted: 19 May 2020, 18:01:29 UTC

Seems like your motherboard supports multi GPUs, so that's out of the equation.

You could try installing a USB riser if you can, or replace the ribbon riser with a powered ribbon riser.
Chances are the motherboard is either not capable of powering 2 GPUs through the PCIE slots, or barely.
Using a powered ribbon or USB riser might alleviate the situation.

Other causes might be:
a driver issue,
bad hardware (broken GPU),
bad overclocking settings,
...

The 4G BIOS option, is only if you have GPUs with higher than 4GB VRAM.
For 4GB and below (even if you run 10), you don't need to enable the option.

Older motherboard Bioses don't show much info on PCIE devices, like the blue background Amibios kind of Bios doesn't.
Though sometimes they will show PCIE slot 00:02.0 or 00:04.0 whatever is populated...
ID: 98654 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98664 - Posted: 19 May 2020, 19:26:34 UTC - in response to Message 98654.  
Last modified: 19 May 2020, 19:27:44 UTC

Seems like your motherboard supports multi GPUs, so that's out of the equation.

You could try installing a USB riser if you can,


I have USB risers. I own two good quality ribbons, about 10 (I got them in bulk from someone who used to use them for bitcoin) PCIE to USB cable to PCIE risers with about a 3 foot cable, and a couple of 4 way PCI to USB cards I can use instead of the single ones. Ribbons always work, but need enough 16 way slots on the motherboard. Single USB risers usually work, but if I use too many the computer gets upset. Quad risers used to work, but are increasingly not working. Something is either wearing out, or newer drivers are not so lenient.

or replace the ribbon riser with a powered ribbon riser.


Never had a problem with a non-powered ribbon, but they are decent ones. One is shielded, the other isn't, but has dual wires for each connection, so plenty of power. My problem is with the USB ones, which I have to use as there's not enough 16 way connectors on the motherboards, or they're too close together to fit the cards in. What idiot designed the motherboards so the graphics card slots are right next to each other? You need them three apart, the card is two wide, and you need at least one space for the fans to intake air.

Chances are the motherboard is either not capable of powering 2 GPUs through the PCIE slots, or barely.
Using a powered ribbon or USB riser might alleviate the situation.


I don't think it's power, as a non-powered ribbon works perfectly, and a powered USB riser does not. I think it's more to do with the motherboard not liking me using less lanes for the cards.

Other causes might be:
a driver issue,
bad hardware (broken GPU),
bad overclocking settings,


I do have one GPU that's annoying me more often, I'm currently swapping things around to see if I can isolate what the problem is. I don't overclock.

The 4G BIOS option, is only if you have GPUs with higher than 4GB VRAM.
For 4GB and below (even if you run 10), you don't need to enable the option.


Please explain. Surely if I have two 4GB cards, the address space goes up to 8GB?
ID: 98664 · Report as offensive     Reply Quote
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 704
United Kingdom
Message 98682 - Posted: 20 May 2020, 11:46:00 UTC

I do have one GPU that's annoying me more often, I'm currently swapping things around to see if I can isolate what the problem is. I don't overclock.

The 4G BIOS option, is only if you have GPUs with higher than 4GB VRAM.
For 4GB and below (even if you run 10), you don't need to enable the option.




Please explain. Surely if I have two 4GB cards, the address space goes up to 8GB?

The motherboard memory addressing by GPUs is not a simple xGb on GPU = xGb on motherboard - there are a whole pile of other things that get in the way, including, but not limited to: how the PCIe bus is mapped onto the physical memory at both ends of the link, driver mapping, data compression, what else is going on, GPU hardware & BIOS etc.
The whole subject is a real pain and is well outside the scope of BOINC to even consider managing - indeed as you suggest it is an issue that resides somewhere in the infernal triangle of: BIOSes, operating system and GPU drivers :-(
There is no simple solution. First make sure all the components work well "on their own" - for GPUs that means only one plugged in directly, checking each one in turn in a given slot without an extender in place; then check each one on its own on several extenders (this should weed out a few of the problematic extenders); now try each GPU in turn singly on a splitter, and so on. As you rightly imply, it takes a lot of time to go through all the combinations. Eventually you may end up with a working system with multiple GPUs connected to the motherboard via splitters and extenders.
Don't forget to have an adequate supply of coffee to hand....)
ID: 98682 · Report as offensive     Reply Quote
Peter Hucker
Avatar

Send message
Joined: 6 Oct 06
Posts: 609
United Kingdom
Message 98695 - Posted: 20 May 2020, 20:47:49 UTC - in response to Message 98682.  

The motherboard memory addressing by GPUs is not a simple xGb on GPU = xGb on motherboard - there are a whole pile of other things that get in the way, including, but not limited to: how the PCIe bus is mapped onto the physical memory at both ends of the link, driver mapping, data compression, what else is going on, GPU hardware & BIOS etc.
The whole subject is a real pain and is well outside the scope of BOINC to even consider managing - indeed as you suggest it is an issue that resides somewhere in the infernal triangle of: BIOSes, operating system and GPU drivers :-(
There is no simple solution. First make sure all the components work well "on their own" - for GPUs that means only one plugged in directly, checking each one in turn in a given slot without an extender in place; then check each one on its own on several extenders (this should weed out a few of the problematic extenders); now try each GPU in turn singly on a splitter, and so on. As you rightly imply, it takes a lot of time to go through all the combinations. Eventually you may end up with a working system with multiple GPUs connected to the motherboard via splitters and extenders.
Don't forget to have an adequate supply of coffee to hand....)


Indeed :-/

I just shifted 5 computers, mostly without cases, from my conservatory to my garage, to eliminate the possibility they were overheating (it was 30C in there when I woke up, plus direct sunlight). Including a bloody heavy bookshelf they live on. I am too old to be moving furniture.

Most of it works, there's just one GPU playing up. I swapped it with one in a different computer (both computers have one GPU on a single USB riser). Now everything is fine. Bloody hell. Compatibility problem? I don't know. I was trying to narrow down what was broken, and now nothing is broken. I guess some things don't like other things.
ID: 98695 · Report as offensive     Reply Quote

Message boards : GPUs : Multiple AMD GPUs = crash?

Copyright © 2020 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.