Thread 'Pretty happy about my current dual GPU Linux setup!'

Message boards : GPUs : Pretty happy about my current dual GPU Linux setup!
Message board moderation

To post messages, you must log in.

AuthorMessage
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 107167 - Posted: 22 Feb 2022, 23:19:00 UTC

I've noticed that my little portable PC, equipped with Dual RTX GPUs, has been running pretty stable under linux lately!
I've upgraded to the latest version of Lubuntu 18.04.06

While no longer officially supported, this is the last version that I could run with Dual GPUs without any issues.
Newer Linux versions gave a lot more problems trying to get 2 GPUs to run in CUDA/OpenCL.

Though, if you're running a single 3080 or 3090 from a cheap dual core CPU like a Celeron, you probably could outperform two older GPUs in terms of performance and/or PPD.
Thus, in most scenarios, you won't need to run 2 GPUs, and a newer linux is fine.

I know Intel CPUs usually run more stable than AMD CPUs, though I was used to seeing at least once a week my system down, or in need of a reboot.

Still, I've been amazed how my 3 year old Celeron, with RTX 2070 and 2060 GPUs, has been running for weeks without any downtime!
ID: 107167 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 883
United States
Message 107168 - Posted: 23 Feb 2022, 1:08:31 UTC - in response to Message 107167.  

Hmmm, I have no issues running 3 or more gpus in more current Ubuntu distros. I have a teammate who runs 6 or 7 gpus for example on some of his hosts.
Run both OpenCL and CUDA applications at the same time too.
ID: 107168 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 107173 - Posted: 23 Feb 2022, 21:41:45 UTC

I think they had issues with the GUI not working, etc...
I have to admit that it's been a year or two since I've looked into it.
Maybe over time bugs got fixed...
ID: 107173 · Report as offensive
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 228
United States
Message 107184 - Posted: 24 Feb 2022, 18:12:02 UTC - in response to Message 107173.  
Last modified: 24 Feb 2022, 18:38:48 UTC

I have two systems running right now with 7 GPUs and 5 GPUs respectively.

I previously had another system with 8 GPUs. and second 7-GPU setup that I recently took offline. at one point I even had a 10-GPU setup.

I've never had any GUI issues. using nvidia GPUs and their drivers it all pretty much just works. there's a bug with the nvidia driver and auto-login, which is easily worked around with a quick edit to the grub kernel commands, but that doesn't necessarily have anything to do with multi-GPU.

I always used the vanilla Ubuntu flavor. used 18.04 in the past, but I updated to 20.04 pretty much right when it released and have been on it ever since. I've had high GPU count systems for like 3-4 years now, and never a problem running multiple GPUs.

All of my systems are on AMD EPYC platforms now . but I have used Intel C602 based systems (E5-2600v2 series), Intel Z270 (i7-7700k), AMD x570 (with both ryzen 3000 and 5000), and Intel x79 systems in the past. I know keith and several others have also used other AMD based systems like X470 and X399/TR and Intel X99 with success. really there's not many configurations that dont work for multi-GPU.
ID: 107184 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 107211 - Posted: 27 Feb 2022, 2:30:50 UTC - in response to Message 107184.  
Last modified: 27 Feb 2022, 2:32:29 UTC

I have two systems running right now with 7 GPUs and 5 GPUs respectively.

I previously had another system with 8 GPUs. and second 7-GPU setup that I recently took offline. at one point I even had a 10-GPU setup.

I've never had any GUI issues. using nvidia GPUs and their drivers it all pretty much just works. there's a bug with the nvidia driver and auto-login, which is easily worked around with a quick edit to the grub kernel commands, but that doesn't necessarily have anything to do with multi-GPU.

I always used the vanilla Ubuntu flavor. used 18.04 in the past, but I updated to 20.04 pretty much right when it released and have been on it ever since. I've had high GPU count systems for like 3-4 years now, and never a problem running multiple GPUs.

All of my systems are on AMD EPYC platforms now . but I have used Intel C602 based systems (E5-2600v2 series), Intel Z270 (i7-7700k), AMD x570 (with both ryzen 3000 and 5000), and Intel x79 systems in the past. I know keith and several others have also used other AMD based systems like X470 and X399/TR and Intel X99 with success. really there's not many configurations that dont work for multi-GPU.


The issue is when you try to enable cool-bits for overclocking and lowering TDP.
That's when the desktop on 20.04 started crashing.
18.04 does it with issues, I mean, the desktop still has issues. But at least I can fold now while setting TDP and GPU core clock values, as well as manually adjust fan speeds, etc...

If you just run it stock, you won't have any issues running multi units.
Perhaps if you run it without a GUI, just the headless (terminal) version, it might work fine, but I need the auto network configuration that comes with the GUI.
Don't want to manually have to install and enable the networks. Too many frustrating experiences with that.
ID: 107211 · Report as offensive
Ian&Steve C.

Send message
Joined: 24 Dec 19
Posts: 228
United States
Message 107212 - Posted: 27 Feb 2022, 3:34:40 UTC - in response to Message 107211.  
Last modified: 27 Feb 2022, 3:36:52 UTC

All of my systems have and always have had coolbits enabled and are overclocked with custom fan speeds and power limits. That’s not the issue. I also run the normal desktop version of Ubuntu. Not headless.

What’s likely happening is that the PCI bus IDs in your xorg.config file are getting swapped around, and a different GPU is getting set to be the one running the monitor. This is a very common problem. All you have to do is go back into the xorg file and edit it to have the proper GPU running the monitor.
ID: 107212 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 107224 - Posted: 28 Feb 2022, 17:48:09 UTC - in response to Message 107212.  

All of my systems have and always have had coolbits enabled and are overclocked with custom fan speeds and power limits. That’s not the issue. I also run the normal desktop version of Ubuntu. Not headless.

What’s likely happening is that the PCI bus IDs in your xorg.config file are getting swapped around, and a different GPU is getting set to be the one running the monitor. This is a very common problem. All you have to do is go back into the xorg file and edit it to have the proper GPU running the monitor.


I guess whenever I happen to have a spare minute, which is more rare than spotting a unicorn, I'll have to take a look.
ID: 107224 · Report as offensive
ProfileKeith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 883
United States
Message 107226 - Posted: 28 Feb 2022, 21:04:18 UTC - in response to Message 107224.  
Last modified: 28 Feb 2022, 21:07:16 UTC

Easy enough to diagnose also. After you enable coolbits and reboot and you end up with no video output, simply unplug the monitor cable from the original card and move it to the other cards in the host until you get your Desktop displayed. Now you know which card has had its BusID swapped with the originally connected card.

To prevent that from happening in the first place, just print out the /etc/X11/xorg.conf file before you enable coolbits. Then look at the file after coolbits is installed and notice the busID changes.

Edit the file to reswap the busID's back to their original locations and save the file.

Now you can reboot the host to pick up the coolbits enablement and you won't have any loss of your desktop.
ID: 107226 · Report as offensive

Message boards : GPUs : Pretty happy about my current dual GPU Linux setup!

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.