Only 1 out of 2 GPUs working

Message boards : Questions and problems : Only 1 out of 2 GPUs working
Message board moderation

To post messages, you must log in.

AuthorMessage
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 102470 - Posted: 7 Jan 2021, 17:21:37 UTC

You'd think by now I'd know what I'm doing,
However, nothing changed other than a system update.
I'm running a Celeron CPU, Ubuntu 18.04.5 LTS, 2x8GB of RAM, and 256GB SSD,
The system has an RTX2070 as main GPU, and an RTX2060 as secondary GPU (and Intel IGP as tertiary GPU).

Everything seems to work fine, except for Boinc doesn't use the 2060.
My CC config even states to use all gpus:

<cc_config>
  <log_flags>
    <task>1</task>
    <file_xfer>1</file_xfer>
    <sched_ops>1</sched_ops>
  </log_flags>
  <options>
   <use_all_gpus>1</use_all_gpus>
   <max_file_xfers>12</max_file_xfers>
   <max_file_xfers_per_project>8</max_file_xfers_per_project>
  </options>
</cc_config>


Not sure what else I can do?

I'm running projects Milkyway and Einstein on the GPUs.
It worked flawlessly before the update?
I even enabled all GPUs, made sure they're in the list.
I wished I could show you the log, but I'm on SSH, and don't know the location.
ID: 102470 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 102472 - Posted: 7 Jan 2021, 19:26:31 UTC

The only thing can think of is to try re-installing the driver for the one that isn't working. But then until recently I didn't have any GPUs that would work with BOINC and still only have one.
ID: 102472 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 102474 - Posted: 7 Jan 2021, 19:31:55 UTC - in response to Message 102472.  

They should be the same driver.......
But it might need to be configured to work with both GPUs, so re-installation might work. Do a "clean" installation, first remove the existing one, then try again - it should find both and do the correct configuration to cope with the differences between the two.
ID: 102474 · Report as offensive
Profile Keith Myers
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 17 Nov 16
Posts: 863
United States
Message 102475 - Posted: 7 Jan 2021, 20:38:02 UTC - in response to Message 102474.  

I've found that adding a new card to an already initialized system sometimes does not pick up the new card until I reinstall the drivers. Seen it on both Windows and Linux hosts.
ID: 102475 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 102476 - Posted: 7 Jan 2021, 20:50:51 UTC - in response to Message 102470.  

My CC config even states to use all gpus:
Did you verify that had been pickup up and acknowledged in the event log?

30-Dec-2020 08:56:30 [---] Config: use all coprocessors
ID: 102476 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 102478 - Posted: 8 Jan 2021, 2:25:01 UTC - in response to Message 102476.  
Last modified: 8 Jan 2021, 2:31:47 UTC

yes, it sees 2 gpus, it just doesn't use the second one.
before the 18.04.5 update, it ran both just fine.

I did sudo nividia-xconfig --enable-all-gpus
and did coolbits after that, confirming both GPUs have the fan slider and OC enabled.
Both are shown in LSPCI, both are shown in Nvidia-smi, both show up as OpenCL GPUs in the log too.

Here's part of the BoincTUI log from SSH; the best I could get a hold of any sort of log:

│Starting BOINC client version 7.16.14 for x86_64-pc-linux-gnu                                                                                                                                                          user total6372754779│
│log flags: file_xfer, sched_ops, task                                                                                                                                                                                 ▒host total 368101842│
│Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3                                                                                           ▒user avg      845340│
│Data directory: /var/lib/boinc-client                                                                                                                                                                                 ▒host avg       82286│
│CUDA: NVIDIA GPU 0: GeForce RTX 2070 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 4096MB, 3968MB available, 7465 GFLOPS peak)                                                                   ▒7 Jan today         │
│CUDA: NVIDIA GPU 1: GeForce RTX 2060 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 4096MB, 3970MB available, 6451 GFLOPS peak)                                                                   ▒├─>user       139352│
│OpenCL: NVIDIA GPU 0: GeForce RTX 2070 (driver version 440.64, device version OpenCL 1.2 CUDA, 7982MB, 3968MB available, 7465 GFLOPS peak)                                                                            ▒└─>host        57485│
│OpenCL: NVIDIA GPU 1: GeForce RTX 2060 (driver version 440.64, device version OpenCL 1.2 CUDA, 5935MB, 3970MB available, 6451 GFLOPS peak)                                                                            ▒                    │
│OpenCL: Intel GPU 0: Intel(R) Gen9 HD Graphics NEO (driver version 20.08.15750, device version OpenCL 2.1 NEO, 6288MB, 6288MB available, 202 GFLOPS peak)                                                             ▒collatz             │
│libc: Ubuntu GLIBC 2.27-3ubuntu1.4 version 2.27                                                                                                                                                                       ▒user total6203592199│
│Host name: Port2                                                                                                                                                                                                      ▒host total 352364402│
│Processor: 4 GenuineIntel Intel(R) Pentium(R) Gold G5600 CPU @ 3.90GHz [Family 6 Model 158 Stepping 11]                                                                                                               ▒7 Jan today         │
│Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_go▒├─>user       110376│
│OS: Linux Ubuntu: Ubuntu 18.04.5 LTS [4.15.0-129-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1.4)]                                                                                                                    ▒└─>host        35698│
│Memory: 7.68 GB physical, 2.00 GB virtual                                                                                                                                                                             ▒Milkyway@Home       │
│Disk: 54.28 GB total, 36.75 GB free                                                                                                                                                                                   ▒user total  24683751│
│Local time is UTC -5 hours                                                                                                                                                                                            ▒host total   2812582│
│VirtualBox version: 5.2.42_Ubuntur137960



It finds gflops ratings on all GPUs,
I even stopped all CPU data crunching for a few minutes, to just let the GPUs crunch, but after at least 6 hours, the second GPU still hasn't kicked in.
ID: 102478 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 102481 - Posted: 8 Jan 2021, 13:58:30 UTC

reinstalled using 'sudo apt install --reinstall boinc-client, but no luck :(

Any other suggestions on how to test if the second GPU is working?
ID: 102481 · Report as offensive
Les Bayliss
Help desk expert

Send message
Joined: 25 Nov 05
Posts: 1654
Australia
Message 102482 - Posted: 8 Jan 2021, 15:07:26 UTC - in response to Message 102481.  

Pull out the other card, and just run it by itself.
Also, try swapping the 2 cards around in the slots.
ID: 102482 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 102483 - Posted: 8 Jan 2021, 17:06:50 UTC - in response to Message 102478.  
Last modified: 8 Jan 2021, 17:08:28 UTC

yes, it sees 2 gpus

It's not what Richard asked and your log doesn't show what was asked, but maybe your log got deprecated before it was shown.
Is there mention of this in the log?

08/01/2021 18:04:42 |  | Config: use all coprocessors
Because if there isn't... your <use_all_gpus> option isn't used in cc_config.xml
ID: 102483 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5077
United Kingdom
Message 102484 - Posted: 8 Jan 2021, 17:20:06 UTC - in response to Message 102481.  

reinstalled using 'sudo apt install --reinstall boinc-client, but no luck :(
And reinstalling BOINC wasn't what Keith suggested, either.

He said reinstall the drivers - and that means the entire driver suite, not just the fan controller.
ID: 102484 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 102486 - Posted: 8 Jan 2021, 17:49:45 UTC - in response to Message 102484.  

...and probably be proceeded by removing the drivers. Probably including one or more re-boots of the computer as the removal takes place.
Yes, it takes time, and can some rather alarming visual effects during the process, including warnings about some features not being available. Fear not- they will come back once the drivers are re-instated.

HOWEVER - it would appear that you now have BOINC recognising both GPUs - In a GUI environment it would be a simple process to see what is being run using BOINC manager's "advanced" view where one can see exactly what tasks are running, what processor(s) is in use. I think BOINCTasks will allow a similar view across a network, but I don't use that I can't make any recommendation either way on that. There is a way of doing it using the command line, but I'm sorry I can't remember what it is - others may well be able to tell you.
ID: 102486 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 102493 - Posted: 9 Jan 2021, 14:29:09 UTC - in response to Message 102484.  

reinstalled using 'sudo apt install --reinstall boinc-client, but no luck :(
And reinstalling BOINC wasn't what Keith suggested, either.

He said reinstall the drivers - and that means the entire driver suite, not just the fan controller.

Sorry, neglected to mention that I had done that.

So after all this, I came home this morning, and it was working...
After like running for 24+ hours on one GPU, now both are working... :/

No idea why...
Not like an RTX2060 is that odd of a GPU, that there wouldn't be any work for it from 2 big projects (Einstein and Milkyway) for over 24 hours?
ID: 102493 · Report as offensive
Profile Dave
Help desk expert

Send message
Joined: 28 Jun 10
Posts: 2515
United Kingdom
Message 102495 - Posted: 9 Jan 2021, 14:42:58 UTC - in response to Message 102493.  

No idea why...
Not like an RTX2060 is that odd of a GPU, that there wouldn't be any work for it from 2 big projects (Einstein and Milkyway) for over 24 hours?

Don't know what it is like on those projects but on CPDN there is a page available to the moderators where they can look at I think it is the last 50 batches released that would let them work that out. For whatever reason, they decided not to have it in public view for crunchers. May be worth asking on their forums. The server status page tells you what is available at any one time but anyone not familiar with the model types would have to look a bit harder to find out which tasks work with which operating systems and that ARM processors such as that used in the Pi series of computers are not supported.
ID: 102495 · Report as offensive
robsmith
Volunteer tester
Help desk expert

Send message
Joined: 25 May 09
Posts: 1283
United Kingdom
Message 102496 - Posted: 9 Jan 2021, 14:47:29 UTC - in response to Message 102493.  

Not like an RTX2060 is that odd of a GPU, that there wouldn't be any work for it from 2 big projects (Einstein and Milkyway) for over 24 hours?


Unless a project has specifically set a performance limit that excludes the RTX2060 then there's no difference between it and any other member of the RTX 2xxx family of GPUs.

BOINC will preferentially use what it perceives to be the most powerful GPU when presented with two of different performance capability, and after being told to use all GPUs it can take some hours for it to start to use the lower power one of a pair to get work. So, as with many things to do with BOINC waiting a day or so before panicking or more fiddling can pay off.
ID: 102496 · Report as offensive
ProDigit

Send message
Joined: 8 Nov 19
Posts: 718
United States
Message 102516 - Posted: 10 Jan 2021, 22:57:49 UTC - in response to Message 102496.  

Not like an RTX2060 is that odd of a GPU, that there wouldn't be any work for it from 2 big projects (Einstein and Milkyway) for over 24 hours?


Unless a project has specifically set a performance limit that excludes the RTX2060 then there's no difference between it and any other member of the RTX 2xxx family of GPUs.

BOINC will preferentially use what it perceives to be the most powerful GPU when presented with two of different performance capability, and after being told to use all GPUs it can take some hours for it to start to use the lower power one of a pair to get work. So, as with many things to do with BOINC waiting a day or so before panicking or more fiddling can pay off.


Yup.
Prior it would take a few minutes to an hour most.
Apparently now it can take a day or two...
ID: 102516 · Report as offensive
jp

Send message
Joined: 15 Dec 19
Posts: 13
Canada
Message 103380 - Posted: 2 Mar 2021, 22:56:46 UTC - in response to Message 102470.  

I had the same problem with 2 of my pc's.One i3-2120 with a gtx 1070 and a gtx 1050ti on a riser card.
And a i3-2120 with a gtx 1660 and a gtx 1060 on a riser card. I find that running only GPU work from
einstein@home and CPU work from lhc@home solved my problem. Einstein kept incresing the number
of threads it was using. I caught it using 5 threads on a 4 thread cpu. The setup has worked fine so far.
Try it, good luck.
JPB
ID: 103380 · Report as offensive

Message boards : Questions and problems : Only 1 out of 2 GPUs working

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.