Fedora 25 GPUs sometimes there but reported missing

Message boards : GPUs : Fedora 25 GPUs sometimes there but reported missing
Message board moderation

To post messages, you must log in.

AuthorMessage
mike8347569357

Send message
Joined: 14 Jun 11
Posts: 30
United Kingdom
Message 76403 - Posted: 14 Mar 2017, 12:25:20 UTC

I am in the process of setting up a new PC running Fedora 25. When the system boots, it reports the Nvidia drivers there, but no usable GPUs. e.g.

    14-Mar-2017 11:50:23 [---] Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu
    14-Mar-2017 11:50:23 [---] log flags: file_xfer, sched_ops, task, coproc_debug
    14-Mar-2017 11:50:23 [---] Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0
    14-Mar-2017 11:50:23 [---] Running as a daemon
    14-Mar-2017 11:50:23 [---] Data directory: /var/lib/boinc
    14-Mar-2017 11:50:23 [---] [coproc] launching child process at /usr/bin/boinc_client
    14-Mar-2017 11:50:23 [---] [coproc] relative to directory /var/lib/boinc
    14-Mar-2017 11:50:23 [---] [coproc] with data directory /var/lib/boinc
    14-Mar-2017 11:50:23 [---] [coproc] NVIDIA drivers present but no GPUs found
    14-Mar-2017 11:50:23 [---] [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory
    14-Mar-2017 11:50:23 [---] [coproc] clGetPlatformIDs() failed to return any OpenCL platforms
    14-Mar-2017 11:50:23 [---] No usable GPUs found
    14-Mar-2017 11:50:23 [---] app version refers to missing GPU type NVIDIA
    14-Mar-2017 11:50:23 [Einstein@Home] Application uses missing NVIDIA GPU
    14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_941250_1
    14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_770570_1
    14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_962585_0
    14-Mar-2017 11:50:23 [Einstein@Home] Missing coprocessor for task LATeah0017L_644.0_0_0.0_769315_1
    14-Mar-2017 11:50:23 [---] Host name: modron
    14-Mar-2017 11:50:23 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9]
    14-Mar-2017 11:50:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
    14-Mar-2017 11:50:23 [---] OS: Linux: 4.9.13-201.fc25.x86_64
    14-Mar-2017 11:50:23 [---] Memory: 15.63 GB physical, 14.90 GB virtual
    14-Mar-2017 11:50:23 [---] Disk: 205.02 GB total, 151.14 GB free
    14-Mar-2017 11:50:23 [---] Local time is UTC +0 hours
    14-Mar-2017 11:50:23 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12504473; resource share 100
    14-Mar-2017 11:50:23 [---] General prefs: from http://setiathome.berkeley.edu/ (last modified 07-Nov-2012 15:55:30)
    14-Mar-2017 11:50:23 [---] Host location: none
    14-Mar-2017 11:50:23 [---] General prefs: using your defaults
    14-Mar-2017 11:50:23 [---] Reading preferences override file
    14-Mar-2017 11:50:23 [---] Preferences:
    14-Mar-2017 11:50:23 [---] max memory usage when active: 8000.31MB
    14-Mar-2017 11:50:23 [---] max memory usage when idle: 14400.56MB
    14-Mar-2017 11:50:23 [---] max disk usage: 151.26GB
    14-Mar-2017 11:50:23 [---] max CPUs used: 2
    14-Mar-2017 11:50:23 [---] suspend work if non-BOINC CPU load exceeds 50%
    14-Mar-2017 11:50:23 [---] (to change preferences, visit a project web site or select Preferences in the Manager)



So then I do a systemctl restart boinc-client, and I get ...


    14-Mar-2017 11:52:24 [---] Received signal 15
    14-Mar-2017 11:52:24 [---] Exiting
    14-Mar-2017 11:52:30 [---] Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu
    14-Mar-2017 11:52:30 [---] log flags: file_xfer, sched_ops, task, coproc_debug
    14-Mar-2017 11:52:30 [---] Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0
    14-Mar-2017 11:52:30 [---] Running as a daemon
    14-Mar-2017 11:52:30 [---] Data directory: /var/lib/boinc
    14-Mar-2017 11:52:30 [---] [coproc] launching child process at /usr/bin/boinc_client
    14-Mar-2017 11:52:30 [---] [coproc] relative to directory /var/lib/boinc
    14-Mar-2017 11:52:30 [---] [coproc] with data directory /var/lib/boinc
    14-Mar-2017 11:52:30 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 378.13, device version OpenCL 1.2 CUDA, 1996MB, 1996MB available, 495 GFLOPS peak)
    14-Mar-2017 11:52:30 [---] [coproc] NVIDIA drivers present but no GPUs found
    14-Mar-2017 11:52:30 [---] [coproc] ATI: libaticalrt.so: cannot open shared object file: No such file or directory
    14-Mar-2017 11:52:30 [---] Host name: modron
    14-Mar-2017 11:52:30 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9]
    14-Mar-2017 11:52:30 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
    14-Mar-2017 11:52:30 [---] OS: Linux: 4.9.13-201.fc25.x86_64
    14-Mar-2017 11:52:30 [---] Memory: 15.63 GB physical, 14.90 GB virtual
    14-Mar-2017 11:52:30 [---] Disk: 205.02 GB total, 151.14 GB free
    14-Mar-2017 11:52:30 [---] Local time is UTC +0 hours
    14-Mar-2017 11:52:30 [Einstein@Home] URL http://einstein.phys.uwm.edu/; Computer ID 12504473; resource share 100
    14-Mar-2017 11:52:30 [---] General prefs: from http://setiathome.berkeley.edu/ (last modified 07-Nov-2012 15:55:30)
    14-Mar-2017 11:52:30 [---] Host location: none
    14-Mar-2017 11:52:30 [---] General prefs: using your defaults
    14-Mar-2017 11:52:30 [---] Reading preferences override file
    14-Mar-2017 11:52:30 [---] Preferences:
    14-Mar-2017 11:52:30 [---] max memory usage when active: 8000.31MB
    14-Mar-2017 11:52:30 [---] max memory usage when idle: 14400.56MB
    14-Mar-2017 11:52:30 [---] max disk usage: 151.26GB
    14-Mar-2017 11:52:30 [---] max CPUs used: 2
    14-Mar-2017 11:52:30 [---] suspend work if non-BOINC CPU load exceeds 50%
    14-Mar-2017 11:52:30 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
    14-Mar-2017 11:52:30 [Einstein@Home] [coproc] Assigning NVIDIA instance 0 to LATeah0017L_644.0_0_0.0_962585_0
    [mike@modron ~]$



Note that it still says "14-Mar-2017 11:52:30 [---] [coproc] NVIDIA drivers present but no GPUs found", but the GPU is now crunching away.

Can someone explain this ?

ID: 76403 · Report as offensive
Richard Haselgrove
Volunteer moderator
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 2723
United Kingdom
Message 76409 - Posted: 14 Mar 2017, 12:48:36 UTC - in response to Message 76403.  

The operating system is starting BOINC too early in the initialisation process, before all components are fully available for use.

There have been posts over the years with guidance on changing the boot sequence, but I'll leave it to the Linux specialists to pick out one appropriate to Fedora 25.
ID: 76409 · Report as offensive
Profile Agentb
Help desk expert
Avatar

Send message
Joined: 30 May 15
Posts: 265
United Kingdom
Message 76448 - Posted: 14 Mar 2017, 18:55:45 UTC
Last modified: 14 Mar 2017, 19:02:44 UTC

I'm not well versed in Fedora but searching for Fedora in the advanced search reveals GPU CUDA issues on Fedora 23 with NVIDIA GTX 970

If the same issue, then it looks like you have not installed the cuda libraries, or boinc client cannot find them, but you are crunching using the OpenCL libraries.

There are likely to be several other issues (which explains the no detection at start) these may be addressed with permissions settings and/or adding a delay in the start script.

Upgrading boinc to something more recent (7.6.31 or later) which has better GPU detection, may also help.
ID: 76448 · Report as offensive
Richard Haselgrove
Volunteer moderator
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 2723
United Kingdom
Message 76449 - Posted: 14 Mar 2017, 19:02:12 UTC - in response to Message 76448.  
Last modified: 14 Mar 2017, 19:06:06 UTC

But look at the second log, after the BOINC restart - BOINC detects the GPU, and even starts running an Einstein task.

It's purely a timing thing - the drivers haven't initialised before the first attempt, so BOINC gets no response when it queries them. But later, everything is ready, and works as planned.

Edit - but it might be worth a try to install the newer CUDA drivers and a later version of BOINC to iron out some of those remaining warning messages.
ID: 76449 · Report as offensive
Profile Agentb
Help desk expert
Avatar

Send message
Joined: 30 May 15
Posts: 265
United Kingdom
Message 76451 - Posted: 14 Mar 2017, 19:17:17 UTC - in response to Message 76449.  

But look at the second log, after the BOINC restart - BOINC detects the GPU, and even starts running an Einstein task.


OP asked why no CUDA tasks.

I don't disagree there is probably a timing issue here, however the first log is likely to have been done by systemd, the second later in x-windows client session with sudo privs.

To "detect the GPU" the boinc client forks a copy of itself and makes makes several dlopen calls.

OpenCL and CUDA libraries are different and are often installed separately (depends on the tribe).

Older versions of boinc client make old assumptions where these libraries are found, and so these calls fail.

hth
ID: 76451 · Report as offensive
mike8347569357

Send message
Joined: 14 Jun 11
Posts: 30
United Kingdom
Message 76471 - Posted: 15 Mar 2017, 16:34:53 UTC - in response to Message 76451.  

Thanks for the replies

CUDA is not part of the standard Red Hat Fedora repositories. Looking back at some old PCs, it appears that CUDA was installed from rpmfusion,

I don't have much time to investigate more at the moment, so for now I'll simply restart the boinc-client service when I log on.
ID: 76471 · Report as offensive
mike8347569357

Send message
Joined: 14 Jun 11
Posts: 30
United Kingdom
Message 76925 - Posted: 27 Mar 2017, 11:59:10 UTC - in response to Message 76471.  
Last modified: 27 Mar 2017, 12:00:58 UTC

As a footnote to this thread, I just need to say I worked out what was going on.

It seems that Selinux was the problem.

To recap. Linux version 4.9.14-200.fc25.x86_64 (Fedora 25 distribution). Nvidia driver 375.39. BOINC client version 7.6.22 for x86_64-pc-linux-gnu, using the standard Fedora repositories, and the Nvidia driver being the only code not from the Fedora repositories.

From what I can remember, I cleared the problems with boinc-client first with ...

ausearch -c 'boinc_client' --raw | audit2allow -M my-boincclient
semodule -i my-boincclient.pp

I then had to set the Selinux permissions with "restorecon -v /dev/nvidia-uvm" and then restart boinc-client.

That gave me a whole new set of abuse from Selinux about nvidia-modprobe which I cleared in the standard way with...

ausearch -c 'nvidia-modprobe' --raw | audit2allow -M my-nvidiamodprobe
semodule -i my-nvidiamodprobe.pp

So that cleared up all the Selinux problems, but then I discovered I had to add an extra line to the boinc-client.service file to load the nvidia-uvm module, i.e.

ExecStartPre=/sbin/modprobe nvidia-uvm

Now the system boots and starts running my GPU stuff without any (more) fiddling.

Mon 27 Mar 2017 10:26:35 BST | | Starting BOINC client version 7.6.22 for x86_64-pc-linux-gnu
Mon 27 Mar 2017 10:26:35 BST | | log flags: file_xfer, sched_ops, task
Mon 27 Mar 2017 10:26:35 BST | | Libraries: libcurl/7.51.0 NSS/3.27 zlib/1.2.8 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.11) libssh2/1.8.0 nghttp2/1.13.0
Mon 27 Mar 2017 10:26:35 BST | | Running as a daemon
Mon 27 Mar 2017 10:26:35 BST | | Data directory: /var/lib/boinc
Mon 27 Mar 2017 10:26:36 BST | | CUDA: NVIDIA GPU 0: GeForce GTX 660 (driver version 375.39, CUDA version 8.0, compute capability 3.0, 1996MB, 1970MB available, 1982 GFLOPS peak)
Mon 27 Mar 2017 10:26:36 BST | | OpenCL: NVIDIA GPU 0: GeForce GTX 660 (driver version 375.39, device version OpenCL 1.2 CUDA, 1996MB, 1970MB available, 1982 GFLOPS peak)
Mon 27 Mar 2017 10:26:36 BST | | Host name: modron
Mon 27 Mar 2017 10:26:36 BST | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz [Family 6 Model 158 Stepping 9]
Mon 27 Mar 2017 10:26:36 BST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
Mon 27 Mar 2017 10:26:36 BST | | OS: Linux: 4.9.14-200.fc25.x86_64
Mon 27 Mar 2017 10:26:36 BST | | Memory: 15.63 GB physical, 14.90 GB virtual
Mon 27 Mar 2017 10:26:36 BST | | Disk: 205.02 GB total, 150.26 GB free


I hope this information is of use to someone.
ID: 76925 · Report as offensive
Germano

Send message
Joined: 21 May 16
Posts: 26
Italy
Message 78685 - Posted: 6 Jun 2017, 14:55:49 UTC
Last modified: 6 Jun 2017, 14:59:11 UTC

Hi there, I am the BOINC co-maintainer for Fedora / RHEL / CentOS.
I found out your topic because I just managed to fix GPU detection issues on AMD Radeon, and I wanted to test it on nVidia too. By the way you seem to have fixed your troubles on your own (CUDA), in a different way than I should do to enable OpenCL. I haven't yet pushed an update, I am still doing some polishing work, and I would need your help to get the best results (I do not have a machine with a nVidia card installed)
Could you please provide output of:
# dnf list installed | grep nvidia

# dnf list installed | grep boinc

# lsmod | grep nvidia

Thank you
ID: 78685 · Report as offensive

Message boards : GPUs : Fedora 25 GPUs sometimes there but reported missing

Copyright © 2018 University of California. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.