Another GPU detection problem with AMD - ubuntu 14.04 - SIGSEGV

Message boards : GPUs : Another GPU detection problem with AMD - ubuntu 14.04 - SIGSEGV
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Agentb
Avatar

Send message
Joined: 30 May 15
Posts: 265
United Kingdom
Message 68380 - Posted: 19 Mar 2016, 11:04:36 UTC

I had noticed GPU detection would stop after an X or display manager (lightdm) crash.

Restarting lightdm would restore a screen - but boinc would no loger recognize the GPU.

Normally i just restart and all is well, i lock my X session screen rather than logout and all is well. I thought i'd dig a bit into the problem.

Enabling the co_proc debug (and restarting in the normal fashion)
sudo service boinc-client restart

I see

Sat 19 Mar 2016 09:35:51 GMT | | [coproc] calInit() returned 1
Sat 19 Mar 2016 09:35:51 GMT | | [coproc] Caught SIGSEGV in OpenCL detection
Sat 19 Mar 2016 09:35:51 GMT | | No usable GPUs found

and

coproc_info.xml shows
    <coprocs>
<warning>NVIDIA: libcuda.so: cannot open shared object file: No such file or dir
ectory</warning>
<warning>calInit() returned 1</warning>
<warning>Caught SIGSEGV in OpenCL detection</warning>
    </coprocs>


I know - see detect gpus - boinc forks a copy with (undocemented) option "--detect_gpus" and looking at the source code, part of this is to generate the coproc_info.xml file.

If I run the gpu detection like this

sudo -u boinc boinc --detect_gpus

the coproc.xml is created well.

If i run boinc straight from the command line to see the on screen output

19-Mar-2016 10:39:17 [---] GPU detection failed. error code 512
19-Mar-2016 10:39:17 [---] [coproc] read_coproc_info_file() returned error -108
19-Mar-2016 10:39:17 [---] No usable GPUs found

It now does not show a SIGSEGV but new errors

In the meantime I'm just going to restart, but i can recreate the problem and i suspect it has to do with fgrlx.

Any thoughts?
ID: 68380 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 68393 - Posted: 19 Mar 2016, 18:48:23 UTC - in response to Message 68380.  

BOINC error -108 can means:

1. "cannot find the file or the directory it's in", because the file or directory is hidden.
2. BOINC finds that the file is open and in use by another process, or BOINC cannot write to the file.
3. The file being written to is still locked due to an earlier abnormally terminated reading or writing process.
4. Permission problems. BOINC is running as a different user than the one that installed it/has permission to write to files in the directory.

Solutions are:
1. not to hide files/directories.
2. exit & restart BOINC.
3. restart computer.
4. run as the user with full permissions, or adjust the directory/file permissions that this user can write to them.
ID: 68393 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 68403 - Posted: 19 Mar 2016, 22:01:52 UTC - in response to Message 68380.  

As long as we can't see the stacktrace blaming fglrx is fine by me :P

It is quite possible you have cause and effect the wrong way round. fglrx may have corrupted itself and that crashed X or lightdm and that corruption also broke OpenCL.
ID: 68403 · Report as offensive

Message boards : GPUs : Another GPU detection problem with AMD - ubuntu 14.04 - SIGSEGV

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.