Cuda Computation error caused by unknown Nvidia driver version

Message boards : Questions and problems : Cuda Computation error caused by unknown Nvidia driver version
Message board moderation

To post messages, you must log in.

AuthorMessage
tyf

Send message
Joined: 9 Mar 13
Posts: 4
Message 48123 - Posted: 9 Mar 2013, 15:04:27 UTC

Hi all, I run BOINC 7.0.27 on ubuntu 12.04, using a laptop with Nvidia optimus technology. I run World Community Grid project.
Naturally , I tried running the BOINC through Bumblebee using 'sudo optirun boinc'. My Graphics card is detected successfully, as shown in the event log:

 NVIDIA GPU 0: GeForce GT 630M (driver version unknown, CUDA version 4.20, compute capability 2.1, 2048MB, 2017MB available, 182 GFLOPS peak)


However, as I tried to get new tasks for the GPU and run it, computation error occured. From the result log, I saw that the computation refused to start because the driver version is unknown:

<core_client_version>7.0.27</core_client_version>
<![CDATA[
<message>
process exited with code 232 (0xe8, -24)
</message>
<stderr_txt>
../../projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.08_i686-pc-linux-gnu__nvidia_hcc1: /usr/lib/nvidia-current/libOpenCL.so.1: no version information available (required by ../../projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.08_i686-pc-linux-gnu__nvidia_hcc1)
Commandline: ../../projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.08_i686-pc-linux-gnu__nvidia_hcc1 --zipfile X0960122690753201010010943.zip --imagelist images.txt --device 0
<app_init_data>
<major_version>7</major_version>
<minor_version>0</minor_version>
<release>27</release>
<app_version>708</app_version>
<app_name>hcc1</app_name>
<project_preferences>


I installed nvidia-current driver, version 295.33.
What should I do to help the BOINC client to recognize the driver version?

ID: 48123 · Report as offensive
SekeRob2

Send message
Joined: 6 Jul 10
Posts: 585
Italy
Message 48126 - Posted: 9 Mar 2013, 16:57:20 UTC - in response to Message 48123.  
Last modified: 9 Mar 2013, 16:58:06 UTC

This error has been reported at WCG in 3 posts: http://www.worldcommunitygrid.org/forums/wcg/search?offset=0&key=process+AND+exited+with+AND+code+AND+232+%280xe8%2C+-24%29&member=&scopeinpost=3&forum=0&date=0&beforeafter=1&minattach=0&sort=1&rows=20 At least at WCG noone posted a resolution, and given it's encountered on NVidia & ATI, seemingly limited to the Linux platform, hard to crack.

Is the GT630M a card in a mobile? For sure the GT630 is blacklisted at WCG i/e/ nothing you can do.
https://secure.worldcommunitygrid.org/help/viewTopic.do?shortName=GPU#610
Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 48126 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5082
United Kingdom
Message 48127 - Posted: 9 Mar 2013, 19:26:55 UTC
Last modified: 9 Mar 2013, 20:24:47 UTC

We came across a similar problem with Windows versions of BOINC in August 2011:

http://boinc.berkeley.edu/trac/changeset/32f4f68614a9e740423d38d33c0172703b9a6ed4/boinc

One of the discovery machines was my Optimus 420M laptop: it has dual graphics adapters, a low-power Intel chip for extended battery life, and a high-power NVidia chip for demanding applications.

The code as originally written in coproc_detect.cpp checked the first graphics adapter, found it couldn't get an NVidia driver version from it, and didn't check any further - it broke out of the test and returned "driver version unknown". We later heard that desktop computers whith multiple graphics cards, and no monitor attched to the NVidia card, were affected the same way, and cured by that fix.

For Windows machines, and the projects we were aware of at that time, the missing driver version was a cosmetic irritation only, and didn't affect compuation - evidently WCG is more touchy.

Going by dates alone, Windows versions from 6.12.37 onwards, and all versions in the 7.0.xx line, should be fixed - I can't help with Linux, I'm afraid.

Edit - looking at the top hosts lists for Einstein, there are plenty of both ATI and NVidia GPUs, running both Linux and Windows. Driver versions are listed for every combo, except Linux/NVidia. And looking at what is now http://boinc.berkeley.edu/trac/browser/boinc/client/gpu_nvidia.cpp, I see no attempt to retrieve an NVidia driver version, except within an #ifdef _WIN32 directive.

If WCG science apps require driver information (they seem to be the only project which does), they'll have to ask for it to be added to BOINC.
ID: 48127 · Report as offensive
tyf

Send message
Joined: 9 Mar 13
Posts: 4
Message 48133 - Posted: 10 Mar 2013, 2:08:55 UTC - in response to Message 48126.  

This error has been reported at WCG in 3 posts: http://www.worldcommunitygrid.org/forums/wcg/search?offset=0&key=process+AND+exited+with+AND+code+AND+232+%280xe8%2C+-24%29&member=&scopeinpost=3&forum=0&date=0&beforeafter=1&minattach=0&sort=1&rows=20 At least at WCG noone posted a resolution, and given it's encountered on NVidia & ATI, seemingly limited to the Linux platform, hard to crack.

Is the GT630M a card in a mobile? For sure the GT630 is blacklisted at WCG i/e/ nothing you can do.
https://secure.worldcommunitygrid.org/help/viewTopic.do?shortName=GPU#610


Yes, GT630M is mobile version (i.e. for laptops).


Think I will have to try updating my BOINC to the newest development release, let's see how it works.
ID: 48133 · Report as offensive
tyf

Send message
Joined: 9 Mar 13
Posts: 4
Message 48135 - Posted: 10 Mar 2013, 3:43:05 UTC

Reinstalled the nvidia cuda drivers with other means, not working too, still the same error.

Is there a way to specify the driver version manually and not through automatic detection? I really need a quick hack over this :)
ID: 48135 · Report as offensive
Juha
Volunteer developer
Volunteer tester
Help desk expert

Send message
Joined: 20 Nov 12
Posts: 801
Finland
Message 48209 - Posted: 14 Mar 2013, 23:06:46 UTC - in response to Message 48135.  

A bit late to the party but I guess I'd better post anyway.

I wonder if this line

../../projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.08_i686-pc-linux-gnu__nvidia_hcc1: /usr/lib/nvidia-current/libOpenCL.so.1: no version information available (required by ../../projects/www.worldcommunitygrid.org/wcg_hcc1_img_7.08_i686-pc-linux-gnu__nvidia_hcc1)

is causing some confusion here.

That line doesn't come from the WCG app but from the system dynamic linker/loader.

There isn't much documentation of the different messages the dynamic linker prints. Google tells that message is printed when the app was linked against a newer version of a library than what you have on your system. Google also tells that isn't a fatal error, that is, loading the program doesn't stop there as can be seen from the next lines which (I think) comes from the WCG app.

Whether or not the different library version causes troubles later in the app is something that needs to be asked from WCG developers.

(Btw. a post by the OP(?) shows that BOINC can get the NVIDIA driver version number as part of OpenCL detection.)
ID: 48209 · Report as offensive
Robert Gammon

Send message
Joined: 19 Aug 08
Posts: 8
United States
Message 48258 - Posted: 18 Mar 2013, 22:01:58 UTC

While not a CUDA problem, I too see problems with BOINC on Fedora 18

1. Boinc initially does not see the client, but on dismissing the error message, the manager successfully connects almost all the time.

2. The stock Nvidia driver, 304.64, pulled from the rpm repositories is one that worked well in Ubuntu 12.04. However, when BOINC manager starts, the second line of the Event Log that deals with the GPU is not there, and OpenGL is not enabled. Several apps depend on this capability so they abort before they get downloaded.

3. BOINC v7.0.29 cannot communicate with the manager BAM! at Boincstats.com. There are four error messages that seem to point to one or more .xml files
ID: 48258 · Report as offensive
tyf

Send message
Joined: 9 Mar 13
Posts: 4
Message 48307 - Posted: 22 Mar 2013, 8:10:06 UTC

Somehow I got it working, and saw this line in the event log:

10-Mar-2013 16:26:48 [---] OpenCL: NVIDIA GPU 0: GeForce GT 630M (driver version 304.84, device version OpenCL 1.1 CUDA, 2048MB, 2027MB available, 182 GFLOPS peak)


But after I updated some Xorg drivers, as part of normal updates, it doesn't work anymore and didn't display that line, so all the GPU apps stopped crunching.

Actually I am not sure how I made it working, but I remembered that I installed the CUDA 5.0, nvidia-current driver and bumblebee. However I am not able to reproduce the working state anymore. Perhaps I was not aware of how it works in the backstage...

Any good idea to fix it?
ID: 48307 · Report as offensive

Message boards : Questions and problems : Cuda Computation error caused by unknown Nvidia driver version

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.