nVidia GTX-295 x 2 = GPU Hell

Message boards : BOINC client : nVidia GTX-295 x 2 = GPU Hell
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24234 - Posted: 12 Apr 2009, 1:21:53 UTC
Last modified: 12 Apr 2009, 1:23:25 UTC

Hi guys

This has been driving me nuts for a month or so.

I have 2 GTX-295's not in SLi with the desktop expanded over the 4 GPU's. 2 Monitors connected and 2 Dummy VGA plugs. All this is on Vista 32 SP 1 (due to the nVidia drivers on 64 bit showing only 2 CUDA devices). nVidia drivers 182.08, 181.xx and 185.xx.

What I'd like to know is why the BOINC client only reports 3 devices found even though the nVidia control panel shows 4 and if I run a FurMark test it reports 4 GPU's in use. This problem occurs on GPUGrid and SETI as it seems to be a client issue. I only get the line :-

04/12/09 01:59:43 CUDA devices: GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

which is obviously 1 short. I have 5 working Dummy VGA plugs and have tried all of them. I know they work as I use them in another machine.

For 4 hours today all 4 showed up but as soon as I (foolishly) rebooted we are back to the same scenario.

This has been on client 6.4.5, 6.6.15 and 6.6.20.

Many Thanks
ID: 24234 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 24235 - Posted: 12 Apr 2009, 1:34:06 UTC - in response to Message 24234.  
Last modified: 12 Apr 2009, 1:34:23 UTC

As far as I understand this, it is a driver problem. BOINC will only detect all GPUs correctly if all drivers for them have loaded. Does this occur when you start up and BOINC Manager is starting automatically before everything else has loaded?

What happens if you now exit BOINC and restart it? Does it then detect all GPUs?
If it does, you may want to stop BOINC Manager auto-load on Windows startup and start it only after Windows has completely loaded. Starting BM will start the client and if not everything has loaded completely yet, it may give these problems. (at my guess)
ID: 24235 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 24236 - Posted: 12 Apr 2009, 1:40:18 UTC - in response to Message 24234.  
Last modified: 12 Apr 2009, 1:41:04 UTC

A Link to the host would help please,

Note, other people (Al) have posted Boinc showing 3 GPU's and the fourth getting cut off part way through, what does the website show? Does 4 cuda apps run together?

Claggy
ID: 24236 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24237 - Posted: 12 Apr 2009, 2:10:20 UTC - in response to Message 24235.  
Last modified: 12 Apr 2009, 2:47:35 UTC

Hi

The BOINCManager was set to start at login. I have disabled this. I booted to the desktop. Checked the number of GPU's was 4 in the nvidia control panel. Checked the displays were all extended. Ran FurMark which finished and said there were 4 active GPU's. Started BOINCManager and found that BOINC thought there were 2 GPU's but is showing 3 on the host page.

The 2 monitors and 2 dummy vga plugs should ensure all the GPU's are active. Indeed as I mentioned, FurMark is telling me the 4 are active.

The machine is at:-

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4837048

Also seems there have been some errors:-

Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.
ID: 24237 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 24244 - Posted: 12 Apr 2009, 19:22:47 UTC - in response to Message 24237.  

Hi

The BOINCManager was set to start at login. I have disabled this. I booted to the desktop. Checked the number of GPU's was 4 in the nvidia control panel. Checked the displays were all extended. Ran FurMark which finished and said there were 4 active GPU's. Started BOINCManager and found that BOINC thought there were 2 GPU's but is showing 3 on the host page.

The 2 monitors and 2 dummy vga plugs should ensure all the GPU's are active. Indeed as I mentioned, FurMark is telling me the 4 are active.

The machine is at:-

http://setiathome.berkeley.edu/show_host_detail.php?hostid=4837048

Also seems there have been some errors:-

Cuda error 'cudaAcc_GetPowerSpectrum_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_PowerSpectrum.cu' in line 56 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaAcc_summax32_kernel' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 147 : unknown error.
Cuda error 'cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, cudaAcc_NumDataPoints / fftlen * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)' in file 'c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_summax.cu' in line 160 : unknown error.


Have you tried each GTX-295's on their own to see if you can get both GPU Cores working,
then swap the GTX-295's over to see if both GPU cores work on that one.
How many power conections on each Card?, have you checked them?

It should work, some of the top hosts have 2 or 3 GTX-295's on XP, XP x64 and Vista x64, and using Boinc 6.6.20,
But i couldn't find anyone else with Vista x86 and 2 GTX-295's.
I wouldn't worry about the compute error's, we all get a few.
Made your host link live, click reply to see how.

Claggy
ID: 24244 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24253 - Posted: 13 Apr 2009, 12:48:54 UTC - in response to Message 24244.  
Last modified: 13 Apr 2009, 12:50:45 UTC

Yes indeed. I have swapped all the cards around and all is fine with them. As mentioned FurMark itself sees and uses all 4 GPUs. If I suspend all BOINC tasks with 3 GPU's running, close down BOINC Manager and stop the clients, run FurMark 4 GPUSs work. I restart the manager and watch the processes start in task manager and I have 3 again.

I'm not paticualrly trying to run 32-bit Vista but as there was a known 64-bit problem at one time. I am still trying to get it to work on both and getting same issues on both. 64-bit would be better actually but same results.
ID: 24253 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24297 - Posted: 14 Apr 2009, 11:13:43 UTC - in response to Message 24253.  

Ahh no now I get the cutting off thing :) I read the thread twice before I saw what you mean't :)

Not sure if it is cutting off the 4th card in the string but I'm sure it isn't allocating a task to it still.
ID: 24297 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24300 - Posted: 14 Apr 2009, 13:54:21 UTC - in response to Message 24297.  
Last modified: 14 Apr 2009, 14:09:00 UTC

So I did some sceen shots.



The compound image shows:-

2x 295 with 4 displays enabled and the image streatched across the 4.
SLI off and PhysX on.

4 GPUs shown in Riva Tuner.

A PrintScreen of the desktop showing all 4 desktops are present in the screen shot.

Yet I still have 3 GPU tasks running and 3 showing when BOINCManager starts up.

Now quite why the screens are numbered 1,2, 4 and 5 and where 3 went is a mystery and maybe one which indicates where the issue is. i.e. Is BOINC looking for 1, 2 ,3 and 4 and as such 3 is not present. BOINC does not look not look for 5 there I get 3 GPU's 1,2 and 4 and not 4 GPU tasks 1, 2, 4 and 5?

I doubt this assumption as according to Rivatuner they are GPU0 through GPU3 which makes more programmatic sense.

I am sooo confuddled.
ID: 24300 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 24301 - Posted: 14 Apr 2009, 14:58:12 UTC - in response to Message 24300.  

Now quite why the screens are numbered 1,2, 4 and 5 and where 3 went is a mystery and maybe one which indicates where the issue is. i.e. Is BOINC looking for 1, 2 ,3 and 4 and as such 3 is not present. BOINC does not look not look for 5 there I get 3 GPU's 1,2 and 4 and not 4 GPU tasks 1, 2, 4 and 5?

The driver that BOINC looks for should take care of this. BOINC won't physically look for your video card, how many GPUs it has and how many screens you have connected.

That said, I can understand that this may be the cause of your problem. I have already forwarded this thread to the BOINC developer who is going over this code, but haven't heard back from him yet.

In the mean time, can't you force screen 3 back in the Nvidia control panel?
ID: 24301 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24307 - Posted: 14 Apr 2009, 16:58:08 UTC - in response to Message 24301.  

Alas not that I can see no :(
The British nation is unique in this respect. They are the only people who like to be told how bad things are, who like to be told the worst.
Winston Churchill
ID: 24307 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 24321 - Posted: 15 Apr 2009, 23:46:12 UTC - in response to Message 24307.  
Last modified: 16 Apr 2009, 0:10:30 UTC

Okay the 4th GPU has entered the building.

After I taught myself CUDA I wrote my own program to enumerate the CUDA drivers present and was seeing 3 even though the nVidia control panel and the MMC device panel was showing 4. So it's not BOINC Manager.

I deinstalled all the nVidia drivers, rebooted and checked in the registry at

Hkey_Local_Machine\Hardware\DeviceMap\Video

to see what I had. There were 3 devices Video0, Video1 and Video2 present which were all standard VGA drivers. I then installed Forceware 185.68 and rebooted.

I now had 10 devices in the registry. I navigated down to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\

and looked to see what was present. Here we see a list of registry keys like

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\(GPU ID)\0000

where there is a key for each GPU. I searched the list and for each GPU e.g.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\{D82C4D9A-396A-4292-A538-A8F8F22FFEB5}\0000\Settings field (where {D82C4D9A-396A-4292-A538-A8F8F22FFEB5} is a GPU id)

and looked for the words NVIDIA GeForce GTX 295.

I presumed (luckily correctly) that the 1st GPUID with those words i nthe Settings field had the monitor on it. To the following 3 GPU's I added the following registry keys:-

DisplayLessPolicy DWORD 1
LimitVideoPresentSources DWORD 1

at the

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\(GPU ID)\0000

level in the registry tree and rebooted.

Now I only have 2 monitors showing in the nVidia control panel as having my desktop on them but my own CUDA program shows 4 GPUS. I fired up BOINC Manager which still displays that it found only 3 (due to the string length being exceeded I expect) but I have 4 running GPUS.




Maybe the string output in the Manager can be changed to be more like the CPU one. i.e.

Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [x86 Family 6 Model 26 Stepping 4]

so we could have

CUDA devices: (4) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

which would at least semi mitigate it shows 3 or

(4) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

and in the case of mixed cards

(2) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS) (1) GeForce 9800GT (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

etc. Granted if they have 4 totally different models it would still not be right in the second option. Or allow it to span more than 1 line in the output :)
The British nation is unique in this respect. They are the only people who like to be told how bad things are, who like to be told the worst.
Winston Churchill
ID: 24321 · Report as offensive
MarkJ
Volunteer tester
Help desk expert

Send message
Joined: 5 Mar 08
Posts: 272
Australia
Message 24395 - Posted: 19 Apr 2009, 0:53:49 UTC - in response to Message 24321.  

Okay the 4th GPU has entered the building.

After I taught myself CUDA I wrote my own program to enumerate the CUDA drivers present and was seeing 3 even though the nVidia control panel and the MMC device panel was showing 4. So it's not BOINC Manager.

I deinstalled all the nVidia drivers, rebooted and checked in the registry at

Hkey_Local_Machine\Hardware\DeviceMap\Video

to see what I had. There were 3 devices Video0, Video1 and Video2 present which were all standard VGA drivers. I then installed Forceware 185.68 and rebooted.

I now had 10 devices in the registry. I navigated down to

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\

and looked to see what was present. Here we see a list of registry keys like

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\(GPU ID)\0000

where there is a key for each GPU. I searched the list and for each GPU e.g.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\{D82C4D9A-396A-4292-A538-A8F8F22FFEB5}\0000\Settings field (where {D82C4D9A-396A-4292-A538-A8F8F22FFEB5} is a GPU id)

and looked for the words NVIDIA GeForce GTX 295.

I presumed (luckily correctly) that the 1st GPUID with those words i nthe Settings field had the monitor on it. To the following 3 GPU's I added the following registry keys:-

DisplayLessPolicy DWORD 1
LimitVideoPresentSources DWORD 1

at the

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Video\(GPU ID)\0000

level in the registry tree and rebooted.

Now I only have 2 monitors showing in the nVidia control panel as having my desktop on them but my own CUDA program shows 4 GPUS. I fired up BOINC Manager which still displays that it found only 3 (due to the string length being exceeded I expect) but I have 4 running GPUS.




Maybe the string output in the Manager can be changed to be more like the CPU one. i.e.

Processor: 8 GenuineIntel Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz [x86 Family 6 Model 26 Stepping 4]

so we could have

CUDA devices: (4) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS), GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

which would at least semi mitigate it shows 3 or

(4) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

and in the case of mixed cards

(2) GeForce GTX 295 (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS) (1) GeForce 9800GT (driver version 18568, CUDA version 1.3, 896MB, est. 106GFLOPS)

etc. Granted if they have 4 totally different models it would still not be right in the second option. Or allow it to span more than 1 line in the output :)


They could spread it across as many lines as cards are in the machine (ie 1 line for each card). The cuda device number could also be useful for each one.
MarkJ
ID: 24395 · Report as offensive
RottenMutt

Send message
Joined: 19 Nov 05
Posts: 7
United States
Message 24411 - Posted: 20 Apr 2009, 0:07:31 UTC - in response to Message 24395.  

I added the displaylesspolicy and LimitVideoPresentSources as binary "01 00 00 00" on all four "GPU ID" strings and was able to get four cuda devices to show up, gpu-z. now the blue led is lit on the second card.

I suspect only the limitvideopresentsources is required. i have only one monitor, and in display settings i have four under intensified monitor icons.
ID: 24411 · Report as offensive
RottenMutt

Send message
Joined: 19 Nov 05
Posts: 7
United States
Message 24464 - Posted: 22 Apr 2009, 5:29:21 UTC

LimitVideoPresentSources is all that is required. search for it, and add it to the other gpu tab which look just like the one you found with the same binary string.
ID: 24464 · Report as offensive
pharrg

Send message
Joined: 8 Jan 09
Posts: 24
United States
Message 24873 - Posted: 15 May 2009, 4:35:30 UTC

Make sure you do NOT have SLI enabled on any of them. That would cause two GPUs to function as one, thus reducing the visible CUDA devices by one. CUDA is not yet SLI compatible.
ID: 24873 · Report as offensive
JockMacMad

Send message
Joined: 12 Apr 09
Posts: 10
United Kingdom
Message 25150 - Posted: 31 May 2009, 22:25:55 UTC - in response to Message 24873.  

Correct but seeing as I was seeing 3 GPU not 2 SLI could not have been on.
The British nation is unique in this respect. They are the only people who like to be told how bad things are, who like to be told the worst.
Winston Churchill
ID: 25150 · Report as offensive
Drevil

Send message
Joined: 7 Jun 09
Posts: 1
United States
Message 25262 - Posted: 7 Jun 2009, 2:16:26 UTC - in response to Message 25150.  

Hey - I have the exact same issue!! ASUS p6t deluxe with sli 295 GTX cards. Sometime I see 3 GPUs in dev manager, sometimes 4!! I even sent screenshots to FalconNW to help.

Did you find an easy solution??

Bruce
ID: 25262 · Report as offensive
Far

Send message
Joined: 18 Jun 09
Posts: 9
Australia
Message 25518 - Posted: 18 Jun 2009, 6:03:35 UTC

I want to make the LimitVideoPresentSources registry mod so I can run 2x295's.

Will this mod cause issues if I want to later set SLI on to play games?

Thanks,
Far
ID: 25518 · Report as offensive
Far

Send message
Joined: 18 Jun 09
Posts: 9
Australia
Message 25542 - Posted: 19 Jun 2009, 7:05:44 UTC

Anyone know? I won't be running Boinc when SLI is on, but will changing the registry cause any catastrophic stuffups if it is switched back to SLI for a while?

(I would just go ahead and try it but hard pressed to find time for any more rebuilds of this thing)
ID: 25542 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15477
Netherlands
Message 25543 - Posted: 19 Jun 2009, 7:20:21 UTC - in response to Message 25542.  

All that SLI does is make two GPUs acts as if there is only one in your system. It'll combine all the memory available on both cards and for all intends and purposes show as if you only have one videocard in the system.

So BOINC will then also treat it as one videocard, one GPU, one task at a time running.
ID: 25543 · Report as offensive
1 · 2 · Next

Message boards : BOINC client : nVidia GTX-295 x 2 = GPU Hell

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.