Dual gpu problem: 99% utilization hangs two gtx-280s

Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 37442 - Posted: 9 Apr 2011, 8:35:15 UTC
Last modified: 9 Apr 2011, 8:55:48 UTC

This is possibly a hardware problem but I am unsure how to solve it. Starting about 2 days ago (April 6th) my core 2 quad with a pair of gtx280 started getting sluggish and then began hanging. About this same time, the project milkyway went offline and my system started processing PrimeGrid(99% gpu utilization) and Collatz (86% gpu utilization) instead of milkyway. This system can no longer process two PrimGrid tasks concurrently without hanging. No errors in windows event file. Temps easily under 70c. Windows 7/64 6.12.13 and 260.99. I upgraded to 6.12.23 and 266.58. Things got worse. Instead of taking 60 seconds or so before the system hung after bringing up boincmgr, it hung within seconds. I pulled the boards and swapped them and eventually put one board in another system. Both boards work fine in separate systems. They are very sluggish when processing PrimeGrid and I suspend the GPU when I want to do other things. Collatz at %86 utlization is not sluggish and the system is useable when collatz is running. ie: I do not have to suspend the gpu.

Is there a way to decrease utilization of gpu or to restrict the same project from using both gpu's at the same time?

This system has DVDFab installed. That program uses CUDA to handle video encoding/decoding for dvd & bluray copying. AFAICT it is only active when I am running the program itself although there is a service associated with it.

PrimeGrid has not changed their CUDA app since January and I dont see anything unusual in the micrsoft win7 updates that could cause this problem.

any help would be appreciated

thanks for looking


[EDIT] Want to clarify that with both boards and the system set for use GPU after 1 minute of idle, the system hangs instantly upon the minute expiring with two primegrid tasks and requires a hard reset. Prior to upgrade to 266.58 and 6.12.23 it would hang within 30 seconds and if I moved the mouse quick enough I could suspend the project before the system completely hung. The system also hangs with two collatz tasks running, but it runs longer before it finally hangs.
I have another system with a pair of 9800gtx+ that run PrimeGrid without hanging. However, they are not double precision video boards like the gtx280.
ID: 37442 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 37447 - Posted: 9 Apr 2011, 12:54:39 UTC - in response to Message 37442.  
Last modified: 9 Apr 2011, 12:55:36 UTC

Is there a way to decrease utilization of gpu or to restrict the same project from using both gpu's at the same time?

No, there isn't; answer to both the questions. At least, not without going into manual app_info.xml country, and even then... I don't think so.

Sounds more like a power issue though. What kind of PSU do you have in that system? Primegrid will utilize the whole GPU, so will draw more power than Collatz. And the GTX280s are power hungry beasts, in the same category as the GTX295s. You need a beefy PSU just to run one, let alone two, let alone at average speed. So could it be that your humble PSU isn't good enough to run 2 GTX280s at full bore?

(Btw, for the 2x GTX295 example, one needs at least a 1,050 KW PSU; 700 MW for one 295, half that more for the second.)
ID: 37447 · Report as offensive
Claggy

Send message
Joined: 23 Apr 07
Posts: 1112
United Kingdom
Message 37449 - Posted: 9 Apr 2011, 14:24:39 UTC - in response to Message 37447.  
Last modified: 9 Apr 2011, 14:40:04 UTC

(Btw, for the 2x GTX295 example, one needs at least a 1,050 KW PSU; 700 MW for one 295, half that more for the second.)

1,050 Kilowatt?, 700 Megawatt?

According to wikipedia, U.S. nuclear power plants have net summer capacities between about 500 and 1300 MW, so he'll need a Nuclear Power station per GTX280, and a some sort of substation to power the PSU ;-)

Claggy
ID: 37449 · Report as offensive
Profile Jord
Volunteer tester
Help desk expert
Avatar

Send message
Joined: 29 Aug 05
Posts: 15480
Netherlands
Message 37451 - Posted: 9 Apr 2011, 15:16:02 UTC - in response to Message 37449.  
Last modified: 9 Apr 2011, 15:16:49 UTC

lol... excuse me. I was half and half busy with my electricity bill.
Sorry. Just plain Watt of course, everywhere. 1,050 Watt and 700 Watt.

(But just wait. What with the KW PSUs around, it'll be only time before you need MW PSUs. ;-))
ID: 37451 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5080
United Kingdom
Message 37452 - Posted: 9 Apr 2011, 16:05:25 UTC - in response to Message 37451.  

lol... excuse me. I was half and half busy with my electricity bill.
Sorry. Just plain Watt of course, everywhere. 1,050 Watt and 700 Watt.

(But just wait. What with the KW PSUs around, it'll be only time before you need MW PSUs. ;-))

That's inflation for you. It does things like that - especially to the bill ;-)
ID: 37452 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 37458 - Posted: 10 Apr 2011, 1:41:40 UTC
Last modified: 10 Apr 2011, 1:43:10 UTC

I suspect the problem is lack of power as Jord suggested. This system has an xfx 850 watt modular pwr supply. However, I recently added a 3rd 2TB drive for a total of 4 HD (6.5TB) and 2 opticals. According to
http://www.eggxpert.com/forums/thread/493743.aspx at the newegg forum, 800 watt is needed for a pair of gtx280. This system worked fine for about 6 months since I added the 2nd GPU, but I suspect adding the additional 2TB drive created too great a power load.

I saw a better power calculating tool somewhere but I don't remember where I found it. The above link is probably just a guess. I can measure A/C current flow into the xfx power supply, but the 850 watt is the delivery rating, not the consumed power. There is probably some way to actually measure how close I am to the 850 watt rating. In the mean time I can start adding up HD power requirements and assume worst case with all disks spinning.

I have a single ATI 5850 that is having its fan replaced under warrantee. I may stick that in place of the gtx280 that I pulled when I get it back. According to the above link the 5850 only requires 600 watts for a pair whereas the gtx280 requires 800.
ID: 37458 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 37542 - Posted: 20 Apr 2011, 7:54:40 UTC

I installed nVidia 270.61 and it starting working again in a win7-64 system. I think the problem was 266.58.

The gtx280 that I thought was defective worked perfectly in a linux system processing collatz and that was when I went back to nVidia and discovered the even newer 270.61 drivers.

Getting the dust out helped but the system still hung with 266.58 and a new power supply I had bought thinking it was a lack of power.

Some other thoughts - The software DVDFab uses CUDA for encoding / decoding of dvd & bluray. It is significently faster than just the CPU. They do not support ATI's equivalent software. It took me over 4 hours to copy a bluray on a system with a cypress 5850 and a core 2 duo. A similar bluray movie copied in 30 minutes on my gtx280 system with a core 2 quad. Currently, I stop BOINC when using DVDFab as I am unsure if they can coexist at the same time.
ID: 37542 · Report as offensive
Profile Joseph Stateson
Volunteer tester
Avatar

Send message
Joined: 27 Jun 08
Posts: 641
United States
Message 37563 - Posted: 23 Apr 2011, 13:55:01 UTC

Followup: Always want to correct a post that was wrong. The problem was NOT the driver. The heatsink compound dried out on on the defective GTX 280. In the process of swapping the board to different systems it got into a slot that was cooler and it simply took longer before it finally failed and I had thought the problem was the driver since it worked a couple of days before it failed again.

XFXForce has lifetime warrantee but only for original buyer, not a used board from ebay so I disassembled it and determined that all the grease on the memory chips had totally dried out turning to a white fluffy powder and the main nVidia chip with the silver stuff had dried out but had stayed silvery. The board ran very hot to the touch, hotter than its other pair, but its temp sensor never got very high. The temp pickup must have been near the big chip that still had the silver stuff but the rest of the chips overheated.

Anyway, I replaced it with at GTX 570 (another story) and ordered a replacement cooler from zalman to try to salvage the board.
Joseph "Beemer Biker" Stateson
ID: 37563 · Report as offensive

Message boards : GPUs : Dual gpu problem: 99% utilization hangs two gtx-280s

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.