Message boards : GPUs : GPU Throttle?
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Apr 07 Posts: 18 |
Greetings: I've just burned out my third GTX-670 in as many months. The only thing I can think of that might be doing this is my GPU processing for SETI. Is there a method for throttling the GPU so it uses only, say, half of the GPU processing capability? Any other suggestions? (Yes, I've been working with ASUS regarding the problem, they're stumped). Thanks! Neil |
Send message Joined: 29 Aug 05 Posts: 15566 |
You can use eFMer's TThrottle to throttle the GPU (set a maximum temperature it may reach). There's no OpenCL or CUDA option yet to specify to use only part of the GPU's processors. But if you burn through them that quick, it's possible that either you don't have enough cooling, or there's a problem with your power supply. |
Send message Joined: 25 Apr 08 Posts: 21 |
Same here, I burned a Graphics Card some years ago (on a weekend when I wasn't at home). So at the moment I've only let GPU-applications work when I was looking after the PC. Now, I've found a workaround using an AutoHotkey script: As you can stop BOINC's CPU and GPU usage when the PC is used, the script simulates this by "moving the mouse 1 pixel up" periodically, giving CPU & GPU some time to chill. To do this, you'll have to: 1. Install AutoHotkey 2. Create a AHK-Script with text-editor (e.g. "MouseBump.AHK) 3. Write something like this in the file: ; each x minutes mouse 1 pixel up If ProcessExist("boinc.exe"){ ;suspend same as in BOINC: sleep 60000 ;repeat this: loop { ;after 5 minutes: sleep 300000 ;"use PC": MouseMove, 0, -1, 0, R ;suspend same as in BOINC: sleep 60000 } } ;function which checks if BOINC is running (used above): ProcessExist(Name){ Process,Exist,%Name% return Errorlevel } This will check if BOINC is running. If so, it waits 1 minute (60,000ms) and then starts the following loop: 1. wait 5 minutes (300,000ms) 2. move mouse 1 pixel up (so BOINC "thinks" PC is in use) 3. wait 1 minute (60,000ms) to let boinc continue it's work. You can change the sleep values as you like. The value which appears double (sleep 60000 = 1 minute) should be the same you've set in BOINC at "Suspend work if no mouse/keyboard activity in last X Minutes". |
Send message Joined: 25 Apr 08 Posts: 21 |
If your CPUs don't have cooling problems, you can set them to "run always" and just uncheck this option for GPU, so the script will only interrupt GPU usage periodically to give it a break. |
Send message Joined: 13 Apr 07 Posts: 18 |
Greetings: Thanks for the comments! I've checked that the power supply is adequate (850 watt antec should be more than sufficient) and there appears to be no heat problem in the case. Admittedly, this is one of the goofy, in my opinion, designs that doesn't pump the heat out of the case. GPU-Z indicates that the GPU temp is on the mid 30's Celsius (this is currently an older ATI Radeon 5750), I can't recall how hot the NVidia card ran. I don't think that should be too hot. How hot is too hot? The card in question is completely stock, no overclocking, no cooling modifications. The CPUs have never seen 50 degrees Celsius. That's an AMD FX-8350 (4.0+ GHz, eight-core) with a Cooler Master Seidon 240M water cooling system. NO issues with Boinc projects there. I've downloaded TThrottle and will play with that, when I get my card replaced. Otherwise, I'll look at the AutoHotKey script. Thanks!!! Neil |
Send message Joined: 25 Apr 08 Posts: 21 |
@Neil: It was an old ATI Radeon which I burned, too. (AGP 8x). 30°C is pretty good. I'm not up-to-date, but as far as I can remember, 60°C for CPU and 70°C for GPU are critical. ...or was it vice versa? A friend (who sells PC Hardware) told me, that some cards are not made to run on full workload for 24/7. Keep this in mind: GCs which are usable for BOINC have programmable shaders, so they are mainly designed for 3D-games. In 3D games, the GPU workload raises with the the number of objects which have to be rendered. In scenes with few objects, the card has some time to chill. Plus, most games cap the framerate to the refresh rate of the monitor (you can also force this by activating Vsync). This means that the card COULD render more frames, but it doesn't, because it makes no sense to calculate more frames than the monitor does show. If I cap my new card in old games to 60FPS, I hardly hear it's fan noise. If I "unleash" the card and deactivate FPS-capping, it generates about 600fps and makes noise like a hovercraft. So, this it what BOINC does. What I've noticed, that even if cooling does work good, a 24/7 full-load makes the fan's bearings wear off faster. I guess, this was the actual reason, why my old GC got toasted. Would be cool if BOINC had separate settings for running & pause time (in minutes) for CPU and GPU each, resulting 4 parameters instead of 1 in percent. This CPU-time-in-percent thingy causes a wobbling fan noise, so the fan is periodically braked and accelerated in a short intervall. With these 4 time parameters, you won't have to care about hardware readout. |
Send message Joined: 29 Aug 05 Posts: 15566 |
GPU-Z indicates that the GPU temp is on the mid 30's Celsius (this is currently an older ATI Radeon 5750), I can't recall how hot the NVidia card ran. I don't think that should be too hot. How hot is too hot? In my opinion, mid 30s means the GPU isn't used for calculations. I am now doing Seti on my AMD HD 7870 and it's running at between 58 and 63C. When it's idling, it's in the high 20s, low 30s as well. Too hot for a GPU is anything above 90C. The GTX-670 has a max temp from the manufacturer of 97C. Run long at, around or beyond that and it'll burn out. Magic smoke. Poof. But then you can compare that to a CPU. When you find your CPU runs at 90C+ you'll be looking into better cooling (liquid nitrogen ;-)) as well. The card in question is completely stock, no overclocking, no cooling modifications. Which card are you now talking about, the ATI or the Nvidia? If the ATI, there is a fan on it? They did release these as passively cooled as well, with just a big heat sink on it, no active (fan) cooling. |
Send message Joined: 2 Jul 14 Posts: 186 |
There's also this one thing in case you're interested in trying some hardcore stuff: It might be possible to flash a different bios into your GTX 670... so it could then have the newer Nvidia GPU Boost 2.0 feature (which was introduced in 700-series cards). With that new bios you would be able to adjust "temp limit" for that card. GPU Boost 2.0 will start throttling the GPU for example once temperature reaches 80 C. It is actually very difficult to burn out a modern Nvidia card with GPU Boost 2.0 inbuilt if voltages are not modified. You can run the card full blast and it will thottle itself. Information: http://www.overclock.net/t/1409584/cl-bios-flash-can-enable-gpu-boost-2-0-on-gtx-600-series-gpus http://www.overclock.net/t/1396335/turn-your-gtx-680-in-to-a-stock-gtx-770 (There is also information of flashing a GTX 760 bios into GTX 670) Video BIOS collection: http://www.techpowerup.com/vgabios/index.php?architecture=NVIDIA&manufacturer=&model=GTX+670&interface=&memType=GDDR5&memSize= * Of course you could also sell your GTX 670 and buy a GTX 760, which has that throttling feature already inbuilt. |
Send message Joined: 25 Apr 08 Posts: 21 |
Thanks, you just gave another idea, even for older cards: underclocking the hardware! |
Send message Joined: 4 Apr 15 Posts: 4 |
its strange that supposedly one is unable to limit or control gpu usage by opencl for example. i had an idea and tried to play around with the coproc_info.xml where i halved the gpu specifications hoping it would define a limit for boinc but strangely i had not impact. my basic idea was telling boinc false ideas about the gfxs specification so it would not torture it so much and thereby effectively reducing gpu load. i cant imagine there is no way you cant control / limit gpu usage. im sure there is a way with the config files or something! would be nice if we could figure it out, because id really love to have it run on 50% of my gpu, otherwise i have to disable gpu computing and i find that very sad due to that lack of control. |
Send message Joined: 2 Jul 14 Posts: 186 |
Thanks, you just gave another idea, even for older cards: underclocking the hardware! I use MSI Afterburner to keep a GTX 780 from going too wild at the moment. Einstein@Home introduced recently the BRP6-Beta-cuda32 application and it can really beat the shit out of the GPU. Earlier BRP4-application warmed up this card to about 77C max (and BRP5 to about 73C), but I was looking at 85C temperatures when this card was crunching new BRP6-Beta. It ment also that the card was fighting to cool down a little bit and fans were spinning much faster rpm and causing plenty of additional noise. I tested a few different settings in the MSI Afterburner and ended up setting "power limit" to 71% and overclocking memory +350 but keeping core clock at stock speed. That forced the card keep its power consumption (TDP) just a little notch lower. 100 --> 71 looks to be a big change in values, but the actual change in power consumption was much much smaller, because normally the card run already far below "100%" (in this computer, with this application...). That small amount of actual limiting was enough to force an important change: It makes the core clock stay at constant base speed all the time. Core speed doesn't "boost up" anymore at any situation, which it would do with normal "power limit 100%". The card also runs with constantly lower voltage now. The GPU doesn't get so hot and cooling fans also spin slower. Noise level is lower. Then, if I open up GPU-Z and look at the "GPU load" and "Memory controller load", they still show surprisingly high percentages. There's not too much difference in crunching power, compared to "power limit 100%" settings. I can see that also on the times how fast tasks have completed. To sum it up: Reduced the performance level just a little bit... and now there's a nice balance point where temperature stays at 80C and card runs cooler and quieter. I have to sleep at the same room with this computer and at this situation I feel those "underclocking" settings really give me more than they take away. At the moment it seems to me: The additional "productivity" that this card could give me above this balance point wouldn't go linear up with the additional "costs". |
Send message Joined: 4 Apr 15 Posts: 4 |
is it possible to run the gpu as non_gpu co-processor? if so, it should adjust with cpu settings. <coproc> specify a coprocessor. Used in combination with the Anonymous platform mechanism. The element has the form <coproc> <type>some_name</type> <count>1</count> <device_nums>0 2</device_nums> [ <peak_flops>1e10</peak_flops> ] [ <non_gpu/> ] </coproc> The name given in <type> must match that in the <coproc> element in app_info.xml. <count> in the number of GPU instances, and <device_nums> is their device numbers. <peak_flops> is the GPU peak FLOPS; it can be omitted if your app_info.xml specifies estimated application FLOPS. If <non_gpu/> is specified, the coprocessor is not treated as a GPU; i.e. "Suspend GPU" does not affect it. This mechanism has two purposes: to provide fine-grained control of the coprocessors recognized by BOINC (NVIDIA, AMD, and Intel GPUs), and to let you use coprocessors not recognized by BOINC. i tried to get it to work with a cc_config.xml file looking like this: <cc_config> <options> <coproc> <type>ATI</type> <count>1</count> <device_nums>0</device_nums> <non_gpu/> </coproc> </options> </cc_config> im not sure if this is correct and working since my gpu load is still 0%. but also i did not use an app_info.xml. can someone clarify? im trying to use gpu as non_gpu co-processor. |
Send message Joined: 25 Apr 08 Posts: 21 |
Here is a new version of my script. Now it tracks mouse movement, so it won't just do a periodical interruption. Time value can now be adjusted in seconds, seconds are also shown as tooltip. If you don't want the tooltips to be shown, remove the relating script lines or write ";" at line start to turn them to a comment. ; each 300 seconds (counted with i) move mouse 1 pixel up ;initialize second counter i := 0 ;initialize tooltip at mouse Tooltip [%i%] ;boinc running? If ProcessExist("boinc.exe") { loop { ;check mouse position MouseGetPos, StartVarX, StartVarY ;wait 1 second sleep 1000 ;check mouse position again MouseGetPos, CheckVarX, CheckVarY ;mouse position changed? If (StartVarX != CheckVarX) or (StartVarY != CheckVarY) { ;reset seconds counter i := 0 ;reset tooltip Tooltip [%i%] } ;mouse position NOT changed? else { ;increase seconds counter i++ ;refresh tooltip Tooltip [%i%] ;300seconds = 5minutes passed? if (i = 300) { ;bump mouse MouseMove, 0, -1, 0, R ;reset seconds counter i := 0 ;reset tooltip Tooltip [%i%] } } } } ;function which is used above ProcessExist(Name){ Process,Exist,%Name% return Errorlevel } |
Send message Joined: 25 Apr 08 Posts: 21 |
After ;bump mouse MouseMove, 0, -1, 0, R you can append: MouseMove, 0, 1, 0, R to move the mouse cursor just 1 pixel down again after it was moved up 1 pixel. This would have the effect that work is suspended, but the cursor won't do a "slow travel northwards". |
Send message Joined: 25 Apr 08 Posts: 21 |
Code update: 1. initial mouse position is only checked once at script start and if mouse was moved. 2. tooltip refresh needs only 1 line now instead of 3 lines. ; move mouse cursor 1 pixel up and down each 300 seconds (using i to count) ;boinc running? If ProcessExist("boinc.exe") { ;initialize seconds counter i := 0 ;store initial mouse position MouseGetPos, StartVarX, StartVarY loop { ;refresh tooltip Tooltip %i% ;wait 1 second sleep 1000 ;check mouse position again MouseGetPos, CheckVarX, CheckVarY ;mouse position changed? If (StartVarX != CheckVarX) or (StartVarY != CheckVarY) { ;reset seconds counter i := 0 ;store new (current) mouse position as initial position MouseGetPos, StartVarX, StartVarY } ;mouse position NOT changed? else { ;increase seconds counter i++ ;time passed? if (i = 300) { ;bump mouse MouseMove, 0, -1, 0, R MouseMove, 0, 1, 0, R ;reset seconds counter i := 0 } } } } ;function which is used above ProcessExist(Name){ Process,Exist,%Name% return Errorlevel } |
Send message Joined: 13 Apr 07 Posts: 18 |
Greetings: Got my replacement card yesterday. Hooked it up and, nothing. Decided to re-arrange the PCI-E power cables and got it to work! So I'm thinking the issue was the power supply, as you suggested. I've created a ticket w/Antec to discuss with them. It's an HCG-850M, so I think it should be sufficient. However, I like the idea of limiting the temp of the GPU, so I'm continuing to play around with TThrottle. Might also look into upgrading the card to the 700 series as suggested by Richie. I've got some additional items I should mention, one in response to your later questions: a. I frequently get errors from Windows about the AMD driver stopping and recovering (I've researched this and found no working solution) b. The temps I mentioned are for the ATI, and, yes, it does have a fan on it. However, GPU-Z often reports the GPU Load as 0%, other times it will be at 100% - maybe this is normal. It's currently reporting 100% and the three GPU temps are 42.5, 43, and 44. c. Since SETI is currently out of work units, I have no GPU crunching to do on the NVidia card, so can't report on it's temperatures, other than that, at idle, it's running 33 (GPU Load 0%). Thanks all!!! Neil |
Send message Joined: 25 Apr 08 Posts: 21 |
@all: I tested TThrottle and like it ;) @Neil: How about another project using GPU? |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.