Nvidia GPUs Hot Hours After Running SETI

Message boards : GPUs : Nvidia GPUs Hot Hours After Running SETI
Message board moderation

To post messages, you must log in.

AuthorMessage
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50288 - Posted: 23 Aug 2013, 10:35:18 UTC

Hello BOINC Community

I've recently built a new desktop machine, in part for the purpose of running BOINC.

Relevant Info
-------------
OS: Windows 8 x64
Client: 7.0.64
GPU: 2 x ASUS Nvidia 560Ti in SLI
App: SETI@home v7 (cuda42)

I've set my machine to run SETI and other apps overnight. Compute finishes at 7am. When I return to my computer hours after compute finishes, MSI Afterburner reports GPU temps of around 58 (GPU 1) and 40 (GPU 2). These are 15 - 20*C higher than normal idle temps.

Can anyone explain what is going on? There doesn't seem to be a problem with other GPU apps. A restart solves the problem.

Additionally, I've now set <ignore_nvidia_dev>0</ignore_nvidia_dev> since I throttle my CPU to 10% where it can't feed 2 GPUs. Even with this command enabled, GPU 1 still gets hot while GPU 2 is working, even though it's not being used (verified with Afterburner). Perversely, due to airflow constraints, it gets hotter than GPU 2.

Regards
Josh

ID: 50288 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 50290 - Posted: 23 Aug 2013, 11:23:47 UTC - in response to Message 50288.  

I've never heard of behaviour like that, although not many people who post about BOINC use that sort of scheduling. We also don't have a lot of experience of Windows 8 yet, either.

I've passed a message to the developer who contributed that particular cuda42 application to SETI. He might need more information, like the NVidia driver version you're using. Could you post your SETI account ID here, or a direct link to the host ID for that computer? There may be clues buried in the stderrtxt for the tasks you've completed. Thanks in advance.
ID: 50290 · Report as offensive
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50300 - Posted: 24 Aug 2013, 1:10:16 UTC - in response to Message 50290.  

Thanks for your response Richard

Is this what you were after?

23/08/2013 6:37:26 PM | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7034435; resource share 200

By scheduling do you mean running for fixed hours over night? The CPU limitation is to keep power usage and heat under control. Using Geforce 320.49 drivers at present.

Happy to post any further required information.

Regards
Josh
ID: 50300 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 50303 - Posted: 24 Aug 2013, 13:45:32 UTC

Thanks - that gets us a bit further down the road.

Unfortunately, it doesn't look at if SETI has been running on that machine, even during the overnight scheduled hours:

All tasks for computer 7034435

All my most relevant experience comes from Windows 7 (not 8), but some may help.

What state do you leave the machine in when you leave in the evening? If you log off from your Windows account, neither BOINC nor SETI will have access to the NVidia driver, and cuda applications can't run. [In fact, I suspect BOINC will close completely when you log off].

For CPU-based applications, the solution is to install BOINC in 'service' or 'Protected Application Execution' mode - but unfortunately the protection is so strong that, again, applications can't communicate with the graphics driver in that mode. So, that's a no-go.

If you (or your employer) are worried about the security aspects of leaving the machine logged on while you are away from your desk, you could either

a) lock the machine when you leave - Ctrl+Alt+Del, or Windows+L: I think this can only be done manually.
b) use a blank screensaver with a password required to resume.

But that doesn't explain the high temperatures. Since you're using MSI Afterburner anyway, could you leave that running overnight behind the lock or screensaver? It might give some clues in the other traces (GPU usage, memory usage, clock speeds etc.) which you could compare with the 'cool' state after reboot.

There have been problems reported with the Mac OS X version of BOINC on some Macbooks - even the simple BOINC start-up action of querying the GPU device capabilities switched the GPU into high-speed, high-power, mode, and it never reverts to low power - even if no actual computation is being carried out. Maybe something similar is happening in Win8? Again, Afterburner should reveal it if so.

BTW, I assume you've adjusted the Windows default power management settings? You can blank the monitor while BOINC runs, but nothing else should be shut down when the machine is idle - use the 'never' settings.
ID: 50303 · Report as offensive
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50308 - Posted: 25 Aug 2013, 2:19:05 UTC - in response to Message 50303.  

Hi Richard

I apologise. I should have given more detailed information.

I've been allowing my PC to sleep overnight (thus not working) for the last month until I have this problem resolved. I would prefer to allow my computer to sleep (after 15 mins) during daylight hours, and then use Insomnia (or another app) to keep it awake overnight.

Before I changed the sleep behaviour SETI was running (at least the event log reported completed work and I was awarded credit). Should I allow SETI to run for the next few evenings to give you data to work with?

I have MSI Afterburner on always to preserve custom fan curves and limit FPS for gaming. I'll run SETI and get back to you with some more detailed information.

Regards
Josh
ID: 50308 · Report as offensive
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50309 - Posted: 25 Aug 2013, 13:13:37 UTC - in response to Message 50308.  

Hi Richard and community. I've some more detailed information to describe my issue.

I've just run SETI work on demand (outside my normally scheduling hours)and noted some key values from MSI Afterburner.

After a restart, both GPUS have the following values.
Core Clock: 51 MHz
Shader Clock: 101 MHz
Memory Clock: 135 MHz

During BOINC Work, the following values are displayed. Remember that only GPU 2 (of two) is performing work, yet the values for both GPUs are identical.
Core Clock: 830 MHz
Shader Clock: 1661 MHz
Memory Clock: 2005 MHz

These values are maintained after stopping SETI from working. So it appears that even after finishing work, the GPUs remain at full clocks. Further more, GPU 1 is clocking up even though BOINC is instructed to ignore it.

Clocks return to normal following a restart. I also noticed that whatever data SETI loads into the GDDR5 is not being dumped after work stops. 367 MBs are occupied both during and post work.

It may be worth mentioning that these GPUs were not purchased 1st hand. I bought both from a single seller to keep me going until I can afford a better solution in a later generation.

As before, this behaviour is only evident after running SETI v7 cuda 42.

Thank you for the assistance thus far.

Regards
Josh[/list]
ID: 50309 · Report as offensive
Richard Haselgrove
Volunteer tester
Help desk expert

Send message
Joined: 5 Oct 06
Posts: 5081
United Kingdom
Message 50310 - Posted: 25 Aug 2013, 13:46:02 UTC - in response to Message 50309.  

Thanks. That does point to a specific issue with SETI's CUDA app - cuda42 in this case, though there are other cuda generations built from the same code base. I'll keep passing messages back.

The clock increases when running SETI are to be expected. The clock increase for GPU 1 (the inactive one) is probably a function of SLI - in my experience, if you disable SLI in the NVidia control panel (leaving the SLI strap connector in place), the clock speeds of the two cards vary independently according to load.

The curious issues - and potential problems - are the failure to downclock when idle, and the failure to release GPU RAM. Did you wait and allow the SETI tasks to finish normally, or did you suspend them manually? That shouldn't make any difference, but it would be nice to know.

I'm still not seeing any reported results on your machine's results page at SETI. If you have any tasks 'Ready to report', could you please update the project, so that we can read the output generated by the application?
ID: 50310 · Report as offensive
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50314 - Posted: 26 Aug 2013, 1:38:12 UTC - in response to Message 50310.  
Last modified: 26 Aug 2013, 1:39:57 UTC

Hi Richard

I suspended the tasks manually. I'll run SETI for a number of hours today to give tasks a chance to complete so you can examine the results.

Disabling SLI manually every night is not ideal, since I use the cards in SLI for games. I'll learn to live with GPU 1 clocking up.

On a positive note, I'm loving GPU compute. I've done more work on this machine in 1 month than 3 years on my laptops :)

Thanks for all your help so far!

Regards
Josh
ID: 50314 · Report as offensive
Josh

Send message
Joined: 10 Mar 11
Posts: 12
Australia
Message 50324 - Posted: 27 Aug 2013, 0:37:30 UTC - in response to Message 50314.  

Hi Richard

SETI returned 3 or 4 work units yesterday for you to examine.

Regards
Josh
ID: 50324 · Report as offensive

Message boards : GPUs : Nvidia GPUs Hot Hours After Running SETI

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.