Message boards :
Questions and problems :
Inaccurate "time left"
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 5 Oct 06 Posts: 5082 |
I'm told that the OpenCL runtime libraries on NVidia only support the primitive spin-wait synchronisation for all the parallel kernels on the GPU. That means that that the CPU is constantly 'active' (clocking up time), but it's doing nothing useful - just twiddling its thumbs, waiting for something to happen. The CUDA environment is far more sophisticated, with callback synchronisation available, but it's different. Projects prefer to code their science apps only once, so they prefer to program in OpenCL, since it supports all three GPUs - AMD (efficient runtime), NV (lousy runtime), and Intel_gpu (special runtime - needs little but fast). I run my Einstein Intel_GPU apps at real-time priority - sounds dangerous, but speeds them up seven-fold, with no downside even on 100% utilised CPUs, except briefly during task end swapover. |
Send message Joined: 5 Oct 06 Posts: 5082 |
How come OpenCL was never written as well for NV?Marketing (monetization). They want to promote their proprietary technology - CUDA. I see no point in using Intel GPUs. Might aswell use the CPU as a CPU, it can't do both at once, one will slow down.Intel GPUs work fine if the CPU cores are doing low-power work (integer maths). You need to keep the entire die within TDP to avoid throttling. SEVENfold? How is that possible? Do you mean if your CPU is also doing other projects so the GPU tasks never get a chance?Probably voodoo, but it works. Partly because the app is kept in on-die cache memory, and the time overhead of a full context switch is enough to kill the performance of the GPU app. |
Send message Joined: 5 Oct 06 Posts: 5082 |
TDP = Total Dispersed Power - electrical input, in watts. The Intel GPUs run at full speed if there is a free core, same as other GPUs. They slow down 7-fold if all cores are busy: but you get that back with real-time priority (I use Process Lasso). That's enough for tonight, UK time :-) |
Send message Joined: 17 Nov 16 Posts: 869 |
The science application itself is what determines how much cpu resource it needs to feed the gpu task. For AMD and Nvidia, the applications are different. I have no problems keeping the GTX 1070 Ti busy at 98% utilization with just a single task running on the gpu. It uses all of the card. The Nvidia cards don't run the Einstein application as efficiently as AMD. AMD cards have no issues running multiple tasks per card, in fact that is preferred to utilize all of the card's resources. Why AMD cards are always preferred at Einstein and somewhat at MilkyWay. But I have always crunched for Seti as my primary project and the fastest applications are always via Linux and Nvidia. So that has always determined my card choice preference for Nvidia. |
Send message Joined: 8 Nov 19 Posts: 718 |
Did you add multiple tasks per GPU? |
Send message Joined: 17 Nov 16 Posts: 869 |
How come Seti is faster on Nvidia? Do Seti do a Cuda version? Actually Seti does have an optimized CUDA application from third party developers. 5X - 10X faster than the stock OpenCL applications. The charts from Shaggie76 are not representative of any hosts running anonymous platform with non-stock applications. His script only scrapes for stock hosts and applications. The few CUDA entries in his list are from the CUDA60 app from 2016. It performs much worse than the stock OpenCL apps. So not representative in the least of the current fastest applications available for Linux. With the special Linux app, I do Seti gpu tasks in 30-60 seconds compared to 600 seconds for the OpenCL applications. It is not even a question of which applications are faster. The special app uses either CUDA9.0 or CUDA10.2. If you look at the Top 100 Hosts list at Seti, you will see it is dominated by Linux hosts running the special app. https://setiathome.berkeley.edu/top_hosts.php [Edit]] FYI this is the most current chart of gpu performance at Seti. https://setiathome.berkeley.edu/forum_thread.php?id=81962&postid=2018703#2018703 |
Send message Joined: 25 May 09 Posts: 1284 |
The "special app" only works for one combination of GPU and operating system, that is mid-high end, recent nVidia GPUs running under Linux (and then possibly not all flavours of Linux). To become mainstream some of the restrictions would have to be addressed - the GPU age and level restriction probably top of the list, followed by making sure i works under a wider range of Linux, then Windows.... Sometime located elsewhere in the (un)known Universe But most often found somewhere near the middle of the UK |
Send message Joined: 5 Oct 06 Posts: 5082 |
If that special app is so much faster, ought Seti not incorporate that programming into the mainstream ones? They'd get huge amounts more work done.Partly because the extra speed comes at the expense of much higher memory usage: the special app can't be used on every GPU. Managing the distribution to compatible cards only is not handled easily or well under the BOINC framework, especially when two (or more) dissimilar cards are installed in the same computer. |
Send message Joined: 5 Oct 06 Posts: 5082 |
The client detects each and every GPU in the system, but it only notifies the server about the 'best' one (plus a count of all the others) - look inside any sched_request file at each separate <coproc_xxx> section.Perhaps it could be set to only send tasks out if the client had > x GB of RAM? Or could identify what GPU(s) are on the client? I think both these pieces of data are available to the project server.If that special app is so much faster, ought Seti not incorporate that programming into the mainstream ones? They'd get huge amounts more work done.Partly because the extra speed comes at the expense of much higher memory usage: the special app can't be used on every GPU. Managing the distribution to compatible cards only is not handled easily or well under the BOINC framework, especially when two (or more) dissimilar cards are installed in the same computer. That decision was made in 2008, and by 2014 it was acknowledged as a mistake - but it would be a hugely complex bit of programming to report and store details of each GPU separately, whilst maintaining compatibility. It hasn't been attempted, and it is unlikely to be in the near future. |
Send message Joined: 25 May 09 Posts: 1284 |
That entirely depends on what you are using. For SETI the best you can run under Windows is the "SoG" application, which is one of the "stock" set of applications. There is some tuning possible for most GPUs which wil improve things over the default settings - if you want to use them the best thing to do is to have a lok around on the SETI forum for your GPUs and see what is suggested. |
Send message Joined: 25 May 09 Posts: 1284 |
My previous post applies. Sadly for you the "tricks" used to gain the huge performance improvement on the nVida "CUDA" applications are not transferable to the AMD/ATI families of GPU. |
Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.