Inaccurate "time left"

Author	Message
Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 94782 - Posted: 8 Jan 2020, 22:35:05 UTC - in response to Message 94781. I'm told that the OpenCL runtime libraries on NVidia only support the primitive spin-wait synchronisation for all the parallel kernels on the GPU. That means that that the CPU is constantly 'active' (clocking up time), but it's doing nothing useful - just twiddling its thumbs, waiting for something to happen. The CUDA environment is far more sophisticated, with callback synchronisation available, but it's different. Projects prefer to code their science apps only once, so they prefer to program in OpenCL, since it supports all three GPUs - AMD (efficient runtime), NV (lousy runtime), and Intel_gpu (special runtime - needs little but fast). I run my Einstein Intel_GPU apps at real-time priority - sounds dangerous, but speeds them up seven-fold, with no downside even on 100% utilised CPUs, except briefly during task end swapover. ID: 94782 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 94785 - Posted: 8 Jan 2020, 22:52:52 UTC - in response to Message 94784. How come OpenCL was never written as well for NV? Marketing (monetization). They want to promote their proprietary technology - CUDA. I see no point in using Intel GPUs. Might aswell use the CPU as a CPU, it can't do both at once, one will slow down. Intel GPUs work fine if the CPU cores are doing low-power work (integer maths). You need to keep the entire die within TDP to avoid throttling. SEVENfold? How is that possible? Do you mean if your CPU is also doing other projects so the GPU tasks never get a chance? Probably voodoo, but it works. Partly because the app is kept in on-die cache memory, and the time overhead of a full context switch is enough to kill the performance of the GPU app. ID: 94785 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 94787 - Posted: 8 Jan 2020, 23:30:57 UTC - in response to Message 94786. TDP = Total Dispersed Power - electrical input, in watts. The Intel GPUs run at full speed if there is a free core, same as other GPUs. They slow down 7-fold if all cores are busy: but you get that back with real-time priority (I use Process Lasso). That's enough for tonight, UK time :-) ID: 94787 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 94790 - Posted: 9 Jan 2020, 3:37:21 UTC The science application itself is what determines how much cpu resource it needs to feed the gpu task. For AMD and Nvidia, the applications are different. I have no problems keeping the GTX 1070 Ti busy at 98% utilization with just a single task running on the gpu. It uses all of the card. The Nvidia cards don't run the Einstein application as efficiently as AMD. AMD cards have no issues running multiple tasks per card, in fact that is preferred to utilize all of the card's resources. Why AMD cards are always preferred at Einstein and somewhat at MilkyWay. But I have always crunched for Seti as my primary project and the fastest applications are always via Linux and Nvidia. So that has always determined my card choice preference for Nvidia. ID: 94790 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 94826 - Posted: 10 Jan 2020, 22:24:19 UTC Did you add multiple tasks per GPU? ID: 94826 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 94840 - Posted: 11 Jan 2020, 0:11:47 UTC - in response to Message 94791. Last modified: 11 Jan 2020, 0:17:58 UTC How come Seti is faster on Nvidia? Do Seti do a Cuda version? Actually, this post suggests your card would do Seti faster on OpenCL.... https://setiathome.berkeley.edu/forum_thread.php?id=80132#1809194 Actually Seti does have an optimized CUDA application from third party developers. 5X - 10X faster than the stock OpenCL applications. The charts from Shaggie76 are not representative of any hosts running anonymous platform with non-stock applications. His script only scrapes for stock hosts and applications. The few CUDA entries in his list are from the CUDA60 app from 2016. It performs much worse than the stock OpenCL apps. So not representative in the least of the current fastest applications available for Linux. With the special Linux app, I do Seti gpu tasks in 30-60 seconds compared to 600 seconds for the OpenCL applications. It is not even a question of which applications are faster. The special app uses either CUDA9.0 or CUDA10.2. If you look at the Top 100 Hosts list at Seti, you will see it is dominated by Linux hosts running the special app. https://setiathome.berkeley.edu/top_hosts.php [Edit]] FYI this is the most current chart of gpu performance at Seti. https://setiathome.berkeley.edu/forum_thread.php?id=81962&postid=2018703#2018703 ID: 94840 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1284	Message 94848 - Posted: 11 Jan 2020, 8:29:32 UTC The "special app" only works for one combination of GPU and operating system, that is mid-high end, recent nVidia GPUs running under Linux (and then possibly not all flavours of Linux). To become mainstream some of the restrictions would have to be addressed - the GPU age and level restriction probably top of the list, followed by making sure i works under a wider range of Linux, then Windows.... Sometime located elsewhere in the (un)known Universe But most often found somewhere near the middle of the UK ID: 94848 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 94849 - Posted: 11 Jan 2020, 8:34:24 UTC - in response to Message 94841. If that special app is so much faster, ought Seti not incorporate that programming into the mainstream ones? They'd get huge amounts more work done. Partly because the extra speed comes at the expense of much higher memory usage: the special app can't be used on every GPU. Managing the distribution to compatible cards only is not handled easily or well under the BOINC framework, especially when two (or more) dissimilar cards are installed in the same computer. ID: 94849 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5082	Message 94860 - Posted: 11 Jan 2020, 19:54:27 UTC - in response to Message 94859. If that special app is so much faster, ought Seti not incorporate that programming into the mainstream ones? They'd get huge amounts more work done. Partly because the extra speed comes at the expense of much higher memory usage: the special app can't be used on every GPU. Managing the distribution to compatible cards only is not handled easily or well under the BOINC framework, especially when two (or more) dissimilar cards are installed in the same computer. Perhaps it could be set to only send tasks out if the client had > x GB of RAM? Or could identify what GPU(s) are on the client? I think both these pieces of data are available to the project server. The client detects each and every GPU in the system, but it only notifies the server about the 'best' one (plus a count of all the others) - look inside any sched_request file at each separate <coproc_xxx> section. That decision was made in 2008, and by 2014 it was acknowledged as a mistake - but it would be a hugely complex bit of programming to report and store details of each GPU separately, whilst maintaining compatibility. It hasn't been attempted, and it is unlikely to be in the near future. ID: 94860 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1284	Message 94877 - Posted: 12 Jan 2020, 17:15:45 UTC That entirely depends on what you are using. For SETI the best you can run under Windows is the "SoG" application, which is one of the "stock" set of applications. There is some tuning possible for most GPUs which wil improve things over the default settings - if you want to use them the best thing to do is to have a lok around on the SETI forum for your GPUs and see what is suggested. ID: 94877 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1284	Message 94881 - Posted: 12 Jan 2020, 18:52:03 UTC My previous post applies. Sadly for you the "tricks" used to gain the huge performance improvement on the nVida "CUDA" applications are not transferable to the AMD/ATI families of GPU. ID: 94881 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.