Nvidia/AMD Cuda/OpenCL on Boinc projects

Author	Message
ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 101570 - Posted: 14 Nov 2020, 9:52:05 UTC - in response to Message 101547. Last modified: 14 Nov 2020, 9:56:20 UTC I normally consider which card to buy for Boinc based on the "theoretical floating point speed" quoted for the card in Techpowerup reviews, taking account of whether the project I want to run it on is single or double precision. But I noticed on PrimeGrid that Cuda tasks are horrendously faster than they should be (by a factor of 2 by looking at other people's results - I only have AMD cards at present). On Einstein there seems to be very little difference. Does anyone know which projects write more efficient Cuda code and how much of a difference it makes? I'm trying to work out if it's worth buying a more expensive and theoretically slower Nvidia card because it can run the better Cuda coding on some projects. So far, the best bang for your buck, is not always the GPU that'll get you the best performance. To get the best, get the GPU with the most cuda cores. That'll be the RTX 3090. If they have too many cuda cores (most likely) you'll be able to double, triple, quadruple, octuple,.... whatever-uple the project, until it's running at maximum performance. If you're looking for the best bang for the buck, it'll be a toss-up between the $700 3080 and $500 3070. I'm leaning towards the 3080, because it's faster, and has faster RAM, but also uses 325W, vs 225W on the 3070 (stock). The 3080 can be pushed to run about as fast as it does at 250W or perhaps even lower wattages. The 3070 is supposed to go for $500, but it'll be a while until it will reach this price. It also can be lowered in wattage. So it'll be up to you, to decide what $$$ amount you want to spend in terms of initial purchase price, running cost, and what your case can handle in terms of heat. All RTX 3000 series GPUs (3090, 3080, 3070), are currently in high demand, and scalpers are trying to sell them for prices 2 to 3x MSRP. Give them 1 to 2 months, for prices to come down. The $500 MSRP priced 3070 would perform very similar to the $1200 MSRP priced RTX 2080Ti. On average it's <5% slower. The 3080 is on average 10-15% faster than the 2080Ti. So: 2060 < 2060 KO < 2060 Super < 2070 < 2070 Super < 2080 < 2080 Super < 3070 < 2080 Ti < 3080 < 3090 ID: 101570 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101571 - Posted: 14 Nov 2020, 10:33:58 UTC - in response to Message 101547. You need to look at the individual projects' forum to see what people have found. You have identified one thing to consider - not every project has well optimised applications and this applies to all three major GPU types. For now we can discount Intel as they are so far behind the other two. Not every project has the resources available to them to highly optimise for the most recent nVidia cards. Also to consider is that some projects require double precision which is not always available on nVidia, but is available on most AMD. All this means that the on-paper performance is not always a good guide to what happens in the real world when faced with the real data and applications we see from projects. In your position I would stick with what you know (You it's AMD, me it's nVidia) as there is a much smaller learning curve when moving up their own performance-price curve than that encountered when flipping from one "breed" to the other. ID: 101571 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 868	Message 101577 - Posted: 15 Nov 2020, 1:24:57 UTC You'll never find consumer Nvidia cards matching AMD FP64 performance. Nvidia deliberately hamstrings FP64 performance on consumer cards to force the user that wants that kind calculation to their professional series called Tesla and Quadro which costs 3X-4X the price of the consumer cards. They have in the past made "prosumer" cards like the TitanX or TitanV which have full 1:2 FP64 performance. The latest Nvidia Ampere series halved the FP64 performance of the previous Turing series. Now at a pathetic 1:64 ratio. The worst the AMD cards have ever been is 1:16 FP64 ratio which their latest series has. But luckily, only a few projects utilize FP64 calculations so you are not alway penalized for choosing Nvidia over AMD. ID: 101577 ·

Tigers Dave Send message Joined: 24 Dec 05 Posts: 52	Message 101579 - Posted: 15 Nov 2020, 4:04:18 UTC - in response to Message 101576. Here are Collatz @ Home performance data, generated using my Macs and optimized crunching parameters. AMD RX 5700 XT - ~7 M credits/day NVIDIA GTX 1080 Ti - ~7 M credits/day AMD RX Vega 56/64 - ~6.5 M credits/day NVIDIA GTX 1080 - ~5 M credits/day AMD RX 580 - ~4 M credits/day NVIDIA GTX 980 Ti - ~4 M credits/day NVIDIA GTX 980 - ~3 M credits/day Some folks may achieve better numbers than me through undervoltage, overclocking, and better optimization of the crunching parameters. And, of course, the performance of a particular GPU under MacOS may be different than under Windows or Linux. Finally, some projects may be better at utilizing AMD GPUs than NVIDIA GPUs and vice versa. So, I agree with the advice of one of the previous posters that you check the message boards of the projects for which you crunch. FWIW, NVIDIA GPUs appear to be a much better bang for the buck than AMD GPUs when crunching Collatz @ Home tasks. But the most recent versions of the MacOS are not compatible with contemporary NVIDIA GPUs. Good luck! ID: 101579 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 101644 - Posted: 19 Nov 2020, 16:11:08 UTC Last modified: 19 Nov 2020, 16:12:26 UTC I don't think DP makes that big of a difference. Most projects use some DP, but mostly 32 bit float. Even projects that use more DP, still use single precision. You'd have to compare how the GPU performs per watt, or PPD performance in a project. That's more significant, than it's DP performance. Most projects I run, favor Nvidia over AMD. Especially those running CUDA. ID: 101644 ·

Jord Volunteer tester Help desk expert Send message Joined: 29 Aug 05 Posts: 15480	Message 101655 - Posted: 19 Nov 2020, 23:42:34 UTC - in response to Message 101644. Even projects that use more DP, still use single precision. Maybe so, but the problem with the projects that require Double Precision, is that you cannot use a Single Precision GPU, at all. So then the issue is moot whether the project only uses DP or also (in part) SP. ID: 101655 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 868	Message 101663 - Posted: 20 Nov 2020, 8:20:21 UTC - in response to Message 101661. Surprised you never ran tasks at Seti. Nvidia was much, much faster than any other vendor. Magnitudes difference. GPUGrid is where my cards run best. No other vendors allowed. ID: 101663 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101666 - Posted: 20 Nov 2020, 8:37:09 UTC Quite a number of compliers, or more specifically linkers, will compile part of an application using DP and other parts using SP, however such applications will indicate as requiring DP capability to run. This means that they will fail to run on a processor that does not have either built-in DP, or some form of "DP emulation". Of course it is possible to compile DP emulation into the application so it will run on SP processors, but that really does lead to a considerable loss in performance (last time I used one of these compliers the loss in performance was more than a ten-fold increase in execution time over the same code running in SP only mode). Discussions about speed advantages are fairly mute in applications where accuracy is vital - let's say one uses SP to calculate the orbital mechanics for an asteroid and that shows it will miss Earth by a safe margin, however, run the same code using DP one finds that on the same orbit the asteroid will collide with Earth. In such a case I would rather take a bit longer (even ten times longer) to get to the correct answer than rush in and get the wrong answer. ID: 101666 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101667 - Posted: 20 Nov 2020, 8:58:31 UTC - in response to Message 101665. Surprised you never ran tasks at Seti. I've always had other projects I wanted to run and never got to SETI. I was actually just about to when they closed down! [/quote] First inaccuracy - SETI@Home HAS NOT closed down, it has gone into hibernation, probably for a year or so, but it is NOT CLOSED. Nvidia was much, much faster than any other vendor. Yes, for a number of reasons, most significant being that their GPUs were better at the set of calculations needed for SETI than AMD processors. Magnitudes difference. GPUGrid is where my cards run best. No other vendors allowed. If a project codes and complies their application for a particular processor then that is their choice - perhaps they only had a person who could code for AMD GPUs? Only due to the programmer deciding to write better code for Cuda. Not quite right, someone, after applying a lot of heavy weight number theory decided to try a different approach to how one block of code was arranged, this block of code being common to all processors. By applying these changes, and he was able to get quite a jump in performance, but it only worked with any degree of accuracy on nVidia GPUs. Later he used a few tricks available in the operating system to improve the way the GPU and CPU parts of the application communicated with each other - again this only worked on nVdia cards and under Linux (not for the want of trying by people who were AMD GPU programmers, and Windows programmers.). Every time I look at buying a card, I find AMD gives more bang for the buck. They're best at CPUs too, the Ryzens are way better than Intel. Your money, your decision (I happen to agree with you about Ryzen CPUs - they have jumped over Intel both in terms of price and performance in the last couple of years. As for Nvidia deliberately crippling double precision and OpenCL, that's criminal in my book. They're boycotted! Again, not true. AMD have OpenCL built into their firmware, and the whole processing unit is optimised around OpenCL. nVidia came late to the OpenCL party, and had to work it into their existing system, and since that is optimised around CUDA there are features of OpenCL and CUDA that do not play well with each other in therms of performance. It is interesting that many of the mega-computers that use GPUs and GPU-like processors use nVidia and not AMD because of the scaling potential that the latter have over AMD. ID: 101667 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101690 - Posted: 21 Nov 2020, 16:10:11 UTC - in response to Message 101684. Your comment shows how little you know about GPUs. Sadly not all GPU have DP burnt into their hardware. AMD have had it over the majority of their GPU for a fair time (if you needed DP back when that arrived then that was certainly one up to AMD), whereas nVidia consumer grade GPUs its not been there that long, with the majority nVidia achieves DP by emulation - hence the performance hit. However the nVidia very high end processors, normally considered to be for "workstation" or "HPC" use, have had DP for some considerable time - these cards are monstrously expensive (topping out at about £5.5k each), and their performance on DP is eye-watering fast and they are very capable of being used in high count parallel arrays to give even more spectacular performance. ID: 101690 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Nvidia/AMD Cuda/OpenCL on Boinc projects - which card to buy?