An undoubtedly very fascinating thread about GPU capabilities running multiple tasks

Author	Message
ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 107393 - Posted: 15 Mar 2022, 18:49:27 UTC - in response to Message 107352. Last modified: 15 Mar 2022, 18:53:39 UTC Glory is crunching GWs 24/7 on 3 GPUs over at Einstein I'm running 5 out of 6 threads on Einstein GW. One thread is MLC. MLC is actually more demanding (sometimes requiring well over 2GB per WU of RAM, most of the time using 1,2GB of RAM), than GW. GW reserves 14GB, but only uses 500 megs to 1,5 Gigs, averaging around 800Megs per WU (or thread). I run 2 GPUs, RTX 2070 and 2060. The 2070 is significantly faster than the 2060 doing 3 threads per GPU. ID: 107393 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 107397 - Posted: 15 Mar 2022, 23:44:27 UTC - in response to Message 107388. The reason is the programmers want *better FP64 precision* than a gpu can provide for sorting the toplist so they transfer that last bit of computing from the gpu back to the cpu. You misunderstood this part All FP64 devices are not equivalent in their precision. I am not talking about throughput capability here. Yes, modern gpus have FP64 capability in some fashion, but their math precision is deficient in comparison to cpus even if they are faster than doing the math on a cpu. The project admins stated this was the case for moving the top list processing back to the cpu at the end of the task processing. ID: 107397 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 107405 - Posted: 16 Mar 2022, 22:52:39 UTC - in response to Message 107402. The reason is the programmers want *better FP64 precision* than a gpu can provide for sorting the toplist so they transfer that last bit of computing from the gpu back to the cpu. You misunderstood this part All FP64 devices are not equivalent in their precision. I am not talking about throughput capability here. Yes, modern gpus have FP64 capability in some fashion, but their math precision is deficient in comparison to cpus even if they are faster than doing the math on a cpu. The project admins stated this was the case for moving the top list processing back to the cpu at the end of the task processing. Interesting, I would have thought if any chip does 64 bit precision it would produce the same answer as any other 64 bit precision chip. What exactly do the GPUs get wrong? It's not that they get it wrong, its that the gpu uses a different math library than the cpu does. And some cpus have higher internal math precision, like 80 bit integers versus only 64 bit integers in a gpu. Also typically, gpu programmers often sacrifice a bit of precision for some speedup in calculations. https://forums.developer.nvidia.com/t/why-accuracy-cpu-and-gpu-not-equal/35316/5 I'll call your attention to this post in the thread. Neither the IEEE-754 (2008) floating-point standard nor the ISO C and ISO C++ language standards prescribe a particular accuracy for the exponential functions exp() or expf(). This means that library implementers are free to make their own choices regarding tradeoffs between accuracy and performance. As a consequence, no two standard math libraries are likely to deliver bit-identical results for a particular math function called with a particular argument. You will make similar observations when comparing results from different math libraries on the CPU. Some libraries even offer multiple selectable accuracy modes within the library itself, and the results differ based on mode. In the case at hand both libraries deliver results with a small ulp error, which in the case of CUDA is within the math function error bound stated in an appendix of the Programming Guide: CPU res: 1.00000095e+000 bit pattern: 3f800008 ulp error vs mathematical: 0.388612 GPU res: 1.00000107e+000 bit pattern: 3f800009 ulp error vs mathematical: 0.611388 The CPU delivers the correctly rounded single-precision result, while CUDA’s result differs from the correctly rounded result by 1 ulp. ID: 107405 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 107412 - Posted: 17 Mar 2022, 6:50:10 UTC - in response to Message 107406. GPUs are either used for gaming (which is only 32 bit) or calculations such as Boinc (which requires decent precision). I can see the GPU manufacturers making 32 bit faster and less accurate, but not 64 bit. Well for one thing . . . . as far as the manufacturers are concerned we are using their gpus in an off-license way. Consumer gpus are not meant to be used for computing. If you want to compute with a gpu you should purchase their datacenter products which are designed for computing. You should only game on consumer cards as far as the manufacturers are concerned. Why do you think that they gimp the video drivers to only run in P2 power state whenever a computing operation is detected and knock the designed clocks down? Sometimes by as much as a Ghz. They have to penalize the user for using the wrong type of card for the operation. ID: 107412 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 869	Message 107414 - Posted: 17 Mar 2022, 7:09:38 UTC - in response to Message 107413. Last modified: 17 Mar 2022, 7:13:54 UTC Yes AMD have been pretty much immune from such practices till the latest cards. Nvidia has been using this penalty for the last 4 card generations. But it can be undone by knowledgeable users that actually pay attention to what is happening. But most people never even look at the clocks their cards are running and are blissfully ignorant of what has been done to them without their permission. Anyone using an Nvidia card for compute should always overclock the card stuck in P2 power state to at least the same clocks the card would run in P0 power state when they are gaming. If you don't, you are leaving performance on the table. ID: 107414 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 107416 - Posted: 17 Mar 2022, 8:18:33 UTC - in response to Message 107415. Gaming tends to have short burst of actibity (even when lots happening on the scree), but computational work is long bursts or contiuous. This tends to increase the temperature during computation. Sometime located elsewhere in the (un)known Universe But most often found somewhere near the middle of the UK ID: 107416 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 107496 - Posted: 19 Mar 2022, 19:21:19 UTC Just wait for intel Arc GPUs to come out. They're meant for Bitcoin mining, and a whole plethora of other 64bit applications. Granted, they're not as fast as a 3090; much closer to a 1060/2060, but are much cheaper and only consume 80Watts presumably. Not released yet, no actual specs, but the laptop GPUs come out first, as Intel will gain experience, and release the desktop versions later. More news on March 30 according to: https://www.techradar.com/news/pc-gamers-rejoice-we-finally-have-a-launch-date-for-the-intel-arc-gpu ID: 107496 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 228	Message 107533 - Posted: 22 Mar 2022, 15:02:13 UTC Last modified: 22 Mar 2022, 15:02:48 UTC glad to see you took my advice on the driver version, Rob. your 1.28 task ran much faster. and you'll be in a better place if you decide to move back to this well optimized app :) and you'll get even better production if you want to run 2 or 3 at a time. ID: 107533 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 107534 - Posted: 22 Mar 2022, 15:06:20 UTC - in response to Message 107533. Thanks - as this was a very recently rebuilt computer I (wrongly) assumed that it had the latest drivers. ID: 107534 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 228	Message 107535 - Posted: 22 Mar 2022, 15:08:40 UTC - in response to Message 107498. Uhh.... 2x the speed of built in graphics? So double bugger all. I have a laptop with Intel Xe graphics (the 96EU model). it indeed performs about 2x the previous gen Intel HD graphics. iGPUs from 10th gen Intel CPUs took 2x as long to crunch the same task (Einstien BRP4/BRP4G) at 30W pulled from the wall. and that's pretty impressive. the larger Intel Xe discrete GPUs that are coming "Arc Alchemist" should be very interesting when they start getting full-size models out. don't expect much for the laptop models yet, look out for the discrete models in the future. ID: 107535 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 228	Message 107544 - Posted: 22 Mar 2022, 16:22:31 UTC - in response to Message 107543. Last modified: 22 Mar 2022, 16:24:19 UTC I guess that’s why I have over a dozen high end discrete GPUs? No one is saying that you should use laptop GPUs for crunching as your main system. Just that you can use the smaller chips to get an idea of performance that you might see at full scale. 2x performance uplift from previous gen to the Xe models is nothing to sneeze at. The whole idea is to be excited for intel’s upcoming DISCRETE GPUs based on observed performance from what the laptop chips are showing us. ID: 107544 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 107555 - Posted: 23 Mar 2022, 3:51:09 UTC - in response to Message 107544. I guess that’s why I have over a dozen high end discrete GPUs? No one is saying that you should use laptop GPUs for crunching as your main system. Just that you can use the smaller chips to get an idea of performance that you might see at full scale. 2x performance uplift from previous gen to the Xe models is nothing to sneeze at. The whole idea is to be excited for intel’s upcoming DISCRETE GPUs based on observed performance from what the laptop chips are showing us. I'm just happy that there's a third player in the market. That means more supply, and lower prices. And perhaps more and faster development, regardless if team Blue is behind right now. I believe Intel has what it takes to be a significant player in the market. Perhaps bringing GT 1030 performance in the beginning, but building up to RTX performance soon. Scaling isn't the issue. Lithography nodes aren't the issue. It's Intel's stringent quality checks, reliability, and their ability to improve upon last gen products through their know how of CPUs. GPUs and CPUs are very much alike in the way they are built. And I think Intel will take about 2 years to be where AMD is today (after 7 years of research). ID: 107555 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.