Nvidia/AMD Cuda/OpenCL on Boinc projects

Author	Message
robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101695 - Posted: 21 Nov 2020, 16:33:45 UTC - in response to Message 101685. First inaccuracy - SETI@Home HAS NOT closed down, it has gone into hibernation, probably for a year or so, but it is NOT CLOSED. From what I heard they were re-looking at the results so far using in house processing. I guess further down the line they might need us again, but I've not heard anything definite. Well, that's not quite right - what is happening is the second phase of the project is being developed. This involves correlating all the potential signals (billions) that have been sifted out by the volunteers over the near twenty years of sifting all the data from (initially) Arecibo telescope, later being supplemented by "targeted" data from the Green Bank Telescope. For a blow-by-blow description of what is happening read Dr.Andersons thread (and all the stuff he links to) here https://setiathome.berkeley.edu/forum_forum.php?id=1511 I thought it was because a 3rd party rewrote the SETI code to make it way more efficient and he happened to be a CUDA programmer. The same could have been done in OpenCL. I can't believe AMD cards are missing a key bit of code! That's a somewhat simplistic view: From the very outset of the use of GPUs the nVidia GPUs were faster than the ATI/AMD price equivalents, but not very much - when I bought my first GTS250 it was about 5% faster than whatever the top of the line ATI GPU. Some years later, and a lot of tweaking by people on both the ATI/AMD and nVidia applications the former had a brief period when they were a few percent ahead. Then one person did some very serious number theory work and, because he only had nVdia cards wrote the code for them - it flew; and the same tricks worked a little on some CPUs, but failed miserably on AMD GPUs. Next the same person tried a few operating system related tricks to do with the way data is synchronised between GPU and CPU, this only worked when using Linux with nVidia GPUs. At about the same time someone else developed an OpenCL applicaiton that worked on AMD GPUs, not so well on nVidia GPUs under Windows, a bit more work this worked better on nVidia than it did on AMD GPUs, but the nVidia application in particular needed a very large helping hand from the CPU. About this time the then new AMD GPUs suffered a bit of a setback in terms of performance, and more importantly in accuracy (a new driver set sorted that) and so nVidia running OpenCL was ahead of AMD when running Windows, and under Linux nVidia was a long way ahead, with nobody being able to get the same code running properly on AMD GPUs. A far from simple story. Well I was told (by someone in one of these forums) they slowed OpenCL down to make Cuda look better. No, nVidia at the outset had to emulate OpenCL rather than run it in native form, but after a few years did manage to get to a comparable performance. But even then that performance was lower than CUDA on the same hardware, probably due to having to run in an emulation mode instead of a native mode. ID: 101695 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 867	Message 101697 - Posted: 21 Nov 2020, 16:53:55 UTC Well the OpenCL developer had two years of development time on a OpenCL app with help from the Nvidia app developer trying to get the Nvidia CUDA source code to work on the OpenCL platform. He was never successful. Also the code only ran on Linux because of some major issues with what Windows allowed. So no Intel or AMD gpus ever had the benefit of the CUDA app speedup. But the OpenCL application was the stock Seti application and ran very well on any platform and card vendor. It just was much slower than the CUDA application on the Linux platform. ID: 101697 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101710 - Posted: 22 Nov 2020, 17:10:49 UTC - in response to Message 101706. What is it that Windows disallows? I thought Linux was more stringent with code, it is the more secure system afterall. Windows does not allow the use of a particular data transfer mode that is available to Linux. This mode is do do with movement of data between pairs of processors within a single computer and not between computers. I'm sure Keith will be along to give you the war and peace version, but let's just say it is very efficient at moving data between CPUs and nVidia processors, and less so between CPUs and AMD processors and Windows doesn't have the appropriate segments in its kernel to let it work (and unlike Linux MicroSoft is very precious of its Windows kernel code and actively peruses those who would do anything to it). ID: 101710 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101713 - Posted: 22 Nov 2020, 17:58:54 UTC - in response to Message 101711. They probably did think about it, but for reasons unknown have never bothered with it :-( Protecting the kernel in this context is not like a programmer finding a way round ab obstacle (something a very large number of Windows application programmers have done over the years), but actually needing to change the kernel code itself to enable features that are (as far as I'm aware) hard-blocked from use. (I do know that, as Keith alluded to, there were a few Windows application programmers who attempted to get around this issue, but Windows always got in the way and stopped it from being accessed.) ID: 101713 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101714 - Posted: 22 Nov 2020, 18:00:28 UTC - in response to Message 101706. I also found one of the two Universe@home (CPU) apps ran faster on Linux. Not sure why, someone said it was memory access handling. Yes - that is how the data is transferred between the CPU and GPU, one writes it to memory and the other reads it (albeit over the PCIe bus). ID: 101714 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 867	Message 101716 - Posted: 22 Nov 2020, 21:59:58 UTC Is it not possible to design Cuda and OpenCL directly into the same chip? You don't design CUDA or OpenCL in the chip architecture. You write an API language that accesses the hardware in a fashion you want to use. You can run either CUDA or OpenCL on Nvidia hardware. You are not locked into CUDA only. But the AMD OpenCL API does not do CUDA. You can do CUDA emulation in OpenCL though but obviously not as fast as running CUDA instruction natively. I also found one of the two Universe@home (CPU) apps ran faster on Linux. Not sure why, someone said it was memory access handling. No, it a culmination of a lot of little things. From the memory access mechanism, the kernel scheduler, to the high level libraries each OS uses. The latest big bump in performance for Universe@home task in the Linux OS' was simply moving from the 2.27 glibc library to the 2.31 glibc library. This happened in the move from Ubuntu 18 and it's ilk to Ubuntu 20. Even though both OS were running the exact same kernel versions if you were on the Ubuntu 18.04 HWE branch, it was the same kernel as the default in Ubuntu 20.04. So the kernel being the same and the only thing changed between the two version was the glibc library update. So there were improvements and updates to the GNU C library that were very beneficial to the way the Universe@home cpu application works. Dropped the crunching times dramatically as soon as you updated your OS from 18 to 20 on every host you ran. ID: 101716 ·

Dave Help desk expert Send message Joined: 28 Jun 10 Posts: 2533	Message 101723 - Posted: 23 Nov 2020, 16:50:29 UTC Whatever changed in there needs to be suggested to the writers of the Windows equivalent. Suggesting is fine but it still depends on having someone with exactly the right skill set and the inclination to do so. ID: 101723 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101726 - Posted: 23 Nov 2020, 17:28:41 UTC The API needs at least two parts to be in place. At its simplest it is a list of calls and parameters that applications use to interface with the rest of the system. It lays down a set of "rules" for all sorts of things, including inter-processor data transfer and synchronisation. Part of the OS monitors these interfaces to make sure that the rules are being obeyed. Now if the rules dose not allow a particular mode of data transfer, but the application makes a call through the API (the only way it can do the transfer), the call will be either blocked or re-cast to an allowed mode. Which happens is a function of the OS and there is little or nothing that the application can do about the outcome. One might argue that a "well behaved" application on receiving a "don't do that again" message from the operating system it would now revert to an allowed transfer mode. In reality there is a lot more to the communication process, the detail of which is highly dependant upon the operating system, let's just say the Linux generally has a reduced number of API calls to do standard operations, but each call caries more information. ID: 101726 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 867	Message 101727 - Posted: 23 Nov 2020, 17:30:46 UTC I'm lost now. I'm wondering why Nvidia cards can't do OpenCL well. If it's nothing to do with the hardware, I see no reason they can't make it run as well as AMDs. I'm pretty sure that the Nvidia developers feel no urgent need to optimize their OpenCL software on their hardware when their CUDA API is so much faster and works better. Why make would they make their competitors cards a viable crunching option to a purchaser when they have an alluring high performance product of their own that pads the bottom line of the profit sheet? I don't know anything about Windows or what libraries it uses. Since the beneficial change occurred in the GNU C library between release versions and I assume that Windows does not use the GNU C library but their own, the problem with the long running Universe tasks can be laid at the feet of the unoptimized libraries that Windows uses. ID: 101727 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101728 - Posted: 23 Nov 2020, 17:31:20 UTC - in response to Message 101725. Yes and no - there is the core set of C++ libraries, but there are a number of additional libraries used (and the same applies to Linux and Windows) - the operating specific OpenCL being just one (or more!!!) of them. ID: 101728 ·

robsmith Volunteer tester Help desk expert Send message Joined: 25 May 09 Posts: 1283	Message 101729 - Posted: 23 Nov 2020, 17:44:18 UTC Why make would they make their competitors cards a viable crunching option to a purchaser when they have an alluring high performance product of their own that pads the bottom line of the profit sheet? That is almost certainly a major consideration. The scalability of the nVidia/CUDA offering is really quite staggering, coupled with nVidia's very high performance offerings in the Quadro family which are most certainly in the "Loadza dosh, loadza go" camp. ID: 101729 ·

Keith Myers Volunteer tester Help desk expert Send message Joined: 17 Nov 16 Posts: 867	Message 101732 - Posted: 23 Nov 2020, 20:55:21 UTC While Nvidia in the past has been reticent to move off the old OpenCL 1.2 standard, I saw recent encouraging news that they are contributing to the new OpenCL 3.0 standard and making efforts to have their hardware run more efficiently on the new standard. Will have to wait and see if that is just propaganda to appease the masses. Or whether the OpenCL performance improves to somewhat parity of the AMD OpenCL performance. ID: 101732 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Nvidia/AMD Cuda/OpenCL on Boinc projects - which card to buy?