AI WUs possible in the future?

Author	Message
ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 107046 - Posted: 11 Feb 2022, 0:45:04 UTC I think our biggest problem today, is how processing intensive WUs are. Most of them ran by 32 bit FP or 64bit DP shaders. While they're precise, there are other ways of processing data. One of the newer ways is a totally redesign in how WUs are processed through AI. Rather than calculate exact points in a matrix, AI uses smaller shaders (often 4 or 8 bit), that offer results through calculating a piece of the problem, and forwarding the rest to other shaders. The method of this processing is much less power hungry, as each pipeline basically runs only 4bit (~170 transistors), vs 64 bit (2720 transistors) per command. Much like ARM vs x86, the 4-bit pipelines will need to process more commands more frequently; meaning they'll be active most of the time. And keeping each transistor busy with work, is more efficient than using only a part of eg: a long 64 bit command. AI thrives on low latency, and many (millions to billions of 4 to 8 bit) transistors, that in themselves are either active or sleeping, vs 32/64 bit pipelines that only have a small portion of the commands perform actual work, and are more passive (they're turned off most of the time, waiting for data). From a hardware perspective, AI calculations are the way of the future. Are we going to see WUs transition to AI? ID: 107046 ·

Les Bayliss Help desk expert Send message Joined: 25 Nov 05 Posts: 1654	Message 107047 - Posted: 11 Feb 2022, 1:39:05 UTC - in response to Message 107046. Unlikely for climate models. ID: 107047 ·

Richard Haselgrove Volunteer tester Help desk expert Send message Joined: 5 Oct 06 Posts: 5080	Message 107051 - Posted: 11 Feb 2022, 9:11:57 UTC - in response to Message 107046. GPUGrid has been experimenting with such an application (it's the application which will have to change, not the data in the WUs) for some months now. The researcher concerned described their plans some time ago in GPUGrid message 57766. Most of the development stages described in that post have now been solved, and the most recent test tasks have run to completion without error. We await the next step. ID: 107051 ·

Ian&Steve C. Send message Joined: 24 Dec 19 Posts: 228	Message 107053 - Posted: 11 Feb 2022, 14:03:18 UTC - in response to Message 107051. GPUGrid has been experimenting with such an application (it's the application which will have to change, not the data in the WUs) for some months now. The researcher concerned described their plans some time ago in GPUGrid message 57766. Most of the development stages described in that post have now been solved, and the most recent test tasks have run to completion without error. We await the next step. to be fair, what the OP is talking about it isn't really the same as what GPUGRID is doing. the OP is talking about using actual AI on different hardware, with the effect of power savings. GPUGRID is using more or less normal computation techniques to train their AI Agent. this trained agent will then be able to use a technique like what the OP is referencing, but that's not what GPUGRID is distributing to us. In my opinion, while I think the machine learning technique that GPUGRID is doing is cool and interesting, it's quite wasteful from the resources perspective (using up to 32 threads of the CPU, with very little GPU utilization). they could train their agent MUCH faster and efficiently if they reworked their code to leverage the tensor cores present in many Nvidia graphics cards (Titan V and RTX line). this is hardware that was designed specifically for this kind of processing. ID: 107053 ·

ProDigit Send message Joined: 8 Nov 19 Posts: 718	Message 107055 - Posted: 11 Feb 2022, 17:39:47 UTC Last modified: 11 Feb 2022, 17:40:53 UTC There's more than tensor cores. Many ARM devices (like single board computers) as well as a lot of GPUs have either 8 bit (or some even 4 bit) shaders that could do the work. Think Nvidia Jetson/Xavier boards, and even Intel is experimenting with additional low bit shaders clusters as add-ons to their cpus (kind of like the opposite of AVX256 and AVX512), to do very specific calculations. If only Nvidia could cut their 32 bit shaders in 4 or 8 separate sections through software, current Nvidia GPUs could perform close to hundreds of Tflops possibly in the single digit Exaflops of data processing with upcoming GPUs. ID: 107055 ·

Copyright © 2024 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.