NVIDIA A100 GPU is setting a new standard for the GPU world. NVIDIA has introduced the A100 Tensor Core GPU. The NVIDIA A100 GPU is specially designed for Data Analytics, Artificial Intelligence, and High-Performance Computing. It gives high-speed performance and has new exciting features.
E2E Networks recently conducted a webinar where Dr. Pallab Maji who is a Senior solution architect- Deep Learning. He talked about the NVIDIA A100 GPU, all the new features available inside the miraculous GPU.
NVIDIA has been in the field of accelerated computing for more than 25 years. Initially, they used to accelerate graphics with their GPUs. But soon they realized that graphic processing units GPUs are also excellent candidates to accelerate parallel workloads. And since Artificial Intelligence and many high-performance computing components are extremely parallel. GPUs can be an inherent choice of hardware to solve these kinds of algorithms in a way that we have very low latency and very high throughput.
GPUs are extremely fast and the new generations of GPUs are performing faster. However, to keep up with the speed of the GPS there is a need to pass it on with very low latency data.
On top of the hardware, they have their full-stack software which is developed on CUDA architecture. And it is one architecture that spans from different work, different GPUs across their portfolio. Therefore to achieve all these things together, we have a very high speed. They have a full-stack software built on top of CUDA and they have the data center scale devices.
Coupled with one architecture, they have been formulating these accelerated computing workloads. And they have been serving their customers with the kind of processing power that they need to process today’s workload in the field of AI, ML, and HPC domains.
There has been the development of numerous different types of architectures which are today surpassing the human baseline of accuracy both in terms of computer vision as well as in the field of natural language processing. And there have been more than 3000 times increment in the kind of computational demand that is, during this time period.
Today we are seeing that millions and millions of devices are being equipped with artificial intelligence. There is everyone today who is alive and is in some form being touched by the field of artificial intelligence. So, with respect to many kinds of workloads in different fields and the similar kind of computational need. Artificial intelligence and machine learning require a massive amount of computational resources, no doubt in that, but we need to provide them at a different scale as well.
Today we need an elastic data center. However, the problem is that there are different kinds of GPUs for hardware accelerators that are available for training and they are available separately for Data Analytics workloads and for inference workloads. So, the main problem right now, that they see in a data center is that they need different kinds of hardware to target different workloads of artificial intelligence and machine learning and at their different juncture of development as well.
At NVIDIA they started reimagining these kinds of situations and started working towards it. And they did come up with few breakthroughs, which they believe can actually help the next era of these accelerated data centers.
In this quest, they came up with a new architecture and recently launched an ampere based GPU called A100.
These A100 GPUs are available in two form factors, one is the XM form factor and the other one is the PCM form factor and the kind of performance that they observed over the last generation is quite huge. They got around 20X improvement over FP32 training in deep learning workloads. And on intake inference, they again got around 20X of improvement over their previous generation volta architecture on HPC workloads as well. And since the majority of the high-performance computing workloads are carried on in FP64. They got a huge 2.5X improvement over volta architecture.
Along with that, they have also introduced the new sparsity. So spatial sparsity acceleration model, which actually gives them a huge performance boost, because in the majority of their deep learning models as they go deeper in the layers, the feature vectors become sparse and the sparsity kills the performance heavily if not handled properly. With their newer addition of sparsity acceleration, they can actually gain a huge performance boost in their MPI architecture.
Dr. Pallab Maji also discussed how they can actually take the entire GPU and break it into smaller GPUs as and when required. So that they can have an elastic GPU, which they can mold and shape based on the requirement. Multi-Instance GPU (MIG) is the component that gives the elasticity of these GPUs.
With this technology, they can have multiple different people working on the same GPU, and each of them is guaranteed to have an equal amount of countries.
With this any engineer or scientist, or researcher gets the ability that they can train their systems for multiple different hyperparameters, they can spend a lot of time in, in actually curating actually formulating, in actually finding out that how to make a use case or how to make a model, which is better in terms of accuracy in terms of throughput and so on.
So, as a data scientist, as an engineer, as a researcher, can actually devote more time in hyperparameter tuning in doing experimentation with different types of architecture, in doing experimentation, in figuring it out that how they can add different components to their loss function, how to add different diversity to the data sets, how to add different types of augmentation concepts to data sets, it gives engineers that flexibility and time, where they can actually try to address different aspects of developing their model rather than waiting for the model to get trained, and then get the results and then try to bring in insights.