Nvidia, whose heritage lies in making chips for avid gamers, has introduced its first new GPU structure in three years, and it’s clearly designed to effectively assist the assorted computing wants of synthetic intelligence and machine studying.
The structure, referred to as Ampere, and its first iteration, the A100 processor, supplant the efficiency of Nvidia’s present Volta structure, whose V100 chip was in 94 of the highest 500 supercomputers final November. The A100 has an unbelievable 54 billion transistors, 2.5 instances as many because the V100.
Tensor efficiency, so very important in AI and machine studying, has been considerably improved. FP16 floating level calculations are virtually 2.5x as quick as V100 and Nvidia launched a brand new math mode referred to as TF32. Nvidia claims TF32 can present as much as 10-fold speedups in comparison with single-precision floating-point math on Volta GPUs.
This is important as a result of FP16 is beneficial for coaching, the compute-intensive a part of machine studying, however overkill for inference, the place the educated fashions are used to deduce an end result or outcome. So Nvidia added INT8 and INT4 to the A100 chip to deal with the easier inference half, and draw much less energy within the course of. This means greatest efficiency circumstances for each coaching and inference from a single chip.
Memory efficiency can be considerably improved because of 40GB of HBM2 reminiscence on the die delivering a complete of 1.6TB/second of bandwidth. And from the seems of the A100 die, Nvidia did what Fujitsu has executed with its A64FX processor and put the HBM2 proper subsequent to the processor.
The A100 additionally sports activities a brand new characteristic referred to as Multi-Instance GPU (MIG), the place a single A100 may be partitioned into as much as seven digital GPUs, every of which will get its personal devoted allocation of cores, L2 cache, and reminiscence controllers. Think of it as virtualization for a GPU.
Finally, Ampere comes with a brand new model of Nvidia’s high-speed interconnect, NVLink. The third era of NVLink practically doubles the signaling charge for NVLink from 25.78Gbps on NVLink 2 to 50Gbps on NVLink 3. Nvidia has additionally minimize the variety of lanes wanted by half to realize the identical pace. This in flip permits it to double the quantity of throughput by the identical variety of lanes.
Nvidia CEO Jensen Huang made the Ampere announcement by way of video from his kitchen throughout the digital GPU Technology Conference (GTC).
New playing cards and Servers are prepared
Nvidia is losing no time bringing the A100 to market. It says the A100 is in manufacturing and introduced the DGX A100 system. The field comes with eight A100 accelerators, in addition to 15 TB of storage, a pair of AMD Epyc 7742 CPUs with 64 cores every (you didn’t assume they had been going to make use of Intel processors, did you?), 1TB of RAM, and HDR InfiniBand Mellanox controllers.
The DGX A100 will set you again $199,000 however it additionally packs 5 petaflops in a field the scale of a small fridge, all devoted to AI and machine studying.
Also, Nvidia’s $7 billion merger with Mellanox is already bearing fruit within the type of the EGX A100 card, a mix of an A100 Ampere-based GPU package deal together with a Mellanox ConnectX-6 Dx NIC on one card.
That gives the A100 with 200Gbps of networking with out requiring any CPU processing and can permit A100 GPUs to speak straight somewhat than undergo the CPU. All of this implies larger pace since GPU-to-CPU communication provides steps and thus latency. The card also can hook up with both Infiniband or Ethernet materials. GPU-to-GPU communication over Infiniband means HPC is about to see a serious leap in efficiency.
Copyright © 2020 , Inc.