212 Views

Blackwell GPU Boosts Performance with Hopper Architecture

LinkedIn Facebook X
March 19, 2024

Get a Price Quote

Accelerated computing has reached a tipping point, according to Nvidia's CEO, who recently outlined the company's strategy at the Nvidia GTC event in San Jose. Rather than focusing on cost reduction, the plan is to enhance performance capabilities. Jensen Huang, the CEO, introduced the Blackwell AI superchip during his keynote address.

The Blackwell AI superchip is designed to democratize trillion-parameter AI, aiming to match the computing performance of accelerated computing, which stands at 1.8 trillion parameters. At the event held in the home of the San Jose Sharks ice hockey team, Huang unveiled the Blackwell platform, highlighting its key features such as the second-generation transformer engine and the fifth-generation NVLINK high-speed GPU interconnect.

Named after the renowned mathematician and gaming theorist David Blackwell, the Blackwell AI superchip is a groundbreaking development in the field of generative AI. It boasts an impressive performance of 20 PFLOPS (FP4) or 10PFLOPS FP8 on a single GPU, thanks to its innovative design criteria that includes two reticle-sized die operating as a unified CUDA GPU.

This unique architecture of the Blackwell AI superchip represents a new era in supercomputing, akin to connecting the two hemispheres of the brain seamlessly. With 192GB HBM3e and 8TBps HBM bandwidth, along with 1.18TBps NVLINK, the superchip offers four times the training speed, 30 times the inference capability, and 25 times the energy efficiency compared to its predecessors, as confirmed by Huang.

Featuring 192GB HBM3e fast memory, the Blackwell AI superchip is poised to revolutionize AI datacenter scalability, potentially enabling configurations with over 100k GPUs. Another notable innovation highlighted by Huang is the support for the second-generation transformer engine, which was also unveiled at the GTC event in San Jose. This engine can optimize performance by tracking and adjusting every level of every tensor layer with intelligent 4-bit precision.

Recent Stories