NVIDIA Blackwell Configurations & Specifications
Available in four models, Blackwell GPUs cater to a range of computing needs:
- NVIDIA DGXâ„¢GB200 NVL72: A Blackwell platform that connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution.
- NVIDIA DGXâ„¢ B200: A Blackwell platform that combines 8x NVIDIA Blackwell GPUs (HGX Baseboard) and a next-gen data center x86 processor to deliver 72 petaFLOPS training and 144 petaFLOPS inference GPU compute. DGX B200 will be available to purchase from Exxact once available.
- NVIDIA HGXâ„¢ B200: A Blackwell platform with the same groundbreaking architecture. It is the eight-Blackwell GPU baseboard that will be offered in custom configurations by solutions partners and us at Exxact.
- NVIDIA HGXâ„¢ B100: Also, an x86 platform based on a cut down eight-Blackwell GPU baseboard, delivering 112 AI petaFLOPs. HGX B100 is a drop-in compatible upgrade for NVIDIA Hopperâ„¢ systems in existing data center infrastructure.
Configuration Specifications
GB200 NVL72 | HGX B200 | HGX B100 | |
---|---|---|---|
Blackwell GPUs | 72 | 8 | 8 |
FP4 Tensor Core | 1,440 petaFLOPS | 144 petaFLOPS | 112 petaFLOPS |
FP8/FP6 Tensor Core | 720 petaFLOPS | 72 petaFLOPS | 56 petaFLOPS |
TF16 Tensor Core | 720 petaOPS | 72 petaOPS | 56 petaOPS |
FP16/BF16 Tensor Core | 360 petaFLOPS | 36 petaFLOPS | 28 petaFLOPS |
TF32 Tensor Core | 180 petaFLOPS | 18 petaFLOPS | 14 petaFLOPS |
FP32 | 6,480 petaFLOPS | 640 petaFLOPS | 480 petaFLOPS |
FP64 | 3240 petaFLOPS | 320 petaFLOPS | 240 petaFLOPS |
Total GPU Memory | 13.5TB | Up to 1.5TB | Up to 1.5TB |
Aggregate Memory Bandwidth | 576TB/s | Up to 64TB/s | Up to 64TB/s |
Aggregate NVLink Bandwidth | 130TB/s | 14.4TB/s | 14.4TB/s |
CPU Cores | 2592 ARM Neoverse V2 Cores | -- | -- |
GPU Option Specifications
GB200 | B200 | B100 | |
---|---|---|---|
Type | Grace Blackwell Superchip | GPU Accelerator | GPU Accelerator |
Memory Clock | 8Gbps HBM3e | 8Gbps HBM3e | 8Gbps HBM3e |
Memory Bandwidth | 16TB/sec | 8TB/sec | 8TB/sec |
VRAM | 384GB (2x2x96GB) |
192GB (2x96GB) |
192GB (2x96GB) |
FP4 Dense Tensor | 20 PFLOPS | 9 PFLOPS | 7 PFLOPS |
INT8/FP8 Dense Tensor | 10 P(FL)OPS | 4.5 P(FL)OPS | 3.5 P(FL)OPS |
FP16 Dense Tensor | 5 PFLOPS | 2.2 PFLOPS | 1.8 PFLOPS |
TF32 Dense Tensor | 2.5 PFLOPS | 1.1 PFLOPS | 0.9 PFLOPS |
FP64 Dense Tensor | 90 TFLOPS | 40 TFLOPS | 30 TFLOPS |
NVLink Bandwidth | NVLink 5 (2x 1800GB/sec) |
NVLink 5 (1800GB/sec) |
NVLink 5 (1800GB/sec) |
PCIe 6.0 Bandwidth | 2x 256GB/sec | 256GB/sec | 256GB/sec |
GPU | 2x Blackwell GPU | Blackwell GPU | Blackwell GPU |
GPU Transistor Count | 416B (2x2x104B) | 208B (2x104B) | 208B (2x104B) |
TDP | 2700W | 1000W | 700W |
Manufacturing Process | TSMC 4NP | TSMC 4NP | TSMC 4NP |
Interface | Superchip | SXM-Next? | SXM-Next? |
Architecture | Grace + Blackwell | Blackwell | Blackwell |
Blackwell DGX GB200 NVL72
The NVIDIA GB200 NVL72 cluster connects 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs) in a rack-scale design. The GB200 NVL72 is a liquid cooled, rack-scale 72-GPU NVLink connected powerhouse, that can act as one massive GPU with up to 13.5TB of GPU memory or 30TB of total fast memory.
It introduces performance capabilities for running Trillion Parameter AI, complex data analytics, and high-performance computing tasks. The use of expansive rack integrated NVLink Spine and full rack liquid cooling enables fast 1.8TB/s GPU to GPU interconnect speeds as well as efficient cooling system to address the power and cooling challenges, as well as rack density posed by the 72 GPU deployment.
The NVLink Spine used in DGX GB200 NVL72 translates 72 individual GPUs into a single compute node with scalability through advanced networking options. Pairing DGX GB200 NVL with NVIDIA Quantum InfiniBand, Spectrum-X800, and BlueField-3, huge AI factories spanning thousands of Grace Blackwell Compute nodes represent a new standard for performance, efficiency, and density for data centers running the largest AI models.
NVIDIA DGX/HGX B200 - The DGX H100 Successor
Alongside the AI factories built with alongside NVIDIA’s Grace CPU, NVIDIA is still supporting their x86 deployment with their classic gold box DGX. The NVIDIA DGX B200 is designed as the true successor to the DGX H100 for offering the most powerful GPU compute nodes for AI development and deployment. DGX B200 offers 3x the AI training performance and 15x the real-time inferencing throughput on GPT MoE 1.8T (a trillion-parameter model).
DGX B200 houses an HGX system board featured 8 Blackwell GPUs (16 Blackwell dies) connected through NVIDIA 5th gen NVLink. Multiple NVIDIA DGX B200 can further be interconnected via the NVLink Switch System, similar to the deployments with Hopper.
Apart from the DGX, the NVIDIA HGX B200 system board will be the building blocks for HGX servers from with dedicated features. Expect HGX B200 equipped computer servers from the likes of Supermicro, Asus, and Exxact.
NVIDIA HGX B100 - Easy Upgrade from Hopper HGX
A slightly cut down version of the HGX B200 features an easy deployment for existing NVIDIA Hopper HGX users. This includes DGX H100, H200 as well as HGX H100 and H200 users.
The NVIDIA HGX B100 offers the same new features such as 2nd gen Transformer Engine, next gen Tensor Cores, next gen NVLink, and more, but can be easy swapped into existing system. The system board is plug and play drop-in replacement; slide out the Hopper HGX and replaced it with a Blackwell HGX for fast deployment and increased compute.
NVIDIA Blackwell’s Role in the Age of AI
With Generative AI models getting larger and larger to accomplish tasks at a higher fidelity, there needs to be a way to power these trillion parameter models. These highly complex models are the future of accelerated computing including finding cures to cancer, predicting weather events, automating a fully robotic fleet.
The journey to achieve large scale artificial intelligence starts at the computational resources delivered from the best of the best. New and extremely large LLMs and generative AI models not only require large amounts of compute to train but to also run for inference tasks.
NVIDIA Blackwell is a generational leap with the power and energy efficiency needed to train and inference these generative AI models with the goal of democratizing the usage and deployment of foundational models. By employing NVIDIA Blackwell and other future NVIDIA GPU Architectures we can believe that AI models will become more complex than ever before.
NVIDIA Blackwell Deployments - GB200 NVL72, DGX/HGX B200, HGX B100
NVIDIA Blackwell Configurations & Specifications
Available in four models, Blackwell GPUs cater to a range of computing needs:
- NVIDIA DGXâ„¢GB200 NVL72: A Blackwell platform that connects 36 Grace CPUs and 72 Blackwell GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution.
- NVIDIA DGXâ„¢ B200: A Blackwell platform that combines 8x NVIDIA Blackwell GPUs (HGX Baseboard) and a next-gen data center x86 processor to deliver 72 petaFLOPS training and 144 petaFLOPS inference GPU compute. DGX B200 will be available to purchase from Exxact once available.
- NVIDIA HGXâ„¢ B200: A Blackwell platform with the same groundbreaking architecture. It is the eight-Blackwell GPU baseboard that will be offered in custom configurations by solutions partners and us at Exxact.
- NVIDIA HGXâ„¢ B100: Also, an x86 platform based on a cut down eight-Blackwell GPU baseboard, delivering 112 AI petaFLOPs. HGX B100 is a drop-in compatible upgrade for NVIDIA Hopperâ„¢ systems in existing data center infrastructure.
Configuration Specifications
GB200 NVL72 | HGX B200 | HGX B100 | |
---|---|---|---|
Blackwell GPUs | 72 | 8 | 8 |
FP4 Tensor Core | 1,440 petaFLOPS | 144 petaFLOPS | 112 petaFLOPS |
FP8/FP6 Tensor Core | 720 petaFLOPS | 72 petaFLOPS | 56 petaFLOPS |
TF16 Tensor Core | 720 petaOPS | 72 petaOPS | 56 petaOPS |
FP16/BF16 Tensor Core | 360 petaFLOPS | 36 petaFLOPS | 28 petaFLOPS |
TF32 Tensor Core | 180 petaFLOPS | 18 petaFLOPS | 14 petaFLOPS |
FP32 | 6,480 petaFLOPS | 640 petaFLOPS | 480 petaFLOPS |
FP64 | 3240 petaFLOPS | 320 petaFLOPS | 240 petaFLOPS |
Total GPU Memory | 13.5TB | Up to 1.5TB | Up to 1.5TB |
Aggregate Memory Bandwidth | 576TB/s | Up to 64TB/s | Up to 64TB/s |
Aggregate NVLink Bandwidth | 130TB/s | 14.4TB/s | 14.4TB/s |
CPU Cores | 2592 ARM Neoverse V2 Cores | -- | -- |
GPU Option Specifications
GB200 | B200 | B100 | |
---|---|---|---|
Type | Grace Blackwell Superchip | GPU Accelerator | GPU Accelerator |
Memory Clock | 8Gbps HBM3e | 8Gbps HBM3e | 8Gbps HBM3e |
Memory Bandwidth | 16TB/sec | 8TB/sec | 8TB/sec |
VRAM | 384GB (2x2x96GB) |
192GB (2x96GB) |
192GB (2x96GB) |
FP4 Dense Tensor | 20 PFLOPS | 9 PFLOPS | 7 PFLOPS |
INT8/FP8 Dense Tensor | 10 P(FL)OPS | 4.5 P(FL)OPS | 3.5 P(FL)OPS |
FP16 Dense Tensor | 5 PFLOPS | 2.2 PFLOPS | 1.8 PFLOPS |
TF32 Dense Tensor | 2.5 PFLOPS | 1.1 PFLOPS | 0.9 PFLOPS |
FP64 Dense Tensor | 90 TFLOPS | 40 TFLOPS | 30 TFLOPS |
NVLink Bandwidth | NVLink 5 (2x 1800GB/sec) |
NVLink 5 (1800GB/sec) |
NVLink 5 (1800GB/sec) |
PCIe 6.0 Bandwidth | 2x 256GB/sec | 256GB/sec | 256GB/sec |
GPU | 2x Blackwell GPU | Blackwell GPU | Blackwell GPU |
GPU Transistor Count | 416B (2x2x104B) | 208B (2x104B) | 208B (2x104B) |
TDP | 2700W | 1000W | 700W |
Manufacturing Process | TSMC 4NP | TSMC 4NP | TSMC 4NP |
Interface | Superchip | SXM-Next? | SXM-Next? |
Architecture | Grace + Blackwell | Blackwell | Blackwell |
Blackwell DGX GB200 NVL72
The NVIDIA GB200 NVL72 cluster connects 36 GB200 Superchips (36 Grace CPUs and 72 Blackwell GPUs) in a rack-scale design. The GB200 NVL72 is a liquid cooled, rack-scale 72-GPU NVLink connected powerhouse, that can act as one massive GPU with up to 13.5TB of GPU memory or 30TB of total fast memory.
It introduces performance capabilities for running Trillion Parameter AI, complex data analytics, and high-performance computing tasks. The use of expansive rack integrated NVLink Spine and full rack liquid cooling enables fast 1.8TB/s GPU to GPU interconnect speeds as well as efficient cooling system to address the power and cooling challenges, as well as rack density posed by the 72 GPU deployment.
The NVLink Spine used in DGX GB200 NVL72 translates 72 individual GPUs into a single compute node with scalability through advanced networking options. Pairing DGX GB200 NVL with NVIDIA Quantum InfiniBand, Spectrum-X800, and BlueField-3, huge AI factories spanning thousands of Grace Blackwell Compute nodes represent a new standard for performance, efficiency, and density for data centers running the largest AI models.
NVIDIA DGX/HGX B200 - The DGX H100 Successor
Alongside the AI factories built with alongside NVIDIA’s Grace CPU, NVIDIA is still supporting their x86 deployment with their classic gold box DGX. The NVIDIA DGX B200 is designed as the true successor to the DGX H100 for offering the most powerful GPU compute nodes for AI development and deployment. DGX B200 offers 3x the AI training performance and 15x the real-time inferencing throughput on GPT MoE 1.8T (a trillion-parameter model).
DGX B200 houses an HGX system board featured 8 Blackwell GPUs (16 Blackwell dies) connected through NVIDIA 5th gen NVLink. Multiple NVIDIA DGX B200 can further be interconnected via the NVLink Switch System, similar to the deployments with Hopper.
Apart from the DGX, the NVIDIA HGX B200 system board will be the building blocks for HGX servers from with dedicated features. Expect HGX B200 equipped computer servers from the likes of Supermicro, Asus, and Exxact.
NVIDIA HGX B100 - Easy Upgrade from Hopper HGX
A slightly cut down version of the HGX B200 features an easy deployment for existing NVIDIA Hopper HGX users. This includes DGX H100, H200 as well as HGX H100 and H200 users.
The NVIDIA HGX B100 offers the same new features such as 2nd gen Transformer Engine, next gen Tensor Cores, next gen NVLink, and more, but can be easy swapped into existing system. The system board is plug and play drop-in replacement; slide out the Hopper HGX and replaced it with a Blackwell HGX for fast deployment and increased compute.
NVIDIA Blackwell’s Role in the Age of AI
With Generative AI models getting larger and larger to accomplish tasks at a higher fidelity, there needs to be a way to power these trillion parameter models. These highly complex models are the future of accelerated computing including finding cures to cancer, predicting weather events, automating a fully robotic fleet.
The journey to achieve large scale artificial intelligence starts at the computational resources delivered from the best of the best. New and extremely large LLMs and generative AI models not only require large amounts of compute to train but to also run for inference tasks.
NVIDIA Blackwell is a generational leap with the power and energy efficiency needed to train and inference these generative AI models with the goal of democratizing the usage and deployment of foundational models. By employing NVIDIA Blackwell and other future NVIDIA GPU Architectures we can believe that AI models will become more complex than ever before.