HPC

Comparing NVIDIA Tensor Core GPUs - NVIDIA B200, B100, H200, H100, A100

July 25, 2024
6 min read
Exx-blog-Comparing-NVIDIA-Tensor-Core-GPUs - NVIDIA-B200-B100-H200-H100-Tensor-Core-GPU.jpg

What are NVIDIA Tensor Core GPUs?

Formally under the Tesla family moniker, NVIDIA Tensor Core GPUs are the gold standard GPU for AI computations due to its unique architecture designed specifically to execute the calculations found in AI and neural networks.

Tensors are AI's most fundamental data type and are a multidimensional array of defined weights. To calculate these arrays, matrix multiplication is ran extensively to update weights and enable neural networks to learn.

Tensor Cores were first introduced in the Tesla V100. Still, to improve the marketing behind their flagship GPUs, in the next GPU generation Ampere, NVIDIA ditched the Tesla name in favor of the Tensor Core GPU naming convention. The NVIDIA A100 Tensor Core GPU was a revolutionary high-performance GPU dedicated to accelerating AI computations.

It’s 2024, and NVIDIA’s lineup of Tensor Core GPUs has expanded. We want to offer a full-scale comparison of the specifications of the GPUs and comprehensive recommendations for deploying the world’s most powerful GPU for AI. For reference, we will be focusing on the SXM variants of each GPU which can be found in the NVIDIA DGX or the NVIDIA HGX platforms.

  • NVIDIA B200 Tensor Core GPU (Blackwell 2025)
  • NVIDIA B100 Tensor Core GPU (Blackwell 2025)
  • NVIDIA H200 Tensor Core GPU (Hopper 2024)
  • NVIDIA H100 Tensor Core GPU (Hopper 2022)
  • NVIDIA A100 Tensor Core GPU (Ampere 2020)

Read this blog to learn more about the various NVIDIA Blackwell Deployments.

B200 vs B100 vs H200 vs H100 vs A100 SXM

Before we get into the details, it is worth noting that A100 is EOL, and the H100 has been supplanted by the H200. Existing stock of H100 is still available but H200 will supersede any new orders. The NVIDIA B200 and B100 will be available to order soon and arrive speculated in 2025. Even though some GPUs are EOL, for those who have existing deployments, we include the legacy NVIDIA A100 and NVIDIA H100.

ArchitectureBlackwellBlackwellHopperHopperAmpere
GPU NameNVIDIA B200NVIDIA B100NVIDIA H200NVIDIA H100NVIDIA A100
FP6440 teraFLOPS30 teraFLOPS34 teraFLOPS34 teraFLOPS9.7 teraFLOPS
FP64 Tensor Core40 teraFLOPS30 teraFLOPS67 teraFLOPS67 teraFLOPS19.5 teraFLOPS
FP3280 teraFLOPS60 teraFLOPS67 teraFLOPS67 teraFLOPS19.5 teraFLOPS
FP32 Tensor Core2.2 petaFLOPS1.8 petaFLOPS989 teraFLOPS989 teraFLOPS312 teraFLOPS
FP16/BF16 Tensor Core4.5 petaFLOPS3.5 petaFLOPS1979 teraFLOPS1979 teraFLOPS624 teraFLOPS
INT8 Tensor Core9 petaOPs7 petaOPs3958 teraOPs3958 teraOPs1248 teraOPs
FP8 Tensor Core9 petaFLOPS7 petaFLOPS3958 teraFLOPS3958 teraFLOPS-
FP4 Tensor Core18 petaFLOPS14 petaFLOPS---
GPU Memory192GB HBM3e192GB HBM3e141GB HBM3e80GB HBM380GB HBM2e
Memory BandwidthUp to 8TB/sUp to 8TB/s4.8TB/s3.2TB/s2TB/s
Decoders7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG5 NVDEC    
5 JPEG     
Multi-Instance GPUsUp to 7 MIGs @23GBUp to 7 MIGs @23GBUp to 7 MIGs @16.5GBUp to 7 MIGs @16.5GBUp to 7 MIGs @ 10GB
InterconnectNVLink 1.8TB/sNVLink 1.8TB/sNVLink 900GB/sNVLink 900GB/sNVLink 600GB/s
NVIDIA AI EnterpriseYesYesYesYesEOL

Looking at the raw performance defined by the number of floating-point operations performed per second at a given precision, the NVIDIA Blackwell GPUs sacrifice FP64 Tensor Core performance in favor of heavily increased Tensor Core performance in FP32 and below. Training AI doesn’t require the utmost 64-bit precision in its weights and parameter calculations. By sacrificing FP64 Tensor Core performance, NVIDIA squeezes more juice out when calculating on the more standard 32-bit and 16-bit precision.

The NVIDIA B200 throughput is over 2x in TF32, FP16, and FP8, coupled with the capability of calculations on FP4. Now these lower precision floating point operations won’t be used on entire calculations but when incorporated into Mixed Precision workloads, the performance gains realized are unparalleled.

NVIDIA B200 vs NVIDIA H100 in AI Training & Inference

Do You Need to Upgrade?

“Newer is always better,” is often true in computer hardware. In the case of these enterprise-level GPUs, we want to think of it as an opportunity for additional hardware scalability.

Take the NVIDIA H100 and the NVIDIA B200:

  • If your organization is looking to purchase a new deployment like a BasePOD or SuperPOD, continue using the existing NVIDIA deployment but shift the workloads, Blackwell is extremely effective and revolutionary in inference performance with a huge 15x speedup and AI Training performance uplift.
  • If your organization is looking to replace your H100s or H200s, we recommend shifting the workloads around. Continue training on your older NVIDIA H100s and allocate Blackwell’s inferencing performance to deploy and deliver your models to clients faster. H100 and H200 also feature on par FP64 performance for HPC workloads, so simulation and analytic-type workloads can be performed on Hopper while AI tasks will be allocated to Blackwell.
  • If your organization wants to deploy a computing infrastructure for AI now, Hopper H200 is available today and offers competitive AI training performance at a lower cost compared to Blackwell B200 (we expect). You can continue to build your AI data center when Blackwell becomes available.

While NVIDIA continues to innovate, you can slowly transition towards newer hardware. Be mindful of the return on investment when deploying these machines, since they can get quite costly. Large-scale infrastructures take a long time to develop and need time to realize their value, and even if there is a new generation of Tensor Core GPUs, the past generation hardware can still deliver exceptional performance.

As elite partners, NVIDIA and Exxact have dedicated experts to get you to your computing infrastructure goals. If you have any questions on how you can better utilize or scale up your NVIDIA hardware, contact Exxact today.

Accelerate AI Training an NVIDIA DGX

Training AI models on massive datasets can be accelerated exponentially with the right system. It's not just a high-performance computer, but a tool to propel and accelerate your research. Deploy multiple NVIDIA DGX nodes for increased scalability. DGX H200 is available for quote today. B200 is available soon!

Get a Quote Today
Exx-blog-Comparing-NVIDIA-Tensor-Core-GPUs - NVIDIA-B200-B100-H200-H100-Tensor-Core-GPU.jpg
HPC

Comparing NVIDIA Tensor Core GPUs - NVIDIA B200, B100, H200, H100, A100

July 25, 20246 min read

What are NVIDIA Tensor Core GPUs?

Formally under the Tesla family moniker, NVIDIA Tensor Core GPUs are the gold standard GPU for AI computations due to its unique architecture designed specifically to execute the calculations found in AI and neural networks.

Tensors are AI's most fundamental data type and are a multidimensional array of defined weights. To calculate these arrays, matrix multiplication is ran extensively to update weights and enable neural networks to learn.

Tensor Cores were first introduced in the Tesla V100. Still, to improve the marketing behind their flagship GPUs, in the next GPU generation Ampere, NVIDIA ditched the Tesla name in favor of the Tensor Core GPU naming convention. The NVIDIA A100 Tensor Core GPU was a revolutionary high-performance GPU dedicated to accelerating AI computations.

It’s 2024, and NVIDIA’s lineup of Tensor Core GPUs has expanded. We want to offer a full-scale comparison of the specifications of the GPUs and comprehensive recommendations for deploying the world’s most powerful GPU for AI. For reference, we will be focusing on the SXM variants of each GPU which can be found in the NVIDIA DGX or the NVIDIA HGX platforms.

  • NVIDIA B200 Tensor Core GPU (Blackwell 2025)
  • NVIDIA B100 Tensor Core GPU (Blackwell 2025)
  • NVIDIA H200 Tensor Core GPU (Hopper 2024)
  • NVIDIA H100 Tensor Core GPU (Hopper 2022)
  • NVIDIA A100 Tensor Core GPU (Ampere 2020)

Read this blog to learn more about the various NVIDIA Blackwell Deployments.

B200 vs B100 vs H200 vs H100 vs A100 SXM

Before we get into the details, it is worth noting that A100 is EOL, and the H100 has been supplanted by the H200. Existing stock of H100 is still available but H200 will supersede any new orders. The NVIDIA B200 and B100 will be available to order soon and arrive speculated in 2025. Even though some GPUs are EOL, for those who have existing deployments, we include the legacy NVIDIA A100 and NVIDIA H100.

ArchitectureBlackwellBlackwellHopperHopperAmpere
GPU NameNVIDIA B200NVIDIA B100NVIDIA H200NVIDIA H100NVIDIA A100
FP6440 teraFLOPS30 teraFLOPS34 teraFLOPS34 teraFLOPS9.7 teraFLOPS
FP64 Tensor Core40 teraFLOPS30 teraFLOPS67 teraFLOPS67 teraFLOPS19.5 teraFLOPS
FP3280 teraFLOPS60 teraFLOPS67 teraFLOPS67 teraFLOPS19.5 teraFLOPS
FP32 Tensor Core2.2 petaFLOPS1.8 petaFLOPS989 teraFLOPS989 teraFLOPS312 teraFLOPS
FP16/BF16 Tensor Core4.5 petaFLOPS3.5 petaFLOPS1979 teraFLOPS1979 teraFLOPS624 teraFLOPS
INT8 Tensor Core9 petaOPs7 petaOPs3958 teraOPs3958 teraOPs1248 teraOPs
FP8 Tensor Core9 petaFLOPS7 petaFLOPS3958 teraFLOPS3958 teraFLOPS-
FP4 Tensor Core18 petaFLOPS14 petaFLOPS---
GPU Memory192GB HBM3e192GB HBM3e141GB HBM3e80GB HBM380GB HBM2e
Memory BandwidthUp to 8TB/sUp to 8TB/s4.8TB/s3.2TB/s2TB/s
Decoders7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG7 NVDEC    
7 JPEG5 NVDEC    
5 JPEG     
Multi-Instance GPUsUp to 7 MIGs @23GBUp to 7 MIGs @23GBUp to 7 MIGs @16.5GBUp to 7 MIGs @16.5GBUp to 7 MIGs @ 10GB
InterconnectNVLink 1.8TB/sNVLink 1.8TB/sNVLink 900GB/sNVLink 900GB/sNVLink 600GB/s
NVIDIA AI EnterpriseYesYesYesYesEOL

Looking at the raw performance defined by the number of floating-point operations performed per second at a given precision, the NVIDIA Blackwell GPUs sacrifice FP64 Tensor Core performance in favor of heavily increased Tensor Core performance in FP32 and below. Training AI doesn’t require the utmost 64-bit precision in its weights and parameter calculations. By sacrificing FP64 Tensor Core performance, NVIDIA squeezes more juice out when calculating on the more standard 32-bit and 16-bit precision.

The NVIDIA B200 throughput is over 2x in TF32, FP16, and FP8, coupled with the capability of calculations on FP4. Now these lower precision floating point operations won’t be used on entire calculations but when incorporated into Mixed Precision workloads, the performance gains realized are unparalleled.

Do You Need to Upgrade?

“Newer is always better,” is often true in computer hardware. In the case of these enterprise-level GPUs, we want to think of it as an opportunity for additional hardware scalability.

Take the NVIDIA H100 and the NVIDIA B200:

  • If your organization is looking to purchase a new deployment like a BasePOD or SuperPOD, continue using the existing NVIDIA deployment but shift the workloads, Blackwell is extremely effective and revolutionary in inference performance with a huge 15x speedup and AI Training performance uplift.
  • If your organization is looking to replace your H100s or H200s, we recommend shifting the workloads around. Continue training on your older NVIDIA H100s and allocate Blackwell’s inferencing performance to deploy and deliver your models to clients faster. H100 and H200 also feature on par FP64 performance for HPC workloads, so simulation and analytic-type workloads can be performed on Hopper while AI tasks will be allocated to Blackwell.
  • If your organization wants to deploy a computing infrastructure for AI now, Hopper H200 is available today and offers competitive AI training performance at a lower cost compared to Blackwell B200 (we expect). You can continue to build your AI data center when Blackwell becomes available.

While NVIDIA continues to innovate, you can slowly transition towards newer hardware. Be mindful of the return on investment when deploying these machines, since they can get quite costly. Large-scale infrastructures take a long time to develop and need time to realize their value, and even if there is a new generation of Tensor Core GPUs, the past generation hardware can still deliver exceptional performance.

As elite partners, NVIDIA and Exxact have dedicated experts to get you to your computing infrastructure goals. If you have any questions on how you can better utilize or scale up your NVIDIA hardware, contact Exxact today.

Accelerate AI Training an NVIDIA DGX

Training AI models on massive datasets can be accelerated exponentially with the right system. It's not just a high-performance computer, but a tool to propel and accelerate your research. Deploy multiple NVIDIA DGX nodes for increased scalability. DGX H200 is available for quote today. B200 is available soon!

Get a Quote Today