Benchmarks

NVIDIA A100 Deep Learning Benchmarks for TensorFlow

December 8, 2020
6 min read
A100_GPU.PNG

Showdown of the Data Center GPUs: A100 vs V100S

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on the NVIDIA A100 GPUs. We also compared these GPU’s with their top of the line predecessor the Volta powered NVIDIA V100S.

Our Deep Learning Server was fitted with 8 NVIDIA A100 PCIe GPUs. We ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. The neural networks we tested were: ResNet50, ResNet152, Inception v3, Inception v4. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations. Determined batch size was the largest that could fit into available GPU memory.

Key Points and Observations

  • The NVIDIA A100 is an exceptional GPU for deep learning with performance unseen in previous generations.
  • The NVIDIA A100 scales very well up to 8 GPUs (and probably more had we tested) using FP16 and FP32.
  • When compared to the V100S, in most cases the A100 offers 2x the performance in FP16 and FP32.

Interested in upgrading your deep learning server?
Learn more about Exxact deep learning servers featuring NVIDIA GPUs

NVIDIA A100 Deep Learning Benchmarks FP16

A100 Benchmarks Deep Learning FP16

1x GPU2x GPU4x GPU8x GPUBatch Size
ResNet 502357.094479.188830.7812481.2512
ResNet 152988.91746.163036.465224.41256
Inception V31377.382639.794994.278117.57512
Inception V4702.271318.512414.934305.89256

NVIDIA A100 Deep Learning Benchmarks FP32

A100 Benchmarks Deep Learning

1x GPU2x GPU4x GPU8x GPUBatch Size
ResNet 50853.091652.983152.715871.22256
ResNet 152364.65666.881192.322110.92128
Inception V3587.81130.12175.394062.41256
Inception V4289.94539.51012.281835.12128


NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP16 Comparison

The NVIDIA A100 simply outperforms the Volta V100S with a performance gains upwards of 2x. These tests only show image processing, however the results are in line with previous tests done by NVIDIA showing similar performance gains.

NVIDIA A100 vs V100S

4x NVIDIA A100 PCIe4x NVIDIA V100S
ResNet 508830.783218
ResNet 1523036.461415.56
Inception V34994.272161.02
Inception V42414.931205.97

NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP32 Comparison

As with the FP16 tests, the A100 handily outperforms the V100S by a factor of 2.

NVIDIA A100 vs V100

4x NVIDIA A100 PCIe4x NVIDIA V100S
ResNet 503152.711432.69
ResNet 1521192.32577.26
Inception V32175.39926.93
Inception V41012.28455.65

Benchmark System Specs

SystemExxact AI Server
CPU2x AMD EPYC 7552
GPUNVIDIA A100 PCIe
System Memory512GB
Storage2x 480GB + 3.84TB
TensorFlow VersionNVIDIA Release 20.10-tf2 (build 16775790) TensorFlow Version 2.3.1

More Info and Specs About NVIDIA A100 PCIe GPU

NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration and flexibility to power the world’s highest-performing elastic data centers for AI, data analytics, and HPC applications. As the engine of the NVIDIA data center platform, A100 provides massive performance upgrades over V100 GPUs and can efficiently scale up to thousands of GPUs, or be partitioned into seven isolated GPU instances to accelerate workloads of all sizes.

Peak FP649.7 TF
Peak FP64 Tensor Core19.5 TF
Peak FP3219.5 TF
Peak FP32 Tensor Core156 TF | 312 TF
Peak BFLOAT16 Tensor Core312 TF | 624 TF
Peak FP16 Tensor Core312 TF | 624 TF
Peak INT8 Tensor Core624 TOPS | 1,248 TOPS
Peak INT4 Tensor Core1,248 TOPS | 2,496 TOPS
GPU Memory40GB
GPU Memory Bandwidth1,555 GB/s
InterconnectNVIDIA NVLink 600 GB/s PCIe Gen4 64 GB/s
Multi-Instance GPUsVarious instance sizes with up to 7 MIGs at 5GB
Form FactorPCIe
Max TDP Power250 W

Have any questions about NVIDIA GPUs or AI Servers?
Contact Exxact Today

A100_GPU.PNG
Benchmarks

NVIDIA A100 Deep Learning Benchmarks for TensorFlow

December 8, 20206 min read

Showdown of the Data Center GPUs: A100 vs V100S

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on the NVIDIA A100 GPUs. We also compared these GPU’s with their top of the line predecessor the Volta powered NVIDIA V100S.

Our Deep Learning Server was fitted with 8 NVIDIA A100 PCIe GPUs. We ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. The neural networks we tested were: ResNet50, ResNet152, Inception v3, Inception v4. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations. Determined batch size was the largest that could fit into available GPU memory.

Key Points and Observations

  • The NVIDIA A100 is an exceptional GPU for deep learning with performance unseen in previous generations.
  • The NVIDIA A100 scales very well up to 8 GPUs (and probably more had we tested) using FP16 and FP32.
  • When compared to the V100S, in most cases the A100 offers 2x the performance in FP16 and FP32.

Interested in upgrading your deep learning server?
Learn more about Exxact deep learning servers featuring NVIDIA GPUs

NVIDIA A100 Deep Learning Benchmarks FP16

A100 Benchmarks Deep Learning FP16

1x GPU2x GPU4x GPU8x GPUBatch Size
ResNet 502357.094479.188830.7812481.2512
ResNet 152988.91746.163036.465224.41256
Inception V31377.382639.794994.278117.57512
Inception V4702.271318.512414.934305.89256

NVIDIA A100 Deep Learning Benchmarks FP32

A100 Benchmarks Deep Learning

1x GPU2x GPU4x GPU8x GPUBatch Size
ResNet 50853.091652.983152.715871.22256
ResNet 152364.65666.881192.322110.92128
Inception V3587.81130.12175.394062.41256
Inception V4289.94539.51012.281835.12128


NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP16 Comparison

The NVIDIA A100 simply outperforms the Volta V100S with a performance gains upwards of 2x. These tests only show image processing, however the results are in line with previous tests done by NVIDIA showing similar performance gains.

NVIDIA A100 vs V100S

4x NVIDIA A100 PCIe4x NVIDIA V100S
ResNet 508830.783218
ResNet 1523036.461415.56
Inception V34994.272161.02
Inception V42414.931205.97

NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP32 Comparison

As with the FP16 tests, the A100 handily outperforms the V100S by a factor of 2.

NVIDIA A100 vs V100

4x NVIDIA A100 PCIe4x NVIDIA V100S
ResNet 503152.711432.69
ResNet 1521192.32577.26
Inception V32175.39926.93
Inception V41012.28455.65

Benchmark System Specs

SystemExxact AI Server
CPU2x AMD EPYC 7552
GPUNVIDIA A100 PCIe
System Memory512GB
Storage2x 480GB + 3.84TB
TensorFlow VersionNVIDIA Release 20.10-tf2 (build 16775790) TensorFlow Version 2.3.1

More Info and Specs About NVIDIA A100 PCIe GPU

NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration and flexibility to power the world’s highest-performing elastic data centers for AI, data analytics, and HPC applications. As the engine of the NVIDIA data center platform, A100 provides massive performance upgrades over V100 GPUs and can efficiently scale up to thousands of GPUs, or be partitioned into seven isolated GPU instances to accelerate workloads of all sizes.

Peak FP649.7 TF
Peak FP64 Tensor Core19.5 TF
Peak FP3219.5 TF
Peak FP32 Tensor Core156 TF | 312 TF
Peak BFLOAT16 Tensor Core312 TF | 624 TF
Peak FP16 Tensor Core312 TF | 624 TF
Peak INT8 Tensor Core624 TOPS | 1,248 TOPS
Peak INT4 Tensor Core1,248 TOPS | 2,496 TOPS
GPU Memory40GB
GPU Memory Bandwidth1,555 GB/s
InterconnectNVIDIA NVLink 600 GB/s PCIe Gen4 64 GB/s
Multi-Instance GPUsVarious instance sizes with up to 7 MIGs at 5GB
Form FactorPCIe
Max TDP Power250 W

Have any questions about NVIDIA GPUs or AI Servers?
Contact Exxact Today