Benchmarks

NVIDIA A100 Deep Learning Benchmarks for TensorFlow

December 8, 2020

6 min read

Showdown of the Data Center GPUs: A100 vs V100S

For this blog article, we conducted deep learning performance benchmarks for TensorFlow on the NVIDIA A100 GPUs. We also compared these GPU’s with their top of the line predecessor the Volta powered NVIDIA V100S.

Our Deep Learning Server was fitted with 8 NVIDIA A100 PCIe GPUs. We ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow github. The neural networks we tested were: ResNet50, ResNet152, Inception v3, Inception v4. Furthermore, we ran the same tests using 1, 2, 4, and 8 GPU configurations. Determined batch size was the largest that could fit into available GPU memory.

Key Points and Observations

The NVIDIA A100 is an exceptional GPU for deep learning with performance unseen in previous generations.
The NVIDIA A100 scales very well up to 8 GPUs (and probably more had we tested) using FP16 and FP32.
When compared to the V100S, in most cases the A100 offers 2x the performance in FP16 and FP32.

Interested in upgrading your deep learning server?
Learn more about Exxact deep learning servers featuring NVIDIA GPUs

NVIDIA A100 Deep Learning Benchmarks FP16

	1x GPU	2x GPU	4x GPU	8x GPU	Batch Size
ResNet 50	2357.09	4479.18	8830.78	12481.2	512
ResNet 152	988.9	1746.16	3036.46	5224.41	256
Inception V3	1377.38	2639.79	4994.27	8117.57	512
Inception V4	702.27	1318.51	2414.93	4305.89	256

NVIDIA A100 Deep Learning Benchmarks FP32

	1x GPU	2x GPU	4x GPU	8x GPU	Batch Size
ResNet 50	853.09	1652.98	3152.71	5871.22	256
ResNet 152	364.65	666.88	1192.32	2110.92	128
Inception V3	587.8	1130.1	2175.39	4062.41	256
Inception V4	289.94	539.5	1012.28	1835.12	128

NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP16 Comparison

The NVIDIA A100 simply outperforms the Volta V100S with a performance gains upwards of 2x. These tests only show image processing, however the results are in line with previous tests done by NVIDIA showing similar performance gains.

	4x NVIDIA A100 PCIe	4x NVIDIA V100S
ResNet 50	8830.78	3218
ResNet 152	3036.46	1415.56
Inception V3	4994.27	2161.02
Inception V4	2414.93	1205.97

NVIDIA A100 PCIe vs NVIDIA V100S PCIe FP32 Comparison

As with the FP16 tests, the A100 handily outperforms the V100S by a factor of 2.

	4x NVIDIA A100 PCIe	4x NVIDIA V100S
ResNet 50	3152.71	1432.69
ResNet 152	1192.32	577.26
Inception V3	2175.39	926.93
Inception V4	1012.28	455.65

Benchmark System Specs

System	Exxact AI Server
CPU	2x AMD EPYC 7552
GPU	NVIDIA A100 PCIe
System Memory	512GB
Storage	2x 480GB + 3.84TB
TensorFlow Version	NVIDIA Release 20.10-tf2 (build 16775790) TensorFlow Version 2.3.1

More Info and Specs About NVIDIA A100 PCIe GPU

NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration and flexibility to power the world’s highest-performing elastic data centers for AI, data analytics, and HPC applications. As the engine of the NVIDIA data center platform, A100 provides massive performance upgrades over V100 GPUs and can efficiently scale up to thousands of GPUs, or be partitioned into seven isolated GPU instances to accelerate workloads of all sizes.

Peak FP64	9.7 TF
Peak FP64 Tensor Core	19.5 TF
Peak FP32	19.5 TF
Peak FP32 Tensor Core	156 TF \| 312 TF
Peak BFLOAT16 Tensor Core	312 TF \| 624 TF
Peak FP16 Tensor Core	312 TF \| 624 TF
Peak INT8 Tensor Core	624 TOPS \| 1,248 TOPS
Peak INT4 Tensor Core	1,248 TOPS \| 2,496 TOPS
GPU Memory	40GB
GPU Memory Bandwidth	1,555 GB/s
Interconnect	NVIDIA NVLink 600 GB/s PCIe Gen4 64 GB/s
Multi-Instance GPUs	Various instance sizes with up to 7 MIGs at 5GB
Form Factor	PCIe
Max TDP Power	250 W

Have any questions about NVIDIA GPUs or AI Servers?
Contact Exxact Today

Topics

Have any questions?

Benchmarks