Benchmarks

TensorFlow Benchmarks for Exxact Server Featuring NVIDIA V100S

July 9, 2020

4 min read

For this post, we show deep learning benchmarks for TensorFlow on an Exxact TensorEX Server. To conduct these benchmarks this deep learning server was outfitted with 4 NVIDIA V100S GPUs.

We ran the standard “tf_cnn_benchmarks.py” benchmark script from TensorFlow’s github. To compare, tests were run on the following networks: ResNet-50, ResNet-152, Inception V3, Inception V4 and googlenet. In addition we compared the FP16 to FP32 performance, and used batch size of 128 . The same tests were run using 2 and 4 GPU configurations. All benchmarks were done using ‘vanilla’ TensorFlow settings for FP16 and FP32.

NVIDIA V100S Deep Learning Benchmark Snapshot

As we see, running FP16 gives a great boost to performance in the overall images/sec metric. If you’re able to train using FP16 vs FP32, we recommend to do so.

NVIDIA V100S Deep Learning Benchmarks FP16

	2 GPU img/sec	4 GPU img/sec	Batch Size
ResNet50	1735.56	3218	128
ResNet152	760.57	1415.56	128
Inception V3	1134.88	2161.02	128
Inception V4	602.36	1205.97	128
googlenet	2820.47	5265.14	128

NVIDIA V100S Deep Learning Benchmarks FP32

	2 GPU img/sec	4 GPU img/sec	Batch Size
ResNet50	762.21	1432.69	128
ResNet152	278.17	577.26	128
Inception V3	495.51	926.93	128
Inception V4	227.05	455.65	128
googlenet	1692.94	3393.91	128

System Specifications:

Model	Exxact TensorEX Deep Learning Server
GPU	NVIDIA Tesla V100S 32 GB PCIe
CPU	Intel Xeon Silver 4116
RAM	128GB DDR4
SSD (OS)	120 GB
SSD (Data)	1024.2 GB
OS	CentOS Linux 7
NVIDIA DRIVER	440.82
CUDA Version	10.2
Python	3.6.9
TensorFlow	20.02-tf1-py3
Docker Image	nvcr.io/nvidia/tensorflow:20.02-tf1-py3

Training Parameters

Dataset:	Imagenet
Mode:	training
SingleSess:	False
Batch Size:	128
Num Batches:	100
Num Epochs:	0.16
Devices:	[‘/gpu:0’]…(varied)
NUMA bind:	False
Data format:	NCHW
Optimizer:	momentum
Variables:	parameter_server

Interested in More Deep Learning Benchmarks?

Topics

Have any questions?

Benchmarks