Benchmarks

NVIDIA RTX A4000, A5000 and A6000 Comparison: Deep Learning Benchmarks for TensorFlow

August 12, 2021

7 min read

Deep Learning Benchmarks for TensorFlow

For this blog article, we conducted deep learning performance benchmarks for TensorFlow comparing the NVIDIA RTX A4000 to NVIDIA RTX A5000 and A6000 GPUs.

Our Deep Learning Server was fitted with four RTX A4000 GPUs and we ran the standard “tf_cnn_benchmarks.py” benchmark script found in the official TensorFlow GitHub. We tested on the following networks: ResNet50, ResNet152, Inception v3, and Inception v4. Furthermore, we ran the same tests using 1, 2, and 4 GPU configurations with largest batch size for FP16.

Key Points and Observations

In our benchmarks the NVIDIA RTX A4000 exhibits near linear scaling up to 4 GPUs.
The RTX A4000 GPU memory foot print of 16GB limited our batch size to 256 against much larger datasets.
Powered by the NVIDIA Ampere architecture, the RTX A4000 features 3rd gen Tensor Core, 2nd gen RT Cores. Plus 16GB of superfast GDDR6X memory, as well as Single Slot form factor and PCI Express Gen 4.
The A4000 offers full support for all our latest and greatest technologies.

NVIDIA RTX A4000 Highlights:

Spec
CUDA Cores	6144
Tensor Cores	192
RT Cores	48
Single Precision Performance	19.2 TFLOPS
RT Core Performance	37.4 TFLOPS
Tensor Performance	153.4 TFLOPS
GPU Memory	16 GB GDDR6 with ECC
Memory Interface	256-bit
Memory Bandwidth	448 GB/sec

Interested in getting faster results?
Learn more about Exxact deep learning workstations starting at $3,700

Exxact RTX A4000 Workstation System Specs:

Spec
Nodes	1
Processor / Count	2x AMD EPYC 7552
Total Logical Cores	48
Memory	DDR4 512 GB
Storage	NVMe 3.84 TB
OS	Ubuntu 18.04
CUDA Version	11.2
Tensorflow Version	2.40

TensorFlow FP16 Benchmarks comparison on ResNet50

Screen_Shot_2021-08-05_at_1.55.46_PM.png

GPUType	1x GPU	2x GPU	Batch Size
RTX A4000	684.56	844.57	256
RTX A5000	1096.25	2154.67	512
RTX A6000	1145.6	2404.77	1024

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on ResNet152

GPU Type	1x GPU	2x GPU	Batch Size
RTX A4000	303.87	551.86	128
RTX A5000	450.79	869.87	256
RTX A6000	605.67	1128.83	512

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception3

Screen_Shot_2021-08-05_at_2.04.13_PM.png

GPU Type	1x GPU	2x GPU	Batch Size
RTX A4000	488.24	943.46	256
RTX A5000	724.44	1353.78	256

Largest Batch Size for all FP16 tests.

TensorFlow FP16 Benchmarks comparison on Inception4

GPU Type	1x GPU	2x GPU	Batch Size
RTX A4000	245.6	451.38	128
RTX A5000	383.13	737.86	256

Largest Batch Size for all FP16 tests.

4x GPU TensorFlow FP16 Benchmarks comparison on ResNet50

GPU Type	1x GPU	2x GPU	4x GPU	Batch Size
RTX A4000	684.56	844.57	1482.31	256
RTX A5000	1096.25	2154.67	3107.32	512

Largest Batch Size for all FP16 tests.

NVIDIA RTX (Ampere) / Quadro RTX (Turing) comparison

	RTX A4000	RTX A5000	RTX A6000	Quadro RTX 4000	Quadro RTX 5000	Quadro RTX 6000
Architecture	Ampere	Ampere	Ampere	Turing	Turing	Turing
GPU memory	16 GB GDDR6	24 GB GDDR6	48 GB GDDR6	8 GB GDDR6	16 GB GDDR6	24 GB GDDR6
ECC memory	Yes	Yes	Yes	No	Yes	Yes
CUDA cores	6,144	8,192	10,752	2304	3,072	4,608
Tensor Cores	192	256	336	288	384	576
RT Cores	48	64	84	36	48	72
SP perf	19.2 TFLOPS	27.8 TFLOPS	38.7 TFLOPS	7.1 TFLOPS	11.2 TFLOPS	16.3 TFLOPS
RT Core perf	37.4 TFLOPS	54.2 TFLOPS	75.6 TFLOPS	N/A	N/A	N/A
Tensor perf	153.4 TFLOPS	222.2 TFLOPS	309.7 TFLOPS	57.0 TFLOPS	89.2 TFLOPS	130.5 TFLOPS
Max Power	140W	230W	300W	160W	265W	295W
Graphic bus	PCI-E 4.0 x16	PCI-E 4.0 x16	PCI-E 4.0 x16	PCI-E 3.0 x16	PCI-E 3.0 x16	PCI-E 3.0 x16
Connectors	DP 1.4 (4)	DP 1.4 (4)	DP 1.4 (4)	DP 1.4 (3), USB-C	DP 1.4 (4), USB-C	DP 1.4 (4), USB-C
Form Factor	Single slot	Dual Slot	Dual Slot	Single slot	Dual Slot	Dual Slot
vGPU Software	N/A	NVIDIA RTX vWS	NVIDIA RTX vWS	N/A	N/A	NVIDIA RTX vWS
Nvlink	N/A	2x RTX A5000	2x RTX A6000	N/A	2x RTX 5000	2x RTX 6000
Power Connector	1x 6-pin PCIe	1x 8-pin PCIe	1x 8-pin CPU	1x 6-pin PCIe	1x 8-pin PCIe	2x 8-pin PCIe

Additional GPU Benchmarks

Final Thoughts for NVIDIA RTX A4000

The NVIDIA RTX A4000 is the most powerful single-slot GPU for professionals, delivering real-time ray tracing, AI-accelerated compute, and high-performance graphics performance to your desktop.

Built on the NVIDIA Ampere architecture, the RTX A4000 combines 48 second-generation RT Cores, 192 third-generation Tensor Cores, and 6144 CUDA cores with 16 GB of graphics memory. So you can engineer products and solutions from your desktop workstation.

Have any questions?
Contact Exxact Today

Topics

Have any questions?

Benchmarks