Deep Learning

Hugging Face Benchmarks - Natural Language Processing for PyTorch

January 26, 2022

13 min read

Hugging Face Benchmark Overview

The following performance benchmarks were performed using the Hugging Face AI community Benchmark Suite. The benchmark uses Transformer Models for NLP using libraries from the Hugging Face ecosystem. For more information visit the Hugging Face website.

Natural Language Processing (NLP)

NLP is a field of linguistics and machine learning focused on understanding everything related to human language. The aim of NLP tasks is not only to understand single words individually, but to be able to understand the context of those words through speech and/or text.

Transformers

Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provide general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability using PyTorch.

Transformers provide thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. The aim is to make cutting-edge NLP easier to use for everyone.

Transformers provide APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on their model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

Transformers are backed by the three most popular deep learning libraries: Jax, PyTorch and TensorFlow, with a seamless integration between them. It's straightforward to train models with one before loading them for inference with the other.

In the following benchmarks we ran, PyTorch was used.

While Hugging Face does offer tiered pricing options for access to premium AutoNLP capabilities and an accelerated inference application programming interface (API), basic access to the inference API is included in the free tier and their core NLP libraries (transformers, tokenizers, datasets, and accelerate) are developed in the open and freely available under an Apache 2.0 License.

Exxact Workstation System Specs:

Nodes	1
Processor / Count	1x AMD Ryzen Threadripper 3960X 24-Core Processor
Total Logical Cores	48
Memory	DDR4 128 GB
Storage	NVMe 3.7T /data
OS	Ubuntu 20.04
CUDA Version	11.3
PyTorch Version	3.8.10

Interested in a deep learning workstation that can handle NLP training?
Learn more about Exxact AI workstations starting around $5,500

Transformers Inference Benchmark Comparison for NVIDIA RTX 3060 Ti, 3070, 3080 and 3090

Hereby, inference is defined by a single forward pass, and training is defined by a single forward pass and backward pass. Three arguments are given to the benchmark argument data classes, namely models, batch_sizes, and sequence_lengths. Bydefault, the time and the required memory for inference are benchmarked.

In the following examples output above the first two sections show the result corresponding to inference time and inference memory. In addition, all relevant information about the computing environment, e.g. the GPU type, the system, the library versions, etc.

For more information visit: https://github.com/huggingface

Environment Information

3060Ti	3070	3080	3090
transformers_version,4.11.3 framework,PyTorch use_torchscript,False framework_version,1.9.0+cu111 python_version,3.8.10 system,Linux cpu,x86_64 architecture,64bit date,2021-11-17 time,14:36:02.274199 fp16,False use_multiprocessing,True only_pretrain_model,False cpu_ram_mb,128747 use_gpu,True num_gpus,1 gpu,NVIDIA GeForce RTX 3060 Ti gpu_ram_mb,7979 gpu_power_watts,200.0 gpu_performance_state,0 use_tpu,False	transformers_version,4.11.3 framework,PyTorch use_torchscript,False framework_version,1.9.0+cu111 python_version,3.8.10 system,Linux cpu,x86_64 architecture,64bit date,2021-10-28 time,11:46:55.409272 fp16,False use_multiprocessing,True only_pretrain_model,False cpu_ram_mb,128747 use_gpu,True num_gpus,1 gpu,NVIDIA RTX 3070 gpu_ram_mb,7979 gpu_power_watts,220.0 gpu_performance_state,0 use_tpu,False	transformers_version,4.11.3 framework,PyTorch use_torchscript,False framework_version,1.9.0+cu111 python_version,3.8.10 system,Linux cpu,x86_64 architecture,64bit date,2021-10-28 time,10:53:02.353928 fp16,False use_multiprocessing,True only_pretrain_model,False cpu_ram_mb,128747 use_gpu,True num_gpus,1 gpu,NVIDIA RTX 3080 gpu_ram_mb,10010 gpu_power_watts,320.0 gpu_performance_state,0 use_tpu,False	transformers_version,4.11.3 framework,PyTorch use_torchscript,False framework_version,1.9.0+cu111 python_version,3.8.10 system,Linux cpu,x86_64 architecture,64bit date,2021-10-14 time,14:15:11.014509 fp16,False use_multiprocessing,True only_pretrain_model,False cpu_ram_mb,128747 use_gpu,True num_gpus,1 gpu,NVIDIA RTX 3090 gpu_ram_mb,24260 gpu_power_watts,350.0 gpu_performance_state,0 use_tpu,False

Model: BERT base model (uncased)

Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is uncased: it does not make a difference between english and English.

Inference - Speed and Memory Results

GPU Type	Time In Sec	Memory In MB
NVIDIA RTX 3070 Seq 8	0.0056	2340
NVIDIA RTX 3080 Seq 8	0.0055	2249
NVIDIA RTX 3090 Seq 8	0.0055	2312
NVIDIA RTX 3070 Seq 32	0.0058	2374
NVIDIA RTX 3080 Seq 32	0.0055	2283
NVIDIA RTX 3090 Seq 32	0.0055	2346
NVIDIA RTX 3070 Seq 128	0.0189	2490
NVIDIA RTX 3080 Seq 128	0.0113	2399
NVIDIA RTX 3090 Seq 128	0.0099	2462
NVIDIA RTX 3070 Seq 512	0.0684	3080
NVIDIA RTX 3080 Seq 512	0.0447	2989
NVIDIA RTX 3090 Seq 512	0.0392	3052

Benchmark Notes

Memory performance is similar across the tested GPUs for each sequence length.
The NVIDIA RTX 3060 Ti and the NVIDIA RTX 3070 provide similar speed performance with sequence lengths 32 and 128, but the NVIDIA RTX 3070 has an 8% advantage with Seq 512.
Factoring in cost, the NVIDIA RTX 3070 ($499 MSRP) is 25% more expensive than the NVIDIA RTX 3060 Ti ($399 MSRP). The 8% speed performance advantage with Seq 512 may not justify the extra cost for the NVIDIA RTX 3070.
The NVIDIA RTX 3080 and the NVIDIA RTX 3090 provide similar speed performance as the other GPUs with sequence lengths 8 and 32, but they clearly outperform the others by at least 40% with Seq 128 and at least 35% with Seq 512.
The NVIDIA RTX 3090 has a 12% advantage over the NVIDIA RTX 3080 in speed performance with sequence lengths 128 and 512.
Factoring in cost, the NVIDIA RTX 3090 ($1,499 MSRP) can cost more than double the NVIDIA RTX 3080 ($699 MSRP). The 12% speed performance gain may not justify paying twice as much for the NVIDIA RTX 3090.
However, the NVIDIA RTX 3080 has a 40% advantage over the NVIDIA RTX 3070 with Seq 128 and 35% advantage with Seq 512, and has a fair increase in price of 40%.

Model: BERT base model (cased)

Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English.

Inference - Speed and Memory Results

GPU Type	Time In Sec	Memory In MB
NVIDIA RTX 3070 Seq 8	0.0054	2323
NVIDIA RTX 3070 Seq 32	0.0058	2357
NVIDIA RTX 3070 Seq 128	0.0187	2467
NVIDIA RTX 3070 Seq 512	0.0678	3039
NVIDIA RTX 3080 Seq 8	0.0054	2245
NVIDIA RTX 3080 Seq 32	0.0054	2279
NVIDIA RTX 3080 Seq 128	0.0111	2389
NVIDIA RTX 3080 Seq 512	0.0442	2961
NVIDIA RTX 3090 Seq 8	0.0056	2308
NVIDIA RTX 3090 Seq 32	0.0056	2342
NVIDIA RTX 3090 Seq 128	0.0098	2452
NVIDIA RTX 3090 Seq 512	0.0387	3024

Benchmark Notes

Memory performance is similar across the tested GPUs for each sequence length.
The NVIDIA RTX 3060 Ti and the NVIDIA RTX 3070 provide similar speed performance with sequence lengths 32 and 128, but the NVIDIA RTX 3070 has a 9% advantage with Seq 512.
Factoring in cost, the NVIDIA RTX 3070 ($499 MSRP) is 25% more expensive than the NVIDIA RTX 3060 Ti ($399 MSRP). The 9% speed performance advantage with Seq 512 may not justify the extra cost for the NVIDIA RTX 3070.
The NVIDIA RTX 3080 and the NVIDIA RTX 3090 provide similar speed performance as the other GPUs with sequence lengths 8 and 32, but they clearly outperform the others by at least 41% with Seq 128 and at least 35% with Seq 512.
The NVIDIA RTX 3090 has a 12% advantage over the NVIDIA RTX 3080 in speed performance with sequence lengths 128 and 512.
Factoring in cost, the NVIDIA RTX 3090 ($1,499 MSRP) can cost more than double the NVIDIA RTX 3080 ($699 MSRP). The 12% speed performance gain may not justify paying twice as much for the NVIDIA RTX 3090.
However, the NVIDIA RTX 3080 has a 41% advantage over the NVIDIA RTX 3070 with Seq 128 and 35% advantage with Seq 512, and has a fair increase in price of 40%.

Model: GPT2

Pretrained model on English language using a causal language modeling (CLM) objective.

Inference - Speed and Memory Results

GPU Type	Time In Sec	Memory In MB
NVIDIA RTX 3070 Seq 8	0.0052	2403
NVIDIA RTX 3080 Seq 8	0.0055	2343
NVIDIA RTX 3090 Seq 8	0.0054	2406
NVIDIA RTX 3070 Seq 32	0.0065	2473
NVIDIA RTX 3080 Seq 32	0.0052	2413
NVIDIA RTX 3090 Seq 32	0.0055	2476
NVIDIA RTX 3070 Seq 128	0.0234	2751
NVIDIA RTX 3080 Seq 128	0.0149	2691
NVIDIA RTX 3090 Seq 128	0.0123	2754
NVIDIA RTX 3070 Seq 512	0.088	4027
NVIDIA RTX 3080 Seq 512	0.0572	3967
NVIDIA RTX 3090 Seq 512	0.0504	4030

Benchmark Notes

Memory performance is similar across the tested GPUs for each sequence length.
Unexpectedly, the NVIDIA RTX 3080 performed 5.5% slower than the NVIDIA RTX 3070 with Seq 8. More so, the NVIDIA RTX 3080 was actually 5.5% faster than the NVIDIA RTX 3090 with Seq 32.
The NVIDIA RTX 3070 has 8.5% speed advantage over the NVIDIA RTX 3060 Ti with Seq 32 and 8% advantage with Seq 512. (Seq 128 was similar.)
Factoring in cost, the NVIDIA RTX 3070 ($499 MSRP) is 25% more expensive than the NVIDIA RTX 3060 Ti ($399 MSRP). The 8-8.5% speed performance advantage may not justify the extra cost for the NVIDIA RTX 3070.
The NVIDIA RTX 3090 has a 17.5% advantage over the NVIDIA RTX 3080 in speed performance with Seq 128 and 12% advantage with Seq 512.
Factoring in cost, the NVIDIA RTX 3090 ($1,499 MSRP) can cost more than double the NVIDIA RTX 3080 ($699 MSRP). The 12-17.5% speed performance gain may not justify paying twice as much for the NVIDIA RTX 3090.
However, the NVIDIA RTX 3080 has a 47% advantage over the NVIDIA RTX 3070 with Seq 128 and 35% advantage with Seq 512, and has a fair increase in price of 40%.

NVIDIA GeForce RTX 30-series GPUs

	NVIDIA RTX 3060 Ti	NVIDIA RTX 3070	NVIDIA RTX 3080	NVIDIA RTX 3090
NVIDIA CUDA Cores	4864	5,888	8,704	10,496
Boost Clock (GHz)	1.67	1.73	1.71	1.70
Memory Size	8GB	8 GB	10 GB	24 GB
Memory Type	GDDR6	GDDR6	GDDR6X	GDDR6X
Dimensions	9.5 x 4.4 inches	9.5 x 4.4 inches	11.2 x 4.4 inches	12.3 x 5.4 inches
Power Draw	200W	220W	320W	350W

Have any questions?

Contact Exxact Today

Topics

Have any questions?

Deep Learning