Benchmarks

GROMACS GPU Benchmark and Hardware Recommendations

August 16, 2023
7 min read
EXX-blog-GROMACS-GPU-Benchmarks-Hardware-Recommendations.jpg

GROMACS Benchmark Scope

As a value-added supplier of scientific workstations and servers, Exxact regularly provides reference benchmarks in various GPU configurations to guide Molecular Dynamics scientists looking to procure systems optimized for their research. In this blog, we benchmark the multiple server platforms and different GPU configurations performance using GROMACS MD and evaluate the total nanoseconds of simulation performed per day.


Looking for an HPC workstation or server for molecular dynamics research like drug discovery?

Configure your GROMACS system with our experienced system engineers today!




GROMACS Summary

GROMACS is a powerful open-source molecular dynamics package primarily designed for simulations of proteins, lipids, and nucleic acids, as well as non-biological systems such as polymers. GROMACS supports all the usual algorithms expected from a modern molecular dynamics implementation. GROMACS can be run in parallel in a multi-node environment using the standard MPI communication protocol, and since GROMACS 4.6, has implemented CUDA-based GPU acceleration on NVIDIA GPUs.

GROMACS is known for being extremely user-friendly, with topologies and parameter files written in clear text format. With GROMACS there is a lot of consistency checking, and clear error messages are issued when errors arise. GROMACS has a lot of variables and optimizations that can alter performance. Parameters are kept the same from test to test other than the hardware configurations running.

Exxact System Specification

Xeon W Workstation Xeon Scalable Server Threadripper PRO Workstation
Processor Intel Xeon W9-3495X Dual Intel Xeon Scalable 8490H AMD Threadripper PRO 5995WX
Total Cores 56 Cores 120 Cores (60 Each) 64 Cores
Base/Max Boost Clock 1.9GHz/4.8GHz 1.9GHz/3.5GHz 2.7GHz/4.5GHz
Memory 512GB DDR5 ECC 512GB DDR5 ECC 512GB DDR4 ECC
Storage #1 1.92TB M.2 NVMe SSD 4.09TB M.2 NVMe SSD 4.09TB M.2 NVMe SSD
CUDA Version 12.0 12.0 12.0

GPU Performance Benchmarks for GROMACS

Our first test is benchmarking the single GPU configurations to see which one performs best in ADH and RNASE workloads. We also include CPU only numbers to see if GPUs make a meaningful difference (spoilers: it does). In RNASE workloads, CPU only configuration did not finish due to constant crashes.

gromacs adh gpu benchmark - rtx 6000 ada, rtx A5500, rtx A4500

gromacs rnase gpu benchmark - rtx 6000 ada, rtx A5500, rtx A4500

Unexpectedly, the most powerful GPU, RTX 6000 Ada, does not always perform the best out of the group. The lower cost of the RTX A4500 is justified by its lower yet still respectable performance. However, for the cost, the RTX A4500 is an attractive choice. The standout is the NVIDIA RTX A5500 here with an average 90% performance versus the RTX 6000, a substantially pricier and newer GPU. RTX A5500 is matching the performance of the RTX 6000 in some tests while trailing not too far behind in others.

CPU Performance Benchmark for GROMACS

Dual vs Single Processor Configuration - Is more Cores Better?

First things first, we tested CPU only configurations. While this test is not representative of how we run GROMACS we wanted to address some common misconceptions that will be revealed in further testing.

gromacs benchmark RTX 6000 Ada - 1 processor vs 2 processors

RNASE is not tested on CPU only configurations since all too many CPU cores being used as the sole accelerator for solving RNASE simulation causes crashes. More cores are expected to perform better when there is no GPU accelerator as demonstrated by the sizable performance leadership the Dual Intel Xeon Scalable. But we weary in extrapolating these results; running GROMACS with a GPU yield a completely different picture.

We tested the two Intel Xeon configuration keeping the same NVIDIA RTX 6000 Ada. This is to test if the more cores in a Dual Xeon Scalable solution can contribute to higher performance. We test using the Intel Xeon W9-3495X versus a Dual Intel Xeon Scalable 8490H system.

GROMACS adh benchmark on RTX 6000 Ada

GROMACS RNASE benchmark RTX 6000 Ada

From both these tests, throwing more cores at the workload doesn’t boost performance, especially in the RNASE benchmark. Yes, the dual Xeon Scalable performed better in the CPU only benchmarking test but when running with a GPU, the dual Xeon Scalable configurations performs poorly versus a single Xeon W9 in every benchmark. GROMACS does not scale well with additional CPUs, nor does it perform better with more cores. Instead, GROMACS workloads perform better with processors running at higher clock speeds.

Ruling out dual CPU configurations, we test single processor systems. The AMD Threadripper PRO system is the HEDT processor competitor to the Intel Xeon W9 while the RTX 4090 and RTX 6000 Ada are consumer and high-end workstation equivalents. We test CPUs only configuration to see if GROMACS runs better on AMD or Intel processors with comparable GPUs.

Side note: RTX 6000 Ada and RTX 4090 have comparable evenly match performance.

  • RTX 6000 Ada values stability and scalability featuring more VRAM, lower power draw, and lesser mean time failure and have a dual slot width design for multi-GPU configurations, perfect for enterprise use.
  • The RTX 4090 values performance in gaming and draws more power. As a 3.5 slot width, 4090 not very scalable for multi-GPU configurations

GROMACS ADH benchmark rtx 6000 ada rtx 4090 intel xeon w9 vs amd threadripper pro

GROMACS RNASE benchmark rtx 6000 ada rtx 4090 intel xeon w9 vs amd threadripper pro

Intel Xeon W and AMD Threadripper PRO are evenly matched in most benchmarks on both ADH and RNASE benchmarks. However, where there is performance difference, the Threadripper PRO system edges out above the Intel Xeon W. Both CPUs perform admirably, the decision is up to the user to team blue or team red.

Full Hardware Recommendations for GROMACS

Running GROMACS workloads is difficult to pinpoint the exact configuration you need to run the best simulation every time. There are numerous optimizations that can increase or decrease performance in GROMACS workloads. However, we will still put together a general list of recommended hardware configurations that can run the majority of GROMACS workloads.

For CPU, avoid having too many cores. Often dense core CPUs cost far too much and GROMACS won’t be able to efficiently use all the CPU resources. However, a fast CPU is still needed. Opt for a processor with high clock speeds like the AMD Threadripper PRO 5995WX or the Intel Xeon W9-3945X.

For GPU, getting the highest spec GPU yield great performance numbers but GROMACS sometimes might not be able to utilize the entire GPU’s memory. The RTX 6000 Ada performed very well but was matched by the last generation RTX A5500, a midrange professional RTX GPU with half the VRAM and a fraction of the cost. The value proposition for the RTX A4500 also put this GPU in the conversation when scaling multiple systems. If your system will not be used at scale, a simple workstation outfitted with an RTX 4090 is also sufficient with performance comparable to the RTX 6000 Ada.

In the end, your workload may not be 100% GROMACS. Striking a balance between GROMACS optimization as well as considering your other applications is imperative in figuring out the right components for you. If you have any questions on building your next Molecular Dynamics system running GROMACS, Exxact engineers can answer them and provide guidance in choosing the best hardware for the best price.


Have any questions about how you can optimize the best solution for you varying workload?
Contact Exxact today to talk to an experience engineer in building your perfect computing infrastructure.



EXX-blog-GROMACS-GPU-Benchmarks-Hardware-Recommendations.jpg
Benchmarks

GROMACS GPU Benchmark and Hardware Recommendations

August 16, 20237 min read

GROMACS Benchmark Scope

As a value-added supplier of scientific workstations and servers, Exxact regularly provides reference benchmarks in various GPU configurations to guide Molecular Dynamics scientists looking to procure systems optimized for their research. In this blog, we benchmark the multiple server platforms and different GPU configurations performance using GROMACS MD and evaluate the total nanoseconds of simulation performed per day.


Looking for an HPC workstation or server for molecular dynamics research like drug discovery?

Configure your GROMACS system with our experienced system engineers today!




GROMACS Summary

GROMACS is a powerful open-source molecular dynamics package primarily designed for simulations of proteins, lipids, and nucleic acids, as well as non-biological systems such as polymers. GROMACS supports all the usual algorithms expected from a modern molecular dynamics implementation. GROMACS can be run in parallel in a multi-node environment using the standard MPI communication protocol, and since GROMACS 4.6, has implemented CUDA-based GPU acceleration on NVIDIA GPUs.

GROMACS is known for being extremely user-friendly, with topologies and parameter files written in clear text format. With GROMACS there is a lot of consistency checking, and clear error messages are issued when errors arise. GROMACS has a lot of variables and optimizations that can alter performance. Parameters are kept the same from test to test other than the hardware configurations running.

Exxact System Specification

Xeon W Workstation Xeon Scalable Server Threadripper PRO Workstation
Processor Intel Xeon W9-3495X Dual Intel Xeon Scalable 8490H AMD Threadripper PRO 5995WX
Total Cores 56 Cores 120 Cores (60 Each) 64 Cores
Base/Max Boost Clock 1.9GHz/4.8GHz 1.9GHz/3.5GHz 2.7GHz/4.5GHz
Memory 512GB DDR5 ECC 512GB DDR5 ECC 512GB DDR4 ECC
Storage #1 1.92TB M.2 NVMe SSD 4.09TB M.2 NVMe SSD 4.09TB M.2 NVMe SSD
CUDA Version 12.0 12.0 12.0

GPU Performance Benchmarks for GROMACS

Our first test is benchmarking the single GPU configurations to see which one performs best in ADH and RNASE workloads. We also include CPU only numbers to see if GPUs make a meaningful difference (spoilers: it does). In RNASE workloads, CPU only configuration did not finish due to constant crashes.

Unexpectedly, the most powerful GPU, RTX 6000 Ada, does not always perform the best out of the group. The lower cost of the RTX A4500 is justified by its lower yet still respectable performance. However, for the cost, the RTX A4500 is an attractive choice. The standout is the NVIDIA RTX A5500 here with an average 90% performance versus the RTX 6000, a substantially pricier and newer GPU. RTX A5500 is matching the performance of the RTX 6000 in some tests while trailing not too far behind in others.

CPU Performance Benchmark for GROMACS

Dual vs Single Processor Configuration - Is more Cores Better?

First things first, we tested CPU only configurations. While this test is not representative of how we run GROMACS we wanted to address some common misconceptions that will be revealed in further testing.

RNASE is not tested on CPU only configurations since all too many CPU cores being used as the sole accelerator for solving RNASE simulation causes crashes. More cores are expected to perform better when there is no GPU accelerator as demonstrated by the sizable performance leadership the Dual Intel Xeon Scalable. But we weary in extrapolating these results; running GROMACS with a GPU yield a completely different picture.

We tested the two Intel Xeon configuration keeping the same NVIDIA RTX 6000 Ada. This is to test if the more cores in a Dual Xeon Scalable solution can contribute to higher performance. We test using the Intel Xeon W9-3495X versus a Dual Intel Xeon Scalable 8490H system.

From both these tests, throwing more cores at the workload doesn’t boost performance, especially in the RNASE benchmark. Yes, the dual Xeon Scalable performed better in the CPU only benchmarking test but when running with a GPU, the dual Xeon Scalable configurations performs poorly versus a single Xeon W9 in every benchmark. GROMACS does not scale well with additional CPUs, nor does it perform better with more cores. Instead, GROMACS workloads perform better with processors running at higher clock speeds.

Ruling out dual CPU configurations, we test single processor systems. The AMD Threadripper PRO system is the HEDT processor competitor to the Intel Xeon W9 while the RTX 4090 and RTX 6000 Ada are consumer and high-end workstation equivalents. We test CPUs only configuration to see if GROMACS runs better on AMD or Intel processors with comparable GPUs.

Side note: RTX 6000 Ada and RTX 4090 have comparable evenly match performance.

  • RTX 6000 Ada values stability and scalability featuring more VRAM, lower power draw, and lesser mean time failure and have a dual slot width design for multi-GPU configurations, perfect for enterprise use.
  • The RTX 4090 values performance in gaming and draws more power. As a 3.5 slot width, 4090 not very scalable for multi-GPU configurations

Intel Xeon W and AMD Threadripper PRO are evenly matched in most benchmarks on both ADH and RNASE benchmarks. However, where there is performance difference, the Threadripper PRO system edges out above the Intel Xeon W. Both CPUs perform admirably, the decision is up to the user to team blue or team red.

Full Hardware Recommendations for GROMACS

Running GROMACS workloads is difficult to pinpoint the exact configuration you need to run the best simulation every time. There are numerous optimizations that can increase or decrease performance in GROMACS workloads. However, we will still put together a general list of recommended hardware configurations that can run the majority of GROMACS workloads.

For CPU, avoid having too many cores. Often dense core CPUs cost far too much and GROMACS won’t be able to efficiently use all the CPU resources. However, a fast CPU is still needed. Opt for a processor with high clock speeds like the AMD Threadripper PRO 5995WX or the Intel Xeon W9-3945X.

For GPU, getting the highest spec GPU yield great performance numbers but GROMACS sometimes might not be able to utilize the entire GPU’s memory. The RTX 6000 Ada performed very well but was matched by the last generation RTX A5500, a midrange professional RTX GPU with half the VRAM and a fraction of the cost. The value proposition for the RTX A4500 also put this GPU in the conversation when scaling multiple systems. If your system will not be used at scale, a simple workstation outfitted with an RTX 4090 is also sufficient with performance comparable to the RTX 6000 Ada.

In the end, your workload may not be 100% GROMACS. Striking a balance between GROMACS optimization as well as considering your other applications is imperative in figuring out the right components for you. If you have any questions on building your next Molecular Dynamics system running GROMACS, Exxact engineers can answer them and provide guidance in choosing the best hardware for the best price.


Have any questions about how you can optimize the best solution for you varying workload?
Contact Exxact today to talk to an experience engineer in building your perfect computing infrastructure.