Engineering MPD

GPU Speeds Up Fluid & Particle Simulation - Particleworks Benchmarks with Enginsoft

December 5, 2024

8 min read

EXX-Blog-GPU-Speeds-Up-Fluid-&-Particle-Simulation.jpg

Introduction

Particle-based simulations have revolutionized engineering and design by enabling precise modeling of fluid dynamics. Particleworks, a leading Computational Fluid Dynamics (CFD) simulation software, leverages high-performance computing hardware, particularly GPUs, to achieve exceptional performance.

Particleworks employs a mesh-free Moving Particle Simulation (MPS) method to solve complex fluid dynamics problems such as multiphase flow, large deformation, conjugate heat transfer, and fluid interactions with moving parts. Its applications span diverse industries, including automotive, civil engineering, manufacturing, and more.

In collaboration with EnginSoft, we explore how GPUs surpass CPUs in Particleworks simulations, delivering significant performance gains and shaping the future of engineering simulation. EnginSoft provided industry expertise to conduct the simulations, while Exxact supplied the high-performance hardware.

A special thanks to EnginSoft for conducting these benchmarks using Exxact’s cutting-edge hardware!

Accelerate Simulations in Particleworks with GPUs

Accelerate your particleworks simulation and CFD project with the latest CPUs and most powerful GPUs available. Configure an optimized solution that fits your deployment, budget, and desired performance!

Configure Now

Benchmark Setup

To measure the impact of GPU acceleration, we conducted a benchmark using Particleworks simulations with the baseline being a powerful 128-core CPU. We tested various GPUs evaluated for relative speedup.

Graph 1: GPU Speedup Relative to CPU in Single-Precision (FP32)

This graph highlights the relative speedup achieved when comparing GPU performance to a 128-core CPU baseline for single-precision (FP32) simulations. FP32 calculations are suitable for scenarios where peak accuracy is not critical to the overall simulation results.

NVIDIA GeForce RTX 4090 demonstrates an impressive 11.26x speedup over the CPU and the NVIDIA RTX 6000 Ada demonstrates a 9.34x speedup over the CPU. For single-GPU setups, the RTX 4090 offers exceptional performance, outpacing the NVIDIA RTX 6000 Ada in this context but here are other factors to consider:
- The RTX 4090 features higher clock speeds but is limited to 24GB of GPU memory, making it less suitable for larger-scale simulations. Furthermore, the larger cooler size limits the number of GPUs you can deploy in a single system.
- The RTX 6000 Ada, while sharing the same GPU architecture as the RTX 4090 with lower clock speeds. RTX 6000 Ada's advantage is its 48GB of GPU memory and professional application validation for enhanced stability. It is ideal for scalable GPU compute tasks and can be deployed in multi-GPU setups such as up to 4x GPUs in workstations or up to 8x GPUs in servers.
High-end professional GPUs like the NVIDIA H100 may not show their full potential in single-precision workloads, as GPUs like the RTX 4090 and RTX 6000 Ada benefit from higher clock speeds. This is not to say a 7.71x speedup is slow by any means. However, as shown in Graph 2, the H100 excels in double-precision (FP64) computations where other GPUs fall short.

Particleworks GPU Speedup Relative to CPU in Single-Precision (FP32)

Best Options: RTX 4090, RTX 6000 Ada, RTX A6000, and other professional RTX or GeForce RTX GPUs.

Professional RTX GPUs: Offer dedicated drivers and professional-level support for greater stability and reliability. These GPUs are optimized for scalability, supporting up to 4 GPUs in a full-tower workstation and offer server deployment flexibility.
GeForce RTX GPUs: Designed primarily for gaming, these cards deliver excellent performance for single-precision workloads but lack the same level of professional support. Their larger slot widths can limit scalability, often restricting workstations to 1 or 2 GPUs like the RTX 4090 and often don't fit inside servers.

Graph 2: GPU Speedup Relative to CPU in Double-Precision FP64

This second graph showcases relative speedup when running simulations, just like Graph 1 but instead in FP64 double-precision. Double-precision calculations are the preferred method for use simulation calculations for peak accuracy.

Graph 1 highest performers don't chart well here due to their non-native FP64 capabilities. However with NVIDIA H100 and NVIDIA A100, we can see 7.33x and 6.02x speedups respectively.
- Both NVIDIA H100 and A100 are data center GPUs with passive coolers that require a server deployment. These GPUs cannot be had in a desktop/workstation deployment.
- NVIDIA H100 and A100 feature 80GB of HBM memory delivering extremely fast memory bandwidth and high memory size, perfect for larger model sizes. These GPUs work better together when NVLinked together.
The NVIDIA A800 40GB Active is a workstation class card based off of the NVIDIA A100. NVIDIA A800 features a 5.52x speedup.
- Because A800 is a blower style card similar to the RTX 6000 Ada, these GPUs can be fitted in workstations.

Particleworks GPU benchmark Speedup in Double-Precison FP64

Best Options: Enterprise-grade GPUs such as the NVIDIA H100 and NVIDIA A100 Tensor Core GPU. The recently released NVIDIA H200 NVL with 141GB of memory (not tested) can offer even more performance.

These GPUs are designed for server deployments, making them ideal for shared HPC compute environments.
Flexibility with NVIDIA A800: While technically a workstation-class GPU, the A800 40GB Active provides enterprise-grade performance in a workstation form factor. This makes it a versatile choice for users requiring double-precision capabilities in smaller systems.

Graph 3 & 4: Particleworks Multi-GPU Scalability

We also tested scalability when deploying mulit-GPU setups to confirm if utilizing additional GPUs offer increased performance. Is 2 GPUs double the performance? Is 4x GPUs quadruple the performance?

We also tested this on both single-precison (float) and double precision with 1x GPUs being the baseline performance.

Particleworks Multi-GPU Scalability FP32

Particleworks Multi-GPU Scalability in FP64

As highlighted in Graphs 3 and 4, multi-GPU deployments offer remarkable scalability. More GPUs not only accelerate simulation times but also provide additional GPU memory, enabling the simulation of high-cell-count models with unparalleled fidelity. Models that previously required modifications in reducing cell count to fit a project timeline can retain its high resolution while still being solved at a feasible time. These higher resolution simulation models deliver a more true to life representations.

Key Takeaways & Conclusions

GPU Acceleration Transforms Particleworks

Reducing computation time with GPUs allows for more simulation iterations, minimizing idle time and significantly boosting productivity. GPU-accelerated computing is transforming the way engineers approach CFD and particle dynamics simulations. Leading software providers Ansys, Siemens, and Dassault are increasingly integrating GPU acceleration into their solvers, signaling a shift toward faster and more efficient simulation workflows.

A single NVIDIA H100 GPU performance is equivalent to approximately 900 CPU cores in double precision. This performance uplift allows for compute consolidation from a multi-CPU cluster to a single GPU accelerated compute node.

Selection Matters

Choosing the right GPU for your simulation workloads is essential to maximizing performance and efficiency. Both consumer and professional GPUs deliver impressive single-precision (FP32) speedups, but enterprise-grade GPUs provide additional robustness, scalability, and support for diverse workflows.

Matching your hardware to your simulation requirements is crucial. For single-precision-heavy workloads, consumer GPUs like the RTX 4090 are cost-effective powerhouses, while enterprise GPUs like the H100 or A100 are better suited for versatile, memory-intensive, or double-precision tasks. Scalability across multiple GPUs further enhances performance, offering users the flexibility to handle even the most demanding simulations.

Scalability

You can increased your compute capabilities by increasing the number of GPUs in your deployment. Particleworks has great GPU scalability as demonstrated in Graphs 3 and 4. Furthermore, multiple GPUs also mean you can run individual simulations on their individual GPUs for a parallel approach.

For running CFD simulations, leveraging GPUs is a no brainer. The continued scalability as you increase the number of GPUs show that continued investment in hardware will pay off and lower total cost of ownership. Stay ahead of the curve.

Closing Thoughts

Particleworks users stand to gain immense benefits from investing in GPU hardware. By leveraging the right GPUs, simulation times can be drastically reduced, leading to faster iterations and better outcomes.

The RTX 6000 Ada and RTX 4090 are great GPU options for single-precision FP32 calculations. For mixed precision and simulations that require FP64, any enterprise GPU like the NVIDIA H100 deliver the compute you need for critical workloads.

Are you ready to accelerate your Particleworks simulations? At Exxact we strive to offer the optimal solutions for each and every customer. Configure one of platforms tailored for Particleworks and contact us today for a quote. We would also be happy to assist with any new cluster deployments or compute infrastructure consolidations.

Accelerate Simulations in Particleworks with GPUs

Configure Now

Topics

Have any questions?

Engineering MPD