Deep Learning

PyTorch 2.0 Preview Announced

December 7, 2022
5 min read
EXX-Blog-Pytorch-2-Preview.jpg

PyTorch 2.0 was announced last week with some game-changing additions to the machine learning framework. While PyTorch 2.0 is still in early development, there is a usable nightly build available for use.

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

PyTorch 2.0 is moving its C++ parts back into Python. Its main new feature is torch.compile, a mode that will compile without needing to change your model code.

import torch
import torchvision.models as models
model = models.resnet18().cuda
optimizer = torch.optim.SGD(model.parameters(), lr=0.01

#inserting torch.compile here:
compiled_model = torch.compile(model)

x=torch.randn(16, 3, 224, 224).cuda()
optimizer.zero_grad()

#modifying 'out = model(x)' to:
out = compiled_model(x)

out.sum().backward()
optimizer.step()

PyTorch 2.0 is Faster

PyTorch ran torch.compile on 163 open-source models from HuggingFace, TIMM, and TorchBench to evaluate the potential performance gain in tasks like Image Classification, Object Detection, Image Generation, various NLP tasks such as Language Modeling, Q&A, Sequence Classification, Recommender Systems, and Reinforcement Learning. Across these models torch.compile worked 93% of the models (as of pre-release), 43% faster on NVIDIA A100, 21% faster on Float32, and 51% faster with AMP precision.

As of pre-release, torch.compile is only available for CPUs, NVIDIA Volta, and NVIDIA Ampere GPUs with the NVIDIA A100 showcasing the best performance. The final release for PyTorch 2.0 is slotted for March 2023 with hopes of further compatibility with more accelerators.

Speaking of compatibility, PyTorch 2.0 is completely backward compatible since this release is fully additive. Under the hood of torch.compile are some new technologies - TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor.

  • TorchDynamo captures PyTorch programs safely using Python Frame Evaluation Hooks and is a significant innovation that was a result of 5 years of our R&D into safe graph capture
  • AOTAutograd overloads PyTorch’s autograd engine as a tracing autodiff for generating ahead-of-time backward traces.
  • PrimTorch canonicalizes ~2000+ PyTorch operators down to a closed set of ~250 primitive operators that developers can target to build a complete PyTorch backend. This substantially lowers the barrier of writing a PyTorch feature or backend.
  • TorchInductor is a deep-learning compiler that generates fast code for multiple accelerators and backends. For NVIDIA GPUs, it uses OpenAI Triton as a key building block.

These 4 new additions are written in Python and support dynamic shapes (the ability to send in Tensors of different sizes without inducing a recompilation), making them flexible and easily hackable, lowering the barrier of entry for developers and vendors.

Out of the box, PyTorch 2.0 is the same as previous PyTorch 1.X versions with the addition of the model = torch.compile(model) which goes through 3 steps:

  1. Graph acquisition: the first model is rewritten as blocks of a subgraph which are compiled by TorchDynamo and flattened. Subgraphs that aren’t supported will fall back to PyTorch 1.X code.
  2. Graph Lowering decomposes all PyTorch operations into constituent kernels specific to the chosen backend
  3. Graph compilation calls corresponding low-level device-specific operations

PyTorch 2.0 FAQ

How to Install PyTorch 2.0?

For GPUs on CUDA 11.7

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

For CPUs:

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Is PyTorch 2.0 backward compatible?

Yes! Using PyTorch 2.0 does not require any code changes to your existing workflows since this update changes the way code is compiled. A single line of code can optimize your model to use PyTorch 2.0 stack.

PyTorch 2.0 Release Date

PyTorch 2.0 is currently still in its experimental phase and in nightlies. Stable PyTorch 2.0 release date is slotted for March 2023. With PyTorch’s initial testing with 163 different Hugging Face, TIMM (pyTorch IMage Models), and TorchBench, torch.compile worked 93% of the time. Your mileage may vary at this time.

Does PyTorch 2.0 work will all GPUs?

Pre-release PyTorch 2.0 currently only supports CPU and NVIDIA Volta and Ampere generation GPU on CUDA 11.6 or 11.7. Future compatibility with other accelerators should continue to release as the stable release rolls out.

Wrapping it Up

Exxact specialized in GPU acceleration for HPC but hardware is just half the picture. We are excited just as much as you are for a faster, more dynamic, and more Pythonic version of our favorite deep learning framework.

The next generations of hardware from NVIDIA, AMD, Intel, and the innovations in software and AI discovery propel the world to a digital and compute-dominant landscape. In the meantime, if you’re getting started in deep learning or looking to switch to an on-premise solution, Exxact offers custom-built workstations, servers, and clusters to drive groundbreaking discoveries. Contact Us today!


EXX-Blog-Pytorch-2-Preview.jpg
Deep Learning

PyTorch 2.0 Preview Announced

December 7, 20225 min read

PyTorch 2.0 was announced last week with some game-changing additions to the machine learning framework. While PyTorch 2.0 is still in early development, there is a usable nightly build available for use.

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

PyTorch 2.0 is moving its C++ parts back into Python. Its main new feature is torch.compile, a mode that will compile without needing to change your model code.

import torch
import torchvision.models as models
model = models.resnet18().cuda
optimizer = torch.optim.SGD(model.parameters(), lr=0.01

#inserting torch.compile here:
compiled_model = torch.compile(model)

x=torch.randn(16, 3, 224, 224).cuda()
optimizer.zero_grad()

#modifying 'out = model(x)' to:
out = compiled_model(x)

out.sum().backward()
optimizer.step()

PyTorch 2.0 is Faster

PyTorch ran torch.compile on 163 open-source models from HuggingFace, TIMM, and TorchBench to evaluate the potential performance gain in tasks like Image Classification, Object Detection, Image Generation, various NLP tasks such as Language Modeling, Q&A, Sequence Classification, Recommender Systems, and Reinforcement Learning. Across these models torch.compile worked 93% of the models (as of pre-release), 43% faster on NVIDIA A100, 21% faster on Float32, and 51% faster with AMP precision.

As of pre-release, torch.compile is only available for CPUs, NVIDIA Volta, and NVIDIA Ampere GPUs with the NVIDIA A100 showcasing the best performance. The final release for PyTorch 2.0 is slotted for March 2023 with hopes of further compatibility with more accelerators.

Speaking of compatibility, PyTorch 2.0 is completely backward compatible since this release is fully additive. Under the hood of torch.compile are some new technologies - TorchDynamo, AOTAutograd, PrimTorch, and TorchInductor.

  • TorchDynamo captures PyTorch programs safely using Python Frame Evaluation Hooks and is a significant innovation that was a result of 5 years of our R&D into safe graph capture
  • AOTAutograd overloads PyTorch’s autograd engine as a tracing autodiff for generating ahead-of-time backward traces.
  • PrimTorch canonicalizes ~2000+ PyTorch operators down to a closed set of ~250 primitive operators that developers can target to build a complete PyTorch backend. This substantially lowers the barrier of writing a PyTorch feature or backend.
  • TorchInductor is a deep-learning compiler that generates fast code for multiple accelerators and backends. For NVIDIA GPUs, it uses OpenAI Triton as a key building block.

These 4 new additions are written in Python and support dynamic shapes (the ability to send in Tensors of different sizes without inducing a recompilation), making them flexible and easily hackable, lowering the barrier of entry for developers and vendors.

Out of the box, PyTorch 2.0 is the same as previous PyTorch 1.X versions with the addition of the model = torch.compile(model) which goes through 3 steps:

  1. Graph acquisition: the first model is rewritten as blocks of a subgraph which are compiled by TorchDynamo and flattened. Subgraphs that aren’t supported will fall back to PyTorch 1.X code.
  2. Graph Lowering decomposes all PyTorch operations into constituent kernels specific to the chosen backend
  3. Graph compilation calls corresponding low-level device-specific operations

PyTorch 2.0 FAQ

How to Install PyTorch 2.0?

For GPUs on CUDA 11.7

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cu117

For CPUs:

pip3 install numpy --pre torch[dynamo] torchvision torchaudio --force-reinstall --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Is PyTorch 2.0 backward compatible?

Yes! Using PyTorch 2.0 does not require any code changes to your existing workflows since this update changes the way code is compiled. A single line of code can optimize your model to use PyTorch 2.0 stack.

PyTorch 2.0 Release Date

PyTorch 2.0 is currently still in its experimental phase and in nightlies. Stable PyTorch 2.0 release date is slotted for March 2023. With PyTorch’s initial testing with 163 different Hugging Face, TIMM (pyTorch IMage Models), and TorchBench, torch.compile worked 93% of the time. Your mileage may vary at this time.

Does PyTorch 2.0 work will all GPUs?

Pre-release PyTorch 2.0 currently only supports CPU and NVIDIA Volta and Ampere generation GPU on CUDA 11.6 or 11.7. Future compatibility with other accelerators should continue to release as the stable release rolls out.

Wrapping it Up

Exxact specialized in GPU acceleration for HPC but hardware is just half the picture. We are excited just as much as you are for a faster, more dynamic, and more Pythonic version of our favorite deep learning framework.

The next generations of hardware from NVIDIA, AMD, Intel, and the innovations in software and AI discovery propel the world to a digital and compute-dominant landscape. In the meantime, if you’re getting started in deep learning or looking to switch to an on-premise solution, Exxact offers custom-built workstations, servers, and clusters to drive groundbreaking discoveries. Contact Us today!