Deep Learning

PyTorch Release v1.3.0 - Mobile Support, Named Tensors, Quantization, Type Promotion

October 10, 2019
66 min read
PyTorch-v1.3.0-blog.jpg

PyTorch is a widely used, open-source deep learning platform used for easily writing neural network layers in Python enabling seamless workflow from research to production. Based on Torch, PyTorch has become a powerful machine learning framework favored by esteemed researchers around the world.

Here is the newest PyTorch release v1.3.0 featuring new mobile support, named tensors, quantization, type promotion, and many more new features.

Table of Contents

  • Breaking Changes
  • Highlights
    • [Experimental]: Mobile Support
    • [Experimental]: Named Tensor Support
    • [Experimental]: Quantization support
    • Type Promotion
    • Deprecations
  • New Features
    • TensorBoard: 3D Mesh and Hyperparameter Support
    • Distributed
    • Libtorch Binaries with C++11 ABI
    • New TorchScript features
  • Improvements
    • C++ Frontend Improvements
      • Autograd
      • New torch::nn modules
      • New torch::nn::functional functions
      • tensor Construction API
      • Other C++ Improvements
    • Distributed Improvements
    • Performance Improvements
    • JIT Improvements
    • ONNX Exporter Improvements
      • Adding Support for ONNX IR v4
      • Adding Support for ONNX Opset 11
      • Exporting More Torch Operators/Models to ONNX
      • Enhancing ONNX Export Infra
    • Other Improvements
  • Bug Fixes
    • TensorBoard Bug Fixes
    • C++ API Bug fixes
    • JIT
    • Other Bug Fixes
  • Documentation Updates
    • Distributed
    • JIT
    • Other documentation improvements

Deep Learning Workstations Transformer

Breaking Changes

Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions. (22273, 26981)

Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.

In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see full documentation). These rules generally will retain precision and be less surprising to users.

Version 1.2Version 1.3
>>> torch.tensor(1) + 2.5
tensor(3)
>>> torch.tensor([1]) + torch.tensor(2.5)
tensor([3])
>>> torch.tensor(**True**) + 5
tensor(True)
>>> torch.tensor(1) + 2.5
tensor(3.5000)
>>> torch.tensor([1]) + torch.tensor(2.5)
tensor([3.5000])
>>> torch.tensor(True) + 5
tensor(6)

Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error. (22273, 26981)

Version 1.2Version 1.3
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
tensor(2)
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
tensor(True)
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
RuntimeError: result type Float cannot be cast to the desired output type Long
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
RuntimeError: result type Long cannot be cast to the desired output type Bool

These rules can be checked at runtime via torch.can_cast.

torch.flatten: 0-dimensional inputs now return a 1-dim tensor. (25406).

Version 1.2Version 1.3
>>> torch.flatten(torch.tensor(0))
tensor(0)
>>> torch.flatten(torch.tensor(0))
tensor([0])

nn.functional.affine_grid: when align_corners = True, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).

Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).

torch.gels: removed deprecated operator, use torch.lstsq instead. (26480).

utils.data.DataLoader: made a number of Iterator attributes private (e.g. num_workers, pin_memory). (22273)

[C++] Variable::backward will no longer implicitly create a gradient for non-1-element Variables. Previously, a gradient tensor of all 1s would be implicitly created. This behavior matches the Python API. (26150)

auto x = torch::randn({5, 5}, torch::requires_grad());
auto y = x * x;
y.backward()
// ERROR: "grad can be implicitly created only for scalar outputs"

[C++] All option specifiers (e.g. GRUOptions::bidirectional_) are now private, use the function variants (GRUOptions::bidirectional(...)) instead. (26419).

Highlights

[Experimental]: Mobile Support

In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here is the full list of features in this release:

  • Support for full TorchScript inference on mobile;
  • Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
  • Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
  • Support for all forward ops on mobile CPU (backward ops are not supported yet);
  • Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
  • Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);

We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.

The tutorials, demo apps, and download links for prebuilt libraries can be found at https://pytorch.org/mobile/

This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster, and better for your specific use cases. Stay tuned and give us your feedback!

[Experimental]: Named Tensor Support

Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by the name" rather than "broadcasting by position".

Create a named tensor by passing a names argument into most tensor factory functions.

>>> tensor = torch.zeros(2, 3, names=('C', 'N'))
    tensor([[0., 0., 0.],
            [0., 0., 0.]], names=('C', 'N'))

Named tensors propagate names across operations.

>>> tensor.abs()
    tensor([[0., 0., 0.],
            [0., 0., 0.]], names=('C', 'N'))

Rearrange to the desired ordering by using align_to .

>>> tensor = tensor.align_to('N', 'C', 'H', 'W')
>>> tensor.names, tensor.shape
    (('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))

And more! Please see our documentation on named tensors.

[Experimental]: Quantization support

PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:

x = torch.rand(10,1, dtype=torch.float32)
xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
# xq is a quantized tensor with data represented as quint8
xdq = x.dequantize()
# convert back to floating point

We also support 8-bit quantized implementations of most common operators in CNNs, including:

  • Tensor operations:
    • view, clone, resize, slice
    • add, multiply, cat, mean, max, sort, topk
  • Modules/Functionals (in torch.nn.quantized)
    • Conv2d
    • Linear
    • Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
    • Interpolate
    • Upsample
  • Fused operations for preserving better accuracy (in torch.nn.intrinsic)
    • ConvReLU2d, ConvBnReLU2d, ConvBn2d
    • LinearReLU
    • add_relu

We also support dynamic quantized operators, which take in floating point activations but use quantized weights (in torch.nn.quantized.dynamic).

  • LSTM
  • Linear

Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:

  • MinMaxObserver
  • MovingAverageMinMaxObserver
  • PerChannelMinMaxObserver
  • MovingAveragePerChannelMinMaxObserver
  • HistogramObserver

For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:

  • torch.fake_quantize_per_tensor_affine, torch.fake_quantize_per_channel_affine
  • torch.quantization.FakeQuantize

In addition, we also support workflows in torch. quantization for:

  • post-training dynamic quantization
  • static post-training quantization
  • quantization aware training

All quantized operators are compatible with TorchScript.

For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html

Type Promotion

Arithmetic and comparison operations may now perform mixed-type operations that promote a common dtype.

This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with dtype=torch.float32.

>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)

See the full documentation for more details.

  • torch.result_type Provide a function to determine the result of mixed-type operations (26012).
  • torch.can_cast Expose casting rules for type promotion (26805).
  • torch.promote_types Expose promotion logic (26655).

Deprecations

nn.functional.affine_grid / nn.functional.grid_sample: USING The Align_CORNER Default value is now deprecated because it will be changed in 1.4 release.

The align_corner the parameter was added in this release; the behavior in the previous release was equivalent to setting the parameter to True. This is also the current default value but it will be changed to False from 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
                                    (1,3,2,2))
UserWarning: Default grid_sample and affine_grid behavior will be changed
to align_corners=False from 1.4.0. 
See the documentation of grid_sample for details.
...

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
                                    (1,3,2,2),
                                    align_corners=True)
# NO WARNING!
...

[C++] Deprecate torch::Tensor::data<T>() in favor of torch::Tensor::data_ptr<T>() (24847, 24886).

New Features

TensorBoard: 3D Mesh and Hyperparameter Support

torch.utils.tensorboard support 3D mesh and points plus hyperparameter logging. More details can be found in the documentation for SummaryWriter with add_mesh and add_hparams.

A simple example of exercising both methods:

from torch.utils.tensorboard import SummaryWriter

vertices_tensor = torch.as_tensor([
    [1, 1, 1],
    [-1, -1, 1],
    [1, -1, -1],
    [-1, 1, -1],
], dtype=torch.float).unsqueeze(0)
colors_tensor = torch.as_tensor([
    [255, 0, 0],
    [0, 255, 0],
    [0, 0, 255],
    [255, 0, 255],
], dtype=torch.int).unsqueeze(0)
faces_tensor = torch.as_tensor([
    [0, 2, 3],
    [0, 3, 1],
    [0, 1, 2],
    [1, 3, 2],
], dtype=torch.int).unsqueeze(0)

with SummaryWriter() as w:
    w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
    for i in range(5):
        w.add_hparams({'lr': 0.1*i, 'bsize': i},
                      {'hparam/accuracy': 10*i, 'hparam/loss': 10*i})

Distributed

This release adds macOS support for torch.distributed with the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.

  • torch.distributed.all_reduce_coalesced Support all reduce of a list of same-device tensors (24949, 25470, 24876)
  • torch.distributed.all_reduce Add bitwise reduction ops (BAND, BOR, BXOR) (26824)

Libtorch Binaries with C++11 ABI

We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ “QUICK START LOCALLY”.

New TorchScript features

  • Add not in support for TorchScript (23637).
  • You can now raise exceptions in one side of an if branch (23565).
  • Add torch.jit.is_scripting() API (25955).
  • Make assertions like x is not None unwrap the optional type of x (23949).
  • Add dictionary augmented assignment (+=) support to TorchScript (23639).
  • Support grad and data attribute for tensor in TorchScript (23842).
  • Add @ignore for TorchScript classes (23614).
  • Support nn.GRU in script (23266).
  • Support tensor as a key type in TorchScript (23638).
  • Add support for ModuleDict (25715).
  • Bind set_grad_enabled() into TorchScript (25350).
  • Add in membership checks for lists (25796).
  • Add tuple keyword (25474).
  • Add __getitem__ to class types (25664).
  • Add __setitem__ to class types (25750).
  • Make JIT dicts ordered, matching Python 3.6+ semantics (26465).
  • Added invert bitwise operation to TorchScript (22324).
  • Add min() and max() for lists to TorchScript (26351).
  • Support iterables and ranges in list comprehensions (26768).

Improvements

C++ Frontend Improvements

We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:

Autograd

  • Tensor autograd APIs
    • torch::Tensor::data Added (26008).
    • torch::Tensor::grad Don’t create a gradient for non-1-element Variables [BC-breaking] (26150).
    • torch::Tensor::is_leaf Added (26186).
    • torch::Tensor::output_nr Added (26216).
    • torch::Tensor::_version Added (26217).
  • Add support for custom autograd functions in C++ API
    • For example usage, please see the PR description and test cases in (23572, 23628, and 23803)
  • torch::autograd::backward and torch::autograd::grad (24342)
  • torch::autograd::Variable::register_hook (24393).

New torch::nn modules

  • Containers
    • torch::nn::ModuleList (24317).
  • Linear layers
    • torch::nn::Identity (26713).
  • Convolution layers
  • Pooling layers
    • torch::nn::MaxPool1d / MaxPool2d / MaxPool3d (24860, 26521).
    • torch::nn::AvgPool1d / AvgPool2d / AvgPool3d (25800).
    • torch::nn::AdaptiveMaxPool1d / AdaptiveMaxPool2d / AdaptiveMaxPool3d (26755, 26772, 26775).
  • Loss functions
    • torch::nn::L1Loss (25902).
  • Distance functions
    • torch::nn::CosineSimilarity (26424)
    • torch::nn::PairwiseDistance (26424)

New torch::nn::functional functions

  • Pooling functions
    • torch::nn::functional::max_pool1d / max_pool2d / max_pool3d (26262).
    • torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices (26521).
    • torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d (26262).
    • torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d (26755, 26772, 26775).
    • torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices (26755, 26772, 26775).
  • Distance functions
    • torch::nn::functional::cosine_similarity (26424).
    • torch::nn::functional::pairwise_distance (26424).

tensor Construction API

  • Add support for multidimensional inputs to torch::tensor (26210, 26890, 26756).
    • From now on, we can use torch::tensor({{1, 2}, {3, 4}}) in C++ to construct the same tensor as torch.tensor([[1, 2], [3, 4]]) in Python. Some caveats are noted in this comment.
  • Add support for bool and BFloat16 dtypes to torch::tensor (23337).

Other C++ Improvements

  • Add torch::nn::Module::unregister_module function, for unregistering a submodule from a torch::nn::Module (26088).

Distributed Improvements

  • torch.distributed Detect and handle NCCL errors appropriately instead of blocking peers until a timeout in ProcessGroupNCCL (25012, 25905)
  • torch.distributed Make scatter/gather arguments optional (25575)
  • torch.distributed.launch Add a -m flag to allow users to launch python modules (24910).
  • torch.distributed Add function to get NCCL version for logging (26583).
  • torch.distributed Add timeout parameter to connect function in TCPStore (26554).
  • torch.distributed use timeout in connect function to prevent against infinite loop (26364).
  • torch.nn.modules.batchnorm Allow SyncBatchNorm to run without DDP in inference mode (24815)

Performance Improvements

  • torch.argmax/argmin Rewrite as TensorIterator reductions (26181).
  • torch.erfinv Vectorize unary operator (26629).
  • torch.sin/cos/tan Use intrinsics for trigonometric functions on CPU (26431).
  • Fix possible deadlock in SharedCache inside a forked child proc (25158).
  • torch.qr Fix a regression (23591).
  • nn.Conv Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).
  • nn.Conv Use parallel_for in DepthwiseConvKernel (26879).
  • nn.Conv Change shape for conv and unary ops (25477).
  • Fix pin_memory_thread not exiting quickly (23646).
  • Increase predefined_minimum_secs to reduce variation (23734).
  • Enhance Tensor indexSelect performance (23055).
  • Separate input shapes to reduce default execution time (24136).
  • constraints.lower_cholesky Vectorize LowerCholeskyTransform (24131).
  • Speed up an integer to the power of a positive integer on CPU (26020).
  • [ROCm] Enable jit fusion (22872).
  • [ROCm] Use MIOpen for transpose convolutions (26172).

JIT Improvements

  • Enable CPU fused kernel on Windows (25578).
  • Expose an API to iterate all the registered operators (23207).
  • Include recursive class compilations in error call stack (23454).
  • Substantial improvements to saved model format speed and size.
    • Compress debug symbols when serializing TorchScript models. (23659).
    • Compress all non-Tensor components of a serialized TorchScript model. (23723).
    • Perform string uniquing by value in pickle serialization. (23741).
    • Implement a bunch of pickle serialization features that optimize for size. (23759).
    • Implement more size-oriented opcodes in the depickler. (26454).
  • Cache node operators to speed up optimization (24827).
  • Allow forward hooks in tracing (23613).
  • Add Pickler C++ API (23241).
  • Open up AliasAnalysisKind for any ops (23810).
  • Add the ability to compile exports on traced modules (24298).
  • Make NoneType a subtype of Optional[T] (25361).

ONNX Exporter Improvements

In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.

Adding Support for ONNX IR v4

  • Provide an option to exclude the weights from model inputs (#23284)
  • Make graph inputs without weights as default (#26146)

Adding Support for ONNX Opset 11

  • Introduce ONNX Opset 11 support (#23739)
  • Add export for torch.Interpolate in Opset 11 (#24805, #27179)
  • Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 (#24790)
  • Add export for tensor.clamp in Opset 11 (#25797)
  • Add export for torch.topk and torch.sort in Opset 11 (#25739)

Exporting More Torch Operators/Models to ONNX

  • Export torch.pixel_shuffle (#23739)
  • Export torch.multinomial (#23581)
  • Export torch.norm’s frobenius_norm (#23536)
  • Export torch.std (#22310)
  • Export torch.empty and torch.empty_like (#24166)
  • Export torch.rsqrt (#24153)
  • Export torch.log1p (#25808)
  • Export torch.unique (#25050)
  • Export torch.gelu (#24475)
  • Export tensor.index_fill and tensor.index_copy (#23052)
  • Export torch.round (#26126)
  • Export torch.baddbmm (#25738)
  • Export torch.remainder (#24410)
  • Export torch.cumsum (#24476)
  • Export tensor.size with negative axis (#26436)
  • Export RNN/LSTM with h0/c0 initial state (#22813)

Enhancing ONNX Export Infra

  • Enable exporting PyTorch models which have Dict and String as inputs and outputs (#25889)
  • Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. (#24378)
  • Correctly validate dynamic axes names. (#23974)
  • Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 (#22993)

Other Improvements

  • Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result (23063).
  • torch.det/logdet/slogdet Allowing batching (22909).
  • torch.logical_not Add new operator (23839).
  • torch.logical_xor Add new operator (23847).
  • torch.symeig Improve the stability of gradient updates (23018).
  • torch.eye Enable for bool and half (24148).
  • torch.tril / triu Enable for bool and half (24163).
  • torch.logical_not/xor support non-bool tensors. (23916, 23978).
  • torch.index_select Implement indexing methods for sparse tensors (24937).
  • torch.lu_solve Enable broadcasting of batch dimensions (24333).
  • torch.cholesky Enable batches greater than 262140 (24438).
  • torch.det Simplify generation of singular matrices to avoid numerical issue on PowerPC (25773).
  • torch.erfinv In the CUDA implementation, use erfinv() for double to preserve accuracy (25337).
  • torch.erfinv Add a float version of erfinv on CPU (26070).
  • torch.cuda.stream Updates autograd engine to respect streams set in forward (8354).
  • torch.backends.mkldnn.enabled Allow disabling MKLDNN at runtime (25459).
  • torch.cholesky_solve Add derivative (26185).
  • torch.cholesky_inverse Add derivative (26451).
  • torch.polygamma Ensure that n is non-negative (26294).
  • torch.pinverse Enable batching (26095).
  • torch.digamma/trigamma Fix type mismatches on CUDA (25791).
  • torch.where Enable for bool tensor on CUDA (26430).
  • torch.load default encoding change to 'utf-8' (26421).
  • torch.repeat_interleave Respect the current stream (26946).
  • torch.bernoulli_ Implement for bool tensors (25076).
  • torch.norm Fix nuclear norm with requires_grad=True (26303).
  • torch.hub.download_url_to_file Make function public (26723).
  • nn.modules.conv add padding_mode to repr (23996).
  • nn.Transformer Extend to support BERT (gelu) (24181).
  • nn.BatchNorm2d Add support for non-affine batch norm with float stats and half inputs (22750).
  • nn.Parameter Fix type hints (25586).
  • nn.CTCLoss Improve error message (26325).
  • nn.Conv Allow batch size of 0 (26214).
  • nn.LSTM/GRU enable double backward for non-cudnn (26660).
  • optim.Adagrad Add epsilon argument (24980).
  • optim.LBFGS Change default tolerance_grad to 1e-7 (25240).
  • optim.lr_scheduler.OneCycleLR Add new 1cycle learning rate scheduler (25324).
  • optimizer.step Fix type annotation (26930).
  • bfloat16 Add support for sub, mul, and div on CPU (22851).
  • bfloat16 Enabled comparison ops on CPU (24182).
  • bfloat16 Enabled masked methods (24183).
  • bfloat16 Enabled torch.mm and torch.mv (24224).
  • bfloat16 Enable log_softmax and CrossEntropyLoss (24457).
  • bfloat16 Enabled conv methods (26167).
  • bfloat16 Enabled dtype on CUDA (26407).
  • quasirandom.SobolEngine Use random seed if not specified (24884).
  • utils.data.dataloader Add possible out of shared memory error message (25730).
  • cuda.set_rng_state Add type hint (26200).
  • Zero sized tensor support for repeat_interleave (23717).
  • Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor. (23621).
  • Fix double backward of inplace op on view (23502).
  • autograd.grad Validate shapes of outputs (25349).
  • Enable libflame as a LAPACK choice (25795).
  • Fix race condition in CUDA initialization (25788).
  • Include iteration_ in SGD optimizer serialization (26906).
  • [C++] torch::tensor Fix an ambiguous overload issues in constructor (26890).
  • [XLA] Check device before accessing data_ptr in PackLayer (26056).
  • [XLA] Allow overwriting catch-all kernels (25947).

Bug Fixes

TensorBoard Bug Fixes

  • SummaryWriter.add_graph: Fix empty graph output in some cases (25599).
  • Update Caffe2 contrib TensorBoard logging to not require TensorFlow (25259).
  • SummaryWriter.make_video: Fix write_gif call to moviepy for newer lib (21218).

C++ API Bug fixes

  • Fixes mismatch of device and data type when computing step_size in LBFGS optimizer (25909).

JIT

  • Fix list comprehension that change the type of the original iterable (24271).
  • Fix double copying of constants during recursive scripting (24412).
  • Fix frontend error message (23576).
  • Clear recursive error stack on each compilation (23458).
  • Fix bugs in assignment to optionals (25059).
  • Make torch.jit.Attribute work when PYTORCH_ENABLED=0 (23851).
  • Fix unicode in comments causing compilation errors (24218).
  • Correctly raise an error if an nn.Module has not been initialized but you try to script it (24852).
  • Fix annotated assignment to variables (25094).
  • dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
  • fix closures which always throw. (25278).
  • Add source location to class instantiation error (24990).
  • Fix AliasAnalysisKind::PURE on MSVC (25375).
  • Emit script function calls during tracing. (25089).
  • Resolve NamedTuple types properly in Python (26443).
  • Fix schema matching of tuples to vartype lists (25944).
  • Correctly preserve ignored function return value type (25262).
  • Fix missing newline in compiled from source range highlight (25802).
  • Fix use-after-free bug in optional (25965).
  • Fix torch.arange traced as constant (25363).
  • Preserve module names in recursive script (24505).
  • Properly resolve ignored module method type annotations (26683).
  • Make is_optional check more robust (26312).
  • Fix builtin lookup for Python functions (26688).
  • Typevar matching fix + implicit conversions from Scalar to int/float (26453).
  • Fix range for non-int inputs and pow implementation (26926).

Other Bug Fixes

  • torch.is_pinned pin_memory should not copy on already pinned tensors (23484).
  • torch.cdist Fix incorrect gradients on CUDA non-batch tensors (22915).
  • torch.from_numpy Fix failure on windows for int32 (25139).
  • torch.tensor Fix memory leak creating a tensor from numpy (24267).
  • torch.index Don't save self in index backward (25594).
  • torch.bincount Fix int32 overflow on CUDA (25748).
  • torch.bernoulli Fix the distribution sampler (26864).
  • torch.pow Fix precision (25476).
  • torch.cdist Fix gradient computation when first arg is 1xn (26254).
  • torch.scatter_add_ Fix scatter CPU kernel when (input size, src size) > index size (25839).
  • nn.ConvTranspose2d Fixed an error with float16 inputs and weights on CUDA. (23552).
  • nn.CTCLoss Fix zero-length targets on CUDA (23298).
  • nn.Conv2d Correct an overflow in an error message (25146).
  • optim.Adam apply a small mathematical fix. (23737).
  • dataloader Fix IndexError on shutdown if not all workers are started (23761).
  • Tensor.repeat Fix crash on for 0 repeats (23766).
  • torch.pin_memory only use one thread (25111).
  • distributions.Uniform,HalfCauchy,Gamma Fix log_prob when value is a float (23017).
  • Fix typing error for Padding with asymmetric signatures (24895).
  • Avoid race condition in intrusive_ptr.reset_() (24464).
  • torch.hub: Fix SSL cert issue for hub in Python 2 (25042).
  • Fix int overflow issue in CUDA kernels. (24818).
  • Module.cuda Fix type hints (25018).
  • Fix bug in assertNotEqual for int tensors (25412).
  • Fix 'in' return true incorrectly (24156).
  • Fix bugs in bulk loader when batch_size=None or with namedtuple (26065).
  • Fix serialization issue in big endian arch (26383).
  • Fix Vec256::abs() for floating point when applied on -0.0 (26422).
  • Fix cyclic reference in _LRScheduler (25776).
  • Fix a build failure on s390x (26233).
  • [XLA] Fix tensor construction from array (24283).

Documentation Updates

Distributed

  • torch.distributed Error phrasing in torch.distributed helper functions (25574)
  • torch.distributions.negative_binomial clarified ambiguous doc string in NegativeBinomial (25923)

JIT

  • Add technical documentation for the serialization format (23456).
  • Fix trace docs (24191).
  • Add trace_module to docs (24258).
  • Cleanup distinction around script and trace (24208).
  • Fix item() call in docs (25404).
  • Misc doc updates / fixes (24371, 24445).

Other documentation improvements

  • torch.record_stream Add documentation (24078).
  • torch.fold Describe the relation between fold and unfold operations (24840).
  • torch.argmax Fix incorrect doc (23775).
  • torch.random add docs (23553).
  • torch.empty_strided Add docs (23735).
  • torch.bitwise_not Document for bool tensors (23800).
  • torch.cdist Add documentation (25221).
  • torch.where Update parameter names in doc (25554).
  • torch.atan2 Clarify and correct the doc (26180).
  • nn.functional.bilinear Added documentation (24951).
  • nn.functional.upsample Fix align_corners doc (23707).
  • nn.Transformer Fixed an error in the example (24837).
  • optim.lr_scheduler.CosineAnnealingWarmRestarts Add documentation (25421).
  • optim.SGD Updated with subscripts (23985).
  • optim.RMSprop Highlighting in the doc that square root comes before adding epsilon (26735).
  • autograd.detect_anomaly Add a warning (26615).
  • Improve dataloader docs on when auto-batching is disabled (23671).
  • Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
  • Document benchmarking practice for CUDA (23910).
  • Add ASAN instructions to CONTRIBUTING.md (24848).
PyTorch-v1.3.0-blog.jpg
Deep Learning

PyTorch Release v1.3.0 - Mobile Support, Named Tensors, Quantization, Type Promotion

October 10, 201966 min read

PyTorch is a widely used, open-source deep learning platform used for easily writing neural network layers in Python enabling seamless workflow from research to production. Based on Torch, PyTorch has become a powerful machine learning framework favored by esteemed researchers around the world.

Here is the newest PyTorch release v1.3.0 featuring new mobile support, named tensors, quantization, type promotion, and many more new features.

Table of Contents

  • Breaking Changes
  • Highlights
    • [Experimental]: Mobile Support
    • [Experimental]: Named Tensor Support
    • [Experimental]: Quantization support
    • Type Promotion
    • Deprecations
  • New Features
    • TensorBoard: 3D Mesh and Hyperparameter Support
    • Distributed
    • Libtorch Binaries with C++11 ABI
    • New TorchScript features
  • Improvements
    • C++ Frontend Improvements
      • Autograd
      • New torch::nn modules
      • New torch::nn::functional functions
      • tensor Construction API
      • Other C++ Improvements
    • Distributed Improvements
    • Performance Improvements
    • JIT Improvements
    • ONNX Exporter Improvements
      • Adding Support for ONNX IR v4
      • Adding Support for ONNX Opset 11
      • Exporting More Torch Operators/Models to ONNX
      • Enhancing ONNX Export Infra
    • Other Improvements
  • Bug Fixes
    • TensorBoard Bug Fixes
    • C++ API Bug fixes
    • JIT
    • Other Bug Fixes
  • Documentation Updates
    • Distributed
    • JIT
    • Other documentation improvements

Deep Learning Workstations Transformer

Breaking Changes

Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions. (22273, 26981)

Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.

In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see full documentation). These rules generally will retain precision and be less surprising to users.

Version 1.2Version 1.3
>>> torch.tensor(1) + 2.5
tensor(3)
>>> torch.tensor([1]) + torch.tensor(2.5)
tensor([3])
>>> torch.tensor(**True**) + 5
tensor(True)
>>> torch.tensor(1) + 2.5
tensor(3.5000)
>>> torch.tensor([1]) + torch.tensor(2.5)
tensor([3.5000])
>>> torch.tensor(True) + 5
tensor(6)

Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error. (22273, 26981)

Version 1.2Version 1.3
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
tensor(2)
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
tensor(True)
>>> int_tensor = torch.tensor(1)
>>> int_tensor.add_(1.5)
RuntimeError: result type Float cannot be cast to the desired output type Long
>>> bool_tensor = torch.tensor(True)
>>> bool_tensor.add_(5)
RuntimeError: result type Long cannot be cast to the desired output type Bool

These rules can be checked at runtime via torch.can_cast.

torch.flatten: 0-dimensional inputs now return a 1-dim tensor. (25406).

Version 1.2Version 1.3
>>> torch.flatten(torch.tensor(0))
tensor(0)
>>> torch.flatten(torch.tensor(0))
tensor([0])

nn.functional.affine_grid: when align_corners = True, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).

Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).

torch.gels: removed deprecated operator, use torch.lstsq instead. (26480).

utils.data.DataLoader: made a number of Iterator attributes private (e.g. num_workers, pin_memory). (22273)

[C++] Variable::backward will no longer implicitly create a gradient for non-1-element Variables. Previously, a gradient tensor of all 1s would be implicitly created. This behavior matches the Python API. (26150)

auto x = torch::randn({5, 5}, torch::requires_grad());
auto y = x * x;
y.backward()
// ERROR: "grad can be implicitly created only for scalar outputs"

[C++] All option specifiers (e.g. GRUOptions::bidirectional_) are now private, use the function variants (GRUOptions::bidirectional(...)) instead. (26419).

Highlights

[Experimental]: Mobile Support

In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here is the full list of features in this release:

  • Support for full TorchScript inference on mobile;
  • Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
  • Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
  • Support for all forward ops on mobile CPU (backward ops are not supported yet);
  • Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
  • Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);

We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.

The tutorials, demo apps, and download links for prebuilt libraries can be found at https://pytorch.org/mobile/

This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster, and better for your specific use cases. Stay tuned and give us your feedback!

[Experimental]: Named Tensor Support

Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by the name" rather than "broadcasting by position".

Create a named tensor by passing a names argument into most tensor factory functions.

>>> tensor = torch.zeros(2, 3, names=('C', 'N'))
    tensor([[0., 0., 0.],
            [0., 0., 0.]], names=('C', 'N'))

Named tensors propagate names across operations.

>>> tensor.abs()
    tensor([[0., 0., 0.],
            [0., 0., 0.]], names=('C', 'N'))

Rearrange to the desired ordering by using align_to .

>>> tensor = tensor.align_to('N', 'C', 'H', 'W')
>>> tensor.names, tensor.shape
    (('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))

And more! Please see our documentation on named tensors.

[Experimental]: Quantization support

PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:

x = torch.rand(10,1, dtype=torch.float32)
xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
# xq is a quantized tensor with data represented as quint8
xdq = x.dequantize()
# convert back to floating point

We also support 8-bit quantized implementations of most common operators in CNNs, including:

  • Tensor operations:
    • view, clone, resize, slice
    • add, multiply, cat, mean, max, sort, topk
  • Modules/Functionals (in torch.nn.quantized)
    • Conv2d
    • Linear
    • Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
    • Interpolate
    • Upsample
  • Fused operations for preserving better accuracy (in torch.nn.intrinsic)
    • ConvReLU2d, ConvBnReLU2d, ConvBn2d
    • LinearReLU
    • add_relu

We also support dynamic quantized operators, which take in floating point activations but use quantized weights (in torch.nn.quantized.dynamic).

  • LSTM
  • Linear

Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:

  • MinMaxObserver
  • MovingAverageMinMaxObserver
  • PerChannelMinMaxObserver
  • MovingAveragePerChannelMinMaxObserver
  • HistogramObserver

For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:

  • torch.fake_quantize_per_tensor_affine, torch.fake_quantize_per_channel_affine
  • torch.quantization.FakeQuantize

In addition, we also support workflows in torch. quantization for:

  • post-training dynamic quantization
  • static post-training quantization
  • quantization aware training

All quantized operators are compatible with TorchScript.

For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html

Type Promotion

Arithmetic and comparison operations may now perform mixed-type operations that promote a common dtype.

This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with dtype=torch.float32.

>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)

See the full documentation for more details.

  • torch.result_type Provide a function to determine the result of mixed-type operations (26012).
  • torch.can_cast Expose casting rules for type promotion (26805).
  • torch.promote_types Expose promotion logic (26655).

Deprecations

nn.functional.affine_grid / nn.functional.grid_sample: USING The Align_CORNER Default value is now deprecated because it will be changed in 1.4 release.

The align_corner the parameter was added in this release; the behavior in the previous release was equivalent to setting the parameter to True. This is also the current default value but it will be changed to False from 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
                                    (1,3,2,2))
UserWarning: Default grid_sample and affine_grid behavior will be changed
to align_corners=False from 1.4.0. 
See the documentation of grid_sample for details.
...

>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
                                    (1,3,2,2),
                                    align_corners=True)
# NO WARNING!
...

[C++] Deprecate torch::Tensor::data<T>() in favor of torch::Tensor::data_ptr<T>() (24847, 24886).

New Features

TensorBoard: 3D Mesh and Hyperparameter Support

torch.utils.tensorboard support 3D mesh and points plus hyperparameter logging. More details can be found in the documentation for SummaryWriter with add_mesh and add_hparams.

A simple example of exercising both methods:

from torch.utils.tensorboard import SummaryWriter

vertices_tensor = torch.as_tensor([
    [1, 1, 1],
    [-1, -1, 1],
    [1, -1, -1],
    [-1, 1, -1],
], dtype=torch.float).unsqueeze(0)
colors_tensor = torch.as_tensor([
    [255, 0, 0],
    [0, 255, 0],
    [0, 0, 255],
    [255, 0, 255],
], dtype=torch.int).unsqueeze(0)
faces_tensor = torch.as_tensor([
    [0, 2, 3],
    [0, 3, 1],
    [0, 1, 2],
    [1, 3, 2],
], dtype=torch.int).unsqueeze(0)

with SummaryWriter() as w:
    w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
    for i in range(5):
        w.add_hparams({'lr': 0.1*i, 'bsize': i},
                      {'hparam/accuracy': 10*i, 'hparam/loss': 10*i})

Distributed

This release adds macOS support for torch.distributed with the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.

  • torch.distributed.all_reduce_coalesced Support all reduce of a list of same-device tensors (24949, 25470, 24876)
  • torch.distributed.all_reduce Add bitwise reduction ops (BAND, BOR, BXOR) (26824)

Libtorch Binaries with C++11 ABI

We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ “QUICK START LOCALLY”.

New TorchScript features

  • Add not in support for TorchScript (23637).
  • You can now raise exceptions in one side of an if branch (23565).
  • Add torch.jit.is_scripting() API (25955).
  • Make assertions like x is not None unwrap the optional type of x (23949).
  • Add dictionary augmented assignment (+=) support to TorchScript (23639).
  • Support grad and data attribute for tensor in TorchScript (23842).
  • Add @ignore for TorchScript classes (23614).
  • Support nn.GRU in script (23266).
  • Support tensor as a key type in TorchScript (23638).
  • Add support for ModuleDict (25715).
  • Bind set_grad_enabled() into TorchScript (25350).
  • Add in membership checks for lists (25796).
  • Add tuple keyword (25474).
  • Add __getitem__ to class types (25664).
  • Add __setitem__ to class types (25750).
  • Make JIT dicts ordered, matching Python 3.6+ semantics (26465).
  • Added invert bitwise operation to TorchScript (22324).
  • Add min() and max() for lists to TorchScript (26351).
  • Support iterables and ranges in list comprehensions (26768).

Improvements

C++ Frontend Improvements

We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:

Autograd

  • Tensor autograd APIs
    • torch::Tensor::data Added (26008).
    • torch::Tensor::grad Don’t create a gradient for non-1-element Variables [BC-breaking] (26150).
    • torch::Tensor::is_leaf Added (26186).
    • torch::Tensor::output_nr Added (26216).
    • torch::Tensor::_version Added (26217).
  • Add support for custom autograd functions in C++ API
    • For example usage, please see the PR description and test cases in (23572, 23628, and 23803)
  • torch::autograd::backward and torch::autograd::grad (24342)
  • torch::autograd::Variable::register_hook (24393).

New torch::nn modules

  • Containers
    • torch::nn::ModuleList (24317).
  • Linear layers
    • torch::nn::Identity (26713).
  • Convolution layers
  • Pooling layers
    • torch::nn::MaxPool1d / MaxPool2d / MaxPool3d (24860, 26521).
    • torch::nn::AvgPool1d / AvgPool2d / AvgPool3d (25800).
    • torch::nn::AdaptiveMaxPool1d / AdaptiveMaxPool2d / AdaptiveMaxPool3d (26755, 26772, 26775).
  • Loss functions
    • torch::nn::L1Loss (25902).
  • Distance functions
    • torch::nn::CosineSimilarity (26424)
    • torch::nn::PairwiseDistance (26424)

New torch::nn::functional functions

  • Pooling functions
    • torch::nn::functional::max_pool1d / max_pool2d / max_pool3d (26262).
    • torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices (26521).
    • torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d (26262).
    • torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d (26755, 26772, 26775).
    • torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices (26755, 26772, 26775).
  • Distance functions
    • torch::nn::functional::cosine_similarity (26424).
    • torch::nn::functional::pairwise_distance (26424).

tensor Construction API

  • Add support for multidimensional inputs to torch::tensor (26210, 26890, 26756).
    • From now on, we can use torch::tensor({{1, 2}, {3, 4}}) in C++ to construct the same tensor as torch.tensor([[1, 2], [3, 4]]) in Python. Some caveats are noted in this comment.
  • Add support for bool and BFloat16 dtypes to torch::tensor (23337).

Other C++ Improvements

  • Add torch::nn::Module::unregister_module function, for unregistering a submodule from a torch::nn::Module (26088).

Distributed Improvements

  • torch.distributed Detect and handle NCCL errors appropriately instead of blocking peers until a timeout in ProcessGroupNCCL (25012, 25905)
  • torch.distributed Make scatter/gather arguments optional (25575)
  • torch.distributed.launch Add a -m flag to allow users to launch python modules (24910).
  • torch.distributed Add function to get NCCL version for logging (26583).
  • torch.distributed Add timeout parameter to connect function in TCPStore (26554).
  • torch.distributed use timeout in connect function to prevent against infinite loop (26364).
  • torch.nn.modules.batchnorm Allow SyncBatchNorm to run without DDP in inference mode (24815)

Performance Improvements

  • torch.argmax/argmin Rewrite as TensorIterator reductions (26181).
  • torch.erfinv Vectorize unary operator (26629).
  • torch.sin/cos/tan Use intrinsics for trigonometric functions on CPU (26431).
  • Fix possible deadlock in SharedCache inside a forked child proc (25158).
  • torch.qr Fix a regression (23591).
  • nn.Conv Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).
  • nn.Conv Use parallel_for in DepthwiseConvKernel (26879).
  • nn.Conv Change shape for conv and unary ops (25477).
  • Fix pin_memory_thread not exiting quickly (23646).
  • Increase predefined_minimum_secs to reduce variation (23734).
  • Enhance Tensor indexSelect performance (23055).
  • Separate input shapes to reduce default execution time (24136).
  • constraints.lower_cholesky Vectorize LowerCholeskyTransform (24131).
  • Speed up an integer to the power of a positive integer on CPU (26020).
  • [ROCm] Enable jit fusion (22872).
  • [ROCm] Use MIOpen for transpose convolutions (26172).

JIT Improvements

  • Enable CPU fused kernel on Windows (25578).
  • Expose an API to iterate all the registered operators (23207).
  • Include recursive class compilations in error call stack (23454).
  • Substantial improvements to saved model format speed and size.
    • Compress debug symbols when serializing TorchScript models. (23659).
    • Compress all non-Tensor components of a serialized TorchScript model. (23723).
    • Perform string uniquing by value in pickle serialization. (23741).
    • Implement a bunch of pickle serialization features that optimize for size. (23759).
    • Implement more size-oriented opcodes in the depickler. (26454).
  • Cache node operators to speed up optimization (24827).
  • Allow forward hooks in tracing (23613).
  • Add Pickler C++ API (23241).
  • Open up AliasAnalysisKind for any ops (23810).
  • Add the ability to compile exports on traced modules (24298).
  • Make NoneType a subtype of Optional[T] (25361).

ONNX Exporter Improvements

In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.

Adding Support for ONNX IR v4

  • Provide an option to exclude the weights from model inputs (#23284)
  • Make graph inputs without weights as default (#26146)

Adding Support for ONNX Opset 11

  • Introduce ONNX Opset 11 support (#23739)
  • Add export for torch.Interpolate in Opset 11 (#24805, #27179)
  • Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 (#24790)
  • Add export for tensor.clamp in Opset 11 (#25797)
  • Add export for torch.topk and torch.sort in Opset 11 (#25739)

Exporting More Torch Operators/Models to ONNX

  • Export torch.pixel_shuffle (#23739)
  • Export torch.multinomial (#23581)
  • Export torch.norm’s frobenius_norm (#23536)
  • Export torch.std (#22310)
  • Export torch.empty and torch.empty_like (#24166)
  • Export torch.rsqrt (#24153)
  • Export torch.log1p (#25808)
  • Export torch.unique (#25050)
  • Export torch.gelu (#24475)
  • Export tensor.index_fill and tensor.index_copy (#23052)
  • Export torch.round (#26126)
  • Export torch.baddbmm (#25738)
  • Export torch.remainder (#24410)
  • Export torch.cumsum (#24476)
  • Export tensor.size with negative axis (#26436)
  • Export RNN/LSTM with h0/c0 initial state (#22813)

Enhancing ONNX Export Infra

  • Enable exporting PyTorch models which have Dict and String as inputs and outputs (#25889)
  • Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. (#24378)
  • Correctly validate dynamic axes names. (#23974)
  • Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 (#22993)

Other Improvements

  • Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result (23063).
  • torch.det/logdet/slogdet Allowing batching (22909).
  • torch.logical_not Add new operator (23839).
  • torch.logical_xor Add new operator (23847).
  • torch.symeig Improve the stability of gradient updates (23018).
  • torch.eye Enable for bool and half (24148).
  • torch.tril / triu Enable for bool and half (24163).
  • torch.logical_not/xor support non-bool tensors. (23916, 23978).
  • torch.index_select Implement indexing methods for sparse tensors (24937).
  • torch.lu_solve Enable broadcasting of batch dimensions (24333).
  • torch.cholesky Enable batches greater than 262140 (24438).
  • torch.det Simplify generation of singular matrices to avoid numerical issue on PowerPC (25773).
  • torch.erfinv In the CUDA implementation, use erfinv() for double to preserve accuracy (25337).
  • torch.erfinv Add a float version of erfinv on CPU (26070).
  • torch.cuda.stream Updates autograd engine to respect streams set in forward (8354).
  • torch.backends.mkldnn.enabled Allow disabling MKLDNN at runtime (25459).
  • torch.cholesky_solve Add derivative (26185).
  • torch.cholesky_inverse Add derivative (26451).
  • torch.polygamma Ensure that n is non-negative (26294).
  • torch.pinverse Enable batching (26095).
  • torch.digamma/trigamma Fix type mismatches on CUDA (25791).
  • torch.where Enable for bool tensor on CUDA (26430).
  • torch.load default encoding change to 'utf-8' (26421).
  • torch.repeat_interleave Respect the current stream (26946).
  • torch.bernoulli_ Implement for bool tensors (25076).
  • torch.norm Fix nuclear norm with requires_grad=True (26303).
  • torch.hub.download_url_to_file Make function public (26723).
  • nn.modules.conv add padding_mode to repr (23996).
  • nn.Transformer Extend to support BERT (gelu) (24181).
  • nn.BatchNorm2d Add support for non-affine batch norm with float stats and half inputs (22750).
  • nn.Parameter Fix type hints (25586).
  • nn.CTCLoss Improve error message (26325).
  • nn.Conv Allow batch size of 0 (26214).
  • nn.LSTM/GRU enable double backward for non-cudnn (26660).
  • optim.Adagrad Add epsilon argument (24980).
  • optim.LBFGS Change default tolerance_grad to 1e-7 (25240).
  • optim.lr_scheduler.OneCycleLR Add new 1cycle learning rate scheduler (25324).
  • optimizer.step Fix type annotation (26930).
  • bfloat16 Add support for sub, mul, and div on CPU (22851).
  • bfloat16 Enabled comparison ops on CPU (24182).
  • bfloat16 Enabled masked methods (24183).
  • bfloat16 Enabled torch.mm and torch.mv (24224).
  • bfloat16 Enable log_softmax and CrossEntropyLoss (24457).
  • bfloat16 Enabled conv methods (26167).
  • bfloat16 Enabled dtype on CUDA (26407).
  • quasirandom.SobolEngine Use random seed if not specified (24884).
  • utils.data.dataloader Add possible out of shared memory error message (25730).
  • cuda.set_rng_state Add type hint (26200).
  • Zero sized tensor support for repeat_interleave (23717).
  • Recommend ~ and bitwise_not() when user tries to apply neg (-) on a bool tensor. (23621).
  • Fix double backward of inplace op on view (23502).
  • autograd.grad Validate shapes of outputs (25349).
  • Enable libflame as a LAPACK choice (25795).
  • Fix race condition in CUDA initialization (25788).
  • Include iteration_ in SGD optimizer serialization (26906).
  • [C++] torch::tensor Fix an ambiguous overload issues in constructor (26890).
  • [XLA] Check device before accessing data_ptr in PackLayer (26056).
  • [XLA] Allow overwriting catch-all kernels (25947).

Bug Fixes

TensorBoard Bug Fixes

  • SummaryWriter.add_graph: Fix empty graph output in some cases (25599).
  • Update Caffe2 contrib TensorBoard logging to not require TensorFlow (25259).
  • SummaryWriter.make_video: Fix write_gif call to moviepy for newer lib (21218).

C++ API Bug fixes

  • Fixes mismatch of device and data type when computing step_size in LBFGS optimizer (25909).

JIT

  • Fix list comprehension that change the type of the original iterable (24271).
  • Fix double copying of constants during recursive scripting (24412).
  • Fix frontend error message (23576).
  • Clear recursive error stack on each compilation (23458).
  • Fix bugs in assignment to optionals (25059).
  • Make torch.jit.Attribute work when PYTORCH_ENABLED=0 (23851).
  • Fix unicode in comments causing compilation errors (24218).
  • Correctly raise an error if an nn.Module has not been initialized but you try to script it (24852).
  • Fix annotated assignment to variables (25094).
  • dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
  • fix closures which always throw. (25278).
  • Add source location to class instantiation error (24990).
  • Fix AliasAnalysisKind::PURE on MSVC (25375).
  • Emit script function calls during tracing. (25089).
  • Resolve NamedTuple types properly in Python (26443).
  • Fix schema matching of tuples to vartype lists (25944).
  • Correctly preserve ignored function return value type (25262).
  • Fix missing newline in compiled from source range highlight (25802).
  • Fix use-after-free bug in optional (25965).
  • Fix torch.arange traced as constant (25363).
  • Preserve module names in recursive script (24505).
  • Properly resolve ignored module method type annotations (26683).
  • Make is_optional check more robust (26312).
  • Fix builtin lookup for Python functions (26688).
  • Typevar matching fix + implicit conversions from Scalar to int/float (26453).
  • Fix range for non-int inputs and pow implementation (26926).

Other Bug Fixes

  • torch.is_pinned pin_memory should not copy on already pinned tensors (23484).
  • torch.cdist Fix incorrect gradients on CUDA non-batch tensors (22915).
  • torch.from_numpy Fix failure on windows for int32 (25139).
  • torch.tensor Fix memory leak creating a tensor from numpy (24267).
  • torch.index Don't save self in index backward (25594).
  • torch.bincount Fix int32 overflow on CUDA (25748).
  • torch.bernoulli Fix the distribution sampler (26864).
  • torch.pow Fix precision (25476).
  • torch.cdist Fix gradient computation when first arg is 1xn (26254).
  • torch.scatter_add_ Fix scatter CPU kernel when (input size, src size) > index size (25839).
  • nn.ConvTranspose2d Fixed an error with float16 inputs and weights on CUDA. (23552).
  • nn.CTCLoss Fix zero-length targets on CUDA (23298).
  • nn.Conv2d Correct an overflow in an error message (25146).
  • optim.Adam apply a small mathematical fix. (23737).
  • dataloader Fix IndexError on shutdown if not all workers are started (23761).
  • Tensor.repeat Fix crash on for 0 repeats (23766).
  • torch.pin_memory only use one thread (25111).
  • distributions.Uniform,HalfCauchy,Gamma Fix log_prob when value is a float (23017).
  • Fix typing error for Padding with asymmetric signatures (24895).
  • Avoid race condition in intrusive_ptr.reset_() (24464).
  • torch.hub: Fix SSL cert issue for hub in Python 2 (25042).
  • Fix int overflow issue in CUDA kernels. (24818).
  • Module.cuda Fix type hints (25018).
  • Fix bug in assertNotEqual for int tensors (25412).
  • Fix 'in' return true incorrectly (24156).
  • Fix bugs in bulk loader when batch_size=None or with namedtuple (26065).
  • Fix serialization issue in big endian arch (26383).
  • Fix Vec256::abs() for floating point when applied on -0.0 (26422).
  • Fix cyclic reference in _LRScheduler (25776).
  • Fix a build failure on s390x (26233).
  • [XLA] Fix tensor construction from array (24283).

Documentation Updates

Distributed

  • torch.distributed Error phrasing in torch.distributed helper functions (25574)
  • torch.distributions.negative_binomial clarified ambiguous doc string in NegativeBinomial (25923)

JIT

  • Add technical documentation for the serialization format (23456).
  • Fix trace docs (24191).
  • Add trace_module to docs (24258).
  • Cleanup distinction around script and trace (24208).
  • Fix item() call in docs (25404).
  • Misc doc updates / fixes (24371, 24445).

Other documentation improvements

  • torch.record_stream Add documentation (24078).
  • torch.fold Describe the relation between fold and unfold operations (24840).
  • torch.argmax Fix incorrect doc (23775).
  • torch.random add docs (23553).
  • torch.empty_strided Add docs (23735).
  • torch.bitwise_not Document for bool tensors (23800).
  • torch.cdist Add documentation (25221).
  • torch.where Update parameter names in doc (25554).
  • torch.atan2 Clarify and correct the doc (26180).
  • nn.functional.bilinear Added documentation (24951).
  • nn.functional.upsample Fix align_corners doc (23707).
  • nn.Transformer Fixed an error in the example (24837).
  • optim.lr_scheduler.CosineAnnealingWarmRestarts Add documentation (25421).
  • optim.SGD Updated with subscripts (23985).
  • optim.RMSprop Highlighting in the doc that square root comes before adding epsilon (26735).
  • autograd.detect_anomaly Add a warning (26615).
  • Improve dataloader docs on when auto-batching is disabled (23671).
  • Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
  • Document benchmarking practice for CUDA (23910).
  • Add ASAN instructions to CONTRIBUTING.md (24848).