PyTorch is a widely used, open-source deep learning platform used for easily writing neural network layers in Python enabling seamless workflow from research to production. Based on Torch, PyTorch has become a powerful machine learning framework favored by esteemed researchers around the world.
Here is the newest PyTorch release v1.3.0 featuring new mobile support, named tensors, quantization, type promotion, and many more new features.
Table of Contents
- Breaking Changes
- Highlights
- [Experimental]: Mobile Support
- [Experimental]: Named Tensor Support
- [Experimental]: Quantization support
- Type Promotion
- Deprecations
- New Features
- TensorBoard: 3D Mesh and Hyperparameter Support
- Distributed
- Libtorch Binaries with C++11 ABI
- New TorchScript features
- Improvements
- C++ Frontend Improvements
- Autograd
- New torch::nn modules
- New torch::nn::functional functions
- tensor Construction API
- Other C++ Improvements
- Distributed Improvements
- Performance Improvements
- JIT Improvements
- ONNX Exporter Improvements
- Adding Support for ONNX IR v4
- Adding Support for ONNX Opset 11
- Exporting More Torch Operators/Models to ONNX
- Enhancing ONNX Export Infra
- Other Improvements
- C++ Frontend Improvements
- Bug Fixes
- TensorBoard Bug Fixes
- C++ API Bug fixes
- JIT
- Other Bug Fixes
- Documentation Updates
- Distributed
- JIT
- Other documentation improvements
Breaking Changes
Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions. (22273, 26981)
Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.
In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see full documentation). These rules generally will retain precision and be less surprising to users.
Version 1.2 | Version 1.3 |
---|---|
>>> torch.tensor(1) + 2.5 tensor(3) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3]) >>> torch.tensor(**True**) + 5 tensor(True) | >>> torch.tensor(1) + 2.5 tensor(3.5000) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3.5000]) >>> torch.tensor(True) + 5 tensor(6) |
Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error. (22273, 26981)
Version 1.2 | Version 1.3 |
---|---|
>>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) tensor(2) >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) tensor(True) | >>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) RuntimeError: result type Float cannot be cast to the desired output type Long >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) RuntimeError: result type Long cannot be cast to the desired output type Bool |
These rules can be checked at runtime via torch.can_cast.
torch.flatten
: 0-dimensional inputs now return a 1-dim tensor. (25406).
Version 1.2 | Version 1.3 |
---|---|
>>> torch.flatten(torch.tensor(0)) tensor(0) | >>> torch.flatten(torch.tensor(0)) tensor([0]) |
nn.functional.affine_grid
: when align_corners = True
, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).
Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).
torch.gels:
removed deprecated operator, use torch.lstsq
instead. (26480).
utils.data.DataLoader:
made a number of Iterator attributes private (e.g. num_workers
, pin_memory
). (22273)
[C++] Variable::backward
will no longer implicitly create a gradient for non-1-element Variables. Previously, a gradient tensor of all 1s would be implicitly created. This behavior matches the Python API. (26150)
auto x = torch::randn({5, 5}, torch::requires_grad());
auto y = x * x;
y.backward()
// ERROR: "grad can be implicitly created only for scalar outputs"
[C++] All option specifiers (e.g. GRUOptions::bidirectional_
) are now private, use the function variants (GRUOptions::bidirectional(...))
instead. (26419).
Highlights
[Experimental]: Mobile Support
In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here is the full list of features in this release:
- Support for full TorchScript inference on mobile;
- Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
- Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
- Support for all forward ops on mobile CPU (backward ops are not supported yet);
- Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
- Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);
We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.
The tutorials, demo apps, and download links for prebuilt libraries can be found at https://pytorch.org/mobile/
This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster, and better for your specific use cases. Stay tuned and give us your feedback!
[Experimental]: Named Tensor Support
Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by the name" rather than "broadcasting by position".
Create a named tensor by passing a names
argument into most tensor factory functions.
>>> tensor = torch.zeros(2, 3, names=('C', 'N')) tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))
Named tensors propagate names across operations.
>>> tensor.abs() tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))
Rearrange to the desired ordering by using align_to
.
>>> tensor = tensor.align_to('N', 'C', 'H', 'W') >>> tensor.names, tensor.shape (('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))
And more! Please see our documentation on named tensors.
[Experimental]: Quantization support
PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:
x = torch.rand(10,1, dtype=torch.float32)
xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
# xq is a quantized tensor with data represented as quint8
xdq = x.dequantize()
# convert back to floating point
We also support 8-bit quantized implementations of most common operators in CNNs, including:
- Tensor operations:
- view, clone, resize, slice
- add, multiply, cat, mean, max, sort, topk
- Modules/Functionals (in torch.nn.quantized)
- Conv2d
- Linear
- Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
- Interpolate
- Upsample
- Fused operations for preserving better accuracy (in torch.nn.intrinsic)
- ConvReLU2d, ConvBnReLU2d, ConvBn2d
- LinearReLU
- add_relu
We also support dynamic quantized operators, which take in floating point activations but use quantized weights (in torch.nn.quantized.dynamic).
- LSTM
- Linear
Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:
- MinMaxObserver
- MovingAverageMinMaxObserver
- PerChannelMinMaxObserver
- MovingAveragePerChannelMinMaxObserver
- HistogramObserver
For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:
torch.fake_quantize_per_tensor_affine
,torch.fake_quantize_per_channel_affine
torch.quantization.FakeQuantize
In addition, we also support workflows in torch. quantization for:
- post-training dynamic quantization
- static post-training quantization
- quantization aware training
All quantized operators are compatible with TorchScript.
For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html
Type Promotion
Arithmetic and comparison operations may now perform mixed-type operations that promote a common dtype.
This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with dtype=torch.float32
.
>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)
See the full documentation for more details.
torch.result_type
Provide a function to determine the result of mixed-type operations (26012).torch.can_cast
Expose casting rules for type promotion (26805).torch.promote_types
Expose promotion logic (26655).
Deprecations
nn.functional.affine_grid
/ nn.functional.grid_sample
: USING The Align_CORNER Default value is now deprecated because it will be changed in 1.4 release.
The align_corner
the parameter was added in this release; the behavior in the previous release was equivalent to setting the parameter to True
. This is also the current default value but it will be changed to False
from 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.
>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2))
UserWarning: Default grid_sample and affine_grid behavior will be changed
to align_corners=False from 1.4.0.
See the documentation of grid_sample for details.
...
>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2),
align_corners=True)
# NO WARNING!
...
[C++] Deprecate torch::Tensor::data<T>()
in favor of torch::Tensor::data_ptr<T>()
(24847, 24886).
New Features
TensorBoard: 3D Mesh and Hyperparameter Support
torch.utils.tensorboard
support 3D mesh and points plus hyperparameter logging. More details can be found in the documentation for SummaryWriter
with add_mesh
and add_hparams
.
A simple example of exercising both methods:
from torch.utils.tensorboard import SummaryWriter
vertices_tensor = torch.as_tensor([
[1, 1, 1],
[-1, -1, 1],
[1, -1, -1],
[-1, 1, -1],
], dtype=torch.float).unsqueeze(0)
colors_tensor = torch.as_tensor([
[255, 0, 0],
[0, 255, 0],
[0, 0, 255],
[255, 0, 255],
], dtype=torch.int).unsqueeze(0)
faces_tensor = torch.as_tensor([
[0, 2, 3],
[0, 3, 1],
[0, 1, 2],
[1, 3, 2],
], dtype=torch.int).unsqueeze(0)
with SummaryWriter() as w:
w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
for i in range(5):
w.add_hparams({'lr': 0.1*i, 'bsize': i},
{'hparam/accuracy': 10*i, 'hparam/loss': 10*i})
Distributed
This release adds macOS support for torch.distributed
with the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.
torch.distributed.all_reduce_coalesced
Support all reduce of a list of same-device tensors (24949, 25470, 24876)torch.distributed.all_reduce
Add bitwise reduction ops (BAND, BOR, BXOR) (26824)
Libtorch Binaries with C++11 ABI
We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ “QUICK START LOCALLY”.
New TorchScript features
- Add
not in
support for TorchScript (23637). - You can now raise exceptions in one side of an if branch (23565).
- Add
torch.jit.is_scripting()
API (25955). - Make assertions like
x is not None
unwrap the optional type ofx
(23949). - Add dictionary augmented assignment (
+=
) support to TorchScript (23639). - Support
grad
anddata
attribute for tensor in TorchScript (23842). - Add
@ignore
for TorchScript classes (23614). - Support nn.GRU in script (23266).
- Support tensor as a key type in TorchScript (23638).
- Add support for ModuleDict (25715).
- Bind
set_grad_enabled()
into TorchScript (25350). - Add
in
membership checks for lists (25796). - Add
tuple
keyword (25474). - Add
__getitem__
to class types (25664). - Add
__setitem__
to class types (25750). - Make JIT dicts ordered, matching Python 3.6+ semantics (26465).
- Added invert bitwise operation to TorchScript (22324).
- Add
min()
andmax()
for lists to TorchScript (26351). - Support iterables and ranges in list comprehensions (26768).
Improvements
C++ Frontend Improvements
We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:
Autograd
- Tensor autograd APIs
- Add support for custom autograd functions in C++ API
torch::autograd::backward
andtorch::autograd::grad
(24342)torch::autograd::Variable::register_hook
(24393).
New torch::nn modules
- Containers
- torch::nn::ModuleList (24317).
- Linear layers
- torch::nn::Identity (26713).
- Convolution layers
- torch::nn::Fold (24160).
- Pooling layers
- Loss functions
- torch::nn::L1Loss (25902).
- Distance functions
New torch::nn::functional functions
- Pooling functions
- torch::nn::functional::max_pool1d / max_pool2d / max_pool3d (26262).
- torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices (26521).
- torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d (26262).
- torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d (26755, 26772, 26775).
- torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices (26755, 26772, 26775).
- Distance functions
tensor Construction API
- Add support for multidimensional inputs to
torch::tensor
(26210, 26890, 26756).- From now on, we can use
torch::tensor({{1, 2}, {3, 4}})
in C++ to construct the same tensor astorch.tensor([[1, 2], [3, 4]])
in Python. Some caveats are noted in this comment.
- From now on, we can use
- Add support for bool and BFloat16 dtypes to
torch::tensor
(23337).
Other C++ Improvements
- Add
torch::nn::Module::unregister_module
function, for unregistering a submodule from atorch::nn::Module
(26088).
Distributed Improvements
torch.distributed
Detect and handle NCCL errors appropriately instead of blocking peers until a timeout inProcessGroupNCCL
(25012, 25905)torch.distributed
Make scatter/gather arguments optional (25575)torch.distributed.launch
Add a -m flag to allow users to launch python modules (24910).torch.distributed
Add function to get NCCL version for logging (26583).torch.distributed
Add timeout parameter to connect function in TCPStore (26554).torch.distributed
use timeout in connect function to prevent against infinite loop (26364).torch.nn.modules.batchnorm
Allow SyncBatchNorm to run without DDP in inference mode (24815)
Performance Improvements
torch.argmax/argmin
Rewrite as TensorIterator reductions (26181).torch.erfinv
Vectorize unary operator (26629).torch.sin/cos/tan
Use intrinsics for trigonometric functions on CPU (26431).- Fix possible deadlock in SharedCache inside a forked child proc (25158).
torch.qr
Fix a regression (23591).nn.Conv
Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).nn.Conv
Use parallel_for in DepthwiseConvKernel (26879).nn.Conv
Change shape for conv and unary ops (25477).- Fix pin_memory_thread not exiting quickly (23646).
- Increase predefined_minimum_secs to reduce variation (23734).
- Enhance Tensor indexSelect performance (23055).
- Separate input shapes to reduce default execution time (24136).
- constraints.lower_cholesky Vectorize LowerCholeskyTransform (24131).
- Speed up an integer to the power of a positive integer on CPU (26020).
- [ROCm] Enable jit fusion (22872).
- [ROCm] Use MIOpen for transpose convolutions (26172).
JIT Improvements
- Enable CPU fused kernel on Windows (25578).
- Expose an API to iterate all the registered operators (23207).
- Include recursive class compilations in error call stack (23454).
- Substantial improvements to saved model format speed and size.
- Compress debug symbols when serializing TorchScript models. (23659).
- Compress all non-Tensor components of a serialized TorchScript model. (23723).
- Perform string uniquing by value in pickle serialization. (23741).
- Implement a bunch of pickle serialization features that optimize for size. (23759).
- Implement more size-oriented opcodes in the depickler. (26454).
- Cache node operators to speed up optimization (24827).
- Allow forward hooks in tracing (23613).
- Add Pickler C++ API (23241).
- Open up AliasAnalysisKind for any ops (23810).
- Add the ability to compile exports on traced modules (24298).
- Make
NoneType
a subtype ofOptional[T]
(25361).
ONNX Exporter Improvements
In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.
Adding Support for ONNX IR v4
- Provide an option to exclude the weights from model inputs (#23284)
- Make graph inputs without weights as default (#26146)
Adding Support for ONNX Opset 11
- Introduce ONNX Opset 11 support (#23739)
- Add export for torch.Interpolate in Opset 11 (#24805, #27179)
- Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 (#24790)
- Add export for tensor.clamp in Opset 11 (#25797)
- Add export for torch.topk and torch.sort in Opset 11 (#25739)
Exporting More Torch Operators/Models to ONNX
- Export torch.pixel_shuffle (#23739)
- Export torch.multinomial (#23581)
- Export torch.norm’s frobenius_norm (#23536)
- Export torch.std (#22310)
- Export torch.empty and torch.empty_like (#24166)
- Export torch.rsqrt (#24153)
- Export torch.log1p (#25808)
- Export torch.unique (#25050)
- Export torch.gelu (#24475)
- Export tensor.index_fill and tensor.index_copy (#23052)
- Export torch.round (#26126)
- Export torch.baddbmm (#25738)
- Export torch.remainder (#24410)
- Export torch.cumsum (#24476)
- Export tensor.size with negative axis (#26436)
- Export RNN/LSTM with h0/c0 initial state (#22813)
Enhancing ONNX Export Infra
- Enable exporting PyTorch models which have Dict and String as inputs and outputs (#25889)
- Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. (#24378)
- Correctly validate dynamic axes names. (#23974)
- Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 (#22993)
Other Improvements
- Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result (23063).
torch.det/logdet/slogdet
Allowing batching (22909).torch.logical_not
Add new operator (23839).torch.logical_xor
Add new operator (23847).torch.symeig
Improve the stability of gradient updates (23018).torch.eye
Enable for bool and half (24148).torch.tril / triu
Enable for bool and half (24163).torch.logical_not/xor
support non-bool tensors. (23916, 23978).torch.index_select
Implement indexing methods for sparse tensors (24937).torch.lu_solve
Enable broadcasting of batch dimensions (24333).torch.cholesky
Enable batches greater than 262140 (24438).torch.det
Simplify generation of singular matrices to avoid numerical issue on PowerPC (25773).torch.erfinv
In the CUDA implementation, use erfinv() for double to preserve accuracy (25337).torch.erfinv
Add a float version of erfinv on CPU (26070).torch.cuda.stream
Updates autograd engine to respect streams set in forward (8354).torch.backends.mkldnn.enabled
Allow disabling MKLDNN at runtime (25459).torch.cholesky_solve
Add derivative (26185).torch.cholesky_inverse
Add derivative (26451).torch.polygamma
Ensure that n is non-negative
(26294).torch.pinverse
Enable batching (26095).torch.digamma/trigamma
Fix type mismatches on CUDA (25791).torch.where
Enable for bool tensor on CUDA (26430).torch.load
default encoding change to 'utf-8' (26421).torch.repeat_interleave
Respect the current stream (26946).torch.bernoulli_
Implement for bool tensors (25076).torch.norm
Fix nuclear norm with requires_grad=True (26303).torch.hub.download_url_to_file
Make function public (26723).nn.modules.conv
add padding_mode to repr (23996).nn.Transformer
Extend to support BERT (gelu) (24181).nn.BatchNorm2d
Add support for non-affine batch norm with float stats and half inputs (22750).nn.Parameter
Fix type hints (25586).nn.CTCLoss
Improve error message (26325).nn.Conv
Allow batch size of 0 (26214).nn.LSTM/GRU
enable double backward for non-cudnn (26660).optim.Adagrad
Add epsilon argument (24980).optim.LBFGS
Change default tolerance_grad to 1e-7 (25240).optim.lr_scheduler.OneCycleLR
Add new 1cycle learning rate scheduler (25324).optimizer.step
Fix type annotation (26930).bfloat16
Add support for sub, mul, and div on CPU (22851).bfloat16
Enabled comparison ops on CPU (24182).bfloat16
Enabled masked methods (24183).bfloat16
Enabled torch.mm and torch.mv (24224).bfloat16
Enable log_softmax and CrossEntropyLoss (24457).bfloat16
Enabled conv methods (26167).bfloat16
Enabled dtype on CUDA (26407).quasirandom.SobolEngine
Use random seed if not specified (24884).utils.data.dataloader
Add possible out of shared memory error message (25730).cuda.set_rng_state
Add type hint (26200).- Zero sized tensor support for repeat_interleave (23717).
- Recommend
~
andbitwise_not()
when user tries to apply neg (-
) on a bool tensor. (23621). - Fix double backward of inplace op on view (23502).
autograd.grad
Validate shapes of outputs (25349).- Enable libflame as a LAPACK choice (25795).
- Fix race condition in CUDA initialization (25788).
- Include
iteration_
in SGD optimizer serialization (26906). - [C++]
torch::tensor
Fix an ambiguous overload issues in constructor (26890). - [XLA] Check device before accessing data_ptr in PackLayer (26056).
- [XLA] Allow overwriting catch-all kernels (25947).
Bug Fixes
TensorBoard Bug Fixes
SummaryWriter.add_graph
: Fix empty graph output in some cases (25599).- Update Caffe2 contrib TensorBoard logging to not require TensorFlow (25259).
SummaryWriter.make_video
: Fix write_gif call to moviepy for newer lib (21218).
C++ API Bug fixes
- Fixes mismatch of device and data type when computing
step_size
in LBFGS optimizer (25909).
JIT
- Fix list comprehension that change the type of the original iterable (24271).
- Fix double copying of constants during recursive scripting (24412).
- Fix frontend error message (23576).
- Clear recursive error stack on each compilation (23458).
- Fix bugs in assignment to optionals (25059).
- Make
torch.jit.Attribute
work whenPYTORCH_ENABLED=0
(23851). - Fix unicode in comments causing compilation errors (24218).
- Correctly raise an error if an
nn.Module
has not been initialized but you try to script it (24852). - Fix annotated assignment to variables (25094).
- dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
- fix closures which always throw. (25278).
- Add source location to class instantiation error (24990).
- Fix
AliasAnalysisKind::PURE
on MSVC (25375). - Emit script function calls during tracing. (25089).
- Resolve
NamedTuple
types properly in Python (26443). - Fix schema matching of tuples to vartype lists (25944).
- Correctly preserve ignored function return value type (25262).
- Fix missing newline in compiled from source range highlight (25802).
- Fix use-after-free bug in
optional
(25965). - Fix torch.arange traced as constant (25363).
- Preserve module names in recursive script (24505).
- Properly resolve ignored module method type annotations (26683).
- Make
is_optional
check more robust (26312). - Fix builtin lookup for Python functions (26688).
- Typevar matching fix + implicit conversions from Scalar to int/float (26453).
- Fix range for non-int inputs and pow implementation (26926).
Other Bug Fixes
torch.is_pinned
pin_memory should not copy on already pinned tensors (23484).torch.cdist
Fix incorrect gradients on CUDA non-batch tensors (22915).torch.from_numpy
Fix failure on windows for int32 (25139).torch.tensor
Fix memory leak creating a tensor from numpy (24267).torch.index
Don't saveself
inindex
backward (25594).torch.bincount
Fix int32 overflow on CUDA (25748).torch.bernoulli
Fix the distribution sampler (26864).torch.pow
Fix precision (25476).torch.cdist
Fix gradient computation when first arg is 1xn (26254).torch.scatter_add_
Fix scatter CPU kernel when (input size, src size) > index size (25839).nn.ConvTranspose2d
Fixed an error with float16 inputs and weights on CUDA. (23552).nn.CTCLoss
Fix zero-length targets on CUDA (23298).nn.Conv2d
Correct an overflow in an error message (25146).optim.Adam
apply a small mathematical fix. (23737).dataloader
Fix IndexError on shutdown if not all workers are started (23761).Tensor.repeat
Fix crash on for 0 repeats (23766).torch.pin_memory
only use one thread (25111).distributions.Uniform,HalfCauchy,Gamma
Fixlog_prob
when value is a float (23017).- Fix typing error for Padding with asymmetric signatures (24895).
- Avoid race condition in
intrusive_ptr.reset_()
(24464). torch.hub
: Fix SSL cert issue for hub in Python 2 (25042).- Fix int overflow issue in CUDA kernels. (24818).
Module.cuda
Fix type hints (25018).- Fix bug in assertNotEqual for int tensors (25412).
- Fix 'in' return true incorrectly (24156).
- Fix bugs in bulk loader when
batch_size=None
or with namedtuple (26065). - Fix serialization issue in big endian arch (26383).
- Fix
Vec256::abs()
for floating point when applied on -0.0 (26422). - Fix cyclic reference in _LRScheduler (25776).
- Fix a build failure on s390x (26233).
- [XLA] Fix tensor construction from array (24283).
Documentation Updates
Distributed
torch.distributed
Error phrasing in torch.distributed helper functions (25574)torch.distributions.negative_binomial
clarified ambiguous doc string in NegativeBinomial (25923)
JIT
- Add technical documentation for the serialization format (23456).
- Fix trace docs (24191).
- Add
trace_module
to docs (24258). - Cleanup distinction around
script
andtrace
(24208). - Fix
item()
call in docs (25404). - Misc doc updates / fixes (24371, 24445).
Other documentation improvements
torch.record_stream
Add documentation (24078).torch.fold
Describe the relation between fold and unfold operations (24840).torch.argmax
Fix incorrect doc (23775).torch.random
add docs (23553).torch.empty_strided
Add docs (23735).torch.bitwise_not
Document for bool tensors (23800).torch.cdist
Add documentation (25221).torch.where
Update parameter names in doc (25554).torch.atan2
Clarify and correct the doc (26180).nn.functional.bilinear
Added documentation (24951).nn.functional.upsample
Fix align_corners doc (23707).nn.Transformer
Fixed an error in the example (24837).optim.lr_scheduler.CosineAnnealingWarmRestarts
Add documentation (25421).optim.SGD
Updated with subscripts (23985).optim.RMSprop
Highlighting in the doc that square root comes before adding epsilon (26735).autograd.detect_anomaly
Add a warning (26615).- Improve dataloader docs on when auto-batching is disabled (23671).
- Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
- Document benchmarking practice for CUDA (23910).
- Add ASAN instructions to CONTRIBUTING.md (24848).
PyTorch Release v1.3.0 - Mobile Support, Named Tensors, Quantization, Type Promotion
PyTorch is a widely used, open-source deep learning platform used for easily writing neural network layers in Python enabling seamless workflow from research to production. Based on Torch, PyTorch has become a powerful machine learning framework favored by esteemed researchers around the world.
Here is the newest PyTorch release v1.3.0 featuring new mobile support, named tensors, quantization, type promotion, and many more new features.
Table of Contents
- Breaking Changes
- Highlights
- [Experimental]: Mobile Support
- [Experimental]: Named Tensor Support
- [Experimental]: Quantization support
- Type Promotion
- Deprecations
- New Features
- TensorBoard: 3D Mesh and Hyperparameter Support
- Distributed
- Libtorch Binaries with C++11 ABI
- New TorchScript features
- Improvements
- C++ Frontend Improvements
- Autograd
- New torch::nn modules
- New torch::nn::functional functions
- tensor Construction API
- Other C++ Improvements
- Distributed Improvements
- Performance Improvements
- JIT Improvements
- ONNX Exporter Improvements
- Adding Support for ONNX IR v4
- Adding Support for ONNX Opset 11
- Exporting More Torch Operators/Models to ONNX
- Enhancing ONNX Export Infra
- Other Improvements
- C++ Frontend Improvements
- Bug Fixes
- TensorBoard Bug Fixes
- C++ API Bug fixes
- JIT
- Other Bug Fixes
- Documentation Updates
- Distributed
- JIT
- Other documentation improvements
Breaking Changes
Type Promotion: Mixed dtype operations may return a different dtype and value than in previous versions. (22273, 26981)
Previous versions of PyTorch supported a limited number of mixed dtype operations. These operations could result in loss of precision by, for example, truncating floating-point zero-dimensional tensors or Python numbers.
In Version 1.3, PyTorch supports NumPy-style type promotion (with slightly modified rules, see full documentation). These rules generally will retain precision and be less surprising to users.
Version 1.2 | Version 1.3 |
---|---|
>>> torch.tensor(1) + 2.5 tensor(3) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3]) >>> torch.tensor(**True**) + 5 tensor(True) | >>> torch.tensor(1) + 2.5 tensor(3.5000) >>> torch.tensor([1]) + torch.tensor(2.5) tensor([3.5000]) >>> torch.tensor(True) + 5 tensor(6) |
Type Promotion: in-place operations whose result_type is a lower dtype category (bool < integer < floating-point) than the in-place operand now throw an Error. (22273, 26981)
Version 1.2 | Version 1.3 |
---|---|
>>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) tensor(2) >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) tensor(True) | >>> int_tensor = torch.tensor(1) >>> int_tensor.add_(1.5) RuntimeError: result type Float cannot be cast to the desired output type Long >>> bool_tensor = torch.tensor(True) >>> bool_tensor.add_(5) RuntimeError: result type Long cannot be cast to the desired output type Bool |
These rules can be checked at runtime via torch.can_cast.
torch.flatten
: 0-dimensional inputs now return a 1-dim tensor. (25406).
Version 1.2 | Version 1.3 |
---|---|
>>> torch.flatten(torch.tensor(0)) tensor(0) | >>> torch.flatten(torch.tensor(0)) tensor([0]) |
nn.functional.affine_grid
: when align_corners = True
, changed the behavior of 2D affine transforms on 1D data and 3D affine transforms on 2D data (i.e., when one of the spatial dimensions has unit size).
Previously, all grid points along a unit dimension were considered arbitrarily to be at -1, now they are considered to be at 0 (the center of the input image).
torch.gels:
removed deprecated operator, use torch.lstsq
instead. (26480).
utils.data.DataLoader:
made a number of Iterator attributes private (e.g. num_workers
, pin_memory
). (22273)
[C++] Variable::backward
will no longer implicitly create a gradient for non-1-element Variables. Previously, a gradient tensor of all 1s would be implicitly created. This behavior matches the Python API. (26150)
auto x = torch::randn({5, 5}, torch::requires_grad());
auto y = x * x;
y.backward()
// ERROR: "grad can be implicitly created only for scalar outputs"
[C++] All option specifiers (e.g. GRUOptions::bidirectional_
) are now private, use the function variants (GRUOptions::bidirectional(...))
instead. (26419).
Highlights
[Experimental]: Mobile Support
In PyTorch 1.3, we are launching experimental support for mobile. Now you can run any TorchScript model directly without any conversion. Here is the full list of features in this release:
- Support for full TorchScript inference on mobile;
- Prebuilt LibTorch libraries for Android/iOS on JCenter/CocoaPods;
- Java wrapper for Android with functionality to cover common inference cases (loading and invoking the model);
- Support for all forward ops on mobile CPU (backward ops are not supported yet);
- Some optimized fp32 operator implementations for ARM CPUs (based on Caffe2Go);
- Some optimized int8 operator implementations for ARM CPUs (based on QNNPACK);
We decided not to create a new framework for mobile so that you can use the same APIs you are already familiar with to run the same TorchScript models on Android/iOS devices without any format conversion. This way you can have the shortest path from research ideas to production-ready mobile apps.
The tutorials, demo apps, and download links for prebuilt libraries can be found at https://pytorch.org/mobile/
This is an experimental release. We are working on other features like customized builds to make PyTorch smaller, faster, and better for your specific use cases. Stay tuned and give us your feedback!
[Experimental]: Named Tensor Support
Named Tensors aim to make tensors easier to use by allowing users to associate explicit names with tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support "broadcasting by the name" rather than "broadcasting by position".
Create a named tensor by passing a names
argument into most tensor factory functions.
>>> tensor = torch.zeros(2, 3, names=('C', 'N')) tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))
Named tensors propagate names across operations.
>>> tensor.abs() tensor([[0., 0., 0.], [0., 0., 0.]], names=('C', 'N'))
Rearrange to the desired ordering by using align_to
.
>>> tensor = tensor.align_to('N', 'C', 'H', 'W') >>> tensor.names, tensor.shape (('N', 'C', 'H', 'W'), torch.Size([3, 2, 1, 1]))
And more! Please see our documentation on named tensors.
[Experimental]: Quantization support
PyTorch now supports quantization from the ground up, starting with support for quantized tensors. Convert a float tensor to a quantized tensor and back by:
x = torch.rand(10,1, dtype=torch.float32)
xq = torch.quantize_per_tensor(x, scale = 0.5, zero_point = 8, dtype=torch.quint8)
# xq is a quantized tensor with data represented as quint8
xdq = x.dequantize()
# convert back to floating point
We also support 8-bit quantized implementations of most common operators in CNNs, including:
- Tensor operations:
- view, clone, resize, slice
- add, multiply, cat, mean, max, sort, topk
- Modules/Functionals (in torch.nn.quantized)
- Conv2d
- Linear
- Avgpool2d, AdaptiveAvgpool2d, MaxPool2d, AdaptiveMaxPool2d
- Interpolate
- Upsample
- Fused operations for preserving better accuracy (in torch.nn.intrinsic)
- ConvReLU2d, ConvBnReLU2d, ConvBn2d
- LinearReLU
- add_relu
We also support dynamic quantized operators, which take in floating point activations but use quantized weights (in torch.nn.quantized.dynamic).
- LSTM
- Linear
Quantization also requires support for methods to collect statistics from tensors and calculate quantization parameters (implementing interface torch.quantization.Observer). We support several methods to do so:
- MinMaxObserver
- MovingAverageMinMaxObserver
- PerChannelMinMaxObserver
- MovingAveragePerChannelMinMaxObserver
- HistogramObserver
For quantization aware training, we support fake-quantization operators and modules to mimic quantization during training:
torch.fake_quantize_per_tensor_affine
,torch.fake_quantize_per_channel_affine
torch.quantization.FakeQuantize
In addition, we also support workflows in torch. quantization for:
- post-training dynamic quantization
- static post-training quantization
- quantization aware training
All quantized operators are compatible with TorchScript.
For more details, see the documentation at: https://pytorch.org/docs/master/quantization.html
Type Promotion
Arithmetic and comparison operations may now perform mixed-type operations that promote a common dtype.
This below example was not allowed in version 1.2. In version 1.3, the same code returns a tensor with dtype=torch.float32
.
>>> torch.tensor([1], dtype=torch.int) + torch.tensor([1], dtype=torch.float32)
See the full documentation for more details.
torch.result_type
Provide a function to determine the result of mixed-type operations (26012).torch.can_cast
Expose casting rules for type promotion (26805).torch.promote_types
Expose promotion logic (26655).
Deprecations
nn.functional.affine_grid
/ nn.functional.grid_sample
: USING The Align_CORNER Default value is now deprecated because it will be changed in 1.4 release.
The align_corner
the parameter was added in this release; the behavior in the previous release was equivalent to setting the parameter to True
. This is also the current default value but it will be changed to False
from 1.4 release. Note that using the default will trigger a warning as demonstrated below; set the value explicitly to remove the warning.
>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2))
UserWarning: Default grid_sample and affine_grid behavior will be changed
to align_corners=False from 1.4.0.
See the documentation of grid_sample for details.
...
>>> torch.nn.functional.affine_grid(torch.randn(1,2,3),
(1,3,2,2),
align_corners=True)
# NO WARNING!
...
[C++] Deprecate torch::Tensor::data<T>()
in favor of torch::Tensor::data_ptr<T>()
(24847, 24886).
New Features
TensorBoard: 3D Mesh and Hyperparameter Support
torch.utils.tensorboard
support 3D mesh and points plus hyperparameter logging. More details can be found in the documentation for SummaryWriter
with add_mesh
and add_hparams
.
A simple example of exercising both methods:
from torch.utils.tensorboard import SummaryWriter
vertices_tensor = torch.as_tensor([
[1, 1, 1],
[-1, -1, 1],
[1, -1, -1],
[-1, 1, -1],
], dtype=torch.float).unsqueeze(0)
colors_tensor = torch.as_tensor([
[255, 0, 0],
[0, 255, 0],
[0, 0, 255],
[255, 0, 255],
], dtype=torch.int).unsqueeze(0)
faces_tensor = torch.as_tensor([
[0, 2, 3],
[0, 3, 1],
[0, 1, 2],
[1, 3, 2],
], dtype=torch.int).unsqueeze(0)
with SummaryWriter() as w:
w.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
for i in range(5):
w.add_hparams({'lr': 0.1*i, 'bsize': i},
{'hparam/accuracy': 10*i, 'hparam/loss': 10*i})
Distributed
This release adds macOS support for torch.distributed
with the Gloo backend. You can more easily switch from development (e.g. on macOS) to deployment (e.g. on Linux) without having to change a single line of code. The prebuilt binaries for macOS (stable and nightly) include support out of the box.
torch.distributed.all_reduce_coalesced
Support all reduce of a list of same-device tensors (24949, 25470, 24876)torch.distributed.all_reduce
Add bitwise reduction ops (BAND, BOR, BXOR) (26824)
Libtorch Binaries with C++11 ABI
We now provide Libtorch binaries for building applications compatible with the C++11 ABI. The download links for libtorch binaries with C++11 ABI can be found in https://pytorch.org/ “QUICK START LOCALLY”.
New TorchScript features
- Add
not in
support for TorchScript (23637). - You can now raise exceptions in one side of an if branch (23565).
- Add
torch.jit.is_scripting()
API (25955). - Make assertions like
x is not None
unwrap the optional type ofx
(23949). - Add dictionary augmented assignment (
+=
) support to TorchScript (23639). - Support
grad
anddata
attribute for tensor in TorchScript (23842). - Add
@ignore
for TorchScript classes (23614). - Support nn.GRU in script (23266).
- Support tensor as a key type in TorchScript (23638).
- Add support for ModuleDict (25715).
- Bind
set_grad_enabled()
into TorchScript (25350). - Add
in
membership checks for lists (25796). - Add
tuple
keyword (25474). - Add
__getitem__
to class types (25664). - Add
__setitem__
to class types (25750). - Make JIT dicts ordered, matching Python 3.6+ semantics (26465).
- Added invert bitwise operation to TorchScript (22324).
- Add
min()
andmax()
for lists to TorchScript (26351). - Support iterables and ranges in list comprehensions (26768).
Improvements
C++ Frontend Improvements
We are on our way to better API parity between our Python and C++ frontends. Specifically, we made the following improvements:
Autograd
- Tensor autograd APIs
- Add support for custom autograd functions in C++ API
torch::autograd::backward
andtorch::autograd::grad
(24342)torch::autograd::Variable::register_hook
(24393).
New torch::nn modules
- Containers
- torch::nn::ModuleList (24317).
- Linear layers
- torch::nn::Identity (26713).
- Convolution layers
- torch::nn::Fold (24160).
- Pooling layers
- Loss functions
- torch::nn::L1Loss (25902).
- Distance functions
New torch::nn::functional functions
- Pooling functions
- torch::nn::functional::max_pool1d / max_pool2d / max_pool3d (26262).
- torch::nn::functional::max_pool1d_with_indices / max_pool2d_with_indices / max_pool3d_with_indices (26521).
- torch::nn::functional::avg_pool1d / avg_pool2d / avg_pool3d (26262).
- torch::nn::functional::adaptive_max_pool1d / adaptive_max_pool2d / adaptive_max_pool3d (26755, 26772, 26775).
- torch::nn::functional::adaptive_max_pool1d_with_indices / adaptive_max_pool2d_with_indices / adaptive_max_pool3d_with_indices (26755, 26772, 26775).
- Distance functions
tensor Construction API
- Add support for multidimensional inputs to
torch::tensor
(26210, 26890, 26756).- From now on, we can use
torch::tensor({{1, 2}, {3, 4}})
in C++ to construct the same tensor astorch.tensor([[1, 2], [3, 4]])
in Python. Some caveats are noted in this comment.
- From now on, we can use
- Add support for bool and BFloat16 dtypes to
torch::tensor
(23337).
Other C++ Improvements
- Add
torch::nn::Module::unregister_module
function, for unregistering a submodule from atorch::nn::Module
(26088).
Distributed Improvements
torch.distributed
Detect and handle NCCL errors appropriately instead of blocking peers until a timeout inProcessGroupNCCL
(25012, 25905)torch.distributed
Make scatter/gather arguments optional (25575)torch.distributed.launch
Add a -m flag to allow users to launch python modules (24910).torch.distributed
Add function to get NCCL version for logging (26583).torch.distributed
Add timeout parameter to connect function in TCPStore (26554).torch.distributed
use timeout in connect function to prevent against infinite loop (26364).torch.nn.modules.batchnorm
Allow SyncBatchNorm to run without DDP in inference mode (24815)
Performance Improvements
torch.argmax/argmin
Rewrite as TensorIterator reductions (26181).torch.erfinv
Vectorize unary operator (26629).torch.sin/cos/tan
Use intrinsics for trigonometric functions on CPU (26431).- Fix possible deadlock in SharedCache inside a forked child proc (25158).
torch.qr
Fix a regression (23591).nn.Conv
Use Caffe2's implementation of grouped depthwise 3x3 convolutions (26556).nn.Conv
Use parallel_for in DepthwiseConvKernel (26879).nn.Conv
Change shape for conv and unary ops (25477).- Fix pin_memory_thread not exiting quickly (23646).
- Increase predefined_minimum_secs to reduce variation (23734).
- Enhance Tensor indexSelect performance (23055).
- Separate input shapes to reduce default execution time (24136).
- constraints.lower_cholesky Vectorize LowerCholeskyTransform (24131).
- Speed up an integer to the power of a positive integer on CPU (26020).
- [ROCm] Enable jit fusion (22872).
- [ROCm] Use MIOpen for transpose convolutions (26172).
JIT Improvements
- Enable CPU fused kernel on Windows (25578).
- Expose an API to iterate all the registered operators (23207).
- Include recursive class compilations in error call stack (23454).
- Substantial improvements to saved model format speed and size.
- Compress debug symbols when serializing TorchScript models. (23659).
- Compress all non-Tensor components of a serialized TorchScript model. (23723).
- Perform string uniquing by value in pickle serialization. (23741).
- Implement a bunch of pickle serialization features that optimize for size. (23759).
- Implement more size-oriented opcodes in the depickler. (26454).
- Cache node operators to speed up optimization (24827).
- Allow forward hooks in tracing (23613).
- Add Pickler C++ API (23241).
- Open up AliasAnalysisKind for any ops (23810).
- Add the ability to compile exports on traced modules (24298).
- Make
NoneType
a subtype ofOptional[T]
(25361).
ONNX Exporter Improvements
In PyTorch 1.3, we have added support for exporting graphs with ONNX IR v4 semantics, and set it as default. We have achieved good initial coverage for ONNX Opset 11, which was released recently with ONNX 1.6. Further enhancement to Opset 11 coverage will follow in the next release. We have enabled export for about 20 new PyTorch operators. Also, we have focused on enabling the export for all models in torchvision. We have introduced some necessary groundwork for that in this release, e.g., accepting PyTorch models with inputs/outputs of Dict or String. We continue to work on torchvision models, such as FasterRCNN and MaskRCNN, to enable their export.
Adding Support for ONNX IR v4
- Provide an option to exclude the weights from model inputs (#23284)
- Make graph inputs without weights as default (#26146)
Adding Support for ONNX Opset 11
- Introduce ONNX Opset 11 support (#23739)
- Add export for torch.Interpolate in Opset 11 (#24805, #27179)
- Add export for tensor.gather, tensor.scatter and tensor.scatter_add in Opset 11 (#24790)
- Add export for tensor.clamp in Opset 11 (#25797)
- Add export for torch.topk and torch.sort in Opset 11 (#25739)
Exporting More Torch Operators/Models to ONNX
- Export torch.pixel_shuffle (#23739)
- Export torch.multinomial (#23581)
- Export torch.norm’s frobenius_norm (#23536)
- Export torch.std (#22310)
- Export torch.empty and torch.empty_like (#24166)
- Export torch.rsqrt (#24153)
- Export torch.log1p (#25808)
- Export torch.unique (#25050)
- Export torch.gelu (#24475)
- Export tensor.index_fill and tensor.index_copy (#23052)
- Export torch.round (#26126)
- Export torch.baddbmm (#25738)
- Export torch.remainder (#24410)
- Export torch.cumsum (#24476)
- Export tensor.size with negative axis (#26436)
- Export RNN/LSTM with h0/c0 initial state (#22813)
Enhancing ONNX Export Infra
- Enable exporting PyTorch models which have Dict and String as inputs and outputs (#25889)
- Systematically solving mismatched types caused by implicit type conversion for binary arithmetic operators by adding an ONNX type conversions pass. (#24378)
- Correctly validate dynamic axes names. (#23974)
- Enable ONNX Runtime tests for Opset 10 and partially for Opset 11 (#22993)
Other Improvements
- Error checking: many operators now perform strides check of the output tensor and errors if it contains inner overlaps that would result in incorrect result (23063).
torch.det/logdet/slogdet
Allowing batching (22909).torch.logical_not
Add new operator (23839).torch.logical_xor
Add new operator (23847).torch.symeig
Improve the stability of gradient updates (23018).torch.eye
Enable for bool and half (24148).torch.tril / triu
Enable for bool and half (24163).torch.logical_not/xor
support non-bool tensors. (23916, 23978).torch.index_select
Implement indexing methods for sparse tensors (24937).torch.lu_solve
Enable broadcasting of batch dimensions (24333).torch.cholesky
Enable batches greater than 262140 (24438).torch.det
Simplify generation of singular matrices to avoid numerical issue on PowerPC (25773).torch.erfinv
In the CUDA implementation, use erfinv() for double to preserve accuracy (25337).torch.erfinv
Add a float version of erfinv on CPU (26070).torch.cuda.stream
Updates autograd engine to respect streams set in forward (8354).torch.backends.mkldnn.enabled
Allow disabling MKLDNN at runtime (25459).torch.cholesky_solve
Add derivative (26185).torch.cholesky_inverse
Add derivative (26451).torch.polygamma
Ensure that n is non-negative
(26294).torch.pinverse
Enable batching (26095).torch.digamma/trigamma
Fix type mismatches on CUDA (25791).torch.where
Enable for bool tensor on CUDA (26430).torch.load
default encoding change to 'utf-8' (26421).torch.repeat_interleave
Respect the current stream (26946).torch.bernoulli_
Implement for bool tensors (25076).torch.norm
Fix nuclear norm with requires_grad=True (26303).torch.hub.download_url_to_file
Make function public (26723).nn.modules.conv
add padding_mode to repr (23996).nn.Transformer
Extend to support BERT (gelu) (24181).nn.BatchNorm2d
Add support for non-affine batch norm with float stats and half inputs (22750).nn.Parameter
Fix type hints (25586).nn.CTCLoss
Improve error message (26325).nn.Conv
Allow batch size of 0 (26214).nn.LSTM/GRU
enable double backward for non-cudnn (26660).optim.Adagrad
Add epsilon argument (24980).optim.LBFGS
Change default tolerance_grad to 1e-7 (25240).optim.lr_scheduler.OneCycleLR
Add new 1cycle learning rate scheduler (25324).optimizer.step
Fix type annotation (26930).bfloat16
Add support for sub, mul, and div on CPU (22851).bfloat16
Enabled comparison ops on CPU (24182).bfloat16
Enabled masked methods (24183).bfloat16
Enabled torch.mm and torch.mv (24224).bfloat16
Enable log_softmax and CrossEntropyLoss (24457).bfloat16
Enabled conv methods (26167).bfloat16
Enabled dtype on CUDA (26407).quasirandom.SobolEngine
Use random seed if not specified (24884).utils.data.dataloader
Add possible out of shared memory error message (25730).cuda.set_rng_state
Add type hint (26200).- Zero sized tensor support for repeat_interleave (23717).
- Recommend
~
andbitwise_not()
when user tries to apply neg (-
) on a bool tensor. (23621). - Fix double backward of inplace op on view (23502).
autograd.grad
Validate shapes of outputs (25349).- Enable libflame as a LAPACK choice (25795).
- Fix race condition in CUDA initialization (25788).
- Include
iteration_
in SGD optimizer serialization (26906). - [C++]
torch::tensor
Fix an ambiguous overload issues in constructor (26890). - [XLA] Check device before accessing data_ptr in PackLayer (26056).
- [XLA] Allow overwriting catch-all kernels (25947).
Bug Fixes
TensorBoard Bug Fixes
SummaryWriter.add_graph
: Fix empty graph output in some cases (25599).- Update Caffe2 contrib TensorBoard logging to not require TensorFlow (25259).
SummaryWriter.make_video
: Fix write_gif call to moviepy for newer lib (21218).
C++ API Bug fixes
- Fixes mismatch of device and data type when computing
step_size
in LBFGS optimizer (25909).
JIT
- Fix list comprehension that change the type of the original iterable (24271).
- Fix double copying of constants during recursive scripting (24412).
- Fix frontend error message (23576).
- Clear recursive error stack on each compilation (23458).
- Fix bugs in assignment to optionals (25059).
- Make
torch.jit.Attribute
work whenPYTORCH_ENABLED=0
(23851). - Fix unicode in comments causing compilation errors (24218).
- Correctly raise an error if an
nn.Module
has not been initialized but you try to script it (24852). - Fix annotated assignment to variables (25094).
- dictPop: dereference dict.find() iterator before calling dict.erase() (25056).
- fix closures which always throw. (25278).
- Add source location to class instantiation error (24990).
- Fix
AliasAnalysisKind::PURE
on MSVC (25375). - Emit script function calls during tracing. (25089).
- Resolve
NamedTuple
types properly in Python (26443). - Fix schema matching of tuples to vartype lists (25944).
- Correctly preserve ignored function return value type (25262).
- Fix missing newline in compiled from source range highlight (25802).
- Fix use-after-free bug in
optional
(25965). - Fix torch.arange traced as constant (25363).
- Preserve module names in recursive script (24505).
- Properly resolve ignored module method type annotations (26683).
- Make
is_optional
check more robust (26312). - Fix builtin lookup for Python functions (26688).
- Typevar matching fix + implicit conversions from Scalar to int/float (26453).
- Fix range for non-int inputs and pow implementation (26926).
Other Bug Fixes
torch.is_pinned
pin_memory should not copy on already pinned tensors (23484).torch.cdist
Fix incorrect gradients on CUDA non-batch tensors (22915).torch.from_numpy
Fix failure on windows for int32 (25139).torch.tensor
Fix memory leak creating a tensor from numpy (24267).torch.index
Don't saveself
inindex
backward (25594).torch.bincount
Fix int32 overflow on CUDA (25748).torch.bernoulli
Fix the distribution sampler (26864).torch.pow
Fix precision (25476).torch.cdist
Fix gradient computation when first arg is 1xn (26254).torch.scatter_add_
Fix scatter CPU kernel when (input size, src size) > index size (25839).nn.ConvTranspose2d
Fixed an error with float16 inputs and weights on CUDA. (23552).nn.CTCLoss
Fix zero-length targets on CUDA (23298).nn.Conv2d
Correct an overflow in an error message (25146).optim.Adam
apply a small mathematical fix. (23737).dataloader
Fix IndexError on shutdown if not all workers are started (23761).Tensor.repeat
Fix crash on for 0 repeats (23766).torch.pin_memory
only use one thread (25111).distributions.Uniform,HalfCauchy,Gamma
Fixlog_prob
when value is a float (23017).- Fix typing error for Padding with asymmetric signatures (24895).
- Avoid race condition in
intrusive_ptr.reset_()
(24464). torch.hub
: Fix SSL cert issue for hub in Python 2 (25042).- Fix int overflow issue in CUDA kernels. (24818).
Module.cuda
Fix type hints (25018).- Fix bug in assertNotEqual for int tensors (25412).
- Fix 'in' return true incorrectly (24156).
- Fix bugs in bulk loader when
batch_size=None
or with namedtuple (26065). - Fix serialization issue in big endian arch (26383).
- Fix
Vec256::abs()
for floating point when applied on -0.0 (26422). - Fix cyclic reference in _LRScheduler (25776).
- Fix a build failure on s390x (26233).
- [XLA] Fix tensor construction from array (24283).
Documentation Updates
Distributed
torch.distributed
Error phrasing in torch.distributed helper functions (25574)torch.distributions.negative_binomial
clarified ambiguous doc string in NegativeBinomial (25923)
JIT
- Add technical documentation for the serialization format (23456).
- Fix trace docs (24191).
- Add
trace_module
to docs (24258). - Cleanup distinction around
script
andtrace
(24208). - Fix
item()
call in docs (25404). - Misc doc updates / fixes (24371, 24445).
Other documentation improvements
torch.record_stream
Add documentation (24078).torch.fold
Describe the relation between fold and unfold operations (24840).torch.argmax
Fix incorrect doc (23775).torch.random
add docs (23553).torch.empty_strided
Add docs (23735).torch.bitwise_not
Document for bool tensors (23800).torch.cdist
Add documentation (25221).torch.where
Update parameter names in doc (25554).torch.atan2
Clarify and correct the doc (26180).nn.functional.bilinear
Added documentation (24951).nn.functional.upsample
Fix align_corners doc (23707).nn.Transformer
Fixed an error in the example (24837).optim.lr_scheduler.CosineAnnealingWarmRestarts
Add documentation (25421).optim.SGD
Updated with subscripts (23985).optim.RMSprop
Highlighting in the doc that square root comes before adding epsilon (26735).autograd.detect_anomaly
Add a warning (26615).- Improve dataloader docs on when auto-batching is disabled (23671).
- Updated docs and added deprecation warnings to acknowledge a bool tensor (22261).
- Document benchmarking practice for CUDA (23910).
- Add ASAN instructions to CONTRIBUTING.md (24848).