Skip to content

Releases: pytorch/vision

Added version suffix back to package

27 Oct 21:22
45f960c
Compare
Choose a tag to compare

Issues resolved:

  • Cannot pip install torchvision==0.8.0+cu110 - #2912

Improved transforms, native image IO, new video API and more

27 Oct 16:17
291f7e2
Compare
Choose a tag to compare

This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than torchvision.io.read_video.

Highlights

Transforms now support Tensor, batch computation, GPU and TorchScript

torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:

import torch
import torchvision.transforms as T

# to fix random seed, use torch.manual_seed
# instead of random.seed
torch.manual_seed(12)

transforms = torch.nn.Sequential(
    T.RandomCrop(224),
    T.RandomHorizontalFlip(p=0.3),
    T.ConvertImageDtype(torch.float),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
scripted_transforms = torch.jit.script(transforms)
# Note: we can similarly use T.Compose to define transforms
# transforms = T.Compose([...]) and 
# scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))

tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
# works directly on Tensors
out_image1 = transforms(tensor_image)
# on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
# with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
# and has torchscript support
out_image2 = scripted_transforms(tensor_image)

These improvements enable the following new features:

  • support for GPU acceleration
  • batched transformations e.g. as needed for videos
  • transform multi-band torch tensor images (with more than 3-4 channels)
  • torchscript transforms together with your model for deployment

Note: Exceptions for TorchScript support includes Compose, RandomChoice, RandomOrder, Lambda and those applied on PIL images, such as ToPILImage.

Native image IO for JPEG and PNG formats

torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return CxHxW tensors in uint8 format, and can thus be now part of your model for deployment in C++ environments.

from torchvision.io import read_image

# tensor_image is a CxHxW uint8 Tensor
tensor_image = read_image('path_to_image.jpeg')

# or equivalently
from torchvision.io.image import read_file, decode_image
# raw_data is a 1d uint8 Tensor with the raw bytes
raw_data = read_file('path_to_image.jpeg')
tensor_image = decode_image(raw_data)

# all operators are torchscriptable and can be
# serialized together with your model torchscript code
scripted_read_image = torch.jit.script(read_image)

New detection model

This release adds a pretrained model for RetinaNet with a ResNet50 backbone from Focal Loss for Dense Object Detection, with the following accuracies on COCO val2017:

IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.364
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.558
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.383
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.315
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.558
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

[BETA] New Video Reader API

This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as itertools.

from torchvision.io import VideoReader

# stream indicates if reading from audio or video
reader = VideoReader('path_to_video.mp4', stream='video')
# can change the stream after construction
# via reader.set_current_stream

# to read all frames in a video starting at 2 seconds
for frame in reader.seek(2):
    # frame is a dict with "data" and "pts" metadata
    print(frame["data"], frame["pts"])

# because reader is an iterator you can combine it with
# itertools
from itertools import takewhile, islice
# read 10 frames starting from 2 seconds
for frame in islice(reader.seek(2), 10):
    pass
    
# or to return all frames between 2 and 5 seconds
for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
    pass

Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system.
Note: the VideoReader API is currently released as beta and its API can change following user feedback.

Backwards Incompatible Changes

  • [Transforms] Random seed now should be set with torch.manual_seed instead of random.seed (#2292)
  • [Transforms] RandomErasing.get_params function’s argument was previously value=0 and is now value=None which is interpreted as Gaussian random noise (#2386)
  • [Transforms] RandomPerspective and F.perspective changed the default value of interpolation to be BILINEAR instead of BICUBIC (#2558, #2561)
  • [Transforms] Fixes incoherence in affine transformation when center is defined as half image size + 0.5 (#2468)

New Features

Improvements

Datasets

Models

  • Removed hard coded value in DeepLabV3 (#2793)
  • Changed the anchor generator default argument to an equivalent one (#2722)
  • Moved model construction location in resnet_fpn_backbone into after docstring (#2482)
  • Partially enabled type hints for models (#2668)

Ops

  • Moved RoIs shape check to C++ (#2794)
  • Use autocast built-in cast-helper functions (#2646)
  • Adde type annotations for torchvision.ops (#2331, #2462)

References

  • [References] Removed redundant target send to device in detection evaluation (#2503)
  • [References] Removed obsolete import in segmentation. (#2399)

Misc

  • [Transforms] Added support for negative padding in pad (#2744)
  • [IO] Added type hints for torchvision.io (#2543)
  • [ONNX] Export ROIAlign with aligned=True (#2613)

Internal

Bug Fixes

  • [Ops] Fixed crash in deformable convolutions (#2604)
  • [Ops] Added empty batch support for DeformConv2d (#2782)
  • [Transforms] Enforced contiguous output in to_tensor (#2483)
  • [Transforms] Fixed fill parameter for PIL pad (#2515)
  • [Models] Fixed deprecation warning in nonzero for R-CNN models (#2705)
  • [IO] Explicitly cast to size_t in video decoder (#2389)
  • [ONNX] Fixed dynamic resize in Mask R-CNN (#2488)
  • [C++ API] Fixed function signatures for torch::nn::Functional (#2463)

Deprecations

  • [Transforms] Deprecated dedicated implementations functional_tensor of F_t.center_crop, F_t.five_crop, `F_t.te...
Read more

Mixed precision training, new models and improvements

28 Jul 15:04
78ed10c
Compare
Choose a tag to compare

Highlights

Mixed precision support for all models

torchvision models now support mixed-precision training via the new torch.cuda.amp package. Using mixed precision support is easy: just wrap the model and the loss inside a torch.cuda.amp.autocast context manager. Here is an example with Faster R-CNN:

import torch, torchvision

device = torch.device('cuda')

model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
model.to(device)

input = [torch.rand(3, 300, 400, device=device)]
boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
boxes[:, 2:] += boxes[:, :2]
target = [{"boxes": boxes,
          "labels": torch.zeros(5, dtype=torch.int64, device=device),
          "image_id": 4,
          "area": torch.zeros(5, dtype=torch.float32, device=device),
          "iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]

# use automatic mixed precision
with torch.cuda.amp.autocast():
    loss_dict = model(input, target)
losses = sum(loss for loss in loss_dict.values())
# perform backward outside of autocast context manager
losses.backward()

New pre-trained segmentation models

This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3.
They are available under torchvision.models.segmentation, and can be obtained as follows:

torchvision.models.segmentation.fcn_resnet50(pretrained=True)
torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)

They obtain the following accuracies:

Network mean IoU global pixelwise acc
FCN ResNet50 60.5 91.4
DeepLabV3 ResNet50 66.4 92.4

Improved ONNX support for Faster / Mask / Keypoint R-CNN

This release restores ONNX support for the R-CNN family of models that had been temporarily dropped in the 0.6.0 release, and additionally fixes a number of corner cases in the ONNX export for these models.
Notable improvements includes support for dynamic input shape exports, including images with no detections.

Backwards Incompatible Changes

  • [Transforms] Fix for integer fill value in constant padding (#2284)
  • [Models] Replace L1 loss with smooth L1 loss in Faster R-CNN for better performance (#2113)
  • [Transforms] Use torch.rand instead of random.random() for random transforms (#2520)

New Features

  • [Models] Add mixed-precision support (#2366, #2384)
  • [Models] Add fcn_resnet50 and deeplabv3_resnet50 pretrained models. (#2086, #2091)
  • [Ops] Added eps attribute to FrozenBatchNorm2d (#2190)
  • [Transforms] Add convert_image_dtype to functionals (#2078)
  • [Transforms] Add pil_to_tensor to functionals (#2092)

Bug Fixes

  • [JIT] Fix virtualenv and torchhub support by removing eager scripting calls (#2248)
  • [IO] Fix write_video when floating point FPS is passed (#2334)
  • [IO] Fix missing compilation files for video-reader (#2183)
  • [IO] Fix missing include for OSX in video decoder (#2224)
  • [IO] Fix overflow error for large buffers. (#2303)
  • [Ops] Fix wrong clamping in RoIAlign with aligned=True (#2438)
  • [Ops] Fix corner case in interpolate (#2146)
  • [Ops] Fix the use of contiguous() in C++ kernels (#2131)
  • [Ops] Restore support of tuple of Tensors for region pooling ops (#2199)
  • [Datasets] Fix bug related with trailing slash on UCF-101 dataset (#2186)
  • [Models] Make copy of targets in GeneralizedRCNNTransform (#2227)
  • [Models] Fix DenseNet issue with gradient checkpoints (#2236)
  • [ONNX] Fix ONNX implementation ofheatmaps_to_keypoints in KeypointRCNN (#2312)
  • [ONNX] Fix export of images with no detection for Faster / Mask / Keypoint R-CNN (#2126, #2215, #2272)

Deprecations

  • [Ops] Deprecate Conv2d, ConvTranspose2d and BatchNorm2d (#2244)
  • [Ops] Deprecate interpolate in favor of PyTorch's implementation (#2252)

Improvements

Datasets

  • Fix DatasetFolder error message (#2143)
  • Change range(len) to enumerate in DatasetFolder (#2153)
  • [DOC] Fix link URL to Flickr8k (#2178)
  • [DOC] Add CelebA to docs (#2107)
  • [DOC] Improve documentation of DatasetFolder and ImageFolder (#2112)

TorchHub

  • Fix torchhub tests due to numerical changes in torch.sum (#2361)
  • Add all the latest models to hubconf (#2189)

Transforms

  • Add fill argument to __repr__ of RandomRotation (#2340)
  • Add tensor support for adjust_hue (#2300, #2355)
  • Make ColorJitter torchscriptable (#2298)
  • Make RandomHorizontalFlip and RandomVerticalFlip torchscriptable (#2282)
  • [DOC] Use consistent symbols in the doc of Normalize to avoid confusion (#2181)
  • [DOC] Fix typo in hflip in functional.py (#2177)
  • [DOC] Fix spelling errors in functional.py (#2333)

IO

  • Refactor video.py to improve clarity (#2335)
  • Save memory by not storing full frames in read_video_timestamps (#2202, #2268)
  • Improve warning when video_reader backend is not available (#2225)
  • Set should_buffer to True by default in _read_from_stream (#2201)
  • [Test] Temporarily disable one PyAV test (#2150)

Models

  • Improve target checks in GeneralizedRCNN (#2207, #2258)
  • Use Module objects instead of functions for some layers of Inception3 (#2287)
  • Add support for other normalizations in MobileNetV2 (#2267)
  • Expose layer freezing option to detection models (#2160, #2242)
  • Make ASPP-Layer in DeepLab more generic (#2174)
  • Faster initialization for Inception family of models (#2170, #2211)
  • Make norm_layer as parameters in models/detection/backbone_utils.py (#2081)
  • Updates integer division to use floor division operator (#2234, #2243)
  • [JIT] Clean up no longer needed workarounds for torchscript support (#2249, #2261, #2210)
  • [DOC] Add docs to clarify aspect ratio definition in RPN. (#2185)
  • [DOC] Fix roi_heads argument name in doctstring of GeneralizedRCNN (#2093)
  • [DOC] Fix type annotation in RPN docstring (#2149)
  • [DOC] add clarifications to Object detection reference documentation (#2241)
  • [Test] Add tests for negative samples for Mask R-CNN and Keypoint R-CNN (#2069)

Reference scripts

  • Add support for SyncBatchNorm in QAT reference script (#2230, #2280)
  • Fix training resuming in references/segmentation (#2142)
  • Rename image to images in references/detection/engine.py (#2187)

ONNX

  • Add support for dynamic input shape export in R-CNN models (#2087)

Ops

  • Added number of features in FrozenBatchNorm2d __repr__ (#2168)
  • improve consistency among box IoU CPU / GPU calculations (#2072)
  • Avoid using in header files (#2257)
  • Make ceil_div __host__ __device__ (#2217)
  • Don't include CUDAApplyUtils.cuh (#2127)
  • Add namespace to avoid conflict with ATen version of channel_shuffle() (#2206)
  • [DOC] Update the statement of supporting torchscript ops (#2343)
  • [DOC] Update torchvision ops in doc (#2341)
  • [DOC] Improve documentation for NMS (#2159)
  • [Test] Add more tests to NMS (#2279)

Misc

  • Add PyTorch version compatibility table to README (#2260)
  • Fix lint (#2182, #2226, #2070)
  • Update version to 0.6.0 in CMake (#2140)
  • Remove mock (#2096)
  • Remove warning about deprecated (#2064)
  • Cleanup unused import (#2067)
  • Type annotations for torchvision/utils.py (#2034)

CI

  • Add version suffix to build version
  • Add backslash to escape
  • Add workflows to run on tag
  • Bump version to 0.7.0, pin PyTorch to 1.6.0
  • Update link for cudnn 10.2 (#2277)
  • Fix binary builds with CUDA 9.2 on Windows (#2273)
  • Remove Python 3.5 from CI (#2158)
  • Improvements to CI infra (#2075, #2071, #2058, #2073, #2099, #2137, #2204, #2264, #2274, #2319)
  • Master version bump 0.6 -> 0.7 (#2102)
  • Add test channels for pytorch version functions (#2208)
  • Add static type check with mypy (#2195, #1696, #2247)

v0.6.1

22 Jun 18:20
fe36f06
Compare
Choose a tag to compare

Highlights

  • Bump pinned PyTorch version to v1.5.1

Drop Python 2 support, several improvements and bugfixes

21 Apr 14:31
Compare
Choose a tag to compare

This release is the first one that officially drops support for Python 2.
It contains a number of improvements and bugfixes.

Highlights

Faster/Mask/Keypoint RCNN supports negative samples

It is now possible to feed training images to Faster / Mask / Keypoint R-CNN that do not contain any positive annotations.
This enables increasing the number of negative samples during training. For those images, the annotations expect a tensor with 0 in the number of objects dimension, as follows:

target = {"boxes": torch.zeros((0, 4), dtype=torch.float32),
          "labels": torch.zeros(0, dtype=torch.int64),
          "image_id": 4,
          "area": torch.zeros(0, dtype=torch.float32),
          "masks": torch.zeros((0, image_height, image_width), dtype=torch.uint8),
          "keypoints": torch.zeros((17, 0, 3), dtype=torch.float32),
          "iscrowd": torch.zeros((0,), dtype=torch.int64)}

Aligned flag for RoIAlign

RoIAlign now supports the aligned flag, which aligns more precisely two neighboring pixel indices.

Refactored abstractions for C++ video decoder

This change is transparent to Python users, but the whole C++ backend for video reading (which needs torchvision to be compiled from source for it to be enabled for now) has been refactored into more modular abstractions.
The core abstractions are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/decoder, and the video reader functions exposed to Python, by leveraging those abstractions, can be written in a much more concise way

Backwards Incompatible Changes

  • Dropping Python2 support (#1761, #1792, #1984, #1976, #2037, #2033, #2017)
  • [Models] Fix inception quantized pre-trained model (#1954, #1969, #1975)
  • ONNX support for Mask R-CNN and Keypoint R-CNN has been temporarily dropped, but will be fixed in next releases

New Features

  • [Transforms] Add Perspective fill option (#1973)
  • [Ops] aligned flag in ROIAlign (#1908)
  • [IO] Update video reader to use new decoder (#1978)
  • [IO] torchscriptable functions for video io (#1653, #1794)
  • [Models] Support negative samples in Faster R-CNN, Mask R-CNN and Keypoint R-CNN (#1911, #2069)

Improvements

Datasets

  • STL10: don't check integrity twice when download=True (#1787)
  • Improve code readability and docstring of video datasets(#2020)
  • [DOC] Fixed typo in Cityscapes docs (#1851)

Transforms

  • Allow passing list to the input argument 'scale' of RandomResizedCrop (#1997) (#2008)
  • F.normalize unsqueeze mean & std only for 1-d arrays (#2002)
  • Improved error messages for transforms.functional.normalize(). (#1915)
  • generalize number of bands calculation in to_tensor (#1781)
  • Replace 2 transpose ops with 1 permute in ToTensor(#2018)
  • Fixed Pillow version check for Pillow >= 10 (#2039)
  • [DOC]: Improve transforms.Normalize docs (#1784, #1858)
  • [DOC] Fixed missing new line in transforms.Crop docstring (#1922)

Ops

  • Check boxes shape in RoIPool / Align (#1968)
  • [ONNX] Export new_empty_tensor (#1733)
  • Fix Tensor::data<> deprecation. (#2028)
  • Fix deprecation warnings (#2055)

Models

  • Add warning and note docs for scipy (#1842) (#1966)
  • Added repr attribute to GeneralizedRCNNTransform (#1834)
  • Replace mean on dimensions 2,3 by adaptive_avg_pooling2d in mobilenet (#1838)
  • Add init_weights keyword argument to Inception3 (#1832)
  • Add device to torch.tensor. (#1979)
  • ONNX export for variable input sizes in Faster R-CNN (#1840)
  • [JIT] Cleanup torchscript constant annotations (#1721, #1923, #1907, #1727)
  • [JIT] use // now that it is supported (#1658)
  • [JIT] add @torch.jit.script to ImageList (#1919)
  • [DOC] Improved docs for Faster R-CNN (#1886, #1868, #1768, #1763)
  • [DOC] add comments for the modified implementation of ResNet (#1983)
  • [DOC] Add comments to AnchorGenerator (#1941)
  • [DOC] Add comment in GoogleNet (#1932)

Documentation

  • Document int8 quantization model (#1951)
  • Update Doc with ONNX support (#1752)
  • Update README to reflect strict dependency on torch==1.4.0 (#1767)
  • Update sphinx theme (#2031)
  • Document origin of preprocessing mean / std (#1965)
  • Fix docstring formatting issues (#2049)

Reference scripts

  • Add return statement in evaluate function of detection reference script (#2029)
  • [DOC]Add default training parameters to classification reference README (#1998)
  • [DOC] Add README to references/segmentation (#1864)

Tests

  • Improve stability of test_nms_cuda (#2044)
  • [ONNX] Disable model tests since export of interpolate script module is broken (#1989)
  • Skip inception v3 in test/test_quantized_models (#1885)
  • [LINT] Small indentation fix (#1831)

Misc

  • Remove unintentional -O0 option in setup.py (#1770)
  • Create CODE_OF_CONDUCT.md
  • Update issue templates (#1913, #1914)
  • master version bump 0.5 → 0.6
  • replace torch 1.5.0 items flagged with deprecation warnings (fix #1906) (#1918)
  • CUDA_SUFFIX → PYTORCH_VERSION_SUFFIX

CI

  • Remove av from the binary requirements (#2006)
  • ci: Add cu102 to CI and packaging, remove cu100 (#1980)
  • .circleci: Switch to use token for conda uploads (#1960)
  • Improvements to CI infra (#2051, #2032, #2046, #1735, #2048, #1789, #1731, #1961)
  • typing only needed for python 3.5 and previous (#1778)
  • Move C++ and Python linter to CircleCI (#2056, #2057)

Bug Fixes

Datasets

  • bug fix on downloading voc2007 test dataset (#1991)
  • fix lsun docstring example (#1935)
  • Fixes EMNIST classes attribute is wrong #1716 (#1736)
  • Force object annotation to be a list in VOC (#1790)

Models

  • Fix for AnchorGenerator when device switch happen (#1745)
  • [JIT] fix len error (#1981)
  • [JIT] fix googlenet no aux logits (#1949)
  • [JIT] Fix quantized googlenet (#1974)

Transforms

  • Fix for rotate fill with Images of type F (#1828)
  • Fix fill in rotate (#1760)

Ops

  • Fix bug in DeformConv2d for batch sizes > 32 (#2027, #2040)
  • Fix for roi_align ONNX export (#1988)
  • Fix torchscript issue in ConvTranspose2d (#1917)
  • Fix interpolate when no scale_factor is passed (#1785)
  • Fix Windows build by renaming Python init functions (#1779)
  • fix for loading models with num_batches_tracked in frozen bn (#1728)

Deprecations

  • the pts_unit of pts from read_video and read_video_timestamp is deprecated, and will be replaced in next releases with seconds.

Towards better research to production support

15 Jan 22:11
85b8fbf
Compare
Choose a tag to compare

This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.

Note: this is the last version of torchvision that officially supports Python 2.

Breaking changes

Updated KeypointRCNN pre-trained weights

The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (#1609)

Corrected the implementation for MNASNet

The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (#1224)

Highlights

TorchScript support for all models

All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN.
Using torchscript with torchvision models is easy:

# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

# convert to torchscript
model_script = torch.jit.script(model)
model_script.eval()

# compute predictions
predictions = model_script([torch.rand(3, 300, 300)])

Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.

ONNX

All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.

# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
inputs = [torch.rand(3, 300, 300)]
predictions = model(inputs)

# convert to ONNX
torch.onnx.export(model, inputs, "model.onnx",
                  do_constant_folding=True,
                  opset_version=11  # opset_version 11 required for Mask R-CNN
                  )

Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.

Quantized models

torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:

model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model.eval()

# run the model with quantized inputs and weights
out = model(torch.rand(1, 3, 224, 224))

We provide pre-trained quantized weights for the following models:

Model Acc@1 Acc@5
MobileNet V2 71.658 90.150
ShuffleNet V2: 68.360 87.582
ResNet 18 69.494 88.882
ResNet 50 75.920 92.814
ResNext 101 32x8d 78.986 94.480
Inception V3 77.084 93.398
GoogleNet 69.826 89.404

Torchscript support for torchvision.ops

torchvision ops are now natively supported by torchscript. This includes operators such as nms, roi_align and roi_pool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.

New operators

Deformable Convolution (#1586) (#1660) (#1637)

As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows:

from torchvision import ops

module = ops.DeformConv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
x = torch.rand(1, 1, 10, 10)

# number of channels for offset should be a multiple
# of 2 * module.weight.size[2] * module.weight.size[3], which correspond
# to the kernel_size
offset = torch.rand(1, 2 * 3 * 3, 10, 10)

# the output requires both the input and the offsets
out = module(x, offset)

If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:

class BasicDeformConv2d(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
                 dilation=1, groups=1, offset_groups=1):
        super().__init__()
        offset_channels = 2 * kernel_size * kernel_size
        self.conv2d_offset = nn.Conv2d(
            in_channels,
            offset_channels * offset_groups,
            kernel_size=3,
            stride=stride,
            padding=dilation,
            dilation=dilation,
        )
        self.conv2d = ops.DeformConv2d(
            in_channels,
            out_channels,
            kernel_size=kernel_size,
            stride=stride,
            padding=dilation,
            dilation=dilation,
            groups=groups,
            bias=False
        )
    
    def forward(self, x):
        offset = self.conv2d_offset(x)
        return self.conv2d(x, offset)

Position-sensitive RoI Pool / Align (#1410)

Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.ps_roi_align, ps_roi_pool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.

New Features

TorchScript support

  • Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (#1670)
  • Make R-CNN models less verbose in script mode (#1671)
  • Minor torchscript fixes for Mask R-CNN (#1639)
  • remove BC-breaking changes (#1560)
  • Make maskrcnn scriptable (#1407)
  • Add Script Support for Video Resnet Models (#1393)
  • fix ASPPPooling (#1575)
  • Test that torchhub models are scriptable (#1242)
  • Make Googlnet & InceptionNet scriptable (#1349)
  • Make fcn_resnet Scriptable (#1352)
  • Make Densenet Scriptable (#1342)
  • make resnext scriptable (#1343)
  • make shufflenet and resnet scriptable (#1270)

ONNX

  • Enable KeypointRCNN test (#1673)
  • enable mask rcnn test (#1613)
  • Changes to Enable KeypointRCNN ONNX Export (#1593)
  • Disable Profiling in Failing Test (#1585)
  • Enable ONNX Test for FasterRcnn (#1555)
  • Support Exporting Mask Rcnn to ONNX (#1461)
  • Lahaidar/export faster rcnn (#1401)
  • Support Exporting RPN to ONNX (#1329)
  • Support Exporting MultiScaleRoiAlign to ONNX (#1324)
  • Support Exporting GeneralizedRCNNTransform to ONNX (#1325)

Quantization

  • Update quantized shufflenet weights (#1715)
  • Add commands to run quantized model with pretrained weights (#1547)
  • Quantizable googlenet, inceptionv3 and shufflenetv2 models (#1503)
  • Quantizable resnet and mobilenet models (#1471)
  • Remove model download from test_quantized_models (#1526)

Improvements

Bugfixes

  • Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (#1677)
  • Fix rpn memory leak and dataType errors. (#1657)
  • Fix torchvision install due to zippeg egg (#1536)

Transforms

  • Make shear operation area preserving (#1529)
  • PILLOW_VERSION deprecation updates (#1501)
  • Adds optional fill colour to rotate (#1280)

Ops

  • Add Deformable Convolution operation. (#1586) (#1660) (#1637)
  • Fix inconsistent NMS implementation between CPU and CUDA (#1556)
  • Speed up nms_cuda (#1704)
  • Implementation for Position-sensitive ROI Pool/Align (#1410)
  • Remove cpp extensions in favor of torch ops (#1348)
  • Make custom ops differentiable (#1314)
  • Fix Windows build in Torchvision Custom op Registration (#1320)
  • Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)
  • Register Torchvision Ops as Cutom Ops (#1267)
  • Use Tensor.data_ptr instead of .data (#1262)
  • Fix header includes for cpu (#1644)

Datasets

  • fixed test for windows by closing the created temporary files (#1662)
  • VideoClips windows fixes (#1661)
  • Fix VOC on Windows (#1641)
  • update dead LSUN link (#1626)
  • DatasetFolder should follow links when searching for data (#1580)
  • add .tgz support to extract_archive (#1650)
  • expose audio_channels as a parameter to kinetics dataset (#1559)
  • Implemented integrity check (md5 hash) after dataset download (#1456)
  • Move VideoClips dummy dataset to top level for pickling (#1649)
  • Remove download for ImageNet (#1457)
  • add tar.xz archive handler (#1361)
  • Fix DeprecationWarning for collections.Iterable import in LSUN (#1417)
  • Support empty target_type for CelebA dataset (#1351)
  • VOC2007 support test set (#1340)
  • Fix EMNSIT download URL (#1297) (#1318)
  • Refactored clip_sampler (#1562)

Documentation

  • Fix documentation for NMS (#1614)
  • More examples of functional transforms (#1402)
  • Fixed doc of crop functionals (#1388)
  • Added Training Sample code for fasterrcnn_resnet50_fpn (#1695)
  • Fix rpn.py typo (#1276)
  • Update README with minimum required version of PyTorch (#1272)
  • fix alignment of README (#1396)
  • fixed typo in DatasetFolder and ImageFolder (#1284)

Models

  • Bugfix for MNASNet (#1224)
  • Fix anchor dtype in AnchorGenerator (#1341)

Utils

  • Adding File...
Read more

Optimized video reader backend

07 Nov 16:33
efb0b26
Compare
Choose a tag to compare

This minor release introduces an optimized video_reader backend for torchvision. It is implemented in C++, and uses FFmpeg internally.

The new video_reader backend can be up to 6 times faster compared to the pyav backend.

  • When decoding all video/audio frames in the video, the new video_reader is 1.2x - 6x faster depending on the codec and video length.
  • When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), video_reader runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128]).

Using the optimized video backend

Switching to the new backend can be done via torchvision.set_video_backend('video_reader') function. By default, we use a backend based on top of PyAV.

Due to packaging issues with FFmpeg, in order to use the video_reader backend one need to first have ffmpeg available on the system, and then compile torchvision from source using the instructions from https://github.com/pytorch/vision#installation

Deprecations

In torchvision 0.4.0, the read_video and read_video_timestamps functions used pts relative to the video stream. This could lead to unaligned video-audio being returned in some cases.

torchvision now allow to specify a pts_unit argument in those functions. The default value is 'pts' (with same behavior as before), and the user can now specify pts_unit='sec', which produces consistently aligned results for both video and audio. The 'pts' value is deprecated for now, and kept for backwards-compatibility.

In the next release, the default value of pts_unit will change to 'sec', so that calling read_video without specifying pts_unit returns consistently aligned audio-video results. This will require users to update their VideoClips checkpoints, which used to store the information in pts by default.

Changelog

Compat with PyTorch 1.3 and bugfix

30 Oct 15:18
Compare
Choose a tag to compare

This minor release provides binaries compatible with PyTorch 1.3.

Compared to version 0.4.0, it contains a single bugfix for HMDB51 and UCF101 datasets, fixed in #1240

Video support, new datasets and models

08 Aug 15:09
d31eafa
Compare
Choose a tag to compare

This release adds support for video models and datasets, and brings several improvements.

Note: torchvision 0.4 requires PyTorch 1.2 or newer

Highlights

Video and IO

Video is now a first-class citizen in torchvision. The 0.4 release includes:

  • efficient IO primitives for reading and writing video files
  • Kinetics-400, HMDB51 and UCF101 datasets for action recognition, which are compatible with torch.utils.data.DataLoader
  • Pre-trained models for action recognition, trained on Kinetics-400
  • Training and evaluation scripts for reproducing the training results.

Writing your own video dataset is easy. We provide an utility class VideoClips that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It additionally allows to specify a fixed frame-rate for the videos.

from torchvision.datasets.video_utils import VideoClips

class MyVideoDataset(object):
    def __init__(self, video_paths):
        self.video_clips = VideoClips(video_paths,
                                      clip_length_in_frames=16,
                                      frames_between_clips``=1,
                                      frame_rate=15)

    def __getitem__(self, idx):
        video, audio, info, video_idx = self.video_clips.get_clip(idx)
        return video, audio
    
    def __len__(self):
        return self.video_clips.num_clips()

We provide pre-trained models for action recognition, trained on Kinetics-400, which reproduce the results on the original papers where they have been first introduced, as well the corresponding training scripts.

model clip @ 1
r3d_18 52.748
mc3_18 53.898
r2plus1d_18 57.498

Bugfixes

  • change aspect ratio calculation formula in references/detection (#1194)
  • bug fixes in ImageNet (#1149)
  • fix save_image when height or width equals 1 (#1059)
  • Fix STL10 __repr__ (#969)
  • Fix wrong behavior of GeneralizedRCNNTransform in Python2. (#960)

Datasets

New

  • Add USPS dataset (#961)(#1117)
  • Added support for the QMNIST dataset (#995)
  • Add HMDB51 and UCF101 datasets (#1156)
  • Add Kinetics400 dataset (#1077)

Improvements

  • Miscellaneous dataset fixes (#1174)
  • Standardize str argument verification in datasets (#1167)
  • Always pass transform and target_transform to abstract dataset (#1126)
  • Remove duplicate transform assignment in FakeDataset (#1125)
  • Automatic extraction for Cityscapes Dataset (#1066) (#1068)
  • Use joint transform in Cityscapes (#1024)(#1045)
  • CelebA: track attr names, support split="all", code cleanup (#1008)
  • Add folds option to STL10 (#914)

Models

New

  • Add pretrained Wide ResNet (#912)
  • Memory efficient densenet (#1003) (#1090)
  • Implementation of the MNASNet family of models (#829)(#1043)(#1092)
  • Add VideoModelZoo models (#1130)

Improvements

  • Fix resnet fpn backbone for resnet18 and resnet34 (#1147)
  • Add checks to roi_heads in detection module (#1091)
  • Make shallow copy of input list in GeneralizedRCNNTransform (#1085)(#1111)(#1084)
  • Make MobileNetV2 number of channel divisible by 8 (#1005)
  • typo fix: ouput -> output in Inception and GoogleNet (#1034)
  • Remove empty proposals from the RPN (#1026)
  • Remove empty boxes before NMS (#1019)
  • Reduce code duplication in segmentation models (#1009)
  • allow user to define residual settings in MobileNetV2 (#965)
  • Use flatten instead of view (#1134)

Documentation

  • Consistency in detection box format (#1110)
  • Fix Mask R-CNN docs (#1089)
  • Add paper references to VGG and Resnet variants (#1088)
  • Doc, Test Fixes in Normalize (#1063)
  • Add transforms doc to more datasets (#1038)
  • Corrected typo: 5 to 0.5 (#1041)
  • Update doc for torchvision.transforms.functional.perspective (#1017)
  • Improve documentation for fillcolor option in RandomAffine (#994)
  • Fix COCO_INSTANCE_CATEGORY_NAMES (#991)
  • Added models information to documentation. (#985)
  • Add missing import in faster_rcnn.py documentation (#979)
  • Improve make_grid docs (#964)

Tests

  • Add test for SVHN (#1086)
  • Add tests for Cityscapes Dataset (#1079)
  • Update CI to Python 3.6 (#1044)
  • Make test_save_image more robust (#1037)
  • Add a generic test for the datasets (#1015)
  • moved fakedata generation to separate module (#1014)
  • Create imagenet fakedata on-the-fly (#1012)
  • Minor test refactorings (#1011)
  • Add test for CIFAR10(0) (#1010)
  • Mock MNIST download for less flaky tests (#1004)
  • Add test for ImageNet (#976)(#1006)
  • Add tests for datasets (#966)

Transforms

New

Improvements

  • Allowing 'F' mode for 1 channel FloatTensor in ToPILImage (#1100)
  • Add shear parallel to y-axis (#1070)
  • fix error message in to_tensor (#1000)
  • Fix TypeError in RandomResizedCrop.get_params (#1036)
  • Fix normalize for different dtype than float32 (#1021)

Ops

  • Renamed vision.h files to vision_cpu.h and vision_cuda.h (#1051)(#1052)
  • Optimize nms_cuda by avoiding extra torch.cat call (#945)

Reference scripts

  • Expose data-path in the detection reference scripts (#1109)
  • Make utils.py work with pytorch-cpu (#1023)
  • Add mixed precision training with Apex (#972)(#1124)
  • Add reference code for similarity learning (#1101)

Build

  • Add windows build steps and wheel build scripts (#998)
  • add packaging scripts (#996)
  • Allow forcing GPU build with FORCE_CUDA=1 (#927)

Misc

  • Misc lint fixes (#1020)
  • Reraise error on failed downloading (#1013)
  • add more hub models (#974)
  • make C extension lazy-import (#971)

Training scripts, detection/segmentation models and more

22 May 19:45
Compare
Choose a tag to compare

This release brings several new features to torchvision, including models for semantic segmentation, object detection, instance segmentation and person keypoint detection, and custom C++ / CUDA ops specific to computer vision.

Note: torchvision 0.3 requires PyTorch 1.1 or newer

Highlights

Reference training / evaluation scripts

We now provide under the references/ folder scripts for training and evaluation of the following tasks: classification, semantic segmentation, object detection, instance segmentation and person keypoint detection.
Their purpose is twofold:

  • serve as a log of how to train a specific model.
  • provide baseline training and evaluation scripts to bootstrap research

They all have an entry-point train.py which performs both training and evaluation for a particular task. Other helper files, specific to each training script, are also present in the folder, and they might get integrated into the torchvision library in the future.

We expect users should copy-paste and modify those reference scripts and use them for their own needs.

TorchVision Ops

TorchVision now contains custom C++ / CUDA operators in torchvision.ops. Those operators are specific to computer vision, and make it easier to build object detection models.
Those operators currently do not support PyTorch script mode, but support for it is planned for future releases.

List of supported ops

  • roi_pool (and the module version RoIPool)
  • roi_align (and the module version RoIAlign)
  • nms, for non-maximum suppression of bounding boxes
  • box_iou, for computing the intersection over union metric between two sets of bounding boxes

All the other ops present in torchvision.ops and its subfolders are experimental, in particular:

  • FeaturePyramidNetwork is a module that adds a FPN on top of a module that returns a set of feature maps.
  • MultiScaleRoIAlign is a wrapper around roi_align that works with multiple feature map scales

Here are a few examples on using torchvision ops:

import torch
import torchvision

# create 10 random boxes
boxes = torch.rand(10, 4) * 100
# they need to be in [x0, y0, x1, y1] format
boxes[:, 2:] += boxes[:, :2]
# create a random image
image = torch.rand(1, 3, 200, 200)
# extract regions in `image` defined in `boxes`, rescaling
# them to have a size of 3x3
pooled_regions = torchvision.ops.roi_align(image, [boxes], output_size=(3, 3))
# check the size
print(pooled_regions.shape)
# torch.Size([10, 3, 3, 3])

# or compute the intersection over union between
# all pairs of boxes
print(torchvision.ops.box_iou(boxes, boxes).shape)
# torch.Size([10, 10])

Models for more tasks

The 0.3 release of torchvision includes pre-trained models for other tasks than image classification on ImageNet.
We include two new categories of models: region-based models, like Faster R-CNN, and dense pixelwise prediction models, like DeepLabV3.

Object Detection, Instance Segmentation and Person Keypoint Detection models

Warning: The API is currently experimental and might change in future versions of torchvision

The 0.3 release contains pre-trained models for Faster R-CNN, Mask R-CNN and Keypoint R-CNN, all of them using ResNet-50 backbone with FPN.
They have been trained on COCO train2017 following the reference scripts in references/, and give the following results on COCO val2017

Network box AP mask AP keypoint AP
Faster R-CNN ResNet-50 FPN 37.0    
Mask R-CNN ResNet-50 FPN 37.9 34.6  
Keypoint R-CNN ResNet-50 FPN 54.6   65.0

The implementations of the models for object detection, instance segmentation and keypoint detection are fast, specially during training.

In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.

For test time, we report the time for the model evaluation and post-processing (including mask pasting in image), but not the time for computing the precision-recall.

Network train time (s / it) test time (s / it) memory (GB)
Faster R-CNN ResNet-50 FPN 0.2288 0.0590 5.2
Mask R-CNN ResNet-50 FPN 0.2728 0.0903 5.4
Keypoint R-CNN ResNet-50 FPN 0.3789 0.1242 6.8

You can load and use pre-trained detection and segmentation models with a few lines of code

import torchvision

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# set it to evaluation mode, as the model behaves differently
# during training and during evaluation
model.eval()

image = PIL.Image.open('/path/to/an/image.jpg')
image_tensor = torchvision.transforms.functional.to_tensor(image)

# pass a list of (potentially different sized) tensors
# to the model, in 0-1 range. The model will take care of
# batching them together and normalizing
output = model([image_tensor])
# output is a list of dict, containing the postprocessed predictions

Pixelwise Semantic Segmentation models

Warning: The API is currently experimental and might change in future versions of torchvision

The 0.3 release also contains models for dense pixelwise prediction on images.
It adds FCN and DeepLabV3 segmentation models, using a ResNet50 and ResNet101 backbones.
Pre-trained weights for ResNet101 backbone are available, and have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC:

Network mean IoU global pixelwise acc
FCN ResNet101 63.7 91.9
DeepLabV3 ResNet101 67.4 92.4

New Datasets

New Models

Classification

Segmentation

  • Fully-Convolutional Network (FCN) with ResNet 101 backbone
  • DeepLabV3 with ResNet 101 backbone

Detection

  • Faster R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
  • Mask R-CNN R-50 FPN trained on COCO train2017 (#898) (#921)
  • Keypoint R-CNN R-50 FPN trained on COCO train2017 (#898) (#921) (#922)

Breaking changes

  • Make CocoDataset ids deterministically ordered (#868)

New Transforms

  • Add bias vector to LinearTransformation (#793) (#843) (#881)
  • Add Random Perspective transform (#781) (#879)

Bugfixes

  • Fix user warning when applying normalize (#810)
  • Fix logic error in check_integrity (#871)

Improvements

  • Fixing mutation of 2d tensors in to_pil_image (#762)
  • Replace tensor.view with tensor.unsqueeze(0) in make_grid (#765)
  • Change usage of view to reshape in resnet to enable running with mkldnn (#890)
  • Improve normalize to work with tensors located on any device (#787)
  • Raise an IndexError for FakeData.__getitem__() if the index would be out of range (#780)
  • Aspect ratio is now sampled from a logarithmic distribution in RandomResizedCrop. (#799)
  • Modernize inception v3 weight initialization code (#824)
  • Remove duplicate code from densenet load_state_dict (#827)
  • Replace endswith calls in a loop with a single endswith call in DatasetFolder (#832)
  • Added missing dot in webp image extensions (#836)
  • fix inconsistent behavior for ~ expression (#850)
  • Minor Compressions in statements in folder.py (#874)
  • Minor fix to evaluation formula of PILLOW_VERSION in transforms.functional.affine (#895)
  • added is_valid_file parameter to DatasetFolder (#867)
  • Add support for joint transformations in VisionDataset (#872)
  • Auto calculating return dimension of squeezenet forward method (#884)
  • Added progress flag to model getters (#875) (#910)
  • Add support for other normalizations (i.e., GroupNorm) in ResNet (#813)
  • Add dilation option to ResNet (#866)

Testing

  • Add basic model testing. (#811)
  • Add test for num_class in test_model.py (#815)
  • Added test for normalize functionality in make_grid function. (#840)
  • Added downloaded directory not empty check in test_datasets_utils (#844)
  • Added test for save_image in utils (#847)
  • Added tests for check_md5 and check_integrity (#873)

Misc

  • Remove shebang in setup.py (#773)
  • configurable version and package names (#842)
  • More hub models (#851)
  • Update travis to use more recent GCC (#891)

Documentation

  • Add comments regarding downsampling layers of resnet (#794)
  • Remove unnecessary bullet point in InceptionV3 doc (#814)
  • Fix crop and resized_crop docs in functional.py (#817)
  • Added dimensions in the comments of googlenet (#788)
  • Update transform doc with random offset of padding due to pad_if_needed (#791)
  • Added the argument transform_input in docs of InceptionV3 (#789)
  • Update documentation for MNIST datasets (#778)
  • Fixed typo in normalize() function. (#823)
  • Fix typo in squeezenet (#841)
  • Fix typo in DenseNet comment (#857)
  • Typo and syntax fixes to transform docstrings (#887)