diff --git a/images/lraspp2.png b/images/lraspp2.png new file mode 100644 index 00000000..cf4c8f1e Binary files /dev/null and b/images/lraspp2.png differ diff --git a/scripts/pytorch_vision_lraspp.md b/scripts/pytorch_vision_lraspp.md new file mode 100644 index 00000000..e6756377 --- /dev/null +++ b/scripts/pytorch_vision_lraspp.md @@ -0,0 +1,100 @@ +--- +layout: hub_detail +background-class: hub-background +body-class: hub +title: LR-ASPP +summary: Lite Reduced Atrous Spatial Pyramid Pooling model with MobileNetV3-Large backbone. +category: researchers +image: lraspp2.png +author: Pytorch Team +tags: [vision, scriptable] +github-link: https://github.com/pytorch/vision/blob/main/torchvision/models/segmentation/lraspp.py +github-id: pytorch/vision +featured_image_1: deeplab1.png +featured_image_2: lraspp2.png +accelerator: cuda-optional +order: 10 +--- + +```python +import torch +model = torch.hub.load('pytorch/vision:v0.10.0', 'lraspp_mobilenet_v3_large', pretrained=True) +model.eval() +``` + +All pre-trained models expect input images normalized in the same way, +i.e. mini-batches of 3-channel RGB images of shape `(N, 3, H, W)`, where `N` is the number of images, `H` and `W` are expected to be at least `224` pixels. +The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]` +and `std = [0.229, 0.224, 0.225]`. + +The model returns an `OrderedDict` with two Tensors that are of the same height and width as the input Tensor, but with 21 classes. +`output['out']` contains the semantic masks, and `output['aux']` contains the auxillary loss values per-pixel. In inference mode, `output['aux']` is not useful. +So, `output['out']` is of shape `(N, 21, H, W)`. More documentation can be found [here](https://pytorch.org/vision/stable/models.html#object-detection-instance-segmentation-and-person-keypoint-detection). + + +```python +# Download an example image from the pytorch website +import urllib +url, filename = ("https://github.com/pytorch/hub/raw/master/images/deeplab1.png", "deeplab1.png") +try: urllib.URLopener().retrieve(url, filename) +except: urllib.request.urlretrieve(url, filename) +``` + +```python +# sample execution (requires torchvision) +from PIL import Image +from torchvision import transforms +input_image = Image.open(filename) +input_image = input_image.convert("RGB") +preprocess = transforms.Compose([ + transforms.ToTensor(), + transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), +]) + +input_tensor = preprocess(input_image) +input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model + +# move the input and model to GPU for speed if available +if torch.cuda.is_available(): + input_batch = input_batch.to('cuda') + model.to('cuda') + +with torch.no_grad(): + output = model(input_batch)['out'][0] +output_predictions = output.argmax(0) +``` + +The output here is of shape `(21, H, W)`, and at each location, there are unnormalized probabilities corresponding to the prediction of each class. +To get the maximum prediction of each class, and then use it for a downstream task, you can do `output_predictions = output.argmax(0)`. + +Here's a small snippet that plots the predictions, with each color being assigned to each class (see the visualized image on the left). + +```python +# create a color pallette, selecting a color for each class +palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1]) +colors = torch.as_tensor([i for i in range(21)])[:, None] * palette +colors = (colors % 255).numpy().astype("uint8") + +# plot the semantic segmentation predictions of 21 classes in each color +r = Image.fromarray(output_predictions.byte().cpu().numpy()).resize(input_image.size) +r.putpalette(colors) + +import matplotlib.pyplot as plt +plt.imshow(r) +# plt.show() +``` + +### Model Description + +LR-ASPP description here. + +Their accuracies of the pre-trained models evaluated on COCO val2017 dataset are listed below. + +| Model structure | Mean IOU | Global Pixelwise Accuracy | +| ----------------- | ----------- | --------------------------| +| lraspp_mobilenet_v3_large | 57.9 | 91.2 | + +### Resources + + - [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244) + - [PyTorch Blog on Training LRSAPP Model](https://pytorch.org/blog/torchvision-mobilenet-v3-implementation)