-
Notifications
You must be signed in to change notification settings - Fork 1
Fully Convolutional Networks for Semantic Segmentation
Jinho Lee edited this page Feb 3, 2019
·
5 revisions
- Our key insight is to build “fully convolutional” networks.
- We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task.
- We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.
- This is the first work to train FCNs end-to-end (1) for pixelwise prediction and (2) from supervised pre-training.
- This approach does not make use of pre- and post-processing complications.
- We define a skip architecture to take advantage of this feature spectrum that combines deep, coarse, semantic information and shallow, fine, appearance information.
- Fully connected layers can also be viewed as convolutions with kernels that cover their entire input regions.
- Furthermore, while the resulting maps are equivalent to the evaluation of the original net on particular input patches, the computation is highly amortized over the overlapping regions of those patches.
- An FCN naturally operates on an input of any size, and produces an output of corresponding (possibly resampled) spatial dimensions.
- The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation.
- Thus upsampling is performed in-network for end-to-end learning by backpropagation from the pixelwise loss.
- Note that the deconvolution filter in such a layer need not be fixed (e.g., to bilinear upsampling), but can be learned.
- The 32 pixel stride at the final prediction layer limits the scale of detail in the upsampled output. their output is dissatisfyingly coarse.
- We address this by adding skips that combine the final prediction layer with lower layers with finer strides.
- Combining fine layers and coarse layers lets the model make local predictions that respect global structure.
- We add a 1 * 1 convolution layer on top of pool4 to produce additional class predictions. We fuse this output with the predictions computed on top of conv7 (convolutionalized fc7) at stride 32 by adding a 2* upsampling layer and summing6 both predictions. We call this net FCN-16s.
- We continue in this fashion by fusing predictions from pool3 with a 2 upsampling of predictions fused from pool4 and conv7, building the net FCN-8s.
- Result 1
- Result 2
- Result 3
- Result 4
- Result 5
2019 Deepest Season 5 (2018.12.29 ~ 2019.06.15)