Skip to content

Fully Convolutional Networks for Semantic Segmentation

Jinho Lee edited this page Feb 3, 2019 · 5 revisions

Resources

Abstract

  • Our key insight is to build “fully convolutional” networks.
  • We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task.
  • We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations.

Fully Convolutional Network

Implication

  • This is the first work to train FCNs end-to-end (1) for pixelwise prediction and (2) from supervised pre-training.
  • This approach does not make use of pre- and post-processing complications.
  • We define a skip architecture to take advantage of this feature spectrum that combines deep, coarse, semantic information and shallow, fine, appearance information.

Converting fully connected layers to convolutional layers



  • Fully connected layers can also be viewed as convolutions with kernels that cover their entire input regions.
  • Furthermore, while the resulting maps are equivalent to the evaluation of the original net on particular input patches, the computation is highly amortized over the overlapping regions of those patches.
  • An FCN naturally operates on an input of any size, and produces an output of corresponding (possibly resampled) spatial dimensions.
  • The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation.

Upsampling is backwards strided convolution(Transpose Convolution)



  • Thus upsampling is performed in-network for end-to-end learning by backpropagation from the pixelwise loss.
  • Note that the deconvolution filter in such a layer need not be fixed (e.g., to bilinear upsampling), but can be learned.

Skip Connections



  • The 32 pixel stride at the final prediction layer limits the scale of detail in the upsampled output. their output is dissatisfyingly coarse.
  • We address this by adding skips that combine the final prediction layer with lower layers with finer strides.
  • Combining fine layers and coarse layers lets the model make local predictions that respect global structure.
  • We add a 1 * 1 convolution layer on top of pool4 to produce additional class predictions. We fuse this output with the predictions computed on top of conv7 (convolutionalized fc7) at stride 32 by adding a 2* upsampling layer and summing6 both predictions. We call this net FCN-16s.
  • We continue in this fashion by fusing predictions from pool3 with a 2 upsampling of predictions fused from pool4 and conv7, building the net FCN-8s.

Result

  • Result 1

  • Result 2

  • Result 3

  • Result 4

  • Result 5