Perceptual losses like VGG-loss were reported to be much more correlated to human perception of image similariry than standard losses like L2. Replacing L2 loss in tasks like autoencoders or super resolution models lead to better looking results.
This work continues the work of Dan Amir Understanding and Simplifying Perceptual Distances Where he analyzes the VGG-loss as a kernel-MMD between patch distribution in the input images and proposes a new non-parametric loss 'MMD++'' which is comparable to VGG perceptualy while being much simpler, faster and extandable.
In this repository I perform varius experiment that mostly aim at comparing VGG perceptual losses to patch distribution losses trying to verify Dan's results and extend them to more practical high-end tasks like image generation.
The first clue about the nature of VGG-loss comes from the Adobe-2AFC test: originaly proposed at The Unreasonable Effectiveness of Deep Features as a Perceptual Metric this test is referenced in Dan's work as well. The Adobe-2AFC dataset contains patch triplets labeld by humans for which one of the first two is closer to the third perceptualy. These annotation are ued as GT for perceptual loses to be tested on. As opposed to the original paper were randomly initialized VGG works worse than trained VGG Dan shows a simple variant random VGG that acheives comparable reults. The MMD++ loss as well.
2AFC sample | results table |
---|---|
This type of experiment also appear in Dan's work under the name Generalized Image Mean (GIM). The mean of a set of images in L2 is a blurry unrealistic one. Optimizing for the mean of the same images while using VGG-loss as a metric leads to much smooth and perceptually good looking results. This leads to a set of experiments tha allow comparing the results of optimization a mea image with different image losses. Here, in some cases MMD++ shows a better performance than VGG.
2 sets of 6 similar images | results with different losses |
---|---|
cluster datasets using kmeans/one-hot-autoencoders while using perceptual distance metrics
The most promiment line of work in neural style transfer surged by a serie of paper by Gatis et Al Image Style Transfer Using Convolutional Neural Networks. The key concept is using L2 distane between VGG feature maps/ Gram matrices of those feature maps in order to conserve content/style accordingly. This is very much related to the concept of perceptual loss but experiments with random VGG network show poor results.
*content + style | VGG pretrained | VGG random |
---|---|---|
VGG pretrained | VGG random |
---|---|
Train autoencoder/GLO with VGG-loss instead of L2 is known to work better. Here I show this and try to acheive comparable results with random VGGs and MMD++. Below are train-set reconstruction results of encoders (a DCGan generator and a similar encoder) trained on 128x128 FFHQ dataset with different losses
L2 | VGG pretrained |
---|---|
*VGG random * | MMD++ |
---|---|