DeepLab EDDL and PyTorch comparison #302

lauracanalini · 2021-10-29T10:49:30Z

Hi,
I'm trying to reproduce DeepLab for image segmentation with EDDL starting from this PyTorch model, but it achieves about ~0.1 less of IoU.
I'm using EDDL v1.0.3b with CUDNN and ECVL v0.4.2.
This is the configuration of EDDL and PyTorch trainings:

BCE loss
Adam optimizer
Learning rate 7e-5
Batch size 4
Image size 512
1 gpu
Same augmentations

I tried also with the dice loss and results are similar.

This is the model used:

class DeepLab
{
    int num_classes_;
    float bn_momentum = 0.1f, bn_eps = 1e-5f;

    layer ASPPModule(layer x, int planes, int kernel_size, int padding, int dilation)
    {
        x = ReLu(BatchNormalization(Conv2D(x, planes, { kernel_size,kernel_size }, { 1,1 }, "same", false, 1, { dilation,dilation }), true, bn_momentum, bn_eps));
        return x;
    }
    layer ASPP(layer x, int output_stride)
    {
        vector<int> dilations;
        if (output_stride == 16) {
            dilations = { 1, 6, 12, 18 };
        }
        else {
            throw "Not implemented output_stride";
        }

        layer x1 = ASPPModule(x, 256, 1, true, dilations[0]);
        layer x2 = ASPPModule(x, 256, 3, true, dilations[1]);
        layer x3 = ASPPModule(x, 256, 3, true, dilations[2]);
        layer x4 = ASPPModule(x, 256, 3, true, dilations[3]);
        layer x5 = GlobalAveragePool2D(x);
        x5 = ReLu(BatchNormalization(Conv2D(x5, 256, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x5 = UpSampling2D(x5, { x4->getShape()[2], x4->getShape()[3] }, "bilinear");
        x = Concat({ x1,x2,x3,x4,x5 });
        x = ReLu(BatchNormalization(Conv2D(x, 256, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x = Dropout(x, 0.5f);
        return x;
    }

    layer Decoder(layer x, layer low_level_feat)
    {
        low_level_feat = ReLu(BatchNormalization(Conv2D(low_level_feat, 48, { 1,1 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps));
        x = UpSampling2D(x, { 4,4 }, "bilinear");
        x = Concat({ x, low_level_feat });

        x = Dropout(ReLu(BatchNormalization(Conv2D(x, 256, { 3,3 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps)), 0.5f);
        x = Dropout(ReLu(BatchNormalization(Conv2D(x, 256, { 3,3 }, { 1,1 }, "same", false), true, bn_momentum, bn_eps)), 0.1f);
        x = Conv2D(x, num_classes_, { 1,1 }, { 1,1 }, "same", false);
        return x;
    }

public:

    DeepLab(int num_classes = 1) : num_classes_{ num_classes } {}

    layer init(layer& input, int output_stride = 16)
    {
        // Import the pretrained onnx obtained from the pretrained PyTorch ResNet101
        auto resnet101 = import_net_from_onnx_file("resnet101_simpl.onnx", { input->getShape()[1], input->getShape()[2], input->getShape()[3] });
        input = getLayer(resnet101, "input"); // set input layer
        auto low_level_feat = getLayer(resnet101, "Relu_35");
        auto x = getLayer(resnet101, "Relu_341");

        x = ASPP(x, output_stride);
        x = Decoder(x, low_level_feat);
        x = UpSampling2D(x, { 4,4 }, "bilinear");
        x = Sigmoid(x);
        return x;
    }
};

To reproduce the experiments you can replace an ECVL example with this skin_lesion_segmentation code modifying the dataset_path to the ISIC segmentation dataset (link of resnet101 pretrained ONNX).

Do you spot some mistakes in the model definition or in the way I use the layers of the pretrained network combined with those of DeepLab that have to be trained from scratch?

The text was updated successfully, but these errors were encountered:

chavicoski · 2021-11-05T15:38:15Z

Hi,
I am currently reproducing the experiment. I have some questions.

The plots that you provide of the BCE and DICE losses are from the training split? Watching the code to reproduce the experiment I see that the only metric tracked during validation is IoU.
The code seems correct to me, but one thing that might be causing the problem is that the "bilinear" interpolation of the UpSampling2D layer is not valid, we only have implemented the "nearest" mode. When it is executed it prints some warnings saying that it is going to use "nearest", not "bilinear".

I think there is a bug in the experiment code that you provided. In the part that computes the IoU the variable that is passed to the IoU function for the target is not valid and it raises a segfault. With this change it works.

 // Compute metric and optionally save the output images
 for (int k = 0; k < current_bs; ++k, ++n) {
     unique_ptr<Tensor> pred(output->select({ to_string(k) }));
     TensorToView(pred.get(), pred_t);
     unique_ptr<Tensor> target(y->select({ to_string(k) }));
     TensorToView(target.get(), target_t);
     // ** BEFORE **
     //cout << " - IoU: " << BinaryIoU(pred_t, orig_gt, 0.5, metric_list_iou);
     // ** AFTER **
     cout << " - IoU: " << BinaryIoU(pred_t, target_t, 0.5, metric_list_iou);
 }

lauracanalini · 2021-11-05T16:36:00Z

No, they represent the IoU computed at each epoch on the validation split. The first plot compares an EDDL and a PyTorch training process with BCE loss, the second with Dice loss. I changed the plots title to make it more clear.
Yes we didn't change the interpolation in the UpSampling layer, but in eddl (correct me if I'm wrong) after the warning the "nearest" interpolation is automatically applied, so I don't think that only this could have caused all this difference. Anyway I can launch a PyTorch training with all the interpolation set to "nearest" in order to see if there is any variation.
Sorry, you're right. I changed some variable names right before sending you the code and I miss that.

RParedesPalacios · 2021-11-07T17:50:26Z

We still doesn't support dilations

chavicoski · 2021-11-08T08:02:02Z

We support them only with CuDNN. With CPU or GPU (without CuDNN) no.

lauracanalini · 2021-11-08T09:56:30Z

I tried to run a PyTorch training with all the interpolation in "nearest" mode but basically it doesn't change anything

chavicoski · 2021-11-10T15:29:18Z

Hi,
I am doing some experiments with the code that you provided. I would like to know the difference without data augmentation. Can you provide the results with Pytorch without using data augmentation? Also, if you share the full pytorch code to reproduce the experiment that would be great.

lauracanalini · 2021-11-15T10:57:31Z

Hi,
this is all the python code, usually launch with main.py /path/to/isic_segmentation.yml --workers 6 --gpu 1. Removing the augmentations (only Resize is applied), there is still a difference between EDDL and PyTorch (although it is a bit smaller as PyTorch gets a little worse while EDDL stays about the same level).

chavicoski · 2021-11-15T13:42:43Z

Hi,

I am doing some tests with the Pytorch and EDDL versions and I think that there is a problem with the data that is being feed to the model.
With the EDDL I removed the data augmentation and I am using only the resizing and normalization:

auto training_augs = make_shared<SequentialAugmentationContainer>(
    AugResizeDim(size, InterpolationType::cubic),
    AugToFloat32(255, 255),
    AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);

auto validation_augs = make_shared<SequentialAugmentationContainer>(
    AugResizeDim(size, InterpolationType::cubic),
    AugToFloat32(255, 255),
    AugNormalize({0.67501814, 0.5663187, 0.52339128}, {0.11092593, 0.10669603, 0.119005}) // isic stats
);

And with Pytorch the same:

 train_transform = A.Compose([
      A.Resize(args.size, args.size, cv2.INTER_CUBIC),
      A.Normalize(norm_mean, norm_std),
      ToTensorV2(),
  ])
  valid_test_transform = A.Compose([
      A.Resize(args.size, args.size, cv2.INTER_CUBIC),
      A.Normalize(norm_mean, norm_std),
      ToTensorV2(),
  ])

Now I am printing the max, min and mean values for each batch that is being loaded during training and I get very different results with the EDDL and the Pytorch.
With EDDL (x is the input and y is the ground truth) for eaxmple:

x_max = 4.06464 - x_min = -6.08531 - x_mean = 0.0575973
y_max = 1 - y_min = 0 - y_mean = 0.238226

With Pytorch:

x_max = 0.015939775854349136 - x_min = -0.023863941431045532 - x_mean = 0.0009028838248923421
y_max = 1.0 - y_min = 0.0 - y_mean = 0.25823211669921875

Can you see if the same thing happens to you? Maybe the transformations that are being made to the images are not equivalent.

lauracanalini · 2021-11-15T15:24:23Z

I think you have to comment line 58 in dataset.py where images are divided by 255. You asked for the code without augmentations so I remove also the Normalize, but I had to add line 58 because in PyTorch the division is done inside the Normalize (while in ECVL there are two separate augmentations).
If you comment the line results are quite similar.

RParedesPalacios · 2021-11-24T16:38:23Z

EDDL BatchNorm momentum (and Keras) is (1-Pytorch_momentum)

Then

float bn_momentum = 0.1f, bn_eps = 1e-5f;

Should be:

float bn_momentum = 0.9f, bn_eps = 1e-5f;

CostantinoGrana · 2021-11-25T08:20:27Z

So if I understand correctly the momentum in EDDL is how much of the running average is kept, and not how much of the current batch average is used to update:

eddl/src/hardware/cpu/nn/cpu_bn.cpp

Lines 147 to 150 in e6de5aa

    
           if (momentum != 0.0) { 
        
               global_mean[j] = momentum * global_mean[j] + (1.0 - momentum) * mean[j]; 
        
               global_variance[j] = momentum * global_variance[j] + (1.0 - momentum) * variance[j]; 
        
           }

Let's see if this fixes it.

RParedesPalacios · 2021-11-25T08:34:49Z

Yes you are right.

Alvaro just try it however it seems that it doesn't fix the problem... what is rare....

lauracanalini mentioned this issue Nov 29, 2021

ONNX loader doesn't load the BatchNorm parameters epsilon and momentum? #305

Closed

salvacarrion added bug Something isn't working Hard help wanted Extra attention is needed labels Dec 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepLab EDDL and PyTorch comparison #302

DeepLab EDDL and PyTorch comparison #302

lauracanalini commented Oct 29, 2021 •

edited

Loading

chavicoski commented Nov 5, 2021

lauracanalini commented Nov 5, 2021 •

edited

Loading

RParedesPalacios commented Nov 7, 2021

chavicoski commented Nov 8, 2021

lauracanalini commented Nov 8, 2021

chavicoski commented Nov 10, 2021

lauracanalini commented Nov 15, 2021

chavicoski commented Nov 15, 2021

lauracanalini commented Nov 15, 2021

RParedesPalacios commented Nov 24, 2021 •

edited

Loading

CostantinoGrana commented Nov 25, 2021

RParedesPalacios commented Nov 25, 2021

DeepLab EDDL and PyTorch comparison #302

DeepLab EDDL and PyTorch comparison #302

Comments

lauracanalini commented Oct 29, 2021 • edited Loading

chavicoski commented Nov 5, 2021

lauracanalini commented Nov 5, 2021 • edited Loading

RParedesPalacios commented Nov 7, 2021

chavicoski commented Nov 8, 2021

lauracanalini commented Nov 8, 2021

chavicoski commented Nov 10, 2021

lauracanalini commented Nov 15, 2021

chavicoski commented Nov 15, 2021

lauracanalini commented Nov 15, 2021

RParedesPalacios commented Nov 24, 2021 • edited Loading

CostantinoGrana commented Nov 25, 2021

RParedesPalacios commented Nov 25, 2021

lauracanalini commented Oct 29, 2021 •

edited

Loading

lauracanalini commented Nov 5, 2021 •

edited

Loading

RParedesPalacios commented Nov 24, 2021 •

edited

Loading