Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training-testing strategies of downstream datasets #19

Open
jsadu826 opened this issue Jul 29, 2024 · 19 comments
Open

About training-testing strategies of downstream datasets #19

jsadu826 opened this issue Jul 29, 2024 · 19 comments

Comments

@jsadu826
Copy link

Hello! I'm trying to reproduce the result on CC-CCII and MM-WHS.

  • For CC-CCII, what is the training-validation-testing setting? Is it a 3-fold cross-validation?
  • For MM-WHS, is the testing done by running the matlab script (matlab_evaluation_script.m) provided by the authors on the encrypted testing labels?

Thank you!

@Luffy03
Copy link
Owner

Luffy03 commented Jul 29, 2024

Hi, thanks for your attention to our work!

  1. For CC-CCII, you can three-fold validation and report the mean results. In my experiments, the results of three folds are almost the approximate. You can have a try.
  2. For WHS, since most of the existing works did not report their test results. For convenience, you can also report 5-fold validation results on the training set.

@jsadu826
Copy link
Author

jsadu826 commented Aug 1, 2024

For CC-CCII data pre-processing, for each scan of each patient, are all the PNGs stacked to get a 3D npy volume with shape (n_slices, n_channels, height, width)? Do weed need to select only the lesion slices? When training, are the volumes resized to (n_slices, n_channels, 256, 256) and directly sent to the model, without further cropping into fixed-size ROIs (e.g. 64x64x64) as in the segmentation tasks?

@Luffy03
Copy link
Owner

Luffy03 commented Aug 1, 2024

We don't need to select the lesion slice. Instead, we input 3D volume to train a 3D network. Please refer to "https://github.com/Luffy03/VoCo/blob/main/Finetune/CC-CCII/utils/data_utils.py" for details.

@jsadu826
Copy link
Author

How is the BraTS 2021 dataset preprocessed in training and testing and which modality (t1, t1ce, t2, flair) is used.

@Luffy03
Copy link
Owner

Luffy03 commented Aug 13, 2024

Hi, we used all of four modalities.

@jsadu826
Copy link
Author

So the model input has 4 channels?

@Luffy03
Copy link
Owner

Luffy03 commented Aug 13, 2024

Yes, exactly. And we don't load the first layer of our pre-trained models.

@jsadu826
Copy link
Author

Great thanks!

@jsadu826
Copy link
Author

jsadu826 commented Sep 5, 2024

Also, could you please share the train-valid-test json for BraTS21? Thank you~

@Luffy03
Copy link
Owner

Luffy03 commented Sep 5, 2024

Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.

@jsadu826
Copy link
Author

jsadu826 commented Sep 5, 2024

So VoCo used 5-fold crossvalidation for BraTS21, as in https://arxiv.org/pdf/2201.01266

@Luffy03
Copy link
Owner

Luffy03 commented Sep 5, 2024

Yes, you can have a try. In my experiments, the results of different folds are approximate, since BraTS21 is already with adequate cases . I notice that some previous works report the results of the first fold, maybe 5-fold is not an essential setting.

@jsadu826
Copy link
Author

jsadu826 commented Sep 6, 2024

Yes, exactly. And we don't load the first layer of our pre-trained models.

Does it mean not loading swinViT.patch_embed.proj.weight, encoder1.layer.conv1.conv.weight, and encoder1.layer.conv3.conv.weight?

@Luffy03
Copy link
Owner

Luffy03 commented Sep 6, 2024

Hi, you can use this code to check.

def load(model, model_dict):
    if "state_dict" in model_dict.keys():
        state_dict = model_dict["state_dict"]
    elif "network_weights" in model_dict.keys():
        state_dict = model_dict["network_weights"]
    elif "net" in model_dict.keys():
        state_dict = model_dict["net"]
    else:
        state_dict = model_dict

    if "module." in list(state_dict.keys())[0]:
        print("Tag 'module.' found in state dict - fixing!")
        for key in list(state_dict.keys()):
            state_dict[key.replace("module.", "")] = state_dict.pop(key)

    if "backbone." in list(state_dict.keys())[0]:
        print("Tag 'backbone.' found in state dict - fixing!")
    for key in list(state_dict.keys()):
        state_dict[key.replace("backbone.", "")] = state_dict.pop(key)

    if "swin_vit" in list(state_dict.keys())[0]:
        print("Tag 'swin_vit' found in state dict - fixing!")
        for key in list(state_dict.keys()):
            state_dict[key.replace("swin_vit", "swinViT")] = state_dict.pop(key)

    current_model_dict = model.state_dict()
    new_state_dict = {
        k: state_dict[k] if (k in state_dict.keys()) and (state_dict[k].size() == current_model_dict[k].size()) else current_model_dict[k]
        for k in current_model_dict.keys()}

    model.load_state_dict(new_state_dict, strict=True)
    print("Using VoCo pretrained backbone weights !!!!!!!")

    return model

@jsadu826
Copy link
Author

Hi, the json file of BraTS21 is copied from https://drive.google.com/file/d/1i-BXYe-wZ8R9Vp3GXoajGyqaJ65Jybg1/view?usp=sharing. We will also provide our implementation of BraTS21 recently.

Hi, I'd like to know why the BraTS21 results obtained by training SwinUNETR from scratch are much higher in SwinUNETR than in VoCo, especially the enhanced tumor. These two works seems to have used the same data split.

In SwinUNETR paper:
image

In VoCo paper:
image

@Luffy03
Copy link
Owner

Luffy03 commented Sep 25, 2024

Hi, it is a mistake in our previous implementation 😟 since we inherited the implementation of swinunetr's official codes in the cvpr version. Our current version can achieve higher performance and we will release it recently. By the way, can you reproduce the results as the swinunetr reported? We cannot reproduce it.

@jsadu826
Copy link
Author

jsadu826 commented Sep 25, 2024

I copied the data split from this IEEE TMI paper, which splits the 1251 BraTS21 training data into train/valid/test = 833/209/209. The training code is based on this. The dices on the testing set were approximately: TC = 90, WT = 93, ET = 86 (almost same for both finetuning on VoCo and training from scratch, but finetuning on VoCo indeed converged much much faster).
image

@Luffy03
Copy link
Owner

Luffy03 commented Sep 25, 2024

Thanks for sharing, encouraging to hear that. Our reproduced result is about 91% Dice but it is not based on this. Seems your results are also good. We also find that VoCo did not improve significantly in this dataset (less than 2%). Maybe it is caused by the modality gap.

@Luffy03
Copy link
Owner

Luffy03 commented Oct 14, 2024

Dear researchers, our work is now available at Large-Scale-Medical, if you are still interested in this topic. Thank you very much for your attention to our work, it does encourage me a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants