Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with model training #1

Open
anothersin opened this issue Jul 28, 2021 · 3 comments
Open

Problems with model training #1

anothersin opened this issue Jul 28, 2021 · 3 comments

Comments

@anothersin
Copy link

Thanks a lot for your contributions. We are very interested in your work, but after retraining the model (with everything adjusted to match the parameters in the paper), we found the new model output unacceptable and appeared to be outputting the wavelet components. It is worth mentioning that we did not find any problem in the output using the provided pre-trained model. This made us confused. So we replaced another dataset and this still happened. We are sure that it is not a problem with the input and output of the model. Now we are at a loss, can you help us?
Our training environment is torch1.7 cuda10.1 on tesla v100

Translated with www.DeepL.com/Translator (free version)
95

@hhb072
Copy link
Owner

hhb072 commented Aug 2, 2021

I am happy you are interested in our work. May I ask when does this problem occur? During the early training or later?

I have retrained the code on the DDN dataset and do not find the problem.

@anothersin
Copy link
Author

Thank you very much for your reply. To be precise we encountered this problem during training, we used a new dataset for training and tested the model with 10, 100 and 500 epochs respectively and the output was similar to this image. In order to rule out whether it is the influence of the training data, we trained directly with Rain100L and the results were similar to this case. In order to exclude the influence of the test code, we tested with the model you provided and found the results are normal.
The change we made to the code was that we only changed the DataLoader, but we double-checked to make sure the data wasn't entered incorrectly.
The specific changes are as follows, we remove 'args.trainfiles' and replace it with traversing the input folder to get the input image name.

In main.py load dataset part.

trainfiles = os.listdir(opt.trainroot)
# train_list = readlinesFromFile(trainfiles)   opt.trainroot='./Deraining/Datasets/train/input/'
train_list = trainfiles
assert len(train_list) > 0

train_set = ImageDatasetFromFile(train_list, opt.trainroot, crop_height=opt.output_height,
                                 output_height=opt.output_height, is_random_crop=True, is_mirror=True,
                                 normalize=None)

in dataset.py load_image function

  imgR = Image.open(file_path)
  image_L_path = join('./Deraining/Datasets/train/target/' + os.path.basename(file_path))
  imgL = Image.open(image_L_path)

@ghost
Copy link

ghost commented Sep 24, 2021

Thanks a lot for your contributions. We are very interested in your work, but after retraining the model (with everything adjusted to match the parameters in the paper), we found the new model output unacceptable and appeared to be outputting the wavelet components. It is worth mentioning that we did not find any problem in the output using the provided pre-trained model. This made us confused. So we replaced another dataset and this still happened. We are sure that it is not a problem with the input and output of the model. Now we are at a loss, can you help us?
Our training environment is torch1.7 cuda10.1 on tesla v100

Translated with www.DeepL.com/Translator (free version)
95

I also encountered the same problem. Have you solved it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants