Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

input/output size for inference / transfer #24

Open
materialvision opened this issue Feb 24, 2024 · 5 comments
Open

input/output size for inference / transfer #24

materialvision opened this issue Feb 24, 2024 · 5 comments
Assignees
Labels
question Further information is requested

Comments

@materialvision
Copy link

Hi in the "original" pytorch CycleGAN it is possible to train on larger images like 2048 but cut up to square 256 or 512 in size for example by using the arguments --load_size 2048 --crop_size 256, like described here: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/docs/tips.md#trainingtesting-with-high-res-images

When using the model I can infer large images even if the model is trained on 256. Would something like that in theory be possible with uvcgan2? Any pointers to how to modify it for this? It is very useful to be able to use the model on larger images in the end...

@usert5432
Copy link
Collaborator

usert5432 commented Feb 26, 2024

Hi @materialvision,

Thank you for your interest in our work.

The uvcgan2 data handling is controlled through the transform_test and transform_train parameters of the training configuration.

For example, the code below from the scripts/celeba/train_celeba_male2female_translation.py demonstrates the configuration that:
a. During Training: will resize images to make the smallest side to have size of 256 pixels, followed by taking a random crop of size 256x256 pixels.

'transform_train' : [
'random-flip-horizontal',
{ 'name' : 'resize', 'size' : 256, },
{ 'name' : 'random-crop', 'size' : 256, },
],

b. During Inference: will resize images to make the smallest side to have size of 256 pixels, followed by taking a center crop of size 256x256 pixels.

'transform_test' : [
{ 'name' : 'resize', 'size' : 256, },
{ 'name' : 'center-crop', 'size' : 256, },
],

These configuration options will allow uvcgan2 to handle images of any size.
If more complicated data transformation are required, a separate data loader can be created that will manually implement them.

Please, let me know if I should elaborate more on these points

@usert5432 usert5432 self-assigned this Feb 26, 2024
@usert5432 usert5432 added the question Further information is requested label Feb 26, 2024
@materialvision
Copy link
Author

materialvision commented Feb 27, 2024

Thanks for your answer and great work. I just wanted to make things clearer for myself... I have tried to train a model with the following config:
'shape' : (3, 512, 512),
'transform_train' : [
'random-flip-horizontal',
{ 'name' : 'resize', 'size' : 2048, },
{ 'name' : 'random-crop', 'size' : 512, },
],
'transform_test' : [
{ 'name' : 'resize', 'size' : 2048, },
{ 'name' : 'center-crop', 'size' : 2048, },
],
} for domain in [ 'A', 'B' ]

but when testing with inference on images of size 2048x2048 I get the following error:

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1024 but got size 16384 for tensor number 1 in the list.

Did I miss something here? Maybe it was wrong to adjust the "shape" argument?

Thanks again for your help.

@usert5432
Copy link
Collaborator

I think, that is an expected outcome.

Maybe it was wrong to adjust the "shape" argument?

No, I think this is correct. The shape argument needs to match the crop size. If you intend to train the network on crops of size (512, 512), then the shape argument is correct.

The problem happens because the network was trained on random crops of size 512 ( { 'name' : 'random-crop', 'size' : 512, },), but the test crops ( { 'name' : 'center-crop', 'size' : 2048, },) are of size 2048, so the inference fails. To fix this, the transformations need to be adjusted a bit. The precise configuration of transformations depends on the exact usecase. Without knowing the details, I can only recommend to set all the size parameters to 512.

@materialvision
Copy link
Author

Thank you. Yes, changing the center-crop of the test config to 512 does fix the error. But to explain the usecase, the goal was to train on 512px images (or crops) to keep the load on the gpu down and train faster, but infer on larger 2048 images (not sized down or cropped but keeping the full quality). My test project is a "de-blur / de-convolution" of images, so the model needs to work with larger resolutions.

Are there some "adjustments to the transformations" as you mention that I can do to achieve this?

Thanks again for guidance and advice.

@usert5432
Copy link
Collaborator

but infer on larger 2048 images (not sized down or cropped but keeping the full quality).

Oh, I see now. Unfortunately, this is not possible with UVCGAN. CycleGAN uses FCN-type generator which can transparently work with images of any size. UVCGAN generator is not FCN, thus one cannot easily train it on crops, but infer on full images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants