input/output size for inference / transfer #24

materialvision · 2024-02-24T09:02:50Z

Hi in the "original" pytorch CycleGAN it is possible to train on larger images like 2048 but cut up to square 256 or 512 in size for example by using the arguments --load_size 2048 --crop_size 256, like described here: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/docs/tips.md#trainingtesting-with-high-res-images

When using the model I can infer large images even if the model is trained on 256. Would something like that in theory be possible with uvcgan2? Any pointers to how to modify it for this? It is very useful to be able to use the model on larger images in the end...

usert5432 · 2024-02-26T04:12:23Z

Hi @materialvision,

Thank you for your interest in our work.

The uvcgan2 data handling is controlled through the transform_test and transform_train parameters of the training configuration.

For example, the code below from the scripts/celeba/train_celeba_male2female_translation.py demonstrates the configuration that:
a. During Training: will resize images to make the smallest side to have size of 256 pixels, followed by taking a random crop of size 256x256 pixels.

uvcgan2/scripts/celeba/train_celeba_male2female_translation.py

Lines 71 to 75 in f741603

    
           'transform_train' : [ 
        
               'random-flip-horizontal', 
        
               { 'name' : 'resize',      'size' : 256, }, 
        
               { 'name' : 'random-crop', 'size' : 256, }, 
        
           ],

b. During Inference: will resize images to make the smallest side to have size of 256 pixels, followed by taking a center crop of size 256x256 pixels.

uvcgan2/scripts/celeba/train_celeba_male2female_translation.py

Lines 76 to 79 in f741603

    
           'transform_test' : [ 
        
               { 'name' : 'resize',      'size' : 256, }, 
        
               { 'name' : 'center-crop', 'size' : 256, }, 
        
           ],

These configuration options will allow uvcgan2 to handle images of any size.
If more complicated data transformation are required, a separate data loader can be created that will manually implement them.

Please, let me know if I should elaborate more on these points

materialvision · 2024-02-27T04:41:39Z

Thanks for your answer and great work. I just wanted to make things clearer for myself... I have tried to train a model with the following config:
'shape' : (3, 512, 512),
'transform_train' : [
'random-flip-horizontal',
{ 'name' : 'resize', 'size' : 2048, },
{ 'name' : 'random-crop', 'size' : 512, },
],
'transform_test' : [
{ 'name' : 'resize', 'size' : 2048, },
{ 'name' : 'center-crop', 'size' : 2048, },
],
} for domain in [ 'A', 'B' ]

but when testing with inference on images of size 2048x2048 I get the following error:

RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1024 but got size 16384 for tensor number 1 in the list.

Did I miss something here? Maybe it was wrong to adjust the "shape" argument?

Thanks again for your help.

usert5432 · 2024-02-28T03:11:26Z

I think, that is an expected outcome.

Maybe it was wrong to adjust the "shape" argument?

No, I think this is correct. The shape argument needs to match the crop size. If you intend to train the network on crops of size (512, 512), then the shape argument is correct.

The problem happens because the network was trained on random crops of size 512 ( { 'name' : 'random-crop', 'size' : 512, },), but the test crops ( { 'name' : 'center-crop', 'size' : 2048, },) are of size 2048, so the inference fails. To fix this, the transformations need to be adjusted a bit. The precise configuration of transformations depends on the exact usecase. Without knowing the details, I can only recommend to set all the size parameters to 512.

materialvision · 2024-02-28T06:48:46Z

Thank you. Yes, changing the center-crop of the test config to 512 does fix the error. But to explain the usecase, the goal was to train on 512px images (or crops) to keep the load on the gpu down and train faster, but infer on larger 2048 images (not sized down or cropped but keeping the full quality). My test project is a "de-blur / de-convolution" of images, so the model needs to work with larger resolutions.

Are there some "adjustments to the transformations" as you mention that I can do to achieve this?

Thanks again for guidance and advice.

usert5432 · 2024-02-29T01:00:11Z

but infer on larger 2048 images (not sized down or cropped but keeping the full quality).

Oh, I see now. Unfortunately, this is not possible with UVCGAN. CycleGAN uses FCN-type generator which can transparently work with images of any size. UVCGAN generator is not FCN, thus one cannot easily train it on crops, but infer on full images.

usert5432 self-assigned this Feb 26, 2024

usert5432 added the question Further information is requested label Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input/output size for inference / transfer #24

input/output size for inference / transfer #24

materialvision commented Feb 24, 2024

usert5432 commented Feb 26, 2024 •

edited

Loading

materialvision commented Feb 27, 2024 •

edited

Loading

usert5432 commented Feb 28, 2024

materialvision commented Feb 28, 2024

usert5432 commented Feb 29, 2024

input/output size for inference / transfer #24

input/output size for inference / transfer #24

Comments

materialvision commented Feb 24, 2024

usert5432 commented Feb 26, 2024 • edited Loading

materialvision commented Feb 27, 2024 • edited Loading

usert5432 commented Feb 28, 2024

materialvision commented Feb 28, 2024

usert5432 commented Feb 29, 2024

usert5432 commented Feb 26, 2024 •

edited

Loading

materialvision commented Feb 27, 2024 •

edited

Loading