You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have difficulty in understanding the parameters of affine_grid.
The Corresponding line number is 233,234 in train.py
As per my understanding, the following things are happening in function process_boxes to which the above line belongs.
1. Localization part of the network has already predicted all the BB of scene text.
2. While iterating through all BB predicted STN (Spatial Transformer Network)is used to crop the specific text word only from the entire image.
3. cropped images are passed through OCR .
4. The OCR loss is backpropagated
The affine_grid which is part of STR has parameters theta (line 233 in train.py)
This theta is 2*3 matrix where the last column is the center coordinate of the predicted crop, remaining 1st two columns help to do transformations like rotation, etc.
When the above part is used I found cropped image gets distorted due to affine_grid transformation and this may affect the ocr output.
What I want is only cropped text image without any transformation using STN (Affine_grid), I have tried following values for the theta matrix
[ 1 0 predX
0 1 predY ]
Where predX and predY are centres of predicted bounding boxes.
After applying this also crops are a few times unrecognizable or look significantly different.
So inshort can you suggest the parameters of theta
such that it only crops the predicted BB by network
without any transformation.
The text was updated successfully, but these errors were encountered:
AniketGurav
changed the title
Affine grid parameters
Affine grid parameters to crop predicted Bounding Boxes without transformation
Sep 7, 2022
I have difficulty in understanding the parameters of affine_grid.
The Corresponding line number is 233,234 in train.py
As per my understanding, the following things are happening in function process_boxes to which the above line belongs.
1. Localization part of the network has already predicted all the BB of scene text.
2. While iterating through all BB predicted STN (Spatial Transformer Network)is used to crop the specific text word only from the entire image.
3. cropped images are passed through OCR .
4. The OCR loss is backpropagated
The affine_grid which is part of STR has parameters theta (line 233 in train.py)
This theta is 2*3 matrix where the last column is the center coordinate of the predicted crop, remaining 1st two columns help to do transformations like rotation, etc.
When the above part is used I found cropped image gets distorted due to affine_grid transformation and this may affect the ocr output.
What I want is only cropped text image without any transformation using STN (Affine_grid), I have tried following values for the theta matrix
[ 1 0 predX
0 1 predY ]
Where predX and predY are centres of predicted bounding boxes.
After applying this also crops are a few times unrecognizable or look significantly different.
So inshort can you suggest the parameters of theta
such that it only crops the predicted BB by network
without any transformation.
The text was updated successfully, but these errors were encountered: