Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How did you happen use the data from SynthText #72

Open
mohammedayub44 opened this issue Apr 8, 2021 · 6 comments
Open

How did you happen use the data from SynthText #72

mohammedayub44 opened this issue Apr 8, 2021 · 6 comments

Comments

@mohammedayub44
Copy link

Hi,

Sorry for the naïve question, I have downloaded the synthetic images (~38GB), depth maps (15GB), segmentation maps(7GB), and raw images (~9GB) from the SynthText repo. I'm wondering how did you convert all of these into a format accepted by you network (from the readme which seems to be in YOLO or ICDAR format.)

Aiming to runs some trails for just English using your E2E network.

Thanks in advance !

@MichalBusta
Copy link
Owner

Hi Mohammed,
you may find some scripts here:

@mohammedayub44
Copy link
Author

Thanks I'll check it out and let you know.

@mohammedayub44
Copy link
Author

mohammedayub44 commented Apr 13, 2021

Using some inspiration from your conversion scripts and SynthText, I managed to create a train folder (~35gb) that contains gt_image_name.txt and image_name.jpg, as suggested by your train readme file for ICDAR format. I'm also planning to create a crop folder which contain all crops (jpg's) and txt file linking to the words.

I'm slightly confused how to initiate the training with the correct folder locations:

  1. For -train_list
    a) Should I also create another file like sample_train_data/MLT/trainMLT.txt which list all image locations.?
    b) I'm guessing the done folder get's populated during training, I don't have to create and populate is prior to training ?
    c) can I skip a) and b) and just give my train folder location

  2. FOR -ocr_feed_list - this is simple, I can directly give the gt.txt that I create from my crop folder (no doubts here).

  3. For -model - If I skip giving this parameter, my guess is it trains a model form scratch rather than finetuning on already trained one ?

Sorry for the basic questions.

Thanks in advance !

@MichalBusta
Copy link
Owner

Using some inspiration from your conversion scripts and SynthText, I managed to create a train folder (~35gb) that contains gt_image_name.txt and image_name.jpg, as suggested by your train readme file for ICDAR format. I'm also planning to create a crop folder which contain all crops (jpg's) and txt file linking to the words.

I'm slightly confused how to initiate the training with the correct folder locations:

  1. For -train_list
    a) Should I also create another file like sample_train_data/MLT/trainMLT.txt which list all image locations.?

yes - I would recommend at least to read data feeding script - it is simple python, and all errors
are usually caused by worng data feeding

b) I'm guessing the done folder get's populated during training, I don't have to create and populate is prior to training ?

done folder is just folder - you can ignore it.

c) can I skip a) and b) and just give my train folder location

no, you have to provide a list - if your data are clean, you can dump it with one command, something like: ls -R *.png >> list.txt

  1. FOR -ocr_feed_list - this is simple, I can directly give the gt.txt that I create from my crop folder (no doubts here).
  2. For -model - If I skip giving this parameter, my guess is it trains a model form scratch rather than finetuning on already trained one ?

yes.

Sorry for the basic questions.
Thanks in advance !
you are welcome.

@mohammedayub44
Copy link
Author

Thanks. I'll check and let you know.

@mohammedayub44
Copy link
Author

mohammedayub44 commented Apr 26, 2021

@MichalBusta Got the training to work correctly as per your suggestion. Model seems to be doing okay but not great.
I'm wondering is this because of input_size parameter ?
I see that in data_gen.py the input images are cropped and rescaled to input_size which by default is 512.

...
   resize_h = input_size
   resize_w = input_size
...
  scaled = cut_image(im,  (resize_w, resize_w), text_polys)

However all my input images are 450x600 size. Do I have to resize all my synthetic images to one particular height and width before starting to train ? I was hoping not.

I can share you the metrics and results to be specific.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants