Skip to content
This repository has been archived by the owner on Sep 1, 2021. It is now read-only.

关于images.csv的问题 #4

Open
qimingfeijin opened this issue Feb 18, 2019 · 5 comments
Open

关于images.csv的问题 #4

qimingfeijin opened this issue Feb 18, 2019 · 5 comments

Comments

@qimingfeijin
Copy link

运行download_images.py报错,错误提示为No such file or directory: 'images.csv',请问我该怎么解决

@lars76
Copy link
Owner

lars76 commented Feb 19, 2019

Hi,

instead of download_images.py, just use the COCO dataset. It is much smaller and for OCR you actually don't need so many images. You can directly download 5K images here: http://images.cocodataset.org/zips/val2017.zip. Then you don't need download_images.py

Hope this helps.

@qimingfeijin
Copy link
Author

感谢你的帮助与分享。我想做中文的文本检测,需要一些中文的图片训练和测试,请问你的中文数据集是在哪里下载的?

@lars76
Copy link
Owner

lars76 commented Feb 20, 2019

I generated the dataset myself by using a subtitle file (srt) and then doing manual annotation. I don't think that there are any datasets that you can download.

Most papers actually generate their own training/test images by creating random text on images. Look at this github project https://github.com/JarveeLee/SynthText_Chinese_version and the corresponding paper is described here https://blog.csdn.net/u010167269/article/details/52389676. I tried something similar myself and it produced equal or better results than a real dataset.

@qimingfeijin
Copy link
Author

我明白了,谢谢你的分享

@wushilian
Copy link

@lars76 can you share your method for synthesise dataset?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants