中文 | English
Please do NOT use this project for text censorship!
---
Major Changes:
- Based on RapidOCR, integrate the latest version of PPOCRv4 OCR models, providing more model options
- Add support for PP-OCRv4 recognition models, including standard and server versions
- Modify the implementation of reading files to support Chinese paths on Windows
- Fix bug: When using multiple processes, the transform_func cannot be serialized
- Fix bug: Compatible with albumentations=1.4.*
Major Changes:
- All models have been retrained, offering higher accuracy than the previous version.
- Models are categorized into several types based on usage scenarios (see Pre-trained Recognition Models):
scene
: For scene images, suitable for recognizing text in general photography. Models in this category start withscene-
, such as thescene-densenet_lite_136-gru
model.doc
: For document images, suitable for recognizing text in regular document screenshots, like scanned book pages. Models in this category start withdoc-
, such as thedoc-densenet_lite_136-gru
model.number
: Specifically for recognizing only numbers (able to recognize only the ten digits0~9
), suitable for scenarios like bank card numbers, ID numbers, etc. Models in this category start withnumber-
, such as thenumber-densenet_lite_136-gru
model.general
: For general scenarios, suitable for images without a clear preference. Models in this category do not have a specific prefix and maintain the same naming convention as older versions, such as thedensenet_lite_136-gru
model.
Note
⚠️ : The above descriptions are for reference only. It is recommended to choose models based on actual performance. - Two larger series of models have been added:
*-densenet_lite_246-gru_base
: Initially available for Planet of Knowledge CnOCR/CnSTD Private Group members, to be open-sourced for free after one month.*-densenet_lite_666-gru_large
: Pro models, available for use after purchase. Purchase link can be found at https://ocr.lemonsqueezy.com.
For more details, please refer to: CnOCR V2.3 New Release: Better, More, and Larger Models | Breezedeus.com.
CnOCR is an Optical Character Recognition (OCR) toolkit for Python 3. It supports recognition of common characters in English and numbers, Simplified Chinese, Traditional Chinese (some models), and vertical text recognition. It comes with 20+ well-trained models for different application scenarios and can be used directly after installation. Also, CnOCR provides simple training commands for users to train their own models. Welcome to join the WeChat contact group.
The author also maintains Planet of Knowledge CnOCR/CnSTD Private Group, where questions are more likely to receive prompt responses from the author. You are welcome to join. Knowledge Planet Members can enjoy the following benefits:
- Access to download some non-open source paid models for free.
- A 20% discount on the purchase of all other paid models.
- Rapid responses from the author to various difficulties encountered during usage.
- The author offers two free training services per month using unique data.
- The group will continuously release exclusive materials related to CnOCR/CnSTD.
- The group will regularly publish the latest research materials related to OCR, STD, CV, and more.
See CnOCR online documentation , in Chinese.
Starting from V2.2, CnOCR internally uses the text detection engine CnSTD for text detection and positioning. So CnOCR V2.2 can recognize not only typographically simple printed text images, such as screenshot images, scanned copies, etc., but also scene text in general images.
Here are some examples of usages for different scenarios.
Just use default values for all parameters. If you find that the result is not good enough, try different parameters more to see the effect, and usually you will end up with a more desirable accuracy.
from cnocr import CnOcr
img_fp = './docs/examples/huochepiao.jpeg'
ocr = CnOcr() # Use default values for all parameters
out = ocr.ocr(img_fp)
print(out)
Recognition results:
Although Chinese detection and recognition models can also recognize English, detectors and recognizers trained specifically for English texts tend to be more accurate. For English-only application scenarios, it is recommended to use the English detection model det_model_name='en_PP-OCRv3_det'
and the English recognition model rec_model_name='en_PP-OCRv3'
from PaddleOCR (also called ppocr).
from cnocr import CnOcr
img_fp = './docs/examples/en_book1.jpeg'
ocr = CnOcr(det_model_name='en_PP-OCRv3_det', rec_model_name='en_PP-OCRv3')
out = ocr.ocr(img_fp)
print(out)
Recognition results:
For typographically simple typographic text images, such as screenshot images, scanned images, etc., you can use det_model_name='naive_det'
, which is equivalent to not using a text detection model, but using simple rules for branching.
Note
det_model_name='naive_det'
is equivalent to CnOCR versions beforeV2.2
(V2.0.*
,V2.1.*
).
The biggest advantage of using det_model_name='naive_det'
is that the speech is fast and the disadvantage is that it is picky about images. How do you determine if you should use the detection model 'naive_det'
? The easiest way is to take your application image and try the effect, if it works well, use it, if not, don't.
from cnocr import CnOcr
img_fp = './docs/examples/multi-line_cn1.png'
ocr = CnOcr(det_model_name='naive_det')
out = ocr.ocr(img_fp)
print(out)
Recognition results:
Chinese recognition model rec_model_name='ch_PP-OCRv3'
from ppocr is used for recognition.
from cnocr import CnOcr
img_fp = './docs/examples/shupai.png'
ocr = CnOcr(rec_model_name='ch_PP-OCRv3')
out = ocr.ocr(img_fp)
print(out)
Recognition results:
Use the traditional Chinese recognition model from ppocr rec_model_name='english_cht_PP-OCRv3'
for recognition.
from cnocr import CnOcr
img_fp = './docs/examples/fanti.jpg'
ocr = CnOcr(rec_model_name='chinese_cht_PP-OCRv3') # use the traditional Chinese recognition model
out = ocr.ocr(img_fp)
print(out)
When using this model, please note the following issues:
-
The recognition accuracy is average and not very good.
-
The recognition of punctuation, English and numbers is not good except for traditional Chinese characters.
-
This model does not support the recognition of vertical text.
If it is clear that the image to be recognized is a single line text image (as shown below), you can use the class function CnOcr.ocr_for_single_line()
for recognition. This saves the time of text detection and will be more than twice as fast.
The code is as follows:
from cnocr import CnOcr
img_fp = './docs/examples/helloworld.jpg'
ocr = CnOcr()
out = ocr.ocr_for_single_line(img_fp)
print(out)
- Recognition of Vaccine App Screenshot
- Recognition of ID Card
- Recognition of Restaurant Ticket
Well, one line of command is enough if it goes well.
$ pip install cnocr[ort-cpu]
If you are using a GPU environment with an ONNX model, please install using the following command:
$ pip install cnocr[ort-gpu]
If you want to train new models with your own data, please install using the following command:
$ pip install cnocr[dev]
If the installation is slow, you can specify a domestic installation source, such as using the Aliyun source:
$ pip install cnocr -i https://mirrors.aliyun.com/pypi/simple
Note
Please use Python3 (3.6 and later should work), I haven't tested if it's okay under Python2.
More instructions can be found in the installation documentation (in Chinese).
Warning
If you have never installed
PyTorch
,OpenCV
python packages on your computer, you may encounter problems with the first installation, but they are usually common problems that can be solved by Baidu/Google.
You can directly pull the image with CnOCR installed from Docker Hub: Docker Hub .
$ docker pull breezedeus/cnocr:latest
More instructions can be found in the installation documentation (in Chinese).
Refer to CnSTD for details.
det_model_name |
PyTorch Version | ONNX Version | Model original source | Model File Size | Supported Language | Whether to support vertical text detection |
---|---|---|---|---|---|---|
en_PP-OCRv3_det | X | √ | ppocr | 2.3 M | English、Numbers | √ |
db_shufflenet_v2 | √ | X | cnocr | 18 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
db_shufflenet_v2_small | √ | X | cnocr | 12 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
db_mobilenet_v3 | √ | X | cnocr | 16 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
db_mobilenet_v3_small | √ | X | cnocr | 7.9 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
db_resnet34 | √ | X | cnocr | 86 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
db_resnet18 | √ | X | cnocr | 47 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
ch_PP-OCRv4_det | X | √ | ppocr | 4.5 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
ch_PP-OCRv4_det_server | X | √ | ppocr | 108 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
ch_PP-OCRv3_det | X | √ | ppocr | 2.3 M | Simplified Chinese, Traditional Chinese, English, Numbers | √ |
Compared to the CnOCR V2.2.* versions, most models in V2.3 have been retrained and fine-tuned, offering higher accuracy than the older versions. Additionally, two series of models with larger parameter volumes have been added:
*-densenet_lite_246-gru_base
: Currently available for Knowledge Planet CnOCR/CnSTD Private Group members, it will be open-sourced afterward.*-densenet_lite_666-gru_large
: Pro models, available for use after purchase. The purchase link: https://ocr.lemonsqueezy.com.
Models in V2.3 are categorized into the following types based on usage scenarios:
scene
: For scene images, suitable for recognizing text in general photography. Models in this category start withscene-
, such as thescene-densenet_lite_136-gru
model.doc
: For document images, suitable for recognizing text in regular document screenshots, like scanned book pages. Models in this category start withdoc-
, such as thedoc-densenet_lite_136-gru
model.number
: Specifically for recognizing only numbers (able to recognize only the ten digits0~9
), suitable for scenarios like bank card numbers, ID numbers, etc. Models in this category start withnumber-
, such as thenumber-densenet_lite_136-gru
model.general
: For general scenarios, suitable for images without a clear preference. Models in this category do not have a specific prefix and maintain the same naming convention as older versions, such as thedensenet_lite_136-gru
model.
Note
⚠️ : The above descriptions are for reference only. It is recommended to choose models based on actual performance.
For more details, see: Available Models.
rec_model_name |
PyTorch Version | ONNX Version | Model original source | Model File Size | Supported Language | Whether to support vertical text recognition |
---|---|---|---|---|---|---|
densenet_lite_136-gru 🆕 | √ | √ | cnocr | 12 M | Simplified Chinese, English, Numbers | X |
scene-densenet_lite_136-gru 🆕 | √ | √ | cnocr | 12 M | Simplified Chinese, English, Numbers | X |
doc-densenet_lite_136-gru 🆕 | √ | √ | cnocr | 12 M | Simplified Chinese, English, Numbers | X |
densenet_lite_246-gru_base 🆕 (Planet Members Only) |
√ | √ | cnocr | 25 M | Simplified Chinese, English, Numbers | X |
scene-densenet_lite_246-gru_base 🆕 (Planet Members Only) |
√ | √ | cnocr | 25 M | Simplified Chinese, English, Numbers | X |
doc-densenet_lite_246-gru_base 🆕 (Planet Members Only) |
√ | √ | cnocr | 25 M | Simplified Chinese, English, Numbers | X |
densenet_lite_666-gru_large 🆕 (Purchase Link) |
√ | √ | cnocr | 82 M | Simplified Chinese, English, Numbers | X |
scene-densenet_lite_666-gru_large 🆕 (Purchase Link) |
√ | √ | cnocr | 82 M | Simplified Chinese, English, Numbers | X |
doc-densenet_lite_666-gru_large 🆕 (Purchase Link) |
√ | √ | cnocr | 82 M | Simplified Chinese, English, Numbers | X |
number-densenet_lite_136-fc 🆕 | √ | √ | cnocr | 2.7 M | Pure Numeric (contains only the ten digits 0~9 ) |
X |
number-densenet_lite_136-gru 🆕 (Planet Members Only) |
√ | √ | cnocr | 5.5 M | Pure Numeric (contains only the ten digits 0~9 ) |
X |
number-densenet_lite_666-gru_large 🆕 (Purchase Link) |
√ | √ | cnocr | 56 M | Pure Numeric (contains only the ten digits 0~9 ) |
X |
ch_PP-OCRv4 | X | √ | ppocr | 10 M | Simplified Chinese, English, Numbers | √ |
ch_PP-OCRv4_server | X | √ | ppocr | 86 M | Simplified Chinese, English, Numbers | √ |
ch_PP-OCRv3 | X | √ | ppocr | 10 M | Simplified Chinese, English, Numbers | √ |
ch_ppocr_mobile_v2.0 | X | √ | ppocr | 4.2 M | Simplified Chinese, English, Numbers | √ |
en_PP-OCRv3 | X | √ | ppocr | 8.5 M | English、Numbers | √ |
en_PP-OCRv4 | X | √ | ppocr | 8.6 M | English、Numbers | √ |
en_number_mobile_v2.0 | X | √ | ppocr | 1.8 M | English、Numbers | √ |
chinese_cht_PP-OCRv3 | X | √ | ppocr | 11 M | Traditional Chinese, English, Numbers | X |
japan_PP-OCRv3 | X | √ | ppocr | 9.6 M | Japanese, English, Numbers | √ |
korean_PP-OCRv3 | X | √ | ppocr | 9.4 M | Korean, English, Numbers | √ |
latin_PP-OCRv3 | X | √ | ppocr | 8.6 M | Latin, English, Numbers | √ |
arabic_PP-OCRv3 | X | √ | ppocr | 8.6 M | Arabic, English, Numbers | √ |
- Support for images containing multiple lines of text (
Done
) - crnn model support for variable length prediction, improving flexibility (since
V1.0.0
) - Refine test cases (
Doing
) - Fix bugs (The code is still messy.) (
Doing
) - Support
space
recognition (sinceV1.1.0
) - Try new models like DenseNet to further improve recognition accuracy (since
V1.1.0
) - Optimize the training set to remove unreasonable samples; based on this, retrain each model
- Change from MXNet to PyTorch architecture (since
V2.0.0
) - Train more efficient models based on PyTorch
- Support text recognition in column format (since
V2.1.2
) - Integration with CnSTD (since
V2.2
) - Further optimization of model accuracy
- Support more application scenarios, such as formula recognition, table recognition, layout analysis, etc.
It is not easy to maintain and evolve the project, so if it is helpful to you, please consider offering the author a cup of coffee 🥤.
Official code base: https://github.com/breezedeus/cnocr. Please cite it properly.