Skip to content

Commit

Permalink
Merge pull request #21 from breezedeus/dev
Browse files Browse the repository at this point in the history
V1.0.0
  • Loading branch information
breezedeus authored Jul 25, 2019
2 parents 1d98ced + 02cd563 commit 3a4b8ec
Show file tree
Hide file tree
Showing 28 changed files with 773 additions and 169 deletions.
181 changes: 155 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,28 @@
中文版说明请见[中文README](./README_cn.md)



# Update 2019.07.25: release cnocr V1.0.0

`cnocr` `v1.0.0` is released, which is more efficient for prediction. **The new version of the model is not compatible with the previous version.** So if upgrading, please download the latest model file again. See below for the details (same as before).



Main changes are:

- **The new crnn model supports prediction for variable-width image files, so is more efficient for prediction.**
- Support fine-tuning the existing model with specific data.
- Fix bugs,such as `train accuracy` always `0`.
- Depended package `mxnet` is upgraded from `1.3.1` to `1.4.1`.



# cnocr

A python package for Chinese OCR with available trained models.
So it can be used directly after installed.

The accuracy of the current crnn model is about `98.7%`.
The accuracy of the current crnn model is about `98.8%`.

The project originates from our own ([爱因互动 Ein+](https://einplus.cn)) internal needs.
Thanks for the internal supports.
Expand All @@ -30,39 +48,87 @@ pip install cnocr
## Usage

The first time cnocr is used, the model files will be downloaded automatically from
[Dropbox](https://www.dropbox.com/s/7w8l3mk4pvkt34w/cnocr-models-v1.0.0.zip?dl=0) to `~/.cnocr`.

The zip file will be extracted and you can find the resulting model files in `~/.cnocr/models` by default.
In case the automatic download can't perform well, you can download the zip file manually
from [Baidu NetDisk](https://pan.baidu.com/s/1DWV3H2UWmzOU6d48UbTYVw) with extraction code `ss81`, and put the zip file to `~/.cnocr`. The code will do else.



### Predict

Three functions are provided for prediction.



#### 1. `CnOcr.ocr(img_fp)`

The function `cnOcr.ocr (img_fp)` can recognize texts in an image containing multiple lines of text (or single lines).



**Function Description**

- input parameter `img_fp`: image file path; or color image `mx.nd.NDArray` or `np.ndarray`, with shape `(height, width, 3)`, and the channels should be RGB formatted.
- return: `List(List(Char))`, such as: `[['第', '一', '行'], ['第', '二', '行'], ['第', '三', '行']]`.




**Usage Case**


```python
from cnocr import CnOcr
ocr = CnOcr()
res = ocr.ocr('examples/multi-line_cn1.png')
print("Predicted Chars:", res)
```

When you run the previous codes, the model files will be downloaded automatically from
[Dropbox](https://www.dropbox.com/s/5n09nxf4x95jprk/cnocr-models-v0.1.0.zip) to `~/.cnocr`.
The zip file will be extracted and you can find the resulting model files in `~/.cnocr/models` by default.
In case the automatic download can't perform well, you can download the zip file manually
from [Baidu NetDisk](https://pan.baidu.com/s/1s91985r0YBGbk_1cqgHa1Q) with extraction code `pg26`,
and put the zip file to `~/.cnocr`. The code will do else.
or:

```python
import mxnet as mx
from cnocr import CnOcr
ocr = CnOcr()
img_fp = 'examples/multi-line_cn1.png'
img = mx.image.imread(img_fp, 1)
res = ocr.ocr(img)
print("Predicted Chars:", res)
```

Try the predict command for [examples/multi-line_cn1.png](./examples/multi-line_cn1.png):
The previous codes can recognize texts in the image file [examples/multi-line_cn1.png](./examples/multi-line_cn1.png):

![examples/multi-line_cn1.png](./examples/multi-line_cn1.png)

The OCR results shoule be:

```bash
python scripts/cnocr_predict.py --file examples/multi-line_cn1.png
```
You will get:
```python
Predicted Chars: [['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '', '', '', '', '', '', '', '', ''], ['', '', '', '', '', '']]
Predicted Chars: [['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
['', '', '', '', '', '']]
```

#### 2. `CnOcr.ocr_for_single_line(img_fp)`

If you know that the image you're predicting contains only one line of text, function `CnOcr.ocr_for_single_line(img_fp)` can be used instead。Compared with `CnOcr.ocr()`, the result of `CnOcr.ocr_for_single_line()` is more reliable because the process of splitting lines is not required.



**Function Description**

### Predict for Single-line-characters Image
- input parameter `img_fp`: image file path; or color image `mx.nd.NDArray` or `np.ndarray`, with shape `[height, width]` or `[height, width, channel]`. The optional channel should be `1` (gray image) or `3` (color image).
- return: `List(Char)`, such as: `['你', '好']`.

If you know your image includes only one single line characters, you can use function `Cnocr.ocr_for_single_line()` instead of `Cnocr.ocr()`. `Cnocr.ocr_for_single_line()` should be more efficient.


**Usage Case**

```python
from cnocr import CnOcr
Expand All @@ -71,31 +137,94 @@ res = ocr.ocr_for_single_line('examples/rand_cn1.png')
print("Predicted Chars:", res)
```

With file [examples/multi-line_cn1.png](./examples/multi-line_cn1.png)
or:

```python
import mxnet as mx
from cnocr import CnOcr
ocr = CnOcr()
img_fp = 'examples/rand_cn1.png'
img = mx.image.imread(img_fp, 1)
res = ocr.ocr_for_single_line(img)
print("Predicted Chars:", res)
```


The previous codes can recognize texts in the image file [examples/rand_cn1.png](./examples/rand_cn1.png)

![examples/rand_cn1.png](./examples/rand_cn1.png)

you will get:
The OCR results shoule be:

```bash
Predicted Chars: ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
```

#### 3. `CnOcr.ocr_for_single_lines(img_list)`

Function `CnOcr.ocr_for_single_lines(img_list)` can predict a number of single-line-text image arrays batchly. Actually `CnOcr.ocr(img_fp)` and `CnOcr.ocr_for_single_line(img_fp)` both invoke `CnOcr.ocr_for_single_lines(img_list)` internally.



**Function Description**

- input parameter `img_list`: list of images, in which each element should be a line image array, with type `mx.nd.NDArray` or `np.ndarray`. Each element should be a tensor with values ranging from `0` to` 255`, and with shape `[height, width]` or `[height, width, channel]`. The optional channel should be `1` (gray image) or `3` (color image).
- return: `List(List(Char))`, such as: `[['第', '一', '行'], ['第', '二', '行'], ['第', '三', '行']]`.



Usage Case**

```python
Predicted Chars: ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
import mxnet as mx
from cnocr import CnOcr
ocr = CnOcr()
img_fp = 'examples/multi-line_cn1.png'
img = mx.image.imread(img_fp, 1).asnumpy()
line_imgs = line_split(img, blank=True)
line_img_list = [line_img for line_img, _ in line_imgs]
res = ocr.ocr_for_single_lines(line_img_list)
print("Predicted Chars:", res)
```

More usage cases can be found at [tests/test_cnocr.py](./tests/test_cnocr.py).


### Using the Script

```bash
python scripts/cnocr_predict.py --file examples/multi-line_cn1.png
```



### (No NECESSARY) Train

You can use the package without any train. But if you really really want to train your own models,
follow this:
You can use the package without any train. But if you really really want to train your own models, follow this:

```bash
python scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr
```



Fine-tuning the model with specific data from existing models is also supported. Please refer to the following command:

```bash
python scripts/cnocr_train.py --cpu 2 --num_proc 4 --loss ctc --dataset cn_ocr --load_epoch 20
```



More references can be found at [scripts/run_cnocr_train.sh](./scripts/run_cnocr_train.sh).



## Future Work
* [x] support multi-line-characters recognition
* Support space recognition
* Bugfixes
* Add Tests
* Maybe use no symbol to rewrite the model
* Try other models such as DenseNet, ResNet

* [x] support multi-line-characters recognition (`Done`)
* [x] crnn model supports prediction for variable-width image files (`Done`)
* [x] Add Unit Tests (`Doing`)
* [x] Bugfixes (`Doing`)
* [ ] Support space recognition (Tried, but not successful for now )
* [ ] Try other models such as DenseNet, ResNet
Loading

0 comments on commit 3a4b8ec

Please sign in to comment.