Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support online cls model prediction #769

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added configs/cls/mobilenetv3/example_cls.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
50 changes: 50 additions & 0 deletions tools/infer/text/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,56 @@ Evaluation of the text spotting inference results on Ascend 910 with MindSpore 2
2. Unless extra inidication, all experiments are run with `--det_limit_type`="min" and `--det_limit_side`=720.
3. SVTR is run in mixed precision mode (amp_level=O2) since it is optimized for O2.

## Text Direction Classification
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块不用单独呈现,e2e的时候加上就行


To run a text Angle classification on an input image or a directory that contains multiple images, please execute
```shell
python tools/infer/text/predict_cls.py --image_dir {path_to_img or dir_to_imgs} --rec_algorithm MV3
```
When run, inference results are saved in `{args.draw_img_save_dir}/cls_results.txt`, where `--draw_img_save_dir` is the directory to save the results, which is the default setting for `./inference_results`. Here are some examples of the results.

- Text Angle Classification

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
</p>
<p align="center">
<em> word_01.png </em>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
</p>
<p align="center">
<em> word_02.png </em>
</p>

Classification results:
```text
word_01.png 0
word_02.png 180
```

**Note:**
- for more parameters and usage, please run ` python tools/infer/text/predict_cls.py -h ` or view ` tools/infer/text/config.py `
- Supports batch text Angle classification and single mode text Angle classification. Batch mode is enabled by default for speed. You can set the batch size with `--cls_batch_num`. You can also run in single image mode by setting `--cls_batch_mode` False.
- You can set the specified weight file by setting `--cls_model_dir`, or ignore this setting. The system will load the default weight file.
- The currently supported Angle classification network is mobilenet_v3. You can specify the model by setting `--cls_algorithm`, whose corresponding `--cls_amp_level` currently supports `O0`.

### Supports angle classification algorithms and networks

<center>

|**Algorithm name**|**Network name**|**Language**|
| :------: | :------: | :------: |
| MV3 | mobilenet_v3 | CH/EN|

</center>

The algorithm network is defined in `tools/infer/text/predict_cls.py`

Currently, the listed models support classification of 0 and 180 degrees. Other angles of classification will be supported soon.

## Argument List

All CLI argument definition can be viewed via `python tools/infer/text/predict_system.py -h` or reading `tools/infer/text/config.py`.
Expand Down
50 changes: 50 additions & 0 deletions tools/infer/text/README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,56 @@ python deploy/eval_utils/eval_pipeline.py --gt_path path/to/gt.txt --pred_path p

3、SVTR在混合精度模式下运行(amp_level=O2),因为它针对O2进行了优化。

## 文本方向分类器

要对输入图像或包含多个图像的目录运行文本角度分类,请执行
```shell
python tools/infer/text/predict_cls.py --image_dir {path_to_img or dir_to_imgs} --rec_algorithm MV3
```
运行后,推理结果保存在`{args.draw_img_save_dir}/cls_results.txt`中,其中`--draw_img_save_dir`是保存结果的目录,这是`./inference_results`的默认设置。下面是一些结果的例子。

- 文本角度分类

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t1.png" width=150 />
</p>
<p align="center">
<em> word_01.png </em>
</p>

<p align="center">
<img src="https://raw.githubusercontent.com/zhangjunlongtech/Material/refs/heads/main/CRNN_t2.png" width=150 />
</p>
<p align="center">
<em> word_02.png </em>
</p>

分类结果:
```text
word_01.png 0
word_02.png 180
```

**注意事项:**
- 有关更多参数说明和用法,请运行`python tools/infer/text/predict_cls.py -h`或查看`tools/infer/text/config.py`
- 支持批量文本角度分类和单模文本角度分类。默认情况下启用批处理模式以提高速度。您可以通过`--cls_batch_num`设置批量大小。您还可以通过设置`--cls_batch_mode` False在单一图像模式下运行。
- 可通过设置`--cls_model_dir` 来设置指定的权重文件,也可忽略此项设置,系统将加载默认权重文件。
- 当前支持的角度分类器的网络为mobilenet_v3,可通过设置`--cls_algorithm` 来指定模型,其对应的`cls_amp_level`当前支持O0。

### 支持的角度分类算法和网络

<center>

|**算法名称**|**网络名称**|**语言**|
| :------: | :------: | :------: |
| MV3 | mobilenet_v3 | 中/英|

</center>

算法网络在`tools/infer/text/predict_cls.py`中定义

目前,所列型号支持识别0与180度。对于其他角度的识别我们将很快予以支持。

## 参数列表

所有CLI参数定义都可以通过`python tools/infer/text/predict_system.py -h`或`tools/infer/text/config.py`查看。
Expand Down
30 changes: 30 additions & 0 deletions tools/infer/text/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,36 @@ def create_parser():
"--table_max_len", type=int, default=480, help="max length of the input image for table structure recognition."
)


parser.add_argument(
"--cls_algorithm",
type=str,
default="MV3",
choices=["MV3"],
help="classification algorithm",
)
parser.add_argument(
"--cls_amp_level",
type=str,
default="O0",
choices=["O0", "O1", "O2", "O3"],
help="Auto Mixed Precision level. This setting only works on GPU and Ascend",
)
parser.add_argument(
"--cls_model_dir",
type=str,
help="directory containing the classification model checkpoint best.ckpt"
"or path to a specific checkpoint file.",
)
parser.add_argument(
"--cls_batch_mode",
type=str2bool,
default=True,
help="Whether to run classification inference in batch-mode, which is faster but may degrade the accuracy "
"due to padding or resizing to the same shape.",
)
parser.add_argument("--cls_batch_num", type=int, default=8)

return parser


Expand Down
7 changes: 6 additions & 1 deletion tools/infer/text/postprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def __init__(self, task="det", algo="DB", rec_char_dict_path=None, **kwargs):
character_dict_path=rec_char_dict_path,
use_space_char=False,
)

else:
raise ValueError(f"No postprocess config defined for {algo}. Please check the algorithm name.")
elif task == "ser":
Expand All @@ -91,6 +91,8 @@ def __init__(self, task="det", algo="DB", rec_char_dict_path=None, **kwargs):
merge_no_span_structure=True,
box_shape="pad",
)
elif task == "cls":
postproc_cfg = dict(name="ClsPostprocess", label_list=["0", "180"])

postproc_cfg.update(kwargs)
self.task = task
Expand Down Expand Up @@ -155,3 +157,6 @@ def __call__(self, pred, data=None, **kwargs):
elif self.task == "table":
output = self.postprocess(pred, labels=kwargs.get("labels"))
return output
elif self.task == "cls":
output = self.postprocess(pred)
return output
Loading