diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000000..e69de29bb2 diff --git a/404.html b/404.html new file mode 100644 index 0000000000..c39ce55ef9 --- /dev/null +++ b/404.html @@ -0,0 +1,4849 @@ + + + + + + + + + + + + + + + + + + + + + + + PaddleOCR Documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ +

404 - Not found

+ +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/FAQ.html b/FAQ.html new file mode 100644 index 0000000000..5430af6b0a --- /dev/null +++ b/FAQ.html @@ -0,0 +1,8624 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FAQ - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + + + + + + + + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

FAQ

+ +
+

恭喜你发现宝藏!

+
+

PaddleOCR收集整理了自从开源以来在issues和用户群中的常见问题并且给出了简要解答,旨在为OCR的开发者提供一些参考,也希望帮助大家少走一些弯路。

+

其中通用问题一般是初次接触OCR相关算法时用户会提出的问题,在1.5 垂类场景实现思路中总结了如何在一些具体的场景中确定技术路线进行优化。PaddleOCR常见问题是开发者在使用PaddleOCR之后可能会遇到的问题也是PaddleOCR实践过程中的避坑指南。

+

同时PaddleOCR也会在review issue的过程中添加 good issuegood first issue 标签,但这些问题可能不会被立刻补充在FAQ文档里,开发者也可对应查看。我们也非常希望开发者能够帮助我们将这些内容补充在FAQ中。

+

OCR领域大佬众多,本文档回答主要依赖有限的项目实践,难免挂一漏万,如有遗漏和不足,也希望有识之士帮忙补充和修正,万分感谢。

+

1. 通用问题

+

1.1 检测

+

Q: 基于深度学习的文字检测方法有哪几种?各有什么优缺点?

+

A:常用的基于深度学习的文字检测方法一般可以分为基于回归的、基于分割的两大类,当然还有一些将两者进行结合的方法。

+

(1)基于回归的方法分为box回归和像素值回归。a. 采用box回归的方法主要有CTPN、Textbox系列和EAST,这类算法对规则形状文本检测效果较好,但无法准确检测不规则形状文本。 b. 像素值回归的方法主要有CRAFT和SA-Text,这类算法能够检测弯曲文本且对小文本效果优秀但是实时性能不够。

+

(2)基于分割的算法,如PSENet,这类算法不受文本形状的限制,对各种形状的文本都能取得较好的效果,但是往往后处理比较复杂,导致耗时严重。目前也有一些算法专门针对这个问题进行改进,如DB,将二值化进行近似,使其可导,融入训练,从而获取更准确的边界,大大降低了后处理的耗时。

+

1.2 识别

+

Q: PaddleOCR提供的文本识别算法包括哪些?

+

A:PaddleOCR主要提供五种文本识别算法,包括CRNN\StarNet\RARE\Rosetta和SRN, 其中CRNN\StarNet和Rosetta是基于ctc的文字识别算法,RARE是基于attention的文字识别算法;SRN为百度自研的文本识别算法,引入了语义信息,显著提升了准确率。 详情可参照如下页面: 文本识别算法

+

Q: 文本识别方法CRNN关键技术有哪些?

+

A:CRNN 关键技术包括三部分。(1)CNN提取图像卷积特征。(2)深层双向LSTM网络,在卷积特征的基础上继续提取文字序列特征。(3)Connectionist Temporal Classification(CTC),解决训练时字符无法对齐的问题。

+

Q: 对于中文行文本识别,CTC和Attention哪种更优?

+

A:(1)从效果上来看,通用OCR场景CTC的识别效果优于Attention,因为带识别的字典中的字符比较多,常用中文汉字三千字以上,如果训练样本不足的情况下,对于这些字符的序列关系挖掘比较困难。中文场景下Attention模型的优势无法体现。而且Attention适合短语句识别,对长句子识别比较差。

+

(2)从训练和预测速度上,Attention的串行解码结构限制了预测速度,而CTC网络结构更高效,预测速度上更有优势。

+

Q: 弯曲形变的文字识别需要怎么处理?TPS应用场景是什么,是否好用?

+

A:(1)在大多数情况下,如果遇到的场景弯曲形变不是太严重,检测4个顶点,然后直接通过仿射变换转正识别就足够了。

+

(2)如果不能满足需求,可以尝试使用TPS(Thin Plate Spline),即薄板样条插值。TPS是一种插值算法,经常用于图像变形等,通过少量的控制点就可以驱动图像进行变化。一般用在有弯曲形变的文本识别中,当检测到不规则的/弯曲的(如,使用基于分割的方法检测算法)文本区域,往往先使用TPS算法对文本区域矫正成矩形再进行识别,如,STAR-Net、RARE等识别算法中引入了TPS模块。

+
+

Warning:TPS看起来美好,在实际应用时经常发现并不够鲁棒,并且会增加耗时,需要谨慎使用。

+
+

1.3 端到端

+

Q: 请问端到端的pgnet相比于DB+CRNN在准确率上有优势吗?或者是pgnet最擅长的场景是什么场景呢?

+

A:pgnet是端到端算法,检测识别一步到位,不用分开训练2个模型,也支持弯曲文本的识别,但是在中文上的效果还没有充分验证;db+crnn的验证更充分,应用相对成熟,常规非弯曲的文本都能解的不错。

+

Q: 目前OCR普遍是二阶段,端到端的方案在业界落地情况如何?

+

A:端到端在文字分布密集的业务场景,效率会比较有保证,精度的话看自己业务数据积累情况,如果行级别的识别数据积累比较多的话two-stage会比较好。百度的落地场景,比如工业仪表识别、车牌识别都用到端到端解决方案。

+

Q: 二阶段的端到端的场景文本识别方法的不足有哪些?

+

A:这类方法一般需要设计针对ROI提取特征的方法,而ROI操作一般比较耗时。

+

Q: AAAI 2021最新的端到端场景文本识别PGNet算法有什么特点?

+

A:PGNet不需要字符级别的标注,NMS操作以及ROI操作。同时提出预测文本行内的阅读顺序模块和基于图的修正模块来提升文本识别效果。该算法是百度自研,近期会在PaddleOCR开源。

+

1.4 评估方法

+

Q: OCR领域常用的评估指标是什么?

+

A:对于两阶段的可以分开来看,分别是检测和识别阶段

+

(1)检测阶段:先按照检测框和标注框的IOU评估,IOU大于某个阈值判断为检测准确。这里检测框和标注框不同于一般的通用目标检测框,是采用多边形进行表示。检测准确率:正确的检测框个数在全部检测框的占比,主要是判断检测指标。检测召回率:正确的检测框个数在全部标注框的占比,主要是判断漏检的指标。

+

(2)识别阶段: +字符识别准确率,即正确识别的文本行占标注的文本行数量的比例,只有整行文本识别对才算正确识别。

+

(3)端到端统计: +端对端召回率:准确检测并正确识别文本行在全部标注文本行的占比; +端到端准确率:准确检测并正确识别文本行在 检测到的文本行数量 的占比; +准确检测的标准是检测框与标注框的IOU大于某个阈值,正确识别的检测框中的文本与标注的文本相同。

+

1.5 垂类场景实现思路

+

Q:背景干扰的文字(如印章盖到落款上,需要识别落款或者印章中的文字),如何识别?

+

A:(1)在人眼确认可识别的条件下,对于背景有干扰的文字,首先要保证检测框足够准确,如果检测框不准确,需要考虑是否可以通过过滤颜色等方式对图像预处理并且增加更多相关的训练数据;在识别的部分,注意在训练数据中加入背景干扰类的扩增图像。

+

(2)如果MobileNet模型不能满足需求,可以尝试ResNet系列大模型来获得更好的效果。

+

Q:请问对于图片中的密集文字,有什么好的处理办法吗?

+

A:可以先试用预训练模型测试一下,例如DB+CRNN,判断下密集文字图片中是检测还是识别的问题,然后针对性的改善。还有一种是如果图象中密集文字较小,可以尝试增大图像分辨率,对图像进行一定范围内的拉伸,将文字稀疏化,提高识别效果。

+

Q: 文本行较紧密的情况下如何准确检测?

+

A:使用基于分割的方法,如DB,检测密集文本行时,最好收集一批数据进行训练,并且在训练时,并将生成二值图像的shrink_ratio参数调小一些。

+

Q:对于一些在识别时稍微模糊的文本,有没有一些图像增强的方式?

+

A:在人类肉眼可以识别的前提下,可以考虑图像处理中的均值滤波、中值滤波或者高斯滤波等模糊算子尝试。也可以尝试从数据扩增扰动来强化模型鲁棒性,另外新的思路有对抗性训练和超分SR思路,可以尝试借鉴。但目前业界尚无普遍认可的最优方案,建议优先在数据采集阶段增加一些限制提升图片质量。

+

Q:低像素文字或者字号比较小的文字有什么超分辨率方法吗

+

A:超分辨率方法分为传统方法和基于深度学习的方法。基于深度学习的方法中,比较经典的有SRCNN,另外CVPR2020也有一篇超分辨率的工作可以参考文章:Unpaired Image Super-Resolution using Pseudo-Supervision,但是没有充分的实践验证过,需要看实际场景下的效果。

+

Q:对于一些尺寸较大的文档类图片,在检测时会有较多的漏检,怎么避免这种漏检的问题呢?

+

A:PaddleOCR中在图像最长边大于960时,将图像等比例缩放为长边960的图像再进行预测,对于这种图像,可以通过修改det_limit_side_len,增大检测的最长边:tools/infer/utility.py#L42

+

Q:文档场景中,使用DB模型会出现整行漏检的情况应该怎么解决?

+

A:可以在预测时调小 det_db_box_thresh 阈值,默认为0.5, 可调小至0.3观察效果。

+

Q: 弯曲文本(如略微形变的文档图像)漏检问题

+

A: db后处理中计算文本框平均得分时,是求rectangle区域的平均分数,容易造成弯曲文本漏检,已新增求polygon区域的平均分数,会更准确,但速度有所降低,可按需选择,在相关pr中可查看可视化对比效果。该功能通过参数 det_db_score_mode进行选择,参数值可选[fast(默认)、slow],fast对应原始的rectangle方式,slow对应polygon方式。感谢用户buptlihangpr帮助解决该问题🌹。

+

Q:如何识别文字比较长的文本?

+

A:在中文识别模型训练时,并不是采用直接将训练样本缩放到[3,32,320]进行训练,而是先等比例缩放图像,保证图像高度为32,宽度不足320的部分补0,宽高比大于10的样本直接丢弃。预测时,如果是单张图像预测,则按上述操作直接对图像缩放,不做宽度320的限制。如果是多张图预测,则采用batch方式预测,每个batch的宽度动态变换,采用这个batch中最长宽度。

+

Q:如何识别带空格的英文行文本图像?

+

A:空格识别可以考虑以下两种方案:

+

(1)优化文本检测算法。检测结果在空格处将文本断开。这种方案在检测数据标注时,需要将含有空格的文本行分成好多段。

+

(2)优化文本识别算法。在识别字典里面引入空格字符,然后在识别的训练数据中,如果用空行,进行标注。此外,合成数据时,通过拼接训练数据,生成含有空格的文本。

+

Q:弯曲文本有试过opencv的TPS进行弯曲校正吗?

+

A:opencv的tps需要标出上下边界对应的点,这个点很难通过传统方法或者深度学习方法获取。PaddleOCR里StarNet网络中的tps模块实现了自动学点,自动校正,可以直接尝试这个。

+

Q: 如何识别招牌或者广告图中的艺术字?

+

A: 招牌或者广告图中的艺术字是文本识别一个非常有挑战性的难题,因为艺术字中的单字和印刷体相比,变化非常大。如果需要识别的艺术字是在一个词典列表内,可以将改每个词典认为是一个待识别图像模板,通过通用图像检索识别系统解决识别问题。可以尝试使用PaddleClas的图像识别系统。

+

Q: 印章如何识别

+

A:1. 使用带tps的识别网络或abcnet,2.使用极坐标变换将图片拉平之后使用crnn

+

Q: 使用预训练模型进行预测,对于特定字符识别识别效果较差,怎么解决?

+

A: 由于我们所提供的识别模型是基于通用大规模数据集进行训练的,部分字符可能在训练集中包含较少,因此您可以构建特定场景的数据集,基于我们提供的预训练模型进行微调。建议用于微调的数据集中,每个字符出现的样本数量不低于300,但同时需要注意不同字符的数量均衡。具体可以参考:微调。

+

Q: 在使用训练好的识别模型进行预测的时候,发现有很多重复的字,这个怎么解决呢?

+

A:可以看下训练的尺度和预测的尺度是否相同,如果训练的尺度为[3, 32, 320],预测的尺度为[3, 64, 640],则会有比较多的重复识别现象。

+

Q: 图像正常识别出来的文字是OK的,旋转90度后识别出来的结果就比较差,有什么方法可以优化?

+

A: 整图旋转90之后效果变差是有可能的,因为目前PPOCR默认输入的图片是正向的; 可以自己训练一个整图的方向分类器,放在预测的最前端(可以参照现有方向分类器的方式),或者可以基于规则做一些预处理,比如判断长宽等等。

+

Q: 如何识别竹简上的古文?

+

A:对于字符都是普通的汉字字符的情况,只要标注足够的数据,finetune模型就可以了。如果数据量不足,您可以尝试StyleText工具。 +而如果使用的字符是特殊的古文字、甲骨文、象形文字等,那么首先需要构建一个古文字的字典,之后再进行训练。

+

Q: 只想要识别票据中的部分片段,重新训练它的话,只需要训练文本检测模型就可以了吗?问文本识别,方向分类还是用原来的模型这样可以吗?

+

A:可以的。PaddleOCR的检测、识别、方向分类器三个模型是独立的,在实际使用中可以优化和替换其中任何一个模型。

+

Q: 如何用PaddleOCR识别视频中的文字?

+

A: 目前PaddleOCR主要针对图像做处理,如果需要视频识别,可以先对视频抽帧,然后用PPOCR识别。

+

Q: 相机采集的图像为四通道,应该如何处理?

+

A: 有两种方式处理:

+
    +
  • 如果没有其他需要,可以在解码数据的时候指定模式为三通道,例如如果使用opencv,可以使用cv::imread(img_path, cv::IMREAD_COLOR)。
  • +
  • 如果其他模块需要处理四通道的图像,那也可以在输入PaddleOCR模块之前进行转换,例如使用cvCvtColor(&img,img3chan,CV_RGBA2RGB)。
  • +
+

Q: 遇到中英文识别模型不支持的字符,该如何对模型做微调?

+

A:如果希望识别中英文识别模型中不支持的字符,需要更新识别的字典,并完成微调过程。比如说如果希望模型能够进一步识别罗马数字,可以按照以下步骤完成模型微调过程。

+
    +
  1. 准备中英文识别数据以及罗马数字的识别数据,用于训练,同时保证罗马数字和中英文识别数字的效果;
  2. +
  3. 修改默认的字典文件,在后面添加罗马数字的字符;
  4. +
  5. 下载PaddleOCR提供的预训练模型,配置预训练模型和数据的路径,开始训练。
  6. +
+

Q:特殊字符(例如一些标点符号)识别效果不好怎么办?

+

A:首先请您确认要识别的特殊字符是否在字典中。 +如果字符在已经字典中但效果依然不好,可能是由于识别数据较少导致的,您可以增加相应数据finetune模型。

+
+

Q:单张图上多语种并存识别(如单张图印刷体和手写文字并存),应该如何处理?

+

A:单张图像中存在多种类型文本的情况很常见,典型的以学生的试卷为代表,一张图像同时存在手写体和印刷体两种文本,这类情况下,可以尝试”1个检测模型+1个N分类模型+N个识别模型”的解决方案。 +其中不同类型文本共用同一个检测模型,N分类模型指额外训练一个分类器,将检测到的文本进行分类,如手写+印刷的情况就是二分类,N种语言就是N分类,在识别的部分,针对每个类型的文本单独训练一个识别模型,如手写+印刷的场景,就需要训练一个手写体识别模型,一个印刷体识别模型,如果一个文本框的分类结果是手写体,那么就传给手写体识别模型进行识别,其他情况同理。

+

Q: 多语言的字典里是混合了不同的语种,这个是有什么讲究吗?统一到一个字典里会对精度造成多大的损失?

+

A:统一到一个字典里,会造成最后一层FC过大,增加模型大小。如果有特殊需求的话,可以把需要的几种语言合并字典训练模型,合并字典之后如果引入过多的形近字,可能会造成精度损失,字符平衡的问题可能也需要考虑一下。在PaddleOCR里暂时将语言字典分开。

+

Q:类似泰语这样的小语种,部分字会占用两个字符甚至三个字符,请问如何制作字典

+

A:处理字符的时候,把多字符的当作一个字就行,字典中每行是一个字。

+
+

Q: 想把简历上的文字识别出来后,能够把关系一一对应起来,比如姓名和它后面的名字组成一对,籍贯、邮箱、学历等等都和各自的内容关联起来,这个应该如何处理,PPOCR目前支持吗?

+

A: 这样的需求在企业应用中确实比较常见,但往往都是个性化的需求,没有非常规整统一的处理方式。常见的处理方式有如下两种:

+
    +
  1. 对于单一版式、或者版式差异不大的应用场景,可以基于识别场景的一些先验信息,将识别内容进行配对; 比如运用表单结构信息:常见表单"姓名"关键字的后面,往往紧跟的就是名字信息
  2. +
  3. 对于版式多样,或者无固定版式的场景, 需要借助于NLP中的NER技术,给识别内容中的某些字段,赋予key值
  4. +
+

由于这部分需求和业务场景强相关,难以用一个统一的模型去处理,目前PPOCR暂不支持。 如果需要用到NER技术,可以参照Paddle团队的另一个开源套件: PaddlePaddle/ERNIE, 其提供的预训练模型ERNIE, 可以帮助提升NER任务的准确率。

+

1.6 训练过程与模型调优

+

Q: 增大batch_size模型训练速度没有明显提升

+

A:如果batch_size打得太大,加速效果不明显的话,可以试一下增大初始化内存的值,运行代码前设置环境变量: +export FLAGS_initial_cpu_memory_in_mb=2000 # 设置初始化内存约2G左右

+

Q: 预测时提示图像过大,显存、内存溢出了,应该如何处理?

+

A:可以按照这个PR的修改来缓解显存、内存占用 #2230

+

Q: 识别训练时,训练集精度已经到达90了,但验证集精度一直在70,涨不上去怎么办?

+

A:训练集精度90,测试集70多的话,应该是过拟合了,有两个可尝试的方法:(1)加入更多的增广方式或者调大增广prob的概率,默认为0.4。(2)调大系统的l2 decay值

+

1.7 补充资料

+

Q: 对于小白如何快速入门中文OCR项目实践?

+

A:建议可以先了解OCR方向的基础知识,大概了解基础的检测和识别模型算法。然后在Github上可以查看OCR方向相关的repo。目前来看,从内容的完备性来看,PaddleOCR的中英文双语教程文档是有明显优势的,在数据集、模型训练、预测部署文档详实,可以快速入手。而且还有微信用户群答疑,非常适合学习实践。项目地址:PaddleOCR AI 快车道课程:https://aistudio.baidu.com/aistudio/course/introduce/1519

+

2. PaddleOCR实战问题

+

2.1 PaddleOCR repo

+

Q: PaddleOCR develop分支和dygraph分支的区别?

+

A:目前PaddleOCR有四个分支,分别是:

+
    +
  • develop:基于Paddle静态图开发的分支,推荐使用paddle1.8 或者2.0版本,该分支具备完善的模型训练、预测、推理部署、量化裁剪等功能,领先于release/1.1分支。
  • +
  • release/1.1:PaddleOCR 发布的第一个稳定版本,基于静态图开发,具备完善的训练、预测、推理部署、量化裁剪等功能。
  • +
  • dygraph:基于Paddle动态图开发的分支,目前仍在开发中,未来将作为主要开发分支,运行要求使用Paddle2.0.0版本。
  • +
  • release/2.0-rc1-0:PaddleOCR发布的第二个稳定版本,基于动态图和paddle2.0版本开发,动态图开发的工程更易于调试,目前支,支持模型训练、预测,暂不支持移动端部署。
  • +
+

如果您已经上手过PaddleOCR,并且希望在各种环境上部署PaddleOCR,目前建议使用静态图分支,develop或者release/1.1分支。如果您是初学者,想快速训练,调试PaddleOCR中的算法,建议尝鲜PaddleOCR dygraph分支。

+

注意:develop和dygraph分支要求的Paddle版本、本地环境有差别,请注意不同分支环境安装部分的差异。

+

Q:PaddleOCR与百度的其他OCR产品有什么区别?

+

A:PaddleOCR主要聚焦通用ocr,如果有垂类需求,您可以用PaddleOCR+垂类数据自己训练; +如果缺少带标注的数据,或者不想投入研发成本,建议直接调用开放的API,开放的API覆盖了目前比较常见的一些垂类。

+

2.2 安装环境

+

Q:OSError: [WinError 126] 找不到指定的模块。mac pro python 3.4 shapely import 问题

+

A:这个问题是因为shapely库安装有误,可以参考 #212 这个issue重新安装一下

+

Q:PaddlePaddle怎么指定GPU运行 os.environ["CUDA_VISIBLE_DEVICES"]这种不生效

+

A:通过设置 export CUDA_VISIBLE_DEVICES='0'环境变量

+

Q:PaddleOCR是否支持在Windows或Mac系统上运行?

+

A:PaddleOCR已完成Windows和Mac系统适配,运行时注意两点: +(1)在快速安装时,如果不想安装docker,可跳过第一步,直接从第二步安装paddle开始。 +(2)inference模型下载时,如果没有安装wget,可直接点击模型链接或将链接地址复制到浏览器进行下载,并解压放置到相应目录。

+

2.3 数据量说明

+

Q:简单的对于精度要求不高的OCR任务,数据集需要准备多少张呢?

+

A:(1)训练数据的数量和需要解决问题的复杂度有关系。难度越大,精度要求越高,则数据集需求越大,而且一般情况实际中的训练数据越多效果越好。

+

(2)对于精度要求不高的场景,检测任务和识别任务需要的数据量是不一样的。对于检测任务,500张图像可以保证基本的检测效果。对于识别任务,需要保证识别字典中每个字符出现在不同场景的行文本图像数目需要大于200张(举例,如果有字典中有5个字,每个字都需要出现在200张图片以上,那么最少要求的图像数量应该在200-1000张之间),这样可以保证基本的识别效果。

+

Q:请问PaddleOCR项目中的中文超轻量和通用模型用了哪些数据集?训练多少样本,gpu什么配置,跑了多少个epoch,大概跑了多久?

+

A: +(1)检测的话,LSVT街景数据集共3W张图像,超轻量模型,150epoch左右,2卡V100 跑了不到2天;通用模型:2卡V100 150epoch 不到4天。 +(2)识别的话,520W左右的数据集(真实数据26W+合成数据500W)训练,超轻量模型:4卡V100,总共训练了5天左右。通用模型:4卡V100,共训练6天。

+

超轻量模型训练分为2个阶段: +(1)全量数据训练50epoch,耗时3天 +(2)合成数据+真实数据按照1:1数据采样,进行finetune训练200epoch,耗时2天

+

通用模型训练: +真实数据+合成数据,动态采样(1:1)训练,200epoch,耗时 6天左右。

+

Q:训练文字识别模型,真实数据有30w,合成数据有500w,需要做样本均衡吗?

+

A:需要,一般需要保证一个batch中真实数据样本和合成数据样本的比例是5:1~10:1左右效果比较理想。如果合成数据过大,会过拟合到合成数据,预测效果往往不佳。还有一种启发性的尝试是可以先用大量合成数据训练一个base模型,然后再用真实数据微调,在一些简单场景效果也是会有提升的。

+

Q: 当训练数据量少时,如何获取更多的数据?

+

A:当训练数据量少时,可以尝试以下三种方式获取更多的数据:(1)人工采集更多的训练数据,最直接也是最有效的方式。(2)基于PIL和opencv基本图像处理或者变换。例如PIL中ImageFont, Image, ImageDraw三个模块将文字写到背景中,opencv的旋转仿射变换,高斯滤波等。(3)利用数据生成算法合成数据,例如pix2pix等算法。

+

2.4 数据标注与生成

+
+

[!NOTE] +StyleText 已经移动到 PFCCLab/StyleText

+
+

Q: Style-Text 如何不文字风格迁移,就像普通文本生成程序一样默认字体直接输出到分割的背景图?

+

A:使用image_synth模式会输出fake_bg.jpg,即为背景图。如果想要批量提取背景,可以稍微修改一下代码,将fake_bg保存下来即可。要修改的位置: +https://github.com/PaddlePaddle/PaddleOCR/blob/de3e2e7cd3b8b65ee02d7a41e570fa5b511a3c1d/StyleText/engine/synthesisers.py#L68

+

Q: 能否修改StyleText配置文件中的分辨率?

+

A:StyleText目前的训练数据主要是高度32的图片,建议不要改变高度。未来我们会支持更丰富的分辨率。

+

Q: StyleText是否可以更换字体文件?

+

A:StyleText项目中的字体文件为标准字体,主要用作模型的输入部分,不能够修改。 +StyleText的用途主要是:提取style_image中的字体、背景等style信息,根据语料生成同样style的图片。

+

Q: StyleText批量生成图片为什么没有输出?

+

A:需要检查以下您配置文件中的路径是否都存在。尤其要注意的是label_file配置。 +如果您使用的style_image输入没有label信息,您依然需要提供一个图片文件列表。

+

Q:使用StyleText进行数据合成时,文本(TextInput)的长度远超StyleInput的长度,该怎么处理与合成呢?

+

A:在使用StyleText进行数据合成的时候,建议StyleInput的长度长于TextInput的长度。有2种方法可以处理上述问题:

+
    +
  1. 将StyleInput按列的方向进行复制与扩充,直到其超过TextInput的长度。
  2. +
  3. 将TextInput进行裁剪,保证每段TextInput都稍短于StyleInput,分别合成之后,再拼接在一起。
  4. +
+

实际使用中发现,使用第2种方法的效果在长文本合成的场景中的合成效果更好,StyleText中提供的也是第2种数据合成的逻辑。

+

Q: StyleText 合成数据效果不好?

+

A:StyleText模型生成的数据主要用于OCR识别模型的训练。PaddleOCR目前识别模型的输入为32 x N,因此当前版本模型主要适用高度为32的数据。 +建议要合成的数据尺寸设置为32 x N。尺寸相差不多的数据也可以生成,尺寸很大或很小的数据效果确实不佳。

+

2.5 预训练模型与微调

+

Q:如何更换文本检测/识别的backbone?

+

A:无论是文字检测,还是文字识别,骨干网络的选择是预测效果和预测效率的权衡。一般,选择更大规模的骨干网络,例如ResNet101_vd,则检测或识别更准确,但预测耗时相应也会增加。而选择更小规模的骨干网络,例如MobileNetV3_small_x0_35,则预测更快,但检测或识别的准确率会大打折扣。幸运的是不同骨干网络的检测或识别效果与在ImageNet数据集图像1000分类任务效果正相关。飞桨图像分类套件PaddleClas汇总了ResNet_vd、Res2Net、HRNet、MobileNetV3、GhostNet等23种系列的分类网络结构,在上述图像分类任务的top1识别准确率,GPU(V100和T4)和CPU(骁龙855)的预测耗时以及相应的117个预训练模型下载地址。

+

(1)文字检测骨干网络的替换,主要是确定类似于ResNet的4个stages,以方便集成后续的类似FPN的检测头。此外,对于文字检测问题,使用ImageNet训练的分类预训练模型,可以加速收敛和效果提升。

+

(2)文字识别的骨干网络的替换,需要注意网络宽高stride的下降位置。由于文本识别一般宽高比例很大,因此高度下降频率少一些,宽度下降频率多一些。可以参考PaddleOCR中MobileNetV3骨干网络的改动。

+

Q: 参照文档做实际项目时,是重新训练还是在官方训练的基础上进行训练?具体如何操作?

+

A: 基于官方提供的模型,进行finetune的话,收敛会更快一些。 具体操作上,以识别模型训练为例:如果修改了字符文件,可以设置pretraind_model为官方提供的预训练模型

+

Q: 下载的识别模型解压后缺失文件,没有期望的inference.pdiparams, inference.pdmodel等文件

+

A:用解压软件解压可能会出现这个问题,建议二次解压下或者用命令行解压tar xf

+

Q: 为什么在checkpoints中load下载的预训练模型会报错?

+

A:这里有两个不同的概念:

+

pretrained_model:指预训练模型,是已经训练完成的模型。这时会load预训练模型的参数,但并不会load学习率、优化器以及训练状态等。如果需要finetune,应该使用pretrained。 +checkpoints:指之前训练的中间结果,例如前一次训练到了100个epoch,想接着训练。这时会load尝试所有信息,包括模型的参数,之前的状态等。

+

Q: 如何对检测模型finetune,比如冻结前面的层或某些层使用小的学习率学习?

+

A:如果是冻结某些层,可以将变量的stop_gradient属性设置为True,这样计算这个变量之前的所有参数都不会更新了,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/faq/train_cn.html#id4

+

如果对某些层使用更小的学习率学习,静态图里还不是很方便,一个方法是在参数初始化的时候,给权重的属性设置固定的学习率,参考:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/fluid/param_attr/ParamAttr_cn.html#paramattr

+

实际上我们实验发现,直接加载模型去fine-tune,不设置某些层不同学习率,效果也都不错

+

2.6 模型超参调整

+

Q: DB检测训练输入尺寸640,可以改大一些吗?

+

A:不建议改大。检测模型训练输入尺寸是预处理中random crop后的尺寸,并非直接将原图进行resize,多数场景下这个尺寸并不小了,改大后可能反而并不合适,而且训练会变慢。另外,代码里可能有的地方参数按照预设输入尺寸适配的,改大后可能有隐藏风险。

+

Q: 预处理部分,图片的长和宽为什么要处理成32的倍数?

+

A:以检测中的resnet骨干网络为例,图像输入网络之后,需要经过5次2倍降采样,共32倍,因此建议输入的图像尺寸为32的倍数。

+

Q: 在识别模型中,为什么降采样残差结构的stride为(2, 1)?

+

A: stride为(2, 1),表示在图像y方向(高度方向)上stride为2,x方向(宽度方向)上为1。由于待识别的文本图像通常为长方形,这样只在高度方向做下采样,尽量保留宽度方向的序列信息,避免宽度方向下采样后丢失过多的文字信息。

+

Q:训练识别时,如何选择合适的网络输入shape?

+

A:一般高度采用32,最长宽度的选择,有两种方法:

+

(1)统计训练样本图像的宽高比分布。最大宽高比的选取考虑满足80%的训练样本。

+

(2)统计训练样本文字数目。最长字符数目的选取考虑满足80%的训练样本。然后中文字符长宽比近似认为是1,英文认为3:1,预估一个最长宽度。

+

Q:识别模型框出来的位置太紧凑,会丢失边缘的文字信息,导致识别错误

+

A:可以在命令中加入 --det_db_unclip_ratio ,参数定义位置,这个参数是检测后处理时控制文本框大小的,默认1.6,可以尝试改成2.5或者更大,反之,如果觉得文本框不够紧凑,也可以把该参数调小。

+

2.7 模型结构

+

Q:文本识别训练不加LSTM是否可以收敛?

+

A:理论上是可以收敛的,加上LSTM模块主要是为了挖掘文字之间的序列关系,提升识别效果。对于有明显上下文语义的场景效果会比较明显。

+

Q:文本识别中LSTM和GRU如何选择?

+

A:从项目实践经验来看,序列模块采用LSTM的识别效果优于GRU,但是LSTM的计算量比GRU大一些,可以根据自己实际情况选择。

+

Q:对于CRNN模型,backbone采用DenseNet和ResNet_vd,哪种网络结构更好?

+

A:Backbone的识别效果在CRNN模型上的效果,与Imagenet 1000 图像分类任务上识别效果和效率一致。在图像分类任务上ResnNet_vd(79%+)的识别精度明显优于DenseNet(77%+),此外对于GPU,Nvidia针对ResNet系列模型做了优化,预测效率更高,所以相对而言,resnet_vd是较好选择。如果是移动端,可以优先考虑MobileNetV3系列。

+

Q: 如何根据不同的硬件平台选用不同的backbone?

+

A:在不同的硬件上,不同的backbone的速度优势不同,可以根据不同平台的速度-精度图来确定backbone,这里可以参考PaddleClas模型速度-精度图

+

2.8 PP-OCR系统

+

Q: 在PP-OCR系统中,文本检测的骨干网络为什么没有使用SE模块?

+

A:SE模块是MobileNetV3网络一个重要模块,目的是估计特征图每个特征通道重要性,给特征图每个特征分配权重,提高网络的表达能力。但是,对于文本检测,输入网络的分辨率比较大,一般是640*640,利用SE模块估计特征图每个特征通道重要性比较困难,网络提升能力有限,但是该模块又比较耗时,因此在PP-OCR系统中,文本检测的骨干网络没有使用SE模块。实验也表明,当去掉SE模块,超轻量模型大小可以减小40%,文本检测效果基本不受影响。详细可以参考PP-OCR技术文章,https://arxiv.org/abs/2009.09941.

+

Q: PP-OCR系统中,文本检测的结果有置信度吗?

+

A:文本检测的结果有置信度,由于推理过程中没有使用,所以没有显示的返回到最终结果中。如果需要文本检测结果的置信度,可以在文本检测DB的后处理代码的155行,添加scores信息。这样,在检测预测代码的197行,就可以拿到文本检测的scores信息。

+

Q: DB文本检测,特征提取网络金字塔构建的部分代码在哪儿?

+

A:特征提取网络金字塔构建的部分:代码位置。ppocr/modeling文件夹里面是组网相关的代码,其中architectures是文本检测或者文本识别整体流程代码;backbones是骨干网络相关代码;necks是类似与FPN的颈函数代码;heads是提取文本检测或者文本识别预测结果相关的头函数;transforms是类似于TPS特征预处理模块。更多的信息可以参考代码组织结构

+

Q:PaddleOCR如何做到横排和竖排同时支持的?

+

A:合成了一批竖排文字,逆时针旋转90度后加入训练集与横排一起训练。预测时根据图片长宽比判断是否为竖排,若为竖排则将crop出的文本逆时针旋转90度后送入识别网络。

+

Q: 目前知识蒸馏有哪些主要的实践思路?

+

A:知识蒸馏即利用教师模型指导学生模型的训练,目前有3种主要的蒸馏思路:

+
    +
  1. 基于输出结果的蒸馏,即让学生模型学习教师模型的软标签(分类或者OCR识别等任务中)或者概率热度图(分割等任务中)。
  2. +
  3. 基于特征图的蒸馏,即让学生模型学习教师模型中间层的特征图,拟合中间层的一些特征。
  4. +
  5. 基于关系的蒸馏,针对不同的样本(假设个数为N),教师模型会有不同的输出,那么可以基于不同样本的输出,计算一个NxN的相关性矩阵,可以让学生模型去学习教师模型关于不同样本的相关性矩阵。
  6. +
+

当然,知识蒸馏方法日新月异,也欢迎大家提出更多的总结与建议。

+ +

A:实验发现,使用贪心的方法去做解码,识别精度影响不大,但是速度方面的优势比较明显,因此PaddleOCR中使用贪心算法去做识别的解码。

+

2.9 端到端

+

Q: 端到端算法PGNet是否支持中文识别,速度会很慢嘛?

+

A:目前开源的PGNet算法模型主要是用于检测英文数字,对于中文的识别需要自己训练,大家可以使用开源的端到端中文数据集,而对于复杂文本(弯曲文本)的识别,也可以自己构造一批数据集针对进行训练,对于推理速度,可以先将模型转换为inference再进行预测,速度应该会相当可观。

+

Q: 端到端算法PGNet提供了两种后处理方式,两者之间有什么区别呢?

+

A: 两种后处理的区别主要在于速度的推理,config中PostProcess有fast/slow两种模式,slow模式的后处理速度慢,精度相对较高,fast模式的后处理速度快,精度也在可接受的范围之内。建议使用速度快的后处理方式。

+

Q: 使用PGNet进行eval报错?

+

A: 需要注意,我们目前在release/2.1更新了评测代码,目前支持A,B两种评测模式:

+
    +
  • A模式:该模式主要为了方便用户使用,与训练集一样的标注文件就可以正常进行eval操作, 代码中默认是A模式。
  • +
  • B模式:该模式主要为了保证我们的评测代码可以和Total Text官方的评测方式对齐,该模式下直接加载官方提供的mat文件进行eval。
  • +
+

Q: PGNet有中文预训练模型吗?

+

A: 目前我们尚未提供针对中文的预训练模型,如有需要,可以尝试自己训练。具体需要修改的地方有:

+
    +
  1. config文件中,字典文件路径及语种设置;
  2. +
  3. 网络结构中out_channels修改为字典中的字符数目+1(考虑到空格);
  4. +
  5. loss中,修改37为字典中的字符数目+1(考虑到空格);
  6. +
+

Q: 用于PGNet的训练集,文本框的标注有要求吗?

+

A: PGNet支持多点标注,比如4点、8点、14点等。但需要注意的是,标注点尽可能分布均匀(相邻标注点间隔距离均匀一致),且label文件中的标注点需要从标注框的左上角开始,按标注点顺时针顺序依次编写,以上问题都可能对训练精度造成影响。 +我们提供的,基于Total Text数据集的PGNet预训练模型使用了14点标注方式。

+

Q: 用PGNet做进行端到端训练时,数据集标注的点的个数必须都是统一一样的吗? 能不能随意标点数,只要能够按顺时针从左上角开始标这样?

+

A: 目前代码要求标注为统一的点数。

+

2.10 模型效果与效果不一致

+

Q: PP-OCR检测效果不好,该如何优化?

+

A: 具体问题具体分析: +如果在你的场景上检测效果不可用,首选是在你的数据上做finetune训练; +如果图像过大,文字过于密集,建议不要过度压缩图像,可以尝试修改检测预处理的resize逻辑,防止图像被过度压缩; +检测框大小过于紧贴文字或检测框过大,可以调整db_unclip_ratio这个参数,加大参数可以扩大检测框,减小参数可以减小检测框大小; +检测框存在很多漏检问题,可以减小DB检测后处理的阈值参数det_db_box_thresh,防止一些检测框被过滤掉,也可以尝试设置det_db_score_mode为'slow'; +其他方法可以选择use_dilation为True,对检测输出的feature map做膨胀处理,一般情况下,会有效果改善;

+

Q:同一张图通用检测出21个条目,轻量级检测出26个 ,难道不是轻量级的好吗?

+

A:可以主要参考可视化效果,通用模型更倾向于检测一整行文字,轻量级可能会有一行文字被分成两段检测的情况,不是数量越多,效果就越好。

+

Q: DB有些框太贴文本了反而去掉了一些文本的边角影响识别,这个问题有什么办法可以缓解吗?

+

A:可以把后处理的参数unclip_ratio适当调大一点。

+

Q: 使用合成数据精调小模型后,效果可以,但是还没开源的小infer模型效果好,这是为什么呢?

+

A:(1)要保证使用的配置文件和pretrain weights是对应的;

+

(2)在微调时,一般都需要真实数据,如果使用合成数据,效果反而可能会有下降,PaddleOCR中放出的识别inference模型也是基于预训练模型在真实数据上微调得到的,效果提升比较明显;

+

(3)在训练的时候,文本长度超过25的训练图像都会被丢弃,因此需要看下真正参与训练的图像有多少,太少的话也容易过拟合。

+

Q: 表格识别中,如何提高单字的识别结果?

+

A: 首先需要确认一下检测模型有没有有效的检测出单个字符,如果没有的话,需要在训练集当中添加相应的单字数据集。

+

Q: 动态图分支(dygraph,release/2.0),训练模型和推理模型效果不一致

+

A:当前问题表现为:使用训练完的模型直接测试结果较好,但是转换为inference model后,预测结果不一致;出现这个问题一般是两个原因:

+
    +
  1. 预处理函数设置的不一致
  2. +
  3. 后处理参数不一致 repo中config.yml文件的前后处理参数和inference预测默认的超参数有不一致的地方,建议排查下训练模型预测和inference预测的前后处理, 参考issue。
  4. +
+

Q: 自己训练的det模型,在同一张图片上,inference模型与eval模型结果差别很大,为什么?

+

A:这是由于图片预处理不同造成的。如果训练的det模型图片输入并不是默认的shape[600, 600],eval的程序中图片预处理方式与train时一致 (由xxx_reader.yml中的test_image_shape参数决定缩放大小,但predict_eval.py中的图片预处理方式由程序里的preprocess_params决定, 最好不要传入max_side_len,而是传入和训练时一样大小的test_image_shape。

+

Q: 训练模型和测试模型的检测结果差距较大

+

A:1. 检查两个模型使用的后处理参数是否是一样的,训练的后处理参数在配置文件中的PostProcess部分,测试模型的后处理参数在tools/infer/utility.py中,最新代码中两个后处理参数已保持一致。

+

Q: PaddleOCR模型Python端预测和C++预测结果不一致?

+

A:正常来说,python端预测和C++预测文本是一致的,如果预测结果差异较大, 建议首先排查diff出现在检测模型还是识别模型,或者尝试换其他模型是否有类似的问题。 其次,检查python端和C++端数据处理部分是否存在差异,建议保存环境,更新PaddleOCR代码再试下。 如果更新代码或者更新代码都没能解决,建议在PaddleOCR微信群里或者issue中抛出您的问题。

+

用户总结的排查步骤:https://github.com/PaddlePaddle/PaddleOCR/issues/2470

+

2.11 训练调试与配置文件

+

Q: 某个类别的样本比较少,通过增加训练的迭代次数或者是epoch,变相增加小样本的数目,这样能缓解这个问题么?

+

A: 尽量保证类别均衡, 某些类别样本少,可以通过补充合成数据的方式处理;实验证明训练集中出现频次较少的字符,识别效果会比较差,增加迭代次数不能改变样本量少的问题。

+

Q:文本检测换成自己的数据没法训练,有一些”###”是什么意思?

+

A:数据格式有问题,”###” 表示要被忽略的文本区域,所以你的数据都被跳过了,可以换成其他任意字符或者就写个空的。

+

Q:如何调试数据读取程序?

+

A:tools/train.py中有一个test_reader()函数用于调试数据读取。

+

Q:中文文本检测、文本识别构建训练集的话,大概需要多少数据量

+

A:检测需要的数据相对较少,在PaddleOCR模型的基础上进行Fine-tune,一般需要500张可达到不错的效果。 识别分英文和中文,一般英文场景需要几十万数据可达到不错的效果,中文则需要几百万甚至更多。

+

Q: config yml文件中的ratio_list参数的作用是什么?

+

A: 在动态图中,ratio_list在有多个数据源的情况下使用,ratio_list中的每个值是每个epoch从对应数据源采样数据的比例。如ratio_list=[0.3,0.2],label_file_list=['data1','data2'],代表每个epoch的训练数据包含data1 30%的数据,和data2里 20%的数据,ratio_list中数值的和不需要等于1。ratio_list和label_file_list的长度必须一致。

+

静态图检测数据采样的逻辑与动态图不同,但基本不影响训练精度。

+

在静态图中,使用 检测 dataloader读取数据时,会先设置每个epoch的数据量,比如这里设置为1000,ratio_list中的值表示在1000中的占比,比如ratio_list是[0.3, 0.7],则表示使用两个数据源,每个epoch从第一个数据源采样1000*0.3=300张图,从第二个数据源采样700张图。ratio_list的值的和也不需要等于1。

+

Q: iaa里面添加的数据增强方式,是每张图像训练都会做增强还是随机的?如何添加一个数据增强方法?

+

A:iaa增强的训练配置参考:这里。其中{ 'type': Fliplr, 'args': { 'p': 0.5 } } p是概率。新增数据增强,可以参考这个方法

+

Q: 怎么加速训练过程呢?

+

A:OCR模型训练过程中一般包含大量的数据增广,这些数据增广是比较耗时的,因此可以离线生成大量增广后的图像,直接送入网络进行训练,机器资源充足的情况下,也可以使用分布式训练的方法,可以参考分布式训练教程文档

+

Q: 一些特殊场景的数据识别效果差,但是数据量很少,不够用来finetune怎么办?

+

A:您可以合成一些接近使用场景的数据用于训练。 +我们计划推出基于特定场景的文本数据合成工具,请您持续关注PaddleOCR的近期更新。

+

Q: PaddleOCR可以识别灰度图吗?

+

A:PaddleOCR的模型均为三通道输入。如果您想使用灰度图作为输入,建议直接用3通道的模式读入灰度图, +或者将单通道图像转换为三通道图像再识别。例如,opencv的cvtColor函数就可以将灰度图转换为RGB三通道模式。

+

Q: 如何合成手写中文数据集?

+

A: 手写数据集可以通过手写单字数据集合成得到。随机选取一定数量的单字图片和对应的label,将图片高度resize为随机的统一高度后拼接在一起,即可得到合成数据集。对于需要添加文字背景的情况,建议使用阈值化将单字图片的白色背景处理为透明背景,再与真实背景图进行合成。具体可以参考文档手写数据集

+

Q:PaddleOCR默认不是200个step保存一次模型吗?为啥文件夹下面都没有生成

+

A:因为默认保存的起始点不是0,而是4000,将eval_batch_step [4000, 5000]改为[0, 2000] 就是从第0次迭代开始,每2000迭代保存一次模型

+

Q: PaddleOCR在训练的时候一直使用cosine_decay的学习率下降策略,这是为什么呢?

+

A:cosine_decay表示在训练的过程中,学习率按照cosine的变化趋势逐渐下降至0,在迭代轮数更长的情况下,比常量的学习率变化策略会有更好的收敛效果,因此在实际训练的时候,均采用了cosine_decay,来获得精度更高的模型。

+

Q: Cosine学习率的更新策略是怎样的?训练过程中为什么会在一个值上停很久?

+

A: Cosine学习率的说明可以参考这里

+

在PaddleOCR中,为了让学习率更加平缓,我们将其中的epoch调整成了iter。 +学习率的更新会和总的iter数量有关。当iter比较大时,会经过较多iter才能看出学习率的值有变化。

+

Q: 之前的CosineWarmup方法为什么不见了?

+

A: 我们对代码结构进行了调整,目前的Cosine可以覆盖原有的CosineWarmup的功能,只需要在配置文件中增加相应配置即可。 +例如下面的代码,可以设置warmup为2个epoch:

+
lr:
+  name: Cosine
+  learning_rate: 0.001
+  warmup_epoch: 2
+
+

Q: 训练识别和检测时学习率要加上warmup,目的是什么?

+

A: Warmup机制先使学习率从一个较小的值逐步升到一个较大的值,而不是直接就使用较大的学习率,这样有助于模型的稳定收敛。在OCR检测和OCR识别中,一般会带来精度~0.5%的提升。

+

Q: 关于dygraph分支中,文本识别模型训练,要使用数据增强应该如何设置?

+

A:可以参考配置文件Train['dataset']['transforms']添加RecAug字段,使数据增强生效。可以通过添加对aug_prob设置,表示每种数据增强采用的概率。aug_prob默认是0.4。详细设置可以参考ISSUE 1744

+

Q: 训练过程中,训练程序意外退出/挂起,应该如何解决?

+

A: 考虑内存,显存(使用GPU训练的话)是否不足,可在配置文件中,将训练和评估的batch size调小一些。需要注意,训练batch size调小时,学习率learning rate也要调小,一般可按等比例调整。

+

Q: 训练程序启动后直到结束,看不到训练过程log?

+

A: 可以从以下三方面考虑: + 1. 检查训练进程是否正常退出、显存占用是否释放、是否有残留进程,如果确定是训练程序卡死,可以检查环境配置,遇到环境问题建议使用docker,可以参考说明文档安装。 + 2. 检查数据集的数据量是否太小,可调小batch size从而增加一个epoch中的训练step数量,或在训练config文件中,将参数print_batch_step改为1,即每一个step打印一次log信息。 + 3. 如果使用私有数据集训练,可先用PaddleOCR提供/推荐的数据集进行训练,排查私有数据集是否存在问题。

+

Q: 配置文件中的参数num workers是什么意思,应该如何设置?

+

A: 训练数据的读取需要硬盘IO,而硬盘IO速度远小于GPU运算速度,为了避免数据读取成为训练速度瓶颈,可以使用多进程读取数据,num workers表示数据读取的进程数量,0表示不使用多进程读取。在Linux系统下,多进程读取数据时,进程间通信需要基于共享内存,因此使用多进程读取数据时,建议设置共享内存不低于2GB,最好可以达到8GB,此时,num workers可以设置为CPU核心数。如果机器硬件配置较低,或训练进程卡死、dataloader报错,可以将num workers设置为0,即不使用多进程读取数据。

+

2.12 预测

+

Q: 为什么PaddleOCR检测预测是只支持一张图片测试?即test_batch_size_per_card=1

+

A:测试的时候,对图像等比例缩放,最长边960,不同图像等比例缩放后长宽不一致,无法组成batch,所以设置为test_batch_size为1。

+

Q: PaddleOCR支持tensorrt推理吗?

+

A:支持的,需要在编译的时候将CMakeLists.txt文件当中,将相关代码option(WITH_TENSORRT "Compile demo with TensorRT." OFF)的OFF改成ON。关于服务器端部署的更多设置,可以参考飞桨官网

+

Q: 如何使用TensorRT加速PaddleOCR预测?

+

A: 目前paddle的dygraph分支已经支持了python和C++ TensorRT预测的代码,python端inference预测时把参数--use_tensorrt=True即可, +C++TensorRT预测需要使用支持TRT的预测库并在编译时打开-DWITH_TENSORRT=ON。 +如果想修改其他分支代码支持TensorRT预测,可以参考PR

+

注:建议使用TensorRT大于等于6.1.0.5以上的版本。

+

Q: 为什么识别模型做预测的时候,预测图片的数量数量还会影响预测的精度

+

A: 推理时识别模型默认的batch_size=6, 如预测图片长度变化大,可能影响预测效果。如果出现上述问题可在推理的时候设置识别bs=1,命令如下:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_4.jpg" --rec_model_dir="./ch_PP-OCRv3_rec_infer/" --rec_batch_num=1
+
+

2.13 推理部署

+

Q:PaddleOCR模型推理方式有几种?各自的优缺点是什么

+

A:目前推理方式支持基于训练引擎推理和基于预测引擎推理。

+

(1)基于训练引擎推理不需要转换模型,但是需要先组网再load参数,语言只支持python,不适合系统集成。

+

(2)基于预测引擎的推理需要先转换模型为inference格式,然后可以进行不需要组网的推理,语言支持c++和python,适合系统集成。

+

Q:PaddleOCR中,对于模型预测加速,CPU加速的途径有哪些?基于TenorRT加速GPU对输入有什么要求?

+

A:(1)CPU可以使用mkldnn进行加速;对于python inference的话,可以把enable_mkldnn改为true,参考代码,对于cpp inference的话,可参考文档

+

(2)GPU需要注意变长输入问题等,TRT6 之后才支持变长输入

+

Q:hubserving、pdserving这两种部署方式区别是什么?

+

A:hubserving原本是paddlehub的配套服务部署工具,可以很方便的将paddlehub内置的模型部署为服务,paddleocr使用了这个功能,并将模型路径等参数暴露出来方便用户自定义修改。paddle serving是面向所有paddle模型的部署工具,文档中可以看到我们提供了快速版和标准版,其中快速版和hubserving的本质是一样的,而标准版基于rpc,更稳定,更适合分布式部署。

+

Q: 目前paddle hub serving 只支持 imgpath,如果我想用imgurl 去哪里改呢?

+

A:图片是在这里读取的, 可以参考下面的写法,将url path转化为np array

+
response = request.urlopen('http://i1.whymtj.com/uploads/tu/201902/9999/52491ae4ba.jpg')
+img_array = np.array(bytearray(response.read()), dtype=np.uint8)
+img = cv.imdecode(img_array, -1)
+
+

Q: C++ 端侧部署可以只对OCR的检测部署吗?

+

A:可以的,识别和检测模块是解耦的。如果想对检测部署,需要自己修改一下main函数, 只保留检测相关就可以: 参考

+

Q:服务部署可以只发布文本识别,而不带文本检测模型么?

+

A:可以的。默认的服务部署是检测和识别串联预测的。也支持单独发布文本检测或文本识别模型,比如使用PaddleHUBPaddleOCR 模型时,deploy下有三个文件夹,分别是 +ocr_det:检测预测 +ocr_rec: 识别预测 +ocr_system: 检测识别串联预测

+

Q: lite预测库和nb模型版本不匹配,该如何解决?

+

A: 如果可以正常预测就不用管,如果这个问题导致无法正常预测,可以尝试使用同一个commit的Paddle Lite代码编译预测库和opt文件,可以参考移动端部署教程

+

Q:如何将PaddleOCR预测模型封装成SDK

+

A:如果是Python的话,可以使用tools/infer/predict_system.py中的TextSystem进行sdk封装,如果是c++的话,可以使用deploy/cpp_infer/src下面的DBDetector和CRNNRecognizer完成封装

+

Q:为什么PaddleOCR检测预测是只支持一张图片测试?即test_batch_size_per_card=1

+

A:测试的时候,对图像等比例缩放,最长边960,不同图像等比例缩放后长宽不一致,无法组成batch,所以设置为test_batch_size为1。

+

Q:为什么第一张张图预测时间很长,第二张之后预测时间会降低?

+

A:第一张图需要显存资源初始化,耗时较多。完成模型加载后,之后的预测时间会明显缩短。

+

Q: 采用Paddle-Lite进行端侧部署,出现问题,环境没问题

+

A:如果你的预测库是自己编译的,那么你的nb文件也要自己编译,用同一个lite版本。不能直接用下载的nb文件,因为版本不同。

+

Q: 如何多进程运行paddleocr?

+

A:实例化多个paddleocr服务,然后将服务注册到注册中心,之后通过注册中心统一调度即可,关于注册中心,可以搜索eureka了解一下具体使用,其他的注册中心也行。

+

Q: 如何多进程预测?

+

A: 近期PaddleOCR新增了多进程预测控制参数use_mp表示是否使用多进程,total_process_num表示在使用多进程时的进程数。具体使用方式请参考文档

+

Q: 怎么解决paddleOCR在T4卡上有越预测越慢的情况?

+

A

+
    +
  1. T4 GPU没有主动散热,因此在测试的时候需要在每次infer之后需要sleep 30ms,否则机器容易因为过热而降频(inference速度会变慢),温度过高也有可能会导致宕机。
  2. +
  3. T4在不使用的时候,也有可能会降频,因此在做benchmark的时候需要锁频,下面这两条命令可以进行锁频。
  4. +
+
nvidia-smi -i 0 -pm ENABLED
+nvidia-smi --lock-gpu-clocks=1590 -i 0
+
+

Q: 在windows上进行cpp inference的部署时,总是提示找不到paddle_fluid.dllopencv_world346.dll

+

A:有2种方法可以解决这个问题:

+
    +
  1. 将paddle预测库和opencv库的地址添加到系统环境变量中。
  2. +
  3. 将提示缺失的dll文件拷贝到编译产出的ocr_system.exe文件夹中。
  4. +
+

Q: win下C++部署中文识别乱码的解决方法

+

A: win下编码格式不是utf8,而ppocr_keys_v1.txt的编码格式的utf8,将ppocr_keys_v1.txt 的编码从utf-8修改为 Ansi 编码格式就行了。

+

Q: windows 3060显卡GPU模式启动 加载模型慢

+

A: 30系列的显卡需要使用cuda11。

+

Q:想在Mac上部署,从哪里下载预测库呢?

+

A:Mac上的Paddle预测库可以从这里下载:https://paddle-inference-lib.bj.bcebos.com/mac/2.0.0/cpu_avx_openblas/paddle_inference.tgz

+

Q:内网环境如何进行服务化部署呢?

+

A:仍然可以使用PaddleServing或者HubServing进行服务化部署,保证内网地址可以访问即可。

+

Q: 使用hub_serving部署,延时较高,可能的原因是什么呀?

+

A: 首先,测试的时候第一张图延时较高,可以多测试几张然后观察后几张图的速度;其次,如果是在cpu端部署serving端模型(如backbone为ResNet34),耗时较慢,建议在cpu端部署mobile(如backbone为MobileNetV3)模型。

+

Q: 在使用PaddleLite进行预测部署时,启动预测后卡死/手机死机?

+

A: 请检查模型转换时所用PaddleLite的版本,和预测库的版本是否对齐。即PaddleLite版本为2.8,则预测库版本也要为2.8。

+

Q: 预测时显存爆炸、内存泄漏问题?

+

A: 打开显存/内存优化开关enable_memory_optim可以解决该问题,相关代码已合入,查看详情

+ + + + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/add_new_algorithm.html b/algorithm/add_new_algorithm.html new file mode 100644 index 0000000000..6cb30cd704 --- /dev/null +++ b/algorithm/add_new_algorithm.html @@ -0,0 +1,5617 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 使用PaddleOCR架构添加新算法 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + +

添加新算法

+

PaddleOCR将一个算法分解为以下几个部分,并对各部分进行模块化处理,方便快速组合出新的算法。

+

下面将分别对每个部分进行介绍,并介绍如何在该部分里添加新算法所需模块。

+

1. 数据加载和处理

+

数据加载和处理由不同的模块(module)组成,其完成了图片的读取、数据增强和label的制作。这一部分在ppocr/data下。 各个文件及文件夹作用说明如下:

+
1
+2
+3
+4
+5
+6
+7
+8
ppocr/data/
+├── imaug             # 图片的读取、数据增强和label制作相关的文件
+   ├── label_ops.py  # 对label进行变换的modules
+   ├── operators.py  # 对image进行变换的modules
+   ├──.....
+├── __init__.py
+├── lmdb_dataset.py   # 读取lmdb的数据集的dataset
+└── simple_dataset.py # 读取以`image_path\tgt`形式保存的数据集的dataset
+
+

PaddleOCR内置了大量图像操作相关模块,对于没有没有内置的模块可通过如下步骤添加:

+
    +
  1. ppocr/data/imaug 文件夹下新建文件,如my_module.py。
  2. +
  3. +

    在 my_module.py 文件内添加相关代码,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    class MyModule:
    +    def __init__(self, *args, **kwargs):
    +        # your init code
    +        pass
    +
    +    def __call__(self, data):
    +        img = data['image']
    +        label = data['label']
    +        # your process code
    +
    +        data['image'] = img
    +        data['label'] = label
    +        return data
    +
    +
  4. +
  5. +

    ppocr/data/imaug/_init_.py 文件内导入添加的模块。

    +
  6. +
+

数据处理的所有处理步骤由不同的模块顺序执行而成,在config文件中按照列表的形式组合并执行。如:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
# angle class data process
+transforms:
+  - DecodeImage: # load image
+      img_mode: BGR
+      channel_first: False
+  - MyModule:
+      args1: args1
+      args2: args2
+  - KeepKeys:
+      keep_keys: [ 'image', 'label' ] # dataloader will return list in this order
+
+

2. 网络

+

网络部分完成了网络的组网操作,PaddleOCR将网络划分为四部分,这一部分在ppocr/modeling下。 进入网络的数据将按照顺序(transforms->backbones-> +necks->heads)依次通过这四个部分。

+
1
+2
+3
+4
+5
├── architectures # 网络的组网代码
+├── transforms    # 网络的图像变换模块
+├── backbones     # 网络的特征提取模块
+├── necks         # 网络的特征增强模块
+└── heads         # 网络的输出模块
+
+

PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的常用模块,对于没有内置的模块可通过如下步骤添加,四个部分添加步骤一致,以backbones为例:

+
    +
  1. ppocr/modeling/backbones 文件夹下新建文件,如my_backbone.py。
  2. +
  3. +

    在 my_backbone.py 文件内添加相关代码,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    +14
    +15
    import paddle
    +import paddle.nn as nn
    +import paddle.nn.functional as F
    +
    +
    +class MyBackbone(nn.Layer):
    +    def __init__(self, *args, **kwargs):
    +        super(MyBackbone, self).__init__()
    +        # your init code
    +        self.conv = nn.xxxx
    +
    +    def forward(self, inputs):
    +        # your network forward
    +        y = self.conv(inputs)
    +        return y
    +
    +
  4. +
  5. +

    ppocr/modeling/backbones/_init_.py文件内导入添加的模块。

    +
  6. +
+

在完成网络的四部分模块添加之后,只需要配置文件中进行配置即可使用,如:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
Architecture:
+  model_type: rec
+  algorithm: CRNN
+  Transform:
+    name: MyTransform
+    args1: args1
+    args2: args2
+  Backbone:
+    name: MyBackbone
+    args1: args1
+  Neck:
+    name: MyNeck
+    args1: args1
+  Head:
+    name: MyHead
+    args1: args1
+
+

3. 后处理

+

后处理实现解码网络输出获得文本框或者识别到的文字。这一部分在ppocr/postprocess下。 +PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的后处理模块,对于没有内置的组件可通过如下步骤添加:

+
    +
  1. ppocr/postprocess 文件夹下新建文件,如 my_postprocess.py。
  2. +
  3. +

    在 my_postprocess.py 文件内添加相关代码,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    +14
    +15
    +16
    +17
    +18
    +19
    +20
    +21
    +22
    +23
    +24
    +25
    +26
    import paddle
    +
    +
    +class MyPostProcess:
    +    def __init__(self, *args, **kwargs):
    +        # your init code
    +        pass
    +
    +    def __call__(self, preds, label=None, *args, **kwargs):
    +        if isinstance(preds, paddle.Tensor):
    +            preds = preds.numpy()
    +        # you preds decode code
    +        preds = self.decode_preds(preds)
    +        if label is None:
    +            return preds
    +        # you label decode code
    +        label = self.decode_label(label)
    +        return preds, label
    +
    +    def decode_preds(self, preds):
    +        # you preds decode code
    +        pass
    +
    +    def decode_label(self, preds):
    +        # you label decode code
    +        pass
    +
    +
  4. +
  5. +

    ppocr/postprocess/_init_.py文件内导入添加的模块。

    +
  6. +
+

在后处理模块添加之后,只需要配置文件中进行配置即可使用,如:

+
1
+2
+3
+4
PostProcess:
+  name: MyPostProcess
+  args1: args1
+  args2: args2
+
+

4. 损失函数

+

损失函数用于计算网络输出和label之间的距离。这一部分在ppocr/losses下。 +PaddleOCR内置了DB,EAST,SAST,CRNN和Attention等算法相关的损失函数模块,对于没有内置的模块可通过如下步骤添加:

+
    +
  1. ppocr/losses 文件夹下新建文件,如 my_loss.py。
  2. +
  3. +

    在 my_loss.py 文件内添加相关代码,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    +14
    +15
    import paddle
    +from paddle import nn
    +
    +
    +class MyLoss(nn.Layer):
    +    def __init__(self, **kwargs):
    +        super(MyLoss, self).__init__()
    +        # you init code
    +        pass
    +
    +    def __call__(self, predicts, batch):
    +        label = batch[1]
    +        # your loss code
    +        loss = self.loss(input=predicts, label=label)
    +        return {'loss': loss}
    +
    +
  4. +
  5. +

    ppocr/losses/_init_.py文件内导入添加的模块。

    +
  6. +
+

在损失函数添加之后,只需要配置文件中进行配置即可使用,如:

+
1
+2
+3
+4
Loss:
+  name: MyLoss
+  args1: args1
+  args2: args2
+
+

5. 指标评估

+

指标评估用于计算网络在当前batch上的性能。这一部分在ppocr/metrics下。 PaddleOCR内置了检测,分类和识别等算法相关的指标评估模块,对于没有内置的模块可通过如下步骤添加:

+
    +
  1. ppocr/metrics 文件夹下新建文件,如my_metric.py。
  2. +
  3. +

    在 my_metric.py 文件内添加相关代码,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    +14
    +15
    +16
    +17
    +18
    +19
    +20
    +21
    +22
    +23
    +24
    +25
    +26
    +27
    +28
    +29
    +30
    +31
    +32
    class MyMetric(object):
    +    def __init__(self, main_indicator='acc', **kwargs):
    +        # main_indicator is used for select best model
    +        self.main_indicator = main_indicator
    +        self.reset()
    +
    +    def __call__(self, preds, batch, *args, **kwargs):
    +        # preds is out of postprocess
    +        # batch is out of dataloader
    +        labels = batch[1]
    +        cur_correct_num = 0
    +        cur_all_num = 0
    +        # you metric code
    +        self.correct_num += cur_correct_num
    +        self.all_num += cur_all_num
    +        return {'acc': cur_correct_num / cur_all_num, }
    +
    +    def get_metric(self):
    +        """
    +        return metrics {
    +                'acc': 0,
    +                'norm_edit_dis': 0,
    +            }
    +        """
    +        acc = self.correct_num / self.all_num
    +        self.reset()
    +        return {'acc': acc}
    +
    +    def reset(self):
    +        # reset metric
    +        self.correct_num = 0
    +        self.all_num = 0
    +
    +
  4. +
  5. +

    ppocr/metrics/_init_.py文件内导入添加的模块。

    +
  6. +
+

在指标评估模块添加之后,只需要配置文件中进行配置即可使用,如:

+
1
+2
+3
Metric:
+  name: MyMetric
+  main_indicator: acc
+
+

6. 优化器

+

优化器用于训练网络。优化器内部还包含了网络正则化和学习率衰减模块。 这一部分在ppocr/optimizer下。 PaddleOCR内置了Momentum,Adam +和RMSProp等常用的优化器模块,Linear,Cosine,StepPiecewise等常用的正则化模块与L1DecayL2Decay等常用的学习率衰减模块。 +对于没有内置的模块可通过如下步骤添加,以optimizer为例:

+
    +
  1. +

    ppocr/optimizer/optimizer.py 文件内创建自己的优化器,示例代码如下:

    +
     1
    + 2
    + 3
    + 4
    + 5
    + 6
    + 7
    + 8
    + 9
    +10
    +11
    +12
    +13
    from paddle import optimizer as optim
    +
    +
    +class MyOptim(object):
    +    def __init__(self, learning_rate=0.001, *args, **kwargs):
    +        self.learning_rate = learning_rate
    +
    +    def __call__(self, parameters):
    +        # It is recommended to wrap the built-in optimizer of paddle
    +        opt = optim.XXX(
    +            learning_rate=self.learning_rate,
    +            parameters=parameters)
    +        return opt
    +
    +
  2. +
+

在优化器模块添加之后,只需要配置文件中进行配置即可使用,如:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
Optimizer:
+  name: MyOptim
+  args1: args1
+  args2: args2
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  regularizer:
+    name: 'L2'
+    factor: 0
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/end_to_end/algorithm_e2e_pgnet.html b/algorithm/end_to_end/algorithm_e2e_pgnet.html new file mode 100644 index 0000000000..6dfe8cb436 --- /dev/null +++ b/algorithm/end_to_end/algorithm_e2e_pgnet.html @@ -0,0 +1,5672 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PGNet - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

PGNet

+ +

一、简介

+

OCR算法可以分为两阶段算法和端对端的算法。二阶段OCR算法一般分为两个部分,文本检测和文本识别算法,文件检测算法从图像中得到文本行的检测框,然后识别算法去识别文本框中的内容。而端对端OCR算法可以在一个算法中完成文字检测和文字识别,其基本思想是设计一个同时具有检测单元和识别模块的模型,共享其中两者的CNN特征,并联合训练。由于一个算法即可完成文字识别,端对端模型更小,速度更快。

+

PGNet算法介绍

+

近些年来,端对端OCR算法得到了良好的发展,包括MaskTextSpotter系列、TextSnake、TextDragon、PGNet系列等算法。在这些算法中,PGNet算法具备其他算法不具备的优势,包括:

+
    +
  • 设计PGNet loss指导训练,不需要字符级别的标注
  • +
  • 不需要NMS和ROI相关操作,加速预测
  • +
  • 提出预测文本行内的阅读顺序模块;
  • +
  • 提出基于图的修正模块(GRM)来进一步提高模型识别性能
  • +
  • 精度更高,预测速度更快
  • +
+

PGNet算法细节详见论文 ,算法原理图如下所示:

+

+

输入图像经过特征提取送入四个分支,分别是:文本边缘偏移量预测TBO模块,文本中心线预测TCL模块,文本方向偏移量预测TDO模块,以及文本字符分类图预测TCC模块。 +其中TBO以及TCL的输出经过后处理后可以得到文本的检测结果,TCL、TDO、TCC负责文本识别。

+

其检测识别效果图如下:

+

+

+

性能指标

+

测试集: Total Text

+

测试环境: NVIDIA Tesla V100-SXM2-16GB

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PGNetAdet_precisiondet_recalldet_f_scoree2e_precisione2e_recalle2e_f_scoreFPS下载
Paper85.3086.8086.10--61.7038.20 (size=640)-
Ours87.0382.4884.6961.7158.4360.0348.73 (size=768)下载链接
+

note:PaddleOCR里的PGNet实现针对预测速度做了优化,在精度下降可接受范围内,可以显著提升端对端预测速度

+

二、环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目

+

三、快速使用

+

inference模型下载

+

本节以训练好的端到端模型为例,快速使用模型预测,首先下载训练好的端到端inference模型下载地址

+
1
+2
+3
mkdir inference && cd inference
+# 下载英文端到端模型并解压
+wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/e2e_server_pgnetA_infer.tar && tar xf e2e_server_pgnetA_infer.tar
+
+
    +
  • windows 环境下如果没有安装wget,下载模型时可将链接复制到浏览器中下载,并解压放置在相应目录下
  • +
+

解压完毕后应有如下文件结构:

+
1
+2
+3
+4
├── e2e_server_pgnetA_infer
+│   ├── inference.pdiparams
+│   ├── inference.pdiparams.info
+│   └── inference.pdmodel
+
+

单张图像或者图像集合预测

+
1
+2
+3
+4
+5
+6
+7
+8
# 预测image_dir指定的单张图像
+python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img623.jpg" --e2e_model_dir="./inference/e2e_server_pgnetA_infer/" --e2e_pgnet_valid_set="totaltext"
+
+# 预测image_dir指定的图像集合
+python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/" --e2e_model_dir="./inference/e2e_server_pgnetA_infer/" --e2e_pgnet_valid_set="totaltext"
+
+# 如果想使用CPU进行预测,需设置use_gpu参数为False
+python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img623.jpg" --e2e_model_dir="./inference/e2e_server_pgnetA_infer/" --e2e_pgnet_valid_set="totaltext" --use_gpu=False
+
+

可视化结果

+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下:

+

+

四、模型训练、评估、推理

+

本节以totaltext数据集为例,介绍PaddleOCR中端到端模型的训练、评估与测试。

+

准备数据

+

下载解压totaltext 数据集到PaddleOCR/train_data/目录,数据集组织结构:

+
1
+2
+3
+4
+5
/PaddleOCR/train_data/total_text/train/
+  |- rgb/            # total_text数据集的训练数据
+      |- img11.jpg
+      | ...
+  |- train.txt       # total_text数据集的训练标注
+
+

train.txt标注文件格式如下,文件名和标注信息中间用"\t"分隔:

+
1
+2
" 图像文件名                    json.dumps编码的图像标注信息"
+rgb/img11.jpg    [{"transcription": "ASRAMA", "points": [[214.0, 325.0], [235.0, 308.0], [259.0, 296.0], [286.0, 291.0], [313.0, 295.0], [338.0, 305.0], [362.0, 320.0], [349.0, 347.0], [330.0, 337.0], [310.0, 329.0], [290.0, 324.0], [269.0, 328.0], [249.0, 336.0], [231.0, 346.0]]}, {...}]
+
+

json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 points 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 +transcription 表示当前文本框的文字,当其内容为“###”时,表示该文本框无效,在训练时会跳过。 +如果您想在其他数据集上训练,可以按照上述形式构建标注文件。

+

启动训练

+

PGNet训练分为两个步骤:step1: 在合成数据上训练,得到预训练模型,此时模型精度依然较低;step2: 加载预训练模型,在totaltext数据集上训练;为快速训练,我们直接提供了step1的预训练模型。

+
1
+2
+3
+4
+5
+6
+7
+8
+9
cd PaddleOCR/
+# 下载step1 预训练模型
+wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/train_step1.tar
+
+# 可以得到以下的文件格式
+./pretrain_models/train_step1/
+  └─ best_accuracy.pdopt
+  └─ best_accuracy.states
+  └─ best_accuracy.pdparams
+
+

如果您安装的是cpu版本,请将配置文件中的 use_gpu 字段修改为false

+
1
+2
+3
+4
# 单机单卡训练 e2e 模型
+python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrained_model=./pretrain_models/train_step1/best_accuracy Global.load_static_weights=False
+# 单机多卡训练,通过 --gpus 参数设置使用的GPU ID
+python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrained_model=./pretrain_models/train_step1/best_accuracy  Global.load_static_weights=False
+
+

上述指令中,通过-c 选择训练使用configs/e2e/e2e_r50_vd_pg.yml配置文件。 +有关配置文件的详细解释,请参考链接

+

您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001

+
python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Optimizer.base_lr=0.0001
+
+

断点训练

+

如果训练程序中断,如果希望加载训练中断的模型从而恢复训练,可以通过指定Global.checkpoints指定要加载的模型路径:

+
python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.checkpoints=./your/trained/model
+
+

注意Global.checkpoints的优先级高于Global.pretrain_weights的优先级,即同时指定两个参数时,优先加载Global.checkpoints指定的模型,如果Global.checkpoints指定的模型路径有误,会加载Global.pretrain_weights指定的模型。

+

PaddleOCR计算三个OCR端到端相关的指标,分别是:Precision、Recall、Hmean。

+

运行如下代码,根据配置文件e2e_r50_vd_pg.ymlsave_res_path指定的测试集检测结果文件,计算评估指标。

+

评估时设置后处理参数max_side_len=768,使用不同数据集、不同模型训练,可调整参数进行优化 +训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。

+
python3 tools/eval.py -c configs/e2e/e2e_r50_vd_pg.yml  -o Global.checkpoints="{path/to/weights}/best_accuracy"
+
+

模型预测

+

测试单张图像的端到端识别效果

+
python3 tools/infer_e2e.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.infer_img="./doc/imgs_en/img_10.jpg" Global.pretrained_model="./output/e2e_pgnet/best_accuracy" Global.load_static_weights=false
+
+

测试文件夹下所有图像的端到端识别效果

+
python3 tools/infer_e2e.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./output/e2e_pgnet/best_accuracy" Global.load_static_weights=false
+
+

预测推理

+

(1). 四边形文本检测模型(ICDAR2015)

+

首先将PGNet端到端训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,以英文数据集训练的模型为例模型下载地址 ,可以使用如下命令进行转换:

+
1
+2
wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/pgnet/en_server_pgnetA.tar && tar xf en_server_pgnetA.tar
+python3 tools/export_model.py -c configs/e2e/e2e_r50_vd_pg.yml -o Global.pretrained_model=./en_server_pgnetA/best_accuracy Global.load_static_weights=False Global.save_inference_dir=./inference/e2e
+
+

PGNet端到端模型推理,需要设置参数--e2e_algorithm="PGNet" and --e2e_pgnet_valid_set="partvgg",可以执行如下命令:

+
python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img_10.jpg" --e2e_model_dir="./inference/e2e/"  --e2e_pgnet_valid_set="partvgg" --e2e_pgnet_valid_set="totaltext"
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下:

+

+

(2). 弯曲文本检测模型(Total-Text)

+

对于弯曲文本样例

+

PGNet端到端模型推理,需要设置参数--e2e_algorithm="PGNet",同时,还需要增加参数--e2e_pgnet_valid_set="totaltext"可以执行如下命令:

+
python3 tools/infer/predict_e2e.py --e2e_algorithm="PGNet" --image_dir="./doc/imgs_en/img623.jpg" --e2e_model_dir="./inference/e2e/" --e2e_pgnet_valid_set="totaltext"
+
+

可视化文本端到端结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'e2e_res'。结果示例如下:

+

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/end_to_end/images/e2e_res_img293_pgnet.png b/algorithm/end_to_end/images/e2e_res_img293_pgnet.png new file mode 100644 index 0000000000..232f8293ad Binary files /dev/null and b/algorithm/end_to_end/images/e2e_res_img293_pgnet.png differ diff --git a/algorithm/end_to_end/images/e2e_res_img295_pgnet.png b/algorithm/end_to_end/images/e2e_res_img295_pgnet.png new file mode 100644 index 0000000000..69337e3adf Binary files /dev/null and b/algorithm/end_to_end/images/e2e_res_img295_pgnet.png differ diff --git a/algorithm/end_to_end/images/e2e_res_img623_pgnet.jpg b/algorithm/end_to_end/images/e2e_res_img623_pgnet.jpg new file mode 100644 index 0000000000..b45dc05f7b Binary files /dev/null and b/algorithm/end_to_end/images/e2e_res_img623_pgnet.jpg differ diff --git a/algorithm/end_to_end/images/e2e_res_img_10_pgnet.jpg b/algorithm/end_to_end/images/e2e_res_img_10_pgnet.jpg new file mode 100644 index 0000000000..a0962993f8 Binary files /dev/null and b/algorithm/end_to_end/images/e2e_res_img_10_pgnet.jpg differ diff --git a/algorithm/end_to_end/images/pgnet_framework.png b/algorithm/end_to_end/images/pgnet_framework.png new file mode 100644 index 0000000000..88fbca3947 Binary files /dev/null and b/algorithm/end_to_end/images/pgnet_framework.png differ diff --git a/algorithm/formula_recognition/algorithm_rec_can.html b/algorithm/formula_recognition/algorithm_rec_can.html new file mode 100644 index 0000000000..91006219bb --- /dev/null +++ b/algorithm/formula_recognition/algorithm_rec_can.html @@ -0,0 +1,5514 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CAN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

手写数学公式识别算法-CAN

+

1. 算法简介

+

论文信息:

+
+

When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition +Bohan Li, Ye Yuan, Dingkang Liang, Xiao Liu, Zhilong Ji, Jinfeng Bai, Wenyu Liu, Xiang Bai +ECCV, 2022

+
+

CAN使用CROHME手写公式数据集进行训练,在对应测试集上的精度如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件ExpRate下载链接
CANDenseNetrec_d28_can.yml51.72%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练CAN识别模型时需要更换配置文件CAN配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_d28_can.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_d28_can.yml
+
+

注意:

+
    +
  • 我们提供的数据集,即CROHME数据集将手写公式存储为黑底白字的格式,若您自行准备的数据集与之相反,即以白底黑字模式存储,请在训练时做出如下修改
  • +
+
python3 tools/train.py -c configs/rec/rec_d28_can.yml -o Train.dataset.transforms.GrayImageChannelFormat.inverse=False
+
+
    +
  • 默认每训练1个epoch(1105次iteration)进行1次评估,若您更改训练的batch_size,或更换数据集,请在训练时作出如下修改
  • +
+
python3 tools/train.py -c configs/rec/rec_d28_can.yml -o Global.eval_batch_step=[0, {length_of_dataset//batch_size}]
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。若使用自行训练保存的模型,请注意修改路径和文件名为{path/to/weights}/{model_name}。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_d28_can.yml -o Global.pretrained_model=./rec_d28_can_train/best_accuracy.pdparams
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
+4
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_d28_can.yml -o Architecture.Head.attdecoder.is_train=False Global.infer_img='./doc/datasets/crohme_demo/hme_00.jpg' Global.pretrained_model=./rec_d28_can_train/best_accuracy.pdparams
+
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/datasets/crohme_demo/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址 ),可以使用如下命令进行转换:

+
1
+2
+3
+4
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_d28_can.yml -o Global.pretrained_model=./rec_d28_can_train/best_accuracy.pdparams Global.save_inference_dir=./inference/rec_d28_can/ Architecture.Head.attdecoder.is_train=False
+
+# 目前的静态图模型默认的输出长度最大为36,如果您需要预测更长的序列,请在导出模型时指定其输出序列为合适的值,例如 Architecture.Head.max_text_length=72
+
+

注意: +如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。

+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_d28_can/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
+3
+4
+5
python3 tools/infer/predict_rec.py --image_dir="./doc/datasets/crohme_demo/hme_00.jpg" --rec_algorithm="CAN" --rec_batch_num=1 --rec_model_dir="./inference/rec_d28_can/" --rec_char_dict_path="./ppocr/utils/dict/latex_symbol_dict.txt"
+
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/datasets/crohme_demo/'。
+
+# 如果您需要在白底黑字的图片上进行预测,请设置 --rec_image_inverse=False
+
+

测试图片样例

+

执行命令后,上面图像的预测结果(识别的文本)会打印到屏幕上,示例如下:

+
Predicts of ./doc/imgs_hme/hme_00.jpg:['x _ { k } x x _ { k } + y _ { k } y x _ { k }', []]
+
+

注意

+
    +
  • 需要注意预测图像为黑底白字,即手写公式部分为白色,背景为黑色的图片。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中CAN的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持CAN,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. CROHME数据集来自于CAN源repo
  2. +
+

引用

+
@misc{https://doi.org/10.48550/arxiv.2207.11463,
+  doi = {10.48550/ARXIV.2207.11463},
+  url = {https://arxiv.org/abs/2207.11463},
+  author = {Li, Bohan and Yuan, Ye and Liang, Dingkang and Liu, Xiao and Ji, Zhilong and Bai, Jinfeng and Liu, Wenyu and Bai, Xiang},
+  keywords = {Computer Vision and Pattern Recognition (cs.CV), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences},
+  title = {When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition},
+  publisher = {arXiv},
+  year = {2022},
+  copyright = {arXiv.org perpetual, non-exclusive license}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/formula_recognition/algorithm_rec_latex_ocr.html b/algorithm/formula_recognition/algorithm_rec_latex_ocr.html new file mode 100644 index 0000000000..428aec3ec5 --- /dev/null +++ b/algorithm/formula_recognition/algorithm_rec_latex_ocr.html @@ -0,0 +1,5455 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LaTeX-OCR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

印刷数学公式识别算法-LaTeX-OCR

+

1. 算法简介

+

原始项目:

+
+

https://github.com/lukas-blecher/LaTeX-OCR

+
+

LaTeX-OCR使用LaTeX-OCR印刷公式数据集进行训练,在对应测试集上的精度如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件BLEU scorenormed edit distanceExpRate下载链接
LaTeX-OCRHybrid ViTrec_latex_ocr.yml0.88210.082340.01%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

此外,需要安装额外的依赖: +

pip install -r docs/algorithm/formula_recognition/requirements.txt
+

+

3. 模型训练、评估、预测

+

3.1 pickle 标签文件生成

+

谷歌云盘中下载 formulae.zip 和 math.txt,之后,使用如下命令,生成 pickle 标签文件。

+
# 创建 LaTeX-OCR 数据集目录
+mkdir -p train_data/LaTeXOCR
+# 解压formulae.zip ,并拷贝math.txt
+unzip -d train_data/LaTeXOCR path/formulae.zip
+cp path/math.txt train_data/LaTeXOCR
+# 将原始的 .txt 文件转换为 .pkl 文件,从而对不同尺度的图像进行分组
+# 训练集转换
+python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/train --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
+# 验证集转换
+python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/val --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
+# 测试集转换
+python ppocr/utils/formula_utils/math_txt2pkl.py --image_dir=train_data/LaTeXOCR/test --mathtxt_path=train_data/LaTeXOCR/math.txt --output_dir=train_data/LaTeXOCR/
+
+

3.2 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练LaTeX-OCR识别模型时需要更换配置文件LaTeX-OCR配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下: +

#单卡训练 (默认训练方式)
+python3 tools/train.py -c configs/rec/rec_latex_ocr.yml
+#多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_latex_ocr.yml
+

+

注意:

+
    +
  • 默认每训练22个epoch(60000次iteration)进行1次评估,若您更改训练的batch_size,或更换数据集,请在训练时作出如下修改 +
    python3 tools/train.py -c configs/rec/rec_latex_ocr.yml -o Global.eval_batch_step=[0,{length_of_dataset//batch_size*22}]
    +
  • +
+

3.3 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
# 注意将pretrained_model的路径设置为本地路径。若使用自行训练保存的模型,请注意修改路径和文件名为{path/to/weights}/{model_name}。
+# 验证集评估
+python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
+# 测试集评估
+python3 tools/eval.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Eval.dataset.data_dir=./train_data/LaTeXOCR/test Eval.dataset.data=./train_data/LaTeXOCR/latexocr_test.pkl
+
+

3.4 预测

+

使用如下命令进行单张图片预测: +

# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_latex_ocr.yml  -o  Global.infer_img='./docs/datasets/images/pme_demo/0000013.png' Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/datasets/pme_demo/'。
+

+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址),可以使用如下命令进行转换:

+

# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_latex_ocr.yml -o Global.pretrained_model=./rec_latex_ocr_train/best_accuracy.pdparams Global.save_inference_dir=./inference/rec_latex_ocr_infer/ 
+
+# 目前的静态图模型支持的最大输出长度为512
+
+注意: +- 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请检查配置文件中的rec_char_dict_path是否为所需要的字典文件。 +- 转换后模型下载地址

+

转换成功后,在目录下有三个文件: +

/inference/rec_latex_ocr_infer/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+

+

执行如下命令进行模型推理:

+

python3 tools/infer/predict_rec.py --image_dir='./docs/datasets/images/pme_demo/0000295.png' --rec_algorithm="LaTeXOCR" --rec_batch_num=1 --rec_model_dir="./inference/rec_latex_ocr_infer/"  --rec_char_dict_path="./ppocr/utils/dict/latex_ocr_tokenizer.json"
+
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/datasets/pme_demo/'。
+

+

测试图片样例

+

执行命令后,上面图像的预测结果(识别的文本)会打印到屏幕上,示例如下: +

Predicts of ./doc/datasets/pme_demo/0000295.png:\zeta_{0}(\nu)=-{\frac{\nu\varrho^{-2\nu}}{\pi}}\int_{\mu}^{\infty}d\omega\int_{C_{+}}d z{\frac{2z^{2}}{(z^{2}+\omega^{2})^{\nu+1}}}{\tilde{\Psi}}(\omega;z)e^{i\epsilon z}~~~,
+

+

注意

+
    +
  • 需要注意预测图像为白底黑字,即手写公式部分为黑色,背景为白色的图片。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中 LaTeX-OCR 的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持 LaTeX-OCR,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. LaTeX-OCR 数据集来自于LaTeXOCR源repo
  2. +
+ + + + + + + + + + + + + + + + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/formula_recognition/images/hme_00.jpg b/algorithm/formula_recognition/images/hme_00.jpg new file mode 100644 index 0000000000..66ff27db26 Binary files /dev/null and b/algorithm/formula_recognition/images/hme_00.jpg differ diff --git a/algorithm/formula_recognition/requirements.txt b/algorithm/formula_recognition/requirements.txt new file mode 100644 index 0000000000..d2f0e739bf --- /dev/null +++ b/algorithm/formula_recognition/requirements.txt @@ -0,0 +1,2 @@ +tokenizers==0.19.1 +imagesize diff --git a/algorithm/kie/algorithm_kie_layoutxlm.html b/algorithm/kie/algorithm_kie_layoutxlm.html new file mode 100644 index 0000000000..834719e333 --- /dev/null +++ b/algorithm/kie/algorithm_kie_layoutxlm.html @@ -0,0 +1,5495 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + LayoutLM - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

关键信息抽取算法-LayoutXLM

+

1. 算法简介

+

论文信息:

+
+

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

+

Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei

+

2021

+
+

在XFUND_zh数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络任务配置文件hmean下载链接
LayoutXLMLayoutXLM-baseSERser_layoutxlm_xfund_zh.yml90.38%训练模型/推理模型
LayoutXLMLayoutXLM-baseREre_layoutxlm_xfund_zh.yml74.83%训练模型/推理模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考关键信息抽取教程。PaddleOCR对代码进行了模块化,训练不同的关键信息抽取模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

SER

+

首先将训练得到的模型转换成inference model。LayoutXLM模型在XFUND_zh数据集上训练的模型为例(模型下载地址),可以使用下面的命令进行转换。

+
wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar
+tar -xf ser_LayoutXLM_xfun_zh.tar
+python3 tools/export_model.py -c configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_LayoutXLM_xfun_zh Global.save_inference_dir=./inference/ser_layoutxlm_infer
+
+

LayoutXLM模型基于SER任务进行推理,可以执行如下命令:

+
1
+2
+3
+4
+5
+6
+7
cd ppstructure
+python3 kie/predict_kie_token_ser.py \
+  --kie_algorithm=LayoutXLM \
+  --ser_model_dir=../inference/ser_layoutxlm_infer \
+  --image_dir=./docs/kie/input/zh_val_42.jpg \
+  --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
+  --vis_font_path=../doc/fonts/simfang.ttf
+
+

SER可视化结果默认保存到./output文件夹里面,结果示例如下:

+

+

RE

+

首先将训练得到的模型转换成inference model。LayoutXLM模型在XFUND_zh数据集上训练的模型为例(模型下载地址),可以使用下面的命令进行转换。

+
wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar
+tar -xf re_LayoutXLM_xfun_zh.tar
+python3 tools/export_model.py -c configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./re_LayoutXLM_xfun_zh Global.save_inference_dir=./inference/ser_layoutxlm_infer
+
+

LayoutXLM模型基于RE任务进行推理,可以执行如下命令:

+
1
+2
+3
+4
+5
+6
+7
+8
cd ppstructure
+python3 kie/predict_kie_token_ser_re.py \
+  --kie_algorithm=LayoutXLM \
+  --re_model_dir=../inference/re_layoutxlm_infer \
+  --ser_model_dir=../inference/ser_layoutxlm_infer \
+  --image_dir=./docs/kie/input/zh_val_42.jpg \
+  --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
+  --vis_font_path=../doc/fonts/simfang.ttf
+
+

RE可视化结果默认保存到./output文件夹里面,结果示例如下:

+

+

4.2 C++推理部署

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{DBLP:journals/corr/abs-2104-08836,
+  author    = {Yiheng Xu and
+               Tengchao Lv and
+               Lei Cui and
+               Guoxin Wang and
+               Yijuan Lu and
+               Dinei Flor{\^{e}}ncio and
+               Cha Zhang and
+               Furu Wei},
+  title     = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich
+               Document Understanding},
+  journal   = {CoRR},
+  volume    = {abs/2104.08836},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2104.08836},
+  eprinttype = {arXiv},
+  eprint    = {2104.08836},
+  timestamp = {Thu, 14 Oct 2021 09:17:23 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-08836.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@article{DBLP:journals/corr/abs-1912-13318,
+  author    = {Yiheng Xu and
+               Minghao Li and
+               Lei Cui and
+               Shaohan Huang and
+               Furu Wei and
+               Ming Zhou},
+  title     = {LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
+  journal   = {CoRR},
+  volume    = {abs/1912.13318},
+  year      = {2019},
+  url       = {http://arxiv.org/abs/1912.13318},
+  eprinttype = {arXiv},
+  eprint    = {1912.13318},
+  timestamp = {Mon, 01 Jun 2020 16:20:46 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-1912-13318.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@article{DBLP:journals/corr/abs-2012-14740,
+  author    = {Yang Xu and
+               Yiheng Xu and
+               Tengchao Lv and
+               Lei Cui and
+               Furu Wei and
+               Guoxin Wang and
+               Yijuan Lu and
+               Dinei A. F. Flor{\^{e}}ncio and
+               Cha Zhang and
+               Wanxiang Che and
+               Min Zhang and
+               Lidong Zhou},
+  title     = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding},
+  journal   = {CoRR},
+  volume    = {abs/2012.14740},
+  year      = {2020},
+  url       = {https://arxiv.org/abs/2012.14740},
+  eprinttype = {arXiv},
+  eprint    = {2012.14740},
+  timestamp = {Tue, 27 Jul 2021 09:53:52 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2012-14740.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/kie/algorithm_kie_sdmgr.html b/algorithm/kie/algorithm_kie_sdmgr.html new file mode 100644 index 0000000000..20514ab056 --- /dev/null +++ b/algorithm/kie/algorithm_kie_sdmgr.html @@ -0,0 +1,5429 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SDMGR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

关键信息抽取算法-SDMGR

+

1. 算法简介

+

论文信息:

+
+

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

+

Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang

+

2021

+
+

在wildreceipt发票公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件hmean下载链接
SDMGRVGG6configs/kie/sdmgr/kie_unet_sdmgr.yml86.70%训练模型/推理模型(coming soon)
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

SDMGR是一个关键信息提取算法,将每个检测到的文本区域分类为预定义的类别,如订单ID、发票号码,金额等。

+

训练和测试的数据采用wildreceipt数据集,通过如下指令下载数据集:

+
wget https://paddleocr.bj.bcebos.com/ppstructure/dataset/wildreceipt.tar && tar xf wildreceipt.tar
+
+

创建数据集软链到PaddleOCR/train_data目录下:

+
1
+2
cd PaddleOCR/ && mkdir train_data && cd train_data
+ln -s ../../wildreceipt ./
+
+

3.1 模型训练

+

训练采用的配置文件是configs/kie/sdmgr/kie_unet_sdmgr.yml,配置文件中默认训练数据路径是train_data/wildreceipt,准备好数据后,可以通过如下指令执行训练:

+
python3 tools/train.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.save_model_dir=./output/kie/
+
+

3.2 模型评估

+

执行下面的命令进行模型评估

+
python3 tools/eval.py -c configs/kie/sdmgr/kie_unet_sdmgr.yml -o Global.checkpoints=./output/kie/best_accuracy
+
+

输出信息示例如下所示:

+
1
+2
+3
[2022/08/10 05:22:23] ppocr INFO: metric eval ***************
+[2022/08/10 05:22:23] ppocr INFO: hmean:0.8670120239257812
+[2022/08/10 05:22:23] ppocr INFO: fps:10.18816520530961
+
+

3.3 模型预测

+

执行下面的命令进行模型预测,预测的时候需要预先加载存储图片路径以及OCR信息的文本文件,使用Global.infer_img进行指定。

+
python3 tools/infer_kie.py -c configs/kie/kie_unet_sdmgr.yml -o Global.checkpoints=kie_vgg16/best_accuracy  Global.infer_img=./train_data/wildreceipt/1.txt
+
+

执行预测后的结果保存在./output/sdmgr_kie/predicts_kie.txt文件中,可视化结果保存在/output/sdmgr_kie/kie_results/目录下。

+

可视化结果如下图所示:

+

img

+

4. 推理部署

+

4.1 Python推理

+

暂不支持

+

4.2 C++推理部署

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@misc{sun2021spatial,
+      title={Spatial Dual-Modality Graph Reasoning for Key Information Extraction},
+      author={Hongbin Sun and Zhanghui Kuang and Xiaoyu Yue and Chenhao Lin and Wayne Zhang},
+      year={2021},
+      eprint={2103.14470},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/kie/algorithm_kie_vi_layoutxlm.html b/algorithm/kie/algorithm_kie_vi_layoutxlm.html new file mode 100644 index 0000000000..fb684e9169 --- /dev/null +++ b/algorithm/kie/algorithm_kie_vi_layoutxlm.html @@ -0,0 +1,5496 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + VI-LayoutXLM - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

关键信息抽取算法-VI-LayoutXLM

+

1. 算法简介

+

VI-LayoutXLM基于LayoutXLM进行改进,在下游任务训练过程中,去除视觉骨干网络模块,最终精度基本无损的情况下,模型推理速度进一步提升。

+

在XFUND_zh数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络任务配置文件hmean下载链接
VI-LayoutXLMVI-LayoutXLM-baseSERser_vi_layoutxlm_xfund_zh_udml.yml93.19%训练模型/推理模型
VI-LayoutXLMVI-LayoutXLM-baseREre_vi_layoutxlm_xfund_zh_udml.yml83.92%训练模型/推理模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考关键信息抽取教程。PaddleOCR对代码进行了模块化,训练不同的关键信息抽取模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

SER

+

首先将训练得到的模型转换成inference model。以VI-LayoutXLM模型在XFUND_zh数据集上训练的模型为例(模型下载地址),可以使用下面的命令进行转换。

+
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/ser_vi_layoutxlm_xfund_pretrained.tar
+tar -xf ser_vi_layoutxlm_xfund_pretrained.tar
+python3 tools/export_model.py -c configs/kie/vi_layoutxlm/ser_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_vi_layoutxlm_xfund_pretrained/best_accuracy Global.save_inference_dir=./inference/ser_vi_layoutxlm_infer
+
+

VI-LayoutXLM模型基于SER任务进行推理,可以执行如下命令:

+
1
+2
+3
+4
+5
+6
+7
+8
cd ppstructure
+python3 kie/predict_kie_token_ser.py \
+  --kie_algorithm=LayoutXLM \
+  --ser_model_dir=../inference/ser_vi_layoutxlm_infer \
+  --image_dir=./docs/kie/input/zh_val_42.jpg \
+  --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
+  --vis_font_path=../doc/fonts/simfang.ttf \
+  --ocr_order_method="tb-yx"
+
+

SER可视化结果默认保存到./output文件夹里面,结果示例如下:

+

+

RE

+

首先将训练得到的模型转换成inference model。以VI-LayoutXLM模型在XFUND_zh数据集上训练的模型为例(模型下载地址),可以使用下面的命令进行转换。

+
wget https://paddleocr.bj.bcebos.com/ppstructure/models/vi_layoutxlm/re_vi_layoutxlm_xfund_pretrained.tar
+tar -xf re_vi_layoutxlm_xfund_pretrained.tar
+python3 tools/export_model.py -c configs/kie/vi_layoutxlm/re_vi_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./re_vi_layoutxlm_xfund_pretrained/best_accuracy Global.save_inference_dir=./inference/re_vi_layoutxlm_infer
+
+

VI-LayoutXLM模型基于RE任务进行推理,可以执行如下命令:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
cd ppstructure
+python3 kie/predict_kie_token_ser_re.py \
+  --kie_algorithm=LayoutXLM \
+  --re_model_dir=../inference/re_vi_layoutxlm_infer \
+  --ser_model_dir=../inference/ser_vi_layoutxlm_infer \
+  --use_visual_backbone=False \
+  --image_dir=./docs/kie/input/zh_val_42.jpg \
+  --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \
+  --vis_font_path=../doc/fonts/simfang.ttf \
+  --ocr_order_method="tb-yx"
+
+

RE可视化结果默认保存到./output文件夹里面,结果示例如下:

+

+

4.2 C++推理部署

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{DBLP:journals/corr/abs-2104-08836,
+  author    = {Yiheng Xu and
+               Tengchao Lv and
+               Lei Cui and
+               Guoxin Wang and
+               Yijuan Lu and
+               Dinei Flor{\^{e}}ncio and
+               Cha Zhang and
+               Furu Wei},
+  title     = {LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich
+               Document Understanding},
+  journal   = {CoRR},
+  volume    = {abs/2104.08836},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2104.08836},
+  eprinttype = {arXiv},
+  eprint    = {2104.08836},
+  timestamp = {Thu, 14 Oct 2021 09:17:23 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-08836.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@article{DBLP:journals/corr/abs-1912-13318,
+  author    = {Yiheng Xu and
+               Minghao Li and
+               Lei Cui and
+               Shaohan Huang and
+               Furu Wei and
+               Ming Zhou},
+  title     = {LayoutLM: Pre-training of Text and Layout for Document Image Understanding},
+  journal   = {CoRR},
+  volume    = {abs/1912.13318},
+  year      = {2019},
+  url       = {http://arxiv.org/abs/1912.13318},
+  eprinttype = {arXiv},
+  eprint    = {1912.13318},
+  timestamp = {Mon, 01 Jun 2020 16:20:46 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-1912-13318.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+@article{DBLP:journals/corr/abs-2012-14740,
+  author    = {Yang Xu and
+               Yiheng Xu and
+               Tengchao Lv and
+               Lei Cui and
+               Furu Wei and
+               Guoxin Wang and
+               Yijuan Lu and
+               Dinei A. F. Flor{\^{e}}ncio and
+               Cha Zhang and
+               Wanxiang Che and
+               Min Zhang and
+               Lidong Zhou},
+  title     = {LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding},
+  journal   = {CoRR},
+  volume    = {abs/2012.14740},
+  year      = {2020},
+  url       = {https://arxiv.org/abs/2012.14740},
+  eprinttype = {arXiv},
+  eprint    = {2012.14740},
+  timestamp = {Tue, 27 Jul 2021 09:53:52 +0200},
+  biburl    = {https://dblp.org/rec/journals/corr/abs-2012-14740.bib},
+  bibsource = {dblp computer science bibliography, https://dblp.org}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/kie/images/sdmgr_result.png b/algorithm/kie/images/sdmgr_result.png new file mode 100644 index 0000000000..6fa4fe8be7 Binary files /dev/null and b/algorithm/kie/images/sdmgr_result.png differ diff --git a/algorithm/kie/images/zh_val_42_re.jpg b/algorithm/kie/images/zh_val_42_re.jpg new file mode 100644 index 0000000000..49a0fad352 Binary files /dev/null and b/algorithm/kie/images/zh_val_42_re.jpg differ diff --git a/algorithm/kie/images/zh_val_42_ser.jpg b/algorithm/kie/images/zh_val_42_ser.jpg new file mode 100644 index 0000000000..d69d83569b Binary files /dev/null and b/algorithm/kie/images/zh_val_42_ser.jpg differ diff --git a/algorithm/overview.html b/algorithm/overview.html new file mode 100644 index 0000000000..a7b2f18a65 --- /dev/null +++ b/algorithm/overview.html @@ -0,0 +1,5797 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 概述 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

前沿算法与模型

+

本文给出了PaddleOCR已支持的OCR算法列表,以及每个算法在英文公开数据集上的模型和指标,主要用于算法简介和算法性能对比,更多包括中文在内的其他数据集上的模型请参考PP-OCRv3 系列模型下载

+

PaddleOCR将持续新增支持OCR领域前沿算法与模型,欢迎广大开发者合作共建,贡献更多算法。

+

新增算法可参考教程:使用PaddleOCR架构添加新算法

+

1. 两阶段算法

+

1.1 文本检测算法

+

已支持的文本检测算法列表(戳链接获取使用教程):

+ +

在ICDAR2015文本检测公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络precisionrecallHmean下载链接
EASTResNet50_vd88.71%81.36%84.88%训练模型
EASTMobileNetV378.20%79.10%78.65%训练模型
DBResNet50_vd86.41%78.72%82.38%训练模型
DBMobileNetV377.29%73.08%75.12%训练模型
SASTResNet50_vd91.39%83.77%87.42%训练模型
PSEResNet50_vd85.81%79.53%82.55%训练模型
PSEMobileNetV382.20%70.48%75.89%训练模型
DB++ResNet5090.89%82.66%86.58%合成数据预训练模型/训练模型
+

在Total-text文本检测公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络precisionrecallHmean下载链接
SASTResNet50_vd89.63%78.44%83.66%训练模型
CTResNet18_vd88.68%81.70%85.05%训练模型
+

在CTW1500文本检测公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络precisionrecallHmean下载链接
FCEResNet50_dcn88.39%82.18%85.27%训练模型
DRRGResNet50_vd89.92%80.91%85.18%训练模型
+

说明: SAST模型训练额外加入了icdar2013、icdar2017、COCO-Text、ArT等公开数据集进行调优。PaddleOCR用到的经过整理格式的英文公开数据集下载:

+ +

1.2 文本识别算法

+

已支持的文本识别算法列表(戳链接获取使用教程):

+ +

参考DTRB (3)文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络Avg Accuracy模型存储命名下载链接
RosettaResnet34_vd79.11%rec_r34_vd_none_none_ctc训练模型
RosettaMobileNetV375.80%rec_mv3_none_none_ctc训练模型
CRNNResnet34_vd81.04%rec_r34_vd_none_bilstm_ctc训练模型
CRNNMobileNetV377.95%rec_mv3_none_bilstm_ctc训练模型
StarNetResnet34_vd82.85%rec_r34_vd_tps_bilstm_ctc训练模型
StarNetMobileNetV379.28%rec_mv3_tps_bilstm_ctc训练模型
RAREResnet34_vd83.98%rec_r34_vd_tps_bilstm_att训练模型
RAREMobileNetV381.76%rec_mv3_tps_bilstm_att训练模型
SRNResnet50_vd_fpn86.31%rec_r50fpn_vd_none_srn训练模型
NRTRNRTR_MTB84.21%rec_mtb_nrtr训练模型
SARResnet3187.20%rec_r31_sar训练模型
SEEDAster_Resnet85.35%rec_resnet_stn_bilstm_att训练模型
SVTRSVTR-Tiny89.25%rec_svtr_tiny_none_ctc_en训练模型
ViTSTRViTSTR79.82%rec_vitstr_none_ce训练模型
ABINetResnet4590.75%rec_r45_abinet训练模型
VisionLANResnet4590.30%rec_r45_visionlan训练模型
SPINResNet3290.00%rec_r32_gaspin_bilstm_att训练模型
RobustScannerResNet3187.77%rec_r31_robustscanner训练模型
RFLResNetRFL88.63%rec_resnet_rfl_att训练模型
ParseQVIT91.24%rec_vit_parseq_synth训练模型
CPPDSVTR-Base93.8%rec_svtrnet_cppd_base_en训练模型
SATRNShallowCNN88.05%rec_satrn训练模型
+

1.3 文本超分辨率算法

+

已支持的文本超分辨率算法列表(戳链接获取使用教程):

+ +

在TextZoom公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络PSNR_AvgSSIM_Avg配置文件下载链接
Text Gestalttsrn19.280.6560configs/sr/sr_tsrn_transformer_strock.yml训练模型
Text Telescopetbsrn21.560.7411configs/sr/sr_telescope.yml训练模型
+

1.4 公式识别算法

+

已支持的公式识别算法列表(戳链接获取使用教程):

+ +

在CROHME手写公式数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件ExpRate下载链接
CANDenseNetrec_d28_can.yml51.72%训练模型
+

2. 端到端算法

+

已支持的端到端OCR算法列表(戳链接获取使用教程):

+ +

3. 表格识别算法

+

已支持的表格识别算法列表(戳链接获取使用教程):

+ +

在PubTabNet表格识别公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件acc下载链接
TableMasterTableResNetExtraconfigs/table/table_master.yml77.47%训练模型 / 推理模型
+

4. 关键信息抽取算法

+

已支持的关键信息抽取算法列表(戳链接获取使用教程):

+ +

在wildreceipt发票公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件hmean下载链接
SDMGRVGG6configs/kie/sdmgr/kie_unet_sdmgr.yml86.70%训练模型
+

在XFUND_zh公开数据集上,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络任务配置文件hmean下载链接
VI-LayoutXLMVI-LayoutXLM-baseSERser_vi_layoutxlm_xfund_zh_udml.yml93.19%训练模型
LayoutXLMLayoutXLM-baseSERser_layoutxlm_xfund_zh.yml90.38%训练模型
LayoutLMLayoutLM-baseSERser_layoutlm_xfund_zh.yml77.31%训练模型
LayoutLMv2LayoutLMv2-baseSERser_layoutlmv2_xfund_zh.yml85.44%训练模型
VI-LayoutXLMVI-LayoutXLM-baseREre_vi_layoutxlm_xfund_zh_udml.yml83.92%训练模型
LayoutXLMLayoutXLM-baseREre_layoutxlm_xfund_zh.yml74.83%训练模型
LayoutLMv2LayoutLMv2-baseREre_layoutlmv2_xfund_zh.yml67.77%训练模型
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/super_resolution/algorithm_sr_gestalt.html b/algorithm/super_resolution/algorithm_sr_gestalt.html new file mode 100644 index 0000000000..28e9ddb68c --- /dev/null +++ b/algorithm/super_resolution/algorithm_sr_gestalt.html @@ -0,0 +1,5434 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Text Gestalt - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

Text Gestalt

+

1. 算法简介

+

论文信息:

+
+

Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution +Chen, Jingye and Yu, Haiyang and Ma, Jianqi and Li, Bin and Xue, Xiangyang +AAAI, 2022

+
+

参考FudanOCR 数据下载说明,在TextZoom测试集合上超分算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + +
模型骨干网络PSNR_AvgSSIM_Avg配置文件下载链接
Text Gestalttsrn19.280.6560configs/sr/sr_tsrn_transformer_strock.yml训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/sr/sr_tsrn_transformer_strock.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/sr/sr_tsrn_transformer_strock.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_tsrn_transformer_strock.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_sr.py -c configs/sr/sr_tsrn_transformer_strock.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png
+
+

img

+

执行命令后,上面图像的超分结果如下:

+

img

+

4. 推理部署

+

4.1 Python推理

+

首先将文本超分训练过程中保存的模型,转换成inference model。以 Text-Gestalt 训练的模型 为例,可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/sr/sr_tsrn_transformer_strock.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out
+
+

Text-Gestalt 文本超分模型推理,可以执行如下命令:

+
python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128
+
+

执行命令后,图像的超分结果如下:

+

img

+

4.2 C++推理

+

暂未支持

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@inproceedings{chen2022text,
+  title={Text gestalt: Stroke-aware scene text image super-resolution},
+  author={Chen, Jingye and Yu, Haiyang and Ma, Jianqi and Li, Bin and Xue, Xiangyang},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={36},
+  number={1},
+  pages={285--293},
+  year={2022}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/super_resolution/algorithm_sr_telescope.html b/algorithm/super_resolution/algorithm_sr_telescope.html new file mode 100644 index 0000000000..d41160ef3c --- /dev/null +++ b/algorithm/super_resolution/algorithm_sr_telescope.html @@ -0,0 +1,5435 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Text Telescope - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

Text Telescope

+

1. 算法简介

+

论文信息:

+
+

Scene Text Telescope: Text-Focused Scene Image Super-Resolution +Chen, Jingye, Bin Li, and Xiangyang Xue +CVPR, 2021

+
+

参考FudanOCR 数据下载说明,在TextZoom测试集合上超分算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + +
模型骨干网络PSNR_AvgSSIM_Avg配置文件下载链接
Text Telescopetbsrn21.560.7411configs/sr/sr_telescope.yml训练模型
+

TextZoom数据集 来自两个超分数据集RealSR和SR-RAW,两个数据集都包含LR-HR对,TextZoom有17367对训数据和4373对测试数据。

+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/sr/sr_telescope.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/sr/sr_telescope.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_sr.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words_en/word_52.png
+
+

img

+

执行命令后,上面图像的超分结果如下:

+

img

+

4. 推理部署

+

4.1 Python推理

+

首先将文本超分训练过程中保存的模型,转换成inference model。以 Text-Telescope 训练的模型 为例,可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/sr/sr_telescope.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.save_inference_dir=./inference/sr_out
+
+

Text-Telescope 文本超分模型推理,可以执行如下命令:

+
python3 tools/infer/predict_sr.py --sr_model_dir=./inference/sr_out --image_dir=doc/imgs_words_en/word_52.png --sr_image_shape=3,32,128
+
+

执行命令后,图像的超分结果如下:

+

img

+

4.2 C++推理

+

暂未支持

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@INPROCEEDINGS{9578891,
+  author={Chen, Jingye and Li, Bin and Xue, Xiangyang},
+  booktitle={2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  title={Scene Text Telescope: Text-Focused Scene Image Super-Resolution},
+  year={2021},
+  volume={},
+  number={},
+  pages={12021-12030},
+  doi={10.1109/CVPR46437.2021.01185}}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/super_resolution/images/sr_word_52-20240704093810101.png b/algorithm/super_resolution/images/sr_word_52-20240704093810101.png new file mode 100644 index 0000000000..c983e9ad7a Binary files /dev/null and b/algorithm/super_resolution/images/sr_word_52-20240704093810101.png differ diff --git a/algorithm/super_resolution/images/sr_word_52-20240704094309205.png b/algorithm/super_resolution/images/sr_word_52-20240704094309205.png new file mode 100644 index 0000000000..c983e9ad7a Binary files /dev/null and b/algorithm/super_resolution/images/sr_word_52-20240704094309205.png differ diff --git a/algorithm/super_resolution/images/sr_word_52.png b/algorithm/super_resolution/images/sr_word_52.png new file mode 100644 index 0000000000..c983e9ad7a Binary files /dev/null and b/algorithm/super_resolution/images/sr_word_52.png differ diff --git a/algorithm/super_resolution/images/word_52-20240704094304807.png b/algorithm/super_resolution/images/word_52-20240704094304807.png new file mode 100644 index 0000000000..493c590183 Binary files /dev/null and b/algorithm/super_resolution/images/word_52-20240704094304807.png differ diff --git a/algorithm/super_resolution/images/word_52.png b/algorithm/super_resolution/images/word_52.png new file mode 100644 index 0000000000..493c590183 Binary files /dev/null and b/algorithm/super_resolution/images/word_52.png differ diff --git a/algorithm/table_recognition/algorithm_table_master.html b/algorithm/table_recognition/algorithm_table_master.html new file mode 100644 index 0000000000..97aa8d92fa --- /dev/null +++ b/algorithm/table_recognition/algorithm_table_master.html @@ -0,0 +1,5369 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TableMaster - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

表格识别算法-TableMASTER

+

1. 算法简介

+

论文信息:

+
+

TableMaster: PINGAN-VCGROUP’S SOLUTION FOR ICDAR 2021 COMPETITION ON SCIENTIFIC LITERATURE PARSING TASK B: TABLE RECOGNITION TO HTML +Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong +2021

+
+

在PubTabNet表格识别公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件acc下载链接
TableMasterTableResNetExtraconfigs/table/table_master.yml77.47%训练模型/推理模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上述TableMaster模型使用PubTabNet表格识别公开数据集训练得到,数据集下载可参考 table_datasets

+

数据下载完成后,请参考文本识别教程进行训练。PaddleOCR对代码进行了模块化,训练不同的模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。以基于TableResNetExtra骨干网络,在PubTabNet数据集训练的模型为例(模型下载地址),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/table/table_master.yml -o Global.pretrained_model=output/table_master/best_accuracy Global.save_inference_dir=./inference/table_master
+
+

注意: 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否为所正确的字典文件。

+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
./inference/table_master/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
+3
cd ppstructure/
+python3.7 table/predict_structure.py --table_model_dir=../output/table_master/table_structure_tablemaster_infer/ --table_algorithm=TableMaster --table_char_dict_path=../ppocr/utils/dict/table_master_structure_dict.txt --table_max_len=480 --image_dir=docs/table/table.jpg
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='docs/table'。
+
+

执行命令后,上面图像的预测结果(结构信息和表格中每个单元格的坐标)会打印到屏幕上,同时会保存单元格坐标的可视化结果。示例如下: +结果如下:

+
1
+2
+3
+4
+5
[2022/06/16 13:06:54] ppocr INFO: result: ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'], [[72.17591094970703, 10.759100914001465, 60.29658508300781, 16.6805362701416], [161.85562133789062, 10.884308815002441, 14.9495210647583, 16.727018356323242], [277.79876708984375, 29.54340362548828, 31.490320205688477, 18.143272399902344],
+...
+[336.11724853515625, 280.3601989746094, 39.456939697265625, 18.121286392211914]]
+[2022/06/16 13:06:54] ppocr INFO: save vis result to ./output/table.jpg
+[2022/06/16 13:06:54] ppocr INFO: Predict time of docs/table/table.jpg: 17.36806297302246
+
+

注意

+
    +
  • TableMaster在推理时比较慢,建议使用GPU进行使用。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持TableMaster,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{ye2021pingan,
+  title={PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML},
+  author={Ye, Jiaquan and Qi, Xianbiao and He, Yelin and Chen, Yihao and Gu, Dengyi and Gao, Peng and Xiao, Rong},
+  journal={arXiv preprint arXiv:2105.01848},
+  year={2021}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/table_recognition/algorithm_table_slanet.html b/algorithm/table_recognition/algorithm_table_slanet.html new file mode 100644 index 0000000000..005f2262fe --- /dev/null +++ b/algorithm/table_recognition/algorithm_table_slanet.html @@ -0,0 +1,5381 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + TableSLANet - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

表格识别算法-SLANet-LCNetV2

+

1. 算法简介

+

该算法由来自北京交通大学机器学习与认识计算研究团队的ocr识别队研发,其在PaddleOCR算法模型挑战赛 - 赛题二:通用表格识别任务中排行榜荣获一等奖,排行榜精度相比PP-Structure表格识别模型提升0.8%,推理速度提升3倍。优化思路如下:

+
    +
  1. 改善推理过程,至EOS停止,速度提升3倍;
  2. +
  3. 升级Backbone为LCNetV2(SSLD版本);
  4. +
  5. 行列特征增强模块;
  6. +
  7. 提升分辨率488至512;
  8. +
  9. 三阶段训练策略。
  10. +
+

在PubTabNet表格识别公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件acc下载链接
SLANetLCNetV2configs/table/SLANet_lcnetv2.yml76.67%训练模型 /推理模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上述SLANet_LCNetv2模型使用PubTabNet表格识别公开数据集训练得到,数据集下载可参考 table_datasets

+

启动训练

+

数据下载完成后,请参考文本识别教程进行训练。PaddleOCR对代码进行了模块化,训练不同的模型只需要更换配置文件即可。

+

训练命令如下:

+
1
+2
+3
+4
# stage1
+python3 -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/table/SLANet_lcnetv2.yml
+# stage2 加载stage1的best model作为预训练模型,学习率调整为0.0001;
+# stage3 加载stage2的best model作为预训练模型,不调整学习率,将配置文件中所有的488修改为512.
+
+

4. 推理部署

+

4.1 Python推理

+

将训练得到best模型,转换成inference model,可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/table/SLANet_lcnetv2.yml -o Global.pretrained_model=path/best_accuracy Global.save_inference_dir=./inference/slanet_lcnetv2_infer
+
+

注意: 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否为所正确的字典文件。

+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
./inference/slanet_lcnetv2_infer/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
+3
cd ppstructure/
+python table/predict_structure.py --table_model_dir=../inference/slanet_lcnetv2_infer/ --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt --image_dir=docs/table/table.jpg --output=../output/table_slanet_lcnetv2 --use_gpu=False --benchmark=True --enable_mkldnn=True --table_max_len=512
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='docs/table'。
+
+

执行命令后,上面图像的预测结果(结构信息和表格中每个单元格的坐标)会打印到屏幕上,同时会保存单元格坐标的可视化结果。示例如下: +结果如下:

+
1
+2
+3
+4
+5
[2022/06/16 13:06:54] ppocr INFO: result: ['<html>', '<body>', '<table>', '<thead>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</thead>', '<tbody>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '<tr>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '<td></td>', '</tr>', '</tbody>', '</table>', '</body>', '</html>'], [[72.17591094970703, 10.759100914001465, 60.29658508300781, 16.6805362701416], [161.85562133789062, 10.884308815002441, 14.9495210647583, 16.727018356323242], [277.79876708984375, 29.54340362548828, 31.490320205688477, 18.143272399902344],
+...
+[336.11724853515625, 280.3601989746094, 39.456939697265625, 18.121286392211914]]
+[2022/06/16 13:06:54] ppocr INFO: save vis result to ./output/table.jpg
+[2022/06/16 13:06:54] ppocr INFO: Predict time of docs/table/table.jpg: 17.36806297302246
+
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持SLANet

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_ct.html b/algorithm/text_detection/algorithm_det_ct.html new file mode 100644 index 0000000000..83f853182e --- /dev/null +++ b/algorithm/text_detection/algorithm_det_ct.html @@ -0,0 +1,5343 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CT - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

CT

+

1. 算法简介

+

论文信息:

+
+

CentripetalText: An Efficient Text Instance Representation for Scene Text Detection +Tao Sheng, Jie Chen, Zhouhui Lian +NeurIPS, 2021

+
+

在Total-Text文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
CTResNet18_vdconfigs/det/det_r18_vd_ct.yml88.68%81.70%85.05%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

CT模型使用Total-Text文本检测公开数据集训练得到,数据集下载可参考 Total-Text-Dataset, 我们将标签文件转成了paddleocr格式,转换好的标签文件下载参考train.txt, text.txt

+

请参考文本检测训练教程。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将CT文本检测训练过程中保存的模型,转换成inference model。以基于Resnet18_vd骨干网络,在Total-Text英文数据集训练的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r18_vd_ct.yml -o Global.pretrained_model=./det_r18_ct_train/best_accuracy  Global.save_inference_dir=./inference/det_ct
+
+

CT文本检测模型推理,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_ct/" --det_algorithm="CT"
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为det_res。结果示例如下:

+

img

+

4.2 C++推理

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@inproceedings{sheng2021centripetaltext,
+    title={CentripetalText: An Efficient Text Instance Representation for Scene Text Detection},
+    author={Tao Sheng and Jie Chen and Zhouhui Lian},
+    booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
+    year={2021}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_db.html b/algorithm/text_detection/algorithm_det_db.html new file mode 100644 index 0000000000..5d195a0a71 --- /dev/null +++ b/algorithm/text_detection/algorithm_det_db.html @@ -0,0 +1,5403 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DB与DB++ - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

DB与DB++

+

1. 算法简介

+

论文信息:

+
+

Real-time Scene Text Detection with Differentiable Binarization +Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang +AAAI, 2020

+

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion +Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang +TPAMI, 2022

+
+

在ICDAR2015文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
DBResNet50_vdconfigs/det/det_r50_vd_db.yml86.41%78.72%82.38%训练模型
DBMobileNetV3configs/det/det_mv3_db.yml77.29%73.08%75.12%训练模型
DB++ResNet50configs/det/det_r50_db++_icdar15.yml90.89%82.66%86.58%合成数据预训练模型/训练模型
+

在TD_TR文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
DB++ResNet50configs/det/det_r50_db++_td_tr.yml92.92%86.48%89.58%合成数据预训练模型/训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本检测训练教程。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将DB文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_db.yml -o Global.pretrained_model=./det_r50_vd_db_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_db
+
+

DB文本检测模型推理,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_db/" --det_algorithm="DB"
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为det_res。结果示例如下:

+

img

+

注意:由于ICDAR2015数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文文本图像检测效果会比较差。

+

4.2 C++推理

+

准备好推理模型后,参考cpp infer教程进行操作即可。

+

4.3 Serving服务化部署

+

准备好推理模型后,参考pdserving教程进行Serving服务化部署,包括Python Serving和C++ Serving两种模式。

+

4.4 更多推理部署

+

DB模型还支持以下推理部署方式:

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@inproceedings{liao2020real,
+  title={Real-time scene text detection with differentiable binarization},
+  author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
+  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
+  volume={34},
+  number={07},
+  pages={11474--11481},
+  year={2020}
+}
+
+@article{liao2022real,
+  title={Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion},
+  author={Liao, Minghui and Zou, Zhisheng and Wan, Zhaoyi and Yao, Cong and Bai, Xiang},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  year={2022},
+  publisher={IEEE}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_drrg.html b/algorithm/text_detection/algorithm_det_drrg.html new file mode 100644 index 0000000000..f8e8f42458 --- /dev/null +++ b/algorithm/text_detection/algorithm_det_drrg.html @@ -0,0 +1,5337 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + DRRG - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

DRRG

+

1. 算法简介

+

论文信息:

+
+

Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection +Zhang, Shi-Xue and Zhu, Xiaobin and Hou, Jie-Bo and Liu, Chang and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng +CVPR, 2020

+
+

在CTW1500文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
DRRGResNet50_vdconfigs/det/det_r50_drrg_ctw.yml89.92%80.91%85.18%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上述DRRG模型使用CTW1500文本检测公开数据集训练得到,数据集下载可参考 ocr_datasets

+

数据下载完成后,请参考文本检测训练教程进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

由于模型前向运行时需要多次转换为Numpy数据进行运算,因此DRRG的动态图转静态图暂未支持。

+

4.2 C++推理

+

暂未支持

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@inproceedings{zhang2020deep,
+  title={Deep relational reasoning graph network for arbitrary shape text detection},
+  author={Zhang, Shi-Xue and Zhu, Xiaobin and Hou, Jie-Bo and Liu, Chang and Yang, Chun and Wang, Hongfa and Yin, Xu-Cheng},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={9699--9708},
+  year={2020}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_east.html b/algorithm/text_detection/algorithm_det_east.html new file mode 100644 index 0000000000..9ab536ea4a --- /dev/null +++ b/algorithm/text_detection/algorithm_det_east.html @@ -0,0 +1,5353 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + EAST - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

EAST

+

1. 算法简介

+

论文信息:

+
+

EAST: An Efficient and Accurate Scene Text Detector +Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, Jiajun Liang +CVPR, 2017

+
+

在ICDAR2015文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
EASTResNet50_vddet_r50_vd_east.yml88.71%81.36%84.88%训练模型
EASTMobileNetV3det_mv3_east.yml78.20%79.10%78.65%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上表中的EAST训练模型使用ICDAR2015文本检测公开数据集训练得到,数据集下载可参考 ocr_datasets

+

数据下载完成后,请参考文本检测训练教程进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将EAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例(训练模型),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_east.yml -o Global.pretrained_model=./det_r50_vd_east_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_r50_east/
+
+

EAST文本检测模型推理,需要设置参数--det_algorithm="EAST",执行预测:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_r50_east/" --det_algorithm="EAST"
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为det_res

+

img

+

4.2 C++推理

+

由于后处理暂未使用CPP编写,EAST文本检测模型暂不支持CPP推理。

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@inproceedings{zhou2017east,
+  title={East: an efficient and accurate scene text detector},
+  author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
+  booktitle={Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
+  pages={5551--5560},
+  year={2017}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_fcenet.html b/algorithm/text_detection/algorithm_det_fcenet.html new file mode 100644 index 0000000000..d364750441 --- /dev/null +++ b/algorithm/text_detection/algorithm_det_fcenet.html @@ -0,0 +1,5349 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + FCENet - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

FCENet

+

1. 算法简介

+

论文信息:

+
+

Fourier Contour Embedding for Arbitrary-Shaped Text Detection +Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang +CVPR, 2021

+
+

在CTW1500文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
FCEResNet50_dcnconfigs/det/det_r50_vd_dcn_fce_ctw.yml88.39%82.18%85.27%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上述FCE模型使用CTW1500文本检测公开数据集训练得到,数据集下载可参考 ocr_datasets

+

数据下载完成后,请参考文本检测训练教程进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将FCE文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd_dcn骨干网络,在CTW1500英文数据集训练的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_dcn_fce_ctw.yml -o Global.pretrained_model=./det_r50_dcn_fce_ctw_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_fce
+
+

FCE文本检测模型推理,执行非弯曲文本检测,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=quad
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

如果想执行弯曲文本检测,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_fce/" --det_algorithm="FCE" --det_fce_box_type=poly
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

注意:由于CTW1500数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文文本图像检测效果会比较差。

+

4.2 C++推理

+

由于后处理暂未使用CPP编写,FCE文本检测模型暂不支持CPP推理。

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@InProceedings{zhu2021fourier,
+  title={Fourier Contour Embedding for Arbitrary-Shaped Text Detection},
+  author={Yiqin Zhu and Jianyong Chen and Lingyu Liang and Zhanghui Kuang and Lianwen Jin and Wayne Zhang},
+  year={2021},
+  booktitle = {CVPR}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_psenet.html b/algorithm/text_detection/algorithm_det_psenet.html new file mode 100644 index 0000000000..875bbad564 --- /dev/null +++ b/algorithm/text_detection/algorithm_det_psenet.html @@ -0,0 +1,5359 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PSENet - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

PSENet

+

1. 算法简介

+

论文信息:

+
+

Shape robust text detection with progressive scale expansion network +Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai +CVPR, 2019

+
+

在ICDAR2015文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
PSEResNet50_vdconfigs/det/det_r50_vd_pse.yml85.81%79.53%82.55%训练模型
PSEMobileNetV3configs/det/det_mv3_pse.yml82.20%70.48%75.89%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

上述PSE模型使用ICDAR2015文本检测公开数据集训练得到,数据集下载可参考 ocr_datasets

+

数据下载完成后,请参考文本检测训练教程进行训练。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

首先将PSE文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_pse.yml -o Global.pretrained_model=./det_r50_vd_pse_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_pse
+
+

PSE文本检测模型推理,执行非弯曲文本检测,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=quad
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

如果想执行弯曲文本检测,可以执行如下命令:

+
python3 tools/infer/predict_det.py --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_pse/" --det_algorithm="PSE" --det_pse_box_type=poly
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

注意:由于ICDAR2015数据集只有1000张训练图像,且主要针对英文场景,所以上述模型对中文或弯曲文本图像检测效果会比较差。

+

4.2 C++推理

+

由于后处理暂未使用CPP编写,PSE文本检测模型暂不支持CPP推理。

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@inproceedings{wang2019shape,
+  title={Shape robust text detection with progressive scale expansion network},
+  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={9336--9345},
+  year={2019}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/algorithm_det_sast.html b/algorithm/text_detection/algorithm_det_sast.html new file mode 100644 index 0000000000..8605ed0b37 --- /dev/null +++ b/algorithm/text_detection/algorithm_det_sast.html @@ -0,0 +1,5427 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SAST - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SAST

+

1. 算法简介

+

论文信息:

+
+

A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning +Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming +ACM MM, 2019

+
+

在ICDAR2015文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
SASTResNet50_vdconfigs/det/det_r50_vd_sast_icdar15.yml91.39%83.77%87.42%训练模型
+

在Total-text文本检测公开数据集上,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件precisionrecallHmean下载链接
SASTResNet50_vdconfigs/det/det_r50_vd_sast_totaltext.yml89.63%78.44%83.66%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本检测训练教程。PaddleOCR对代码进行了模块化,训练不同的检测模型只需要更换配置文件即可。

+

4. 推理部署

+

4.1 Python推理

+

(1). 四边形文本检测模型(ICDAR2015)

+

首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在ICDAR2015英文数据集训练的模型为例(模型下载地址),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_icdar15.yml -o Global.pretrained_model=./det_r50_vd_sast_icdar15_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_sast_ic15
+
+

SAST文本检测模型推理,需要设置参数--det_algorithm="SAST",可以执行如下命令:

+
python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img_10.jpg" --det_model_dir="./inference/det_sast_ic15/"
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

(2). 弯曲文本检测模型(Total-Text)

+

首先将SAST文本检测训练过程中保存的模型,转换成inference model。以基于Resnet50_vd骨干网络,在Total-Text英文数据集训练的模型为例(模型下载地址),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/det/det_r50_vd_sast_totaltext.yml -o Global.pretrained_model=./det_r50_vd_sast_totaltext_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/det_sast_tt
+
+

SAST文本检测模型推理,需要设置参数--det_algorithm="SAST",同时,还需要增加参数--det_box_type=poly,可以执行如下命令:

+
python3 tools/infer/predict_det.py --det_algorithm="SAST" --image_dir="./doc/imgs_en/img623.jpg" --det_model_dir="./inference/det_sast_tt/" --det_box_type='poly'
+
+

可视化文本检测结果默认保存到./inference_results文件夹里面,结果文件的名称前缀为'det_res'。结果示例如下:

+

img

+

注意:本代码库中,SAST后处理Locality-Aware NMS有python和c++两种版本,c++版速度明显快于python版。由于c++版本nms编译版本问题,只有python3.5环境下会调用c++版nms,其他情况将调用python版nms。

+

4.2 C++推理

+

暂未支持

+

4.3 Serving服务化部署

+

暂未支持

+

4.4 更多推理部署

+

暂未支持

+

5. FAQ

+

引用

+
@inproceedings{wang2019single,
+  title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
+  author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
+  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},
+  pages={1277--1285},
+  year={2019}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_detection/images/det_res_img623_ct.jpg b/algorithm/text_detection/images/det_res_img623_ct.jpg new file mode 100644 index 0000000000..2c5f57d96c Binary files /dev/null and b/algorithm/text_detection/images/det_res_img623_ct.jpg differ diff --git a/algorithm/text_detection/images/det_res_img623_fce.jpg b/algorithm/text_detection/images/det_res_img623_fce.jpg new file mode 100644 index 0000000000..938ae4cabf Binary files /dev/null and b/algorithm/text_detection/images/det_res_img623_fce.jpg differ diff --git a/algorithm/text_detection/images/det_res_img623_sast.jpg b/algorithm/text_detection/images/det_res_img623_sast.jpg new file mode 100644 index 0000000000..af5e2d6e2c Binary files /dev/null and b/algorithm/text_detection/images/det_res_img623_sast.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_db.jpg b/algorithm/text_detection/images/det_res_img_10_db.jpg new file mode 100644 index 0000000000..6af89f6bb3 Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_db.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_east.jpg b/algorithm/text_detection/images/det_res_img_10_east.jpg new file mode 100644 index 0000000000..908d077c3e Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_east.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_fce.jpg b/algorithm/text_detection/images/det_res_img_10_fce.jpg new file mode 100644 index 0000000000..fb32950ffd Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_fce.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_pse.jpg b/algorithm/text_detection/images/det_res_img_10_pse.jpg new file mode 100644 index 0000000000..cdb7625dd0 Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_pse.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_pse_poly.jpg b/algorithm/text_detection/images/det_res_img_10_pse_poly.jpg new file mode 100644 index 0000000000..9c06a17ccb Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_pse_poly.jpg differ diff --git a/algorithm/text_detection/images/det_res_img_10_sast.jpg b/algorithm/text_detection/images/det_res_img_10_sast.jpg new file mode 100644 index 0000000000..702f773e68 Binary files /dev/null and b/algorithm/text_detection/images/det_res_img_10_sast.jpg differ diff --git a/algorithm/text_recognition/algorithm_rec_abinet.html b/algorithm/text_recognition/algorithm_rec_abinet.html new file mode 100644 index 0000000000..d72521f812 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_abinet.html @@ -0,0 +1,5494 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ABINet - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-ABINet

+

1. 算法简介

+

论文信息:

+
+

ABINet: Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition +Shancheng Fang and Hongtao Xie and Yuxin Wang and Zhendong Mao and Yongdong Zhang +CVPR, 2021

+
+

ABINet使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
ABINetResNet45rec_r45_abinet.yml90.75%预训练、训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练ABINet识别模型时需要更换配置文件ABINet配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r45_abinet.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r45_abinet.yml
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r45_abinet.yml -o Global.pretrained_model=./rec_r45_abinet_train/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_r45_abinet.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_r45_abinet_train/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址 ),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_r45_abinet.yml -o Global.pretrained_model=./rec_r45_abinet_train/best_accuracy Global.save_inference_dir=./inference/rec_r45_abinet/
+
+

注意:

+
    +
  • 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。
  • +
  • 如果您修改了训练时的输入大小,请修改tools/export_model.py文件中的对应ABINet的infer_shape
  • +
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_r45_abinet/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_r45_abinet/' --rec_algorithm='ABINet' --rec_image_shape='3,32,128' --rec_char_dict_path='./ppocr/utils/ic15_dict.txt'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

img

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999995231628418)
+
+

注意

+
    +
  • 训练上述模型采用的图像分辨率是[3,32,128],需要通过参数rec_image_shape设置为您训练时的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中ABINet的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持ABINet,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. MJSynth和SynthText两种数据集来自于ABINet源repo
  2. +
  3. 我们使用ABINet作者提供的预训练模型进行finetune训练。
  4. +
+

引用

+
@article{Fang2021ABINet,
+  title     = {ABINet: Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
+  author    = {Shancheng Fang and Hongtao Xie and Yuxin Wang and Zhendong Mao and Yongdong Zhang},
+  booktitle = {CVPR},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2103.06495},
+  pages     = {7098-7107}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_cppd.html b/algorithm/text_recognition/algorithm_rec_cppd.html new file mode 100644 index 0000000000..44fccf97a2 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_cppd.html @@ -0,0 +1,5784 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CPPD - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-CPPD

+

1. 算法简介

+

论文信息:

+
+

Context Perception Parallel Decoder for Scene Text Recognition +Yongkun Du and Zhineng Chen and Caiyan Jia and Xiaoting Yin and Chenxia Li and Yuning Du and Yu-Gang Jiang

+
+

CPPD算法简介

+

基于深度学习的场景文本识别模型通常是Encoder-Decoder结构,其中decoder可以分为两种:(1)CTC,(2)Attention-based。目前SOTA模型大多使用Attention-based的decoder,而attention-based可以分为AR和PD两种,一般来说,AR解码器识别精度优于PD,而PD解码速度快于AR,CPPD通过精心设计的CO和CC模块,达到了“AR的精度,PD的速度”的效果。

+

CPPD在场景文本识别公开数据集上的精度(%)和模型文件如下:

+
    +
  • 英文训练集和测试集来自于PARSeq
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型IC13
857
SVTIIIT5k
3000
IC15
1811
SVTPCUTE80Avg下载链接
CPPD Tiny97.194.496.686.688.590.392.25英文
CPPD Base98.295.597.687.990.092.793.80英文
CPPD Base 48*16097.595.597.787.792.493.794.10英文
+
    +
  • 英文合成数据集(MJ+ST)训练,英文Union14M-L benchmark测试结果U14m
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型CurveMulti-
Oriented
ArtisticContextlessSalientMulti-
word
GeneralAvg下载链接
CPPD Tiny52.412.348.254.461.553.461.449.10同上表
CPPD Base65.518.656.061.971.057.565.856.63同上表
CPPD Base 48*16071.922.160.567.978.363.967.161.69同上表
+
    +
  • Union14M-L 训练集From scratch训练,英文测试结果。
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型IC13
857
SVTIIIT5k
3000
IC15
1811
SVTPCUTE80Avg下载链接
CPPD Base 32*12898.597.799.290.394.698.396.42Coming soon
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型CurveMulti-
Oriented
ArtisticContextlessSalientMulti-
word
GeneralAvg下载链接
CPPD Base 32*12883.071.275.180.979.482.683.779.41Coming soon
+
    +
  • 加载合成数据集预训练模型,Union14M-L 训练集微调训练,英文测试结果。
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型IC13
857
SVTIIIT5k
3000
IC15
1811
SVTPCUTE80Avg下载链接
CPPD Base 32*12898.798.599.491.796.799.797.44英文
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型CurveMulti-
Oriented
ArtisticContextlessSalientMulti-
word
GeneralAvg下载链接
CPPD Base 32*12887.570.778.282.985.585.484.382.08同上表
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型SceneWebDocumentHandwritingAvg下载链接
CPPD Base74.476.198.655.376.10中文
CPPD Base + STN78.479.398.957.678.55中文
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

数据集准备

+

英文数据集下载

+

Union14M-L 下载

+

中文数据集下载

+

启动训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练CPPD识别模型时需要更换配置文件CPPD配置文件

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_svtrnet_cppd_base_en.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_svtrnet_cppd_base_en.yml
+
+

3.2 评估

+

可下载CPPD提供的模型文件和配置文件:下载地址 ,以CPPD-B为例,使用如下命令进行评估:

+
1
+2
+3
+4
# 下载包含CPPD-B的模型文件和配置文件的tar压缩包并解压
+wget https://paddleocr.bj.bcebos.com/CCPD/rec_svtr_cppd_base_en_train.tar && tar xf rec_svtr_cppd_base_en_train.tar
+# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c ./rec_svtr_cppd_base_en_train/rec_svtrnet_cppd_base_en.yml -o Global.pretrained_model=./rec_svtr_cppd_base_en_train/best_model
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c ./rec_svtr_cppd_base_en_train/rec_svtrnet_cppd_base_en.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_svtr_cppd_base_en_train/best_model
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。下面以基于CPPD-B,在英文数据集训练的模型为例(模型和配置文件下载地址,可以使用如下命令进行转换:

+

注意:

+
    +
  • 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否为所正确的字典文件。
  • +
+

执行如下命令进行模型导出和推理:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
# 注意将pretrained_model的路径设置为本地路径。
+# export model
+# en
+python3 tools/export_model.py -c configs/rec/rec_svtrnet_cppd_base_en.yml -o Global.pretrained_model=./rec_svtr_cppd_base_en_train/best_model.pdparams Global.save_inference_dir=./rec_svtr_cppd_base_en_infer
+# ch
+python3 tools/export_model.py -c configs/rec/rec_svtrnet_cppd_base_ch.yml -o Global.pretrained_model=./rec_svtr_cppd_base_ch_train/best_model.pdparams Global.save_inference_dir=./rec_svtr_cppd_base_ch_infer
+
+# speed test
+# docker image https://hub.docker.com/r/paddlepaddle/paddle/tags/: sudo docker pull paddlepaddle/paddle:2.4.2-gpu-cuda11.2-cudnn8.2-trt8.0
+# install auto_log: pip install https://paddleocr.bj.bcebos.com/libs/auto_log-1.2.0-py3-none-any.whl
+# en
+python3 tools/infer/predict_rec.py --image_dir='../iiik' --rec_model_dir='./rec_svtr_cppd_base_en_infer/' --rec_algorithm='CPPD' --rec_image_shape='3,32,100' --rec_char_dict_path='./ppocr/utils/ic15_dict.txt' --warmup=True --benchmark=True --rec_batch_num=1 --use_tensorrt=True
+# ch
+python3 tools/infer/predict_rec.py --image_dir='../iiik' --rec_model_dir='./rec_svtr_cppd_base_ch_infer/' --rec_algorithm='CPPDPadding' --rec_image_shape='3,32,256' --warmup=True --benchmark=True --rec_batch_num=1 --use_tensorrt=True
+# stn_ch
+python3 tools/infer/predict_rec.py --image_dir='../iiik' --rec_model_dir='./rec_svtr_cppd_base_stn_ch_infer/' --rec_algorithm='CPPD' --rec_image_shape='3,64,256' --warmup=True --benchmark=True --rec_batch_num=1 --use_tensorrt=True
+
+

导出成功后,在目录下有三个文件:

+
/inference/rec_svtr_cppd_base_en_infer/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持CPPD,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

引用

+
@article{Du2023CPPD,
+  title     = {Context Perception Parallel Decoder for Scene Text Recognition},
+  author    = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},
+  booktitle = {Arxiv},
+  year      = {2023},
+  url       = {https://arxiv.org/abs/2307.12270}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_crnn.html b/algorithm/text_recognition/algorithm_rec_crnn.html new file mode 100644 index 0000000000..5d560e4f04 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_crnn.html @@ -0,0 +1,5450 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CRNN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

CRNN

+

1. 算法简介

+

论文信息:

+
+

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition +Baoguang Shi, Xiang Bai, Cong Yao +IEEE, 2015

+
+

参考DTRB 文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络Avg Accuracy配置文件下载链接
CRNNResnet34_vd81.04%configs/rec/rec_r34_vd_none_bilstm_ctc.yml训练模型
CRNNMobileNetV377.95%configs/rec/rec_mv3_none_bilstm_ctc.yml训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c rec_r34_vd_none_bilstm_ctc.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将 CRNN 文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,使用MJSynth和SynthText两个英文文本识别合成数据集训练的模型 为例,可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_bilstm_ctc.yml -o Global.pretrained_model=./rec_r34_vd_none_bilstm_ctc_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/rec_crnn
+
+

CRNN 文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_crnn/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
+
+

img

+

执行命令后,上面图像的识别结果如下:

+
Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
+
+

注意:由于上述模型是参考DTRB文本识别训练和评估流程,与超轻量级中文识别模型训练有两方面不同:

+
    +
  • 训练时采用的图像分辨率不同,训练上述模型采用的图像分辨率是[3,32,100],而中文模型训练时,为了保证长文本的识别效果,训练时采用的图像分辨率是[3, 32, 320]。预测推理程序默认的形状参数是训练中文采用的图像分辨率,即[3, 32, 320]。因此,这里推理上述英文模型时,需要通过参数rec_image_shape设置识别图像的形状。
  • +
  • 字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_dict_path,指定为英文字典"./ppocr/utils/ic15_dict.txt"。
  • +
+
1
+2
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+dict_character = list(self.character_str)
+
+

4.2 C++推理

+

准备好推理模型后,参考cpp infer教程进行操作即可。

+

4.3 Serving服务化部署

+

准备好推理模型后,参考pdserving教程进行Serving服务化部署,包括Python Serving和C++ Serving两种模式。

+

4.4 更多推理部署

+

CRNN模型还支持以下推理部署方式:

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@ARTICLE{7801919,
+  author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
+  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  title={An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition},
+  year={2017},
+  volume={39},
+  number={11},
+  pages={2298-2304},
+  doi={10.1109/TPAMI.2016.2646371}}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_nrtr.html b/algorithm/text_recognition/algorithm_rec_nrtr.html new file mode 100644 index 0000000000..2da82fa16e --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_nrtr.html @@ -0,0 +1,5766 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + NRTR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-NRTR

+

1. 算法简介

+

论文信息:

+
+

NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition +Fenfen Sheng and Zhineng Chen and Bo Xu +ICDAR, 2019

+
+

NRTR使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
NRTRMTBrec_mtb_nrtr.yml84.21%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练NRTR识别模型时需要更换配置文件NRTR配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_mtb_nrtr.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_mtb_nrtr.yml
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_mtb_nrtr.yml -o Global.pretrained_model=./rec_mtb_nrtr_train/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_mtb_nrtr.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_mtb_nrtr_train/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址 ),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_mtb_nrtr.yml -o Global.pretrained_model=./rec_mtb_nrtr_train/best_accuracy Global.save_inference_dir=./inference/rec_mtb_nrtr/
+
+

注意:

+
    +
  • 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。
  • +
  • 如果您修改了训练时的输入大小,请修改tools/export_model.py文件中的对应NRTR的infer_shape
  • +
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_mtb_nrtr/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_mtb_nrtr/' --rec_algorithm='NRTR' --rec_image_shape='1,32,100' --rec_char_dict_path='./ppocr/utils/EN_symbol_dict.txt'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

img

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9465042352676392)
+
+

注意

+
    +
  • 训练上述模型采用的图像分辨率是[1,32,100],需要通过参数rec_image_shape设置为您训练时的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中NRTR的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持NRTR,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. NRTR论文中使用Beam搜索进行解码字符,但是速度较慢,这里默认未使用Beam搜索,以贪婪搜索进行解码字符。
  2. +
+

6. 发行公告

+
    +
  1. +

    release/2.6更新NRTR代码结构,新版NRTR可加载旧版(release/2.5及之前)模型参数,使用下面示例代码将旧版模型参数转换为新版模型参数:

    +

    +详情

    +
      1
    +  2
    +  3
    +  4
    +  5
    +  6
    +  7
    +  8
    +  9
    + 10
    + 11
    + 12
    + 13
    + 14
    + 15
    + 16
    + 17
    + 18
    + 19
    + 20
    + 21
    + 22
    + 23
    + 24
    + 25
    + 26
    + 27
    + 28
    + 29
    + 30
    + 31
    + 32
    + 33
    + 34
    + 35
    + 36
    + 37
    + 38
    + 39
    + 40
    + 41
    + 42
    + 43
    + 44
    + 45
    + 46
    + 47
    + 48
    + 49
    + 50
    + 51
    + 52
    + 53
    + 54
    + 55
    + 56
    + 57
    + 58
    + 59
    + 60
    + 61
    + 62
    + 63
    + 64
    + 65
    + 66
    + 67
    + 68
    + 69
    + 70
    + 71
    + 72
    + 73
    + 74
    + 75
    + 76
    + 77
    + 78
    + 79
    + 80
    + 81
    + 82
    + 83
    + 84
    + 85
    + 86
    + 87
    + 88
    + 89
    + 90
    + 91
    + 92
    + 93
    + 94
    + 95
    + 96
    + 97
    + 98
    + 99
    +100
    +101
    +102
    +103
    +104
    +105
    +106
    +107
    +108
    +109
    +110
    +111
    +112
    +113
    +114
    +115
    +116
    +117
    +118
    +119
    +120
    +121
    +122
    params = paddle.load('path/' + '.pdparams') # 旧版本参数
    +state_dict = model.state_dict() # 新版模型参数
    +new_state_dict = {}
    +
    +for k1, v1 in state_dict.items():
    +
    +    k = k1
    +    if 'encoder' in k and 'self_attn' in k and 'qkv' in k and 'weight' in k:
    +
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        q = params[k_para.replace('qkv', 'conv1')].transpose((1, 0, 2, 3))
    +        k = params[k_para.replace('qkv', 'conv2')].transpose((1, 0, 2, 3))
    +        v = params[k_para.replace('qkv', 'conv3')].transpose((1, 0, 2, 3))
    +
    +        new_state_dict[k1] = np.concatenate([q[:, :, 0, 0], k[:, :, 0, 0], v[:, :, 0, 0]], -1)
    +
    +    elif 'encoder' in k and 'self_attn' in k and 'qkv' in k and 'bias' in k:
    +
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        q = params[k_para.replace('qkv', 'conv1')]
    +        k = params[k_para.replace('qkv', 'conv2')]
    +        v = params[k_para.replace('qkv', 'conv3')]
    +
    +        new_state_dict[k1] = np.concatenate([q, k, v], -1)
    +
    +    elif 'encoder' in k and 'self_attn' in k and 'out_proj' in k:
    +
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        new_state_dict[k1] = params[k_para]
    +
    +    elif 'encoder' in k and 'norm3' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        new_state_dict[k1] = params[k_para.replace('norm3', 'norm2')]
    +
    +    elif 'encoder' in k and 'norm1' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        new_state_dict[k1] = params[k_para]
    +
    +
    +    elif 'decoder' in k and 'self_attn' in k and 'qkv' in k and 'weight' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        q = params[k_para.replace('qkv', 'conv1')].transpose((1, 0, 2, 3))
    +        k = params[k_para.replace('qkv', 'conv2')].transpose((1, 0, 2, 3))
    +        v = params[k_para.replace('qkv', 'conv3')].transpose((1, 0, 2, 3))
    +        new_state_dict[k1] = np.concatenate([q[:, :, 0, 0], k[:, :, 0, 0], v[:, :, 0, 0]], -1)
    +
    +    elif 'decoder' in k and 'self_attn' in k and 'qkv' in k and 'bias' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        q = params[k_para.replace('qkv', 'conv1')]
    +        k = params[k_para.replace('qkv', 'conv2')]
    +        v = params[k_para.replace('qkv', 'conv3')]
    +        new_state_dict[k1] = np.concatenate([q, k, v], -1)
    +
    +    elif 'decoder' in k and 'self_attn' in k and 'out_proj' in k:
    +
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        new_state_dict[k1] = params[k_para]
    +
    +    elif 'decoder' in k and 'cross_attn' in k and 'q' in k and 'weight' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('cross_attn', 'multihead_attn')
    +        q = params[k_para.replace('q', 'conv1')].transpose((1, 0, 2, 3))
    +        new_state_dict[k1] = q[:, :, 0, 0]
    +
    +    elif 'decoder' in k and 'cross_attn' in k and 'q' in k and 'bias' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('cross_attn', 'multihead_attn')
    +        q = params[k_para.replace('q', 'conv1')]
    +        new_state_dict[k1] = q
    +
    +    elif 'decoder' in k and 'cross_attn' in k and 'kv' in k and 'weight' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('cross_attn', 'multihead_attn')
    +        k = params[k_para.replace('kv', 'conv2')].transpose((1, 0, 2, 3))
    +        v = params[k_para.replace('kv', 'conv3')].transpose((1, 0, 2, 3))
    +        new_state_dict[k1] = np.concatenate([k[:, :, 0, 0], v[:, :, 0, 0]], -1)
    +
    +    elif 'decoder' in k and 'cross_attn' in k and 'kv' in k and 'bias' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('cross_attn', 'multihead_attn')
    +        k = params[k_para.replace('kv', 'conv2')]
    +        v = params[k_para.replace('kv', 'conv3')]
    +        new_state_dict[k1] = np.concatenate([k, v], -1)
    +
    +    elif 'decoder' in k and 'cross_attn' in k and 'out_proj' in k:
    +
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('cross_attn', 'multihead_attn')
    +        new_state_dict[k1] = params[k_para]
    +    elif 'decoder' in k and 'norm' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        new_state_dict[k1] = params[k_para]
    +    elif 'mlp' in k and 'weight' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('fc', 'conv')
    +        k_para = k_para.replace('mlp.', '')
    +        w = params[k_para].transpose((1, 0, 2, 3))
    +        new_state_dict[k1] = w[:, :, 0, 0]
    +    elif 'mlp' in k and 'bias' in k:
    +        k_para = k[:13] + 'layers.' + k[13:]
    +        k_para = k_para.replace('fc', 'conv')
    +        k_para = k_para.replace('mlp.', '')
    +        w = params[k_para]
    +        new_state_dict[k1] = w
    +
    +    else:
    +        new_state_dict[k1] = params[k1]
    +
    +    if list(new_state_dict[k1].shape) != list(v1.shape):
    +        print(k1)
    +
    +
    +for k, v1 in state_dict.items():
    +    if k not in new_state_dict.keys():
    +        print(1, k)
    +    elif list(new_state_dict[k].shape) != list(v1.shape):
    +        print(2, k)
    +
    +
    +
    +model.set_state_dict(new_state_dict)
    +paddle.save(model.state_dict(), 'nrtrnew_from_old_params.pdparams')
    +
    +
    +
  2. +
  3. +

    新版相比与旧版,代码结构简洁,推理速度有所提高。

    +
  4. +
+

引用

+
@article{Sheng2019NRTR,
+  title     = {NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition},
+  author    = {Fenfen Sheng and Zhineng Chen and Bo Xu},
+  booktitle = {ICDAR},
+  year      = {2019},
+  url       = {http://arxiv.org/abs/1806.00926},
+  pages     = {781-786}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_parseq.html b/algorithm/text_recognition/algorithm_rec_parseq.html new file mode 100644 index 0000000000..b376f96923 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_parseq.html @@ -0,0 +1,5445 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ParseQ - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

ParseQ

+

1. 算法简介

+

论文信息:

+
+

Scene Text Recognition with Permuted Autoregressive Sequence Models +Darwin Bautista, Rowel Atienza +ECCV, 2021

+
+

原论文分别使用真实文本识别数据集(Real)和合成文本识别数据集(Synth)进行训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估。其中:

+
    +
  • 真实文本识别数据集(Real)包含COCO-Text, RCTW17, Uber-Text, ArT, LSVT, MLT19, ReCTS, TextOCR, OpenVINO数据集
  • +
  • 合成文本识别数据集(Synth)包含MJSynth和SynthText数据集
  • +
+

在不同数据集上训练的算法的复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
数据集模型骨干网络配置文件Acc下载链接
SynthParseQVITrec_vit_parseq.yml91.24%训练模型
RealParseQVITrec_vit_parseq.yml94.74%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_vit_parseq.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_vit_parseq.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_vit_parseq.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_vit_parseq.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将ParseQ文本识别训练过程中保存的模型,转换成inference model。( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_vit_parseq.yml -o Global.pretrained_model=./rec_vit_parseq_real/best_accuracy Global.save_inference_dir=./inference/rec_parseq
+
+

ParseQ文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_parseq/" --rec_image_shape="3, 32, 128" --rec_algorithm="ParseQ" --rec_char_dict_path="ppocr/utils/dict/parseq_dict.txt" --max_text_length=25 --use_space_char=False
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持ParseQ,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@InProceedings{bautista2022parseq,
+  title={Scene Text Recognition with Permuted Autoregressive Sequence Models},
+  author={Bautista, Darwin and Atienza, Rowel},
+  booktitle={European Conference on Computer Vision},
+  pages={178--196},
+  month={10},
+  year={2022},
+  publisher={Springer Nature Switzerland},
+  address={Cham},
+  doi={10.1007/978-3-031-19815-1_11},
+  url={https://doi.org/10.1007/978-3-031-19815-1_11}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_rare.html b/algorithm/text_recognition/algorithm_rec_rare.html new file mode 100644 index 0000000000..ef66e01a21 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_rare.html @@ -0,0 +1,5433 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RARE - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

RARE

+

1. 算法简介

+

论文信息:

+
+

Robust Scene Text Recognition with Automatic Rectification +Baoguang Shi, Xinggang Wang, Pengyuan Lyu, Cong Yao, Xiang Bai∗ +CVPR, 2016

+
+

使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Avg Accuracy下载链接
RAREResnet34_vdconfigs/rec/rec_r34_vd_tps_bilstm_att.yml83.60%训练模型
RAREMobileNetV3configs/rec/rec_mv3_tps_bilstm_att.yml82.50%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。以基于Resnet34_vd骨干网络为例:

+

3.1 训练

+
1
+2
+3
+4
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml
+
+

3.2 评估

+
1
+2
# GPU评估, Global.pretrained_model为待评估模型
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

3.3 预测

+
python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将RARE文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,在MJSynth和SynthText两个文字识别数据集训练得到的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_att.yml -o Global.pretrained_model=./rec_r34_vd_tps_bilstm_att_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/rec_rare
+
+

RARE文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_rare/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
+
+

推理结果如下所示:

+

img

+
Predicts of doc/imgs_words/en/word_1.png:('joint ', 0.9999969601631165)
+
+

4.2 C++推理

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

RARE模型还支持以下推理部署方式:

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@inproceedings{2016Robust,
+  title={Robust Scene Text Recognition with Automatic Rectification},
+  author={ Shi, B.  and  Wang, X.  and  Lyu, P.  and  Cong, Y.  and  Xiang, B. },
+  booktitle={2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2016},
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_rfl.html b/algorithm/text_recognition/algorithm_rec_rfl.html new file mode 100644 index 0000000000..ccd9597953 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_rfl.html @@ -0,0 +1,5511 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RFL - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-RFL

+

1. 算法简介

+

论文信息:

+
+

Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition +Hui Jiang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Wenqi Ren, Fei Wu, and Wenming Tan +ICDAR, 2021

+
+

RFL使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
RFL-CNTResNetRFLrec_resnet_rfl_visual.yml93.40%训练模型
RFL-AttResNetRFLrec_resnet_rfl_att.yml88.63%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

PaddleOCR对代码进行了模块化,训练RFL识别模型时需要更换配置文件RFL配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
#step1:训练CNT分支
+#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_resnet_rfl_visual.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_resnet_rfl_visual.yml
+
+#step2:联合训练CNT和Att分支,注意将pretrained_model的路径设置为本地路径。
+#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_resnet_rfl_att.yml -o Global.pretrained_model=./output/rec/rec_resnet_rfl_visual/best_accuracy
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_resnet_rfl_att.yml  -o Global.pretrained_model=./output/rec/rec_resnet_rfl_visual/best_accuracy
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_resnet_rfl_att.yml -o Global.pretrained_model=./output/rec/rec_resnet_rfl_att/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_resnet_rfl_att.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./output/rec/rec_resnet_rfl_att/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址 ),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_resnet_rfl_att.yml -o Global.pretrained_model=./output/rec/rec_resnet_rfl_att/best_accuracy Global.save_inference_dir=./inference/rec_resnet_rfl_att/
+
+

注意: 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。

+
    +
  • 如果您修改了训练时的输入大小,请修改tools/export_model.py文件中的对应RFL的infer_shape
  • +
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_resnet_rfl_att/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_resnet_rfl_att/' --rec_algorithm='RFL' --rec_image_shape='1,32,100'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

img

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999927282333374)
+
+

注意

+
    +
  • 训练上述模型采用的图像分辨率是[1,32,100],需要通过参数rec_image_shape设置为您训练时的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中RFL的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持RFL,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{2021Reciprocal,
+  title     = {Reciprocal Feature Learning via Explicit and Implicit Tasks in Scene Text Recognition},
+  author    = {Jiang, H.  and  Xu, Y.  and  Cheng, Z.  and  Pu, S.  and  Niu, Y.  and  Ren, W.  and  Wu, F.  and  Tan, W. },
+  booktitle = {ICDAR},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2105.06229}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_robustscanner.html b/algorithm/text_recognition/algorithm_rec_robustscanner.html new file mode 100644 index 0000000000..4e3c6c37dc --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_robustscanner.html @@ -0,0 +1,5426 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + RobustScanner - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

RobustScanner

+

1. 算法简介

+

论文信息:

+
+

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition +Xiaoyu Yue, Zhanghui Kuang, Chenhao Lin, Hongbin Sun, Wayne +Zhang +ECCV, 2020

+
+

使用MJSynth和SynthText两个合成文字识别数据集训练,在IIIT, SVT, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
RobustScannerResNet31rec_r31_robustscanner.yml87.77%训练模型
+

注:除了使用MJSynth和SynthText两个文字识别数据集外,还加入了SynthAdd数据(提取码:627x),和部分真实数据,具体数据细节可以参考论文。

+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r31_robustscanner.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r31_robustscanner.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_robustscanner.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r31_robustscanner.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将RobustScanner文本识别训练过程中保存的模型,转换成inference model。可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r31_robustscanner.yml -o Global.pretrained_model={path/to/weights}/best_accuracy  Global.save_inference_dir=./inference/rec_r31_robustscanner
+
+

RobustScanner文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_r31_robustscanner/" --rec_image_shape="3, 48, 48, 160" --rec_algorithm="RobustScanner" --rec_char_dict_path="ppocr/utils/dict90.txt" --use_space_char=False
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持RobustScanner,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{2020RobustScanner,
+  title={RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition},
+  author={Xiaoyu Yue and Zhanghui Kuang and Chenhao Lin and Hongbin Sun and Wayne Zhang},
+  journal={ECCV2020},
+  year={2020},
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_rosetta.html b/algorithm/text_recognition/algorithm_rec_rosetta.html new file mode 100644 index 0000000000..e7e1df6c5d --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_rosetta.html @@ -0,0 +1,5433 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Rosetta - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

Rosetta

+

1. 算法简介

+

论文信息:

+
+

Rosetta: Large Scale System for Text Detection and Recognition in Images +Borisyuk F , Gordo A , V Sivakumar +KDD, 2018

+
+

使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估, 算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Avg Accuracy下载链接
RosettaResnet34_vdconfigs/rec/rec_r34_vd_none_none_ctc.yml79.11%训练模型
RosettaMobileNetV3configs/rec/rec_mv3_none_none_ctc.yml75.80%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。 以基于Resnet34_vd骨干网络为例:

+

3.1 训练

+
1
+2
+3
+4
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r34_vd_none_none_ctc.yml
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r34_vd_none_none_ctc.yml
+
+

3.2 评估

+
1
+2
# GPU评估, Global.pretrained_model为待评估模型
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_none_none_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

3.3 预测

+
python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_none_none_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将Rosetta文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,在MJSynth和SynthText两个文字识别数据集训练得到的模型为例( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r34_vd_none_none_ctc.yml -o Global.pretrained_model=./rec_r34_vd_none_none_ctc_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/rec_rosetta
+
+

Rosetta文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_rosetta/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
+
+

推理结果如下所示:

+

img

+
Predicts of doc/imgs_words/en/word_1.png:('joint', 0.9999982714653015)
+
+

4.2 C++推理

+

暂不支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

Rosetta模型还支持以下推理部署方式:

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@inproceedings{2018Rosetta,
+  title={Rosetta: Large Scale System for Text Detection and Recognition in Images},
+  author={ Borisyuk, Fedor  and  Gordo, Albert  and  Sivakumar, Viswanath },
+  booktitle={the 24th ACM SIGKDD International Conference},
+  year={2018},
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_sar.html b/algorithm/text_recognition/algorithm_rec_sar.html new file mode 100644 index 0000000000..931ec19bf3 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_sar.html @@ -0,0 +1,5426 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SAR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SAR

+

1. 算法简介

+

论文信息:

+
+

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition +Hui Li, Peng Wang, Chunhua Shen, Guyu Zhang +AAAI, 2019

+
+

使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
SARResNet31rec_r31_sar.yml87.20%训练模型
+

注:除了使用MJSynth和SynthText两个文字识别数据集外,还加入了SynthAdd数据(提取码:627x),和部分真实数据,具体数据细节可以参考论文。

+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r31_sar.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r31_sar.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将SAR文本识别训练过程中保存的模型,转换成inference model。( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r31_sar.yml -o Global.pretrained_model=./rec_r31_sar_train/best_accuracy  Global.save_inference_dir=./inference/rec_sar
+
+

SAR文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_sar/" --rec_image_shape="3, 48, 48, 160" --rec_algorithm="SAR" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持SAR,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{Li2019ShowAA,
+  title={Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition},
+  author={Hui Li and Peng Wang and Chunhua Shen and Guyu Zhang},
+  journal={ArXiv},
+  year={2019},
+  volume={abs/1811.00751}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_satrn.html b/algorithm/text_recognition/algorithm_rec_satrn.html new file mode 100644 index 0000000000..d310d2dcd9 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_satrn.html @@ -0,0 +1,5424 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SATRN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SATRN

+

1. 算法简介

+

论文信息:

+
+

On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention +Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee +CVPR, 2020 +参考DTRB 文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:

+
+ + + + + + + + + + + + + + + + + + + +
模型骨干网络Avg Accuracy配置文件下载链接
SATRNShallowCNN88.05%configs/rec/rec_satrn.yml训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_satrn.yml
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c rec_satrn.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_satrn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_satrn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将SATRN文本识别训练过程中保存的模型,转换成inference model。( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_satrn.yml -o Global.pretrained_model=./rec_satrn/best_accuracy  Global.save_inference_dir=./inference/rec_satrn
+
+

SATRN文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_satrn/" --rec_image_shape="3, 48, 48, 160" --rec_algorithm="SATRN" --rec_char_dict_path="ppocr/utils/dict90.txt" --max_text_length=30 --use_space_char=False
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持SATRN,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{lee2019recognizing,
+      title={On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention},
+      author={Junyeop Lee and Sungrae Park and Jeonghun Baek and Seong Joon Oh and Seonghyeon Kim and Hwalsuk Lee},
+      year={2019},
+      eprint={1910.04396},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_seed.html b/algorithm/text_recognition/algorithm_rec_seed.html new file mode 100644 index 0000000000..9eac4eec14 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_seed.html @@ -0,0 +1,5423 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SEED - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SEED

+

1. 算法简介

+

论文信息:

+
+

SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition +Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping +CVPR, 2020

+
+

参考DTRB 文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络Avg Accuracy配置文件下载链接
SEEDAster_Resnet85.20%configs/rec/rec_resnet_stn_bilstm_att.yml训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

SEED模型需要额外加载FastText训练好的语言模型 ,并且安装 fasttext 依赖:

+
python3 -m pip install fasttext==0.9.1
+
+

然后,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_resnet_stn_bilstm_att.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c rec_resnet_stn_bilstm_att.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_resnet_stn_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_resnet_stn_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

coming soon

+

4.2 C++推理

+

coming soon

+

4.3 Serving服务化部署

+

coming soon

+

4.4 更多推理部署

+

coming soon

+

5. FAQ

+

引用

+
@inproceedings{qiao2020seed,
+  title={Seed: Semantics enhanced encoder-decoder framework for scene text recognition},
+  author={Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={13528--13537},
+  year={2020}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_spin.html b/algorithm/text_recognition/algorithm_rec_spin.html new file mode 100644 index 0000000000..270d68f17b --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_spin.html @@ -0,0 +1,5425 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SPIN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition

+

1. 算法简介

+

论文信息:

+
+

SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition +Chengwei Zhang, Yunlu Xu, Zhanzhan Cheng, Shiliang Pu, Yi Niu, Fei Wu, Futai Zou +AAAI, 2020

+
+

SPIN收录于AAAI2020。主要用于OCR识别任务。在任意形状文本识别中,矫正网络是一种较为常见的前置处理模块,但诸如RARE\ASTER\ESIR等只考虑了空间变换,并没有考虑色度变换。本文提出了一种结构Structure-Preserving Inner Offset Network (SPIN),可以在色彩空间上进行变换。该模块是可微分的,可以加入到任意识别器中。 +使用MJSynth和SynthText两个合成文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
SPINResNet32rec_r32_gaspin_bilstm_att.yml90.00%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r32_gaspin_bilstm_att.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r32_gaspin_bilstm_att.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r32_gaspin_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r32_gaspin_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将SPIN文本识别训练过程中保存的模型,转换成inference model。可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r32_gaspin_bilstm_att.yml -o Global.pretrained_model={path/to/weights}/best_accuracy  Global.save_inference_dir=./inference/rec_r32_gaspin_bilstm_att
+
+

SPIN文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_r32_gaspin_bilstm_att/" --rec_image_shape="3, 32, 100" --rec_algorithm="SPIN" --rec_char_dict_path="/ppocr/utils/dict/spin_dict.txt" --use_space_char=Falsee
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持SPIN,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{2020SPIN,
+  title={SPIN: Structure-Preserving Inner Offset Network for Scene Text Recognition},
+  author={Chengwei Zhang and Yunlu Xu and Zhanzhan Cheng and Shiliang Pu and Yi Niu and Fei Wu and Futai Zou},
+  journal={AAAI2020},
+  year={2020},
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_srn.html b/algorithm/text_recognition/algorithm_rec_srn.html new file mode 100644 index 0000000000..6c5bdf1c6c --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_srn.html @@ -0,0 +1,5425 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SRN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

SRN

+

1. 算法简介

+

论文信息:

+
+

Towards Accurate Scene Text Recognition with Semantic Reasoning Networks +Deli Yu, Xuan Li, Chengquan Zhang, Junyu Han, Jingtuo Liu, Errui Ding +CVPR,2020

+
+

使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
SRNResnet50_vd_fpnrec_r50_fpn_srn.yml86.31%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r50_fpn_srn.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将SRN文本识别训练过程中保存的模型,转换成inference model。( 模型下载地址 ),可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r50_fpn_srn.yml -o Global.pretrained_model=./rec_r50_vd_srn_train/best_accuracy  Global.save_inference_dir=./inference/rec_srn
+
+

SRN文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/en/word_1.png" --rec_model_dir="./inference/rec_srn/" --rec_image_shape="1,64,256"  --rec_algorithm="SRN" --rec_char_dict_path=./ppocr/utils/ic15_dict.txt  --use_space_char=False
+
+

4.2 C++推理

+

由于C++预处理后处理还未支持SRN,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+

引用

+
@article{Yu2020TowardsAS,
+  title={Towards Accurate Scene Text Recognition With Semantic Reasoning Networks},
+  author={Deli Yu and Xuan Li and Chengquan Zhang and Junyu Han and Jingtuo Liu and Errui Ding},
+  journal={2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year={2020},
+  pages={12110-12119}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_starnet.html b/algorithm/text_recognition/algorithm_rec_starnet.html new file mode 100644 index 0000000000..81bc5becdd --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_starnet.html @@ -0,0 +1,5453 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + STAR-Net - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

STAR-Net

+

1. 算法简介

+

论文信息:

+
+

STAR-Net: a spatial attention residue network for scene text recognition. +Wei Liu, Chaofeng Chen, Kwan-Yee K. Wong, Zhizhong Su and Junyu Han. +BMVC, pages 43.1-43.13, 2016

+
+

参考DTRB 文字识别训练和评估流程,使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
模型骨干网络Avg Accuracy配置文件下载链接
StarNetResnet34_vd84.44%configs/rec/rec_r34_vd_tps_bilstm_ctc.yml训练模型
StarNetMobileNetV381.42%configs/rec/rec_mv3_tps_bilstm_ctc.yml训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练不同的识别模型只需要更换配置文件即可。

+

训练

+

在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
# 单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c rec_r34_vd_tps_bilstm_ctc.yml
+
+

评估

+
1
+2
# GPU 评估, Global.pretrained_model 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy
+
+

预测

+
1
+2
# 预测使用的配置文件必须与训练一致
+python3 tools/infer_rec.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model={path/to/weights}/best_accuracy Global.infer_img=doc/imgs_words/en/word_1.png
+
+

4. 推理部署

+

4.1 Python推理

+

首先将 STAR-Net 文本识别训练过程中保存的模型,转换成inference model。以基于Resnet34_vd骨干网络,使用MJSynth和SynthText两个英文文本识别合成数据集训练的模型 为例,可以使用如下命令进行转换:

+
python3 tools/export_model.py -c configs/rec/rec_r34_vd_tps_bilstm_ctc.yml -o Global.pretrained_model=./rec_r34_vd_tps_bilstm_ctc_v2.0_train/best_accuracy  Global.save_inference_dir=./inference/rec_starnet
+
+

STAR-Net 文本识别模型推理,可以执行如下命令:

+
python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words_en/word_336.png" --rec_model_dir="./inference/rec_starnet/" --rec_image_shape="3, 32, 100" --rec_char_dict_path="./ppocr/utils/ic15_dict.txt"
+
+

img

+

执行命令后,上面图像的识别结果如下:

+
Predicts of ./doc/imgs_words_en/word_336.png:('super', 0.9999073)
+
+

注意:由于上述模型是参考DTRB文本识别训练和评估流程,与超轻量级中文识别模型训练有两方面不同:

+
    +
  • +

    训练时采用的图像分辨率不同,训练上述模型采用的图像分辨率是[3,32,100],而中文模型训练时,为了保证长文本的识别效果,训练时采用的图像分辨率是[3, 32, 320]。预测推理程序默认的形状参数是训练中文采用的图像分辨率,即[3, 32, 320]。因此,这里推理上述英文模型时,需要通过参数rec_image_shape设置识别图像的形状。

    +
  • +
  • +

    字符列表,DTRB论文中实验只是针对26个小写英文本母和10个数字进行实验,总共36个字符。所有大小字符都转成了小写字符,不在上面列表的字符都忽略,认为是空格。因此这里没有输入字符字典,而是通过如下命令生成字典.因此在推理时需要设置参数rec_char_dict_path,指定为英文字典"./ppocr/utils/ic15_dict.txt"。

    +
  • +
+
1
+2
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
+dict_character = list(self.character_str)
+
+

4.2 C++推理

+

准备好推理模型后,参考cpp infer教程进行操作即可。

+

4.3 Serving服务化部署

+

准备好推理模型后,参考pdserving教程进行Serving服务化部署,包括Python Serving和C++ Serving两种模式。

+

4.4 更多推理部署

+

STAR-Net模型还支持以下推理部署方式:

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@inproceedings{liu2016star,
+  title={STAR-Net: a spatial attention residue network for scene text recognition.},
+  author={Liu, Wei and Chen, Chaofeng and Wong, Kwan-Yee K and Su, Zhizhong and Han, Junyu},
+  booktitle={BMVC},
+  volume={2},
+  pages={7},
+  year={2016}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_svtr.html b/algorithm/text_recognition/algorithm_rec_svtr.html new file mode 100644 index 0000000000..2693314980 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_svtr.html @@ -0,0 +1,5667 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SVTR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-SVTR

+

1. 算法简介

+

论文信息:

+
+

SVTR: Scene Text Recognition with a Single Visual Model +Yongkun Du and Zhineng Chen and Caiyan Jia and Xiaoting Yin and Tianlun Zheng and Chenxia Li and Yuning Du and Yu-Gang Jiang +IJCAI, 2022

+
+

场景文本识别旨在将自然图像中的文本转录为数字字符序列,从而传达对场景理解至关重要的高级语义。这项任务由于文本变形、字体、遮挡、杂乱背景等方面的变化具有一定的挑战性。先前的方法为提高识别精度做出了许多工作。然而文本识别器除了准确度外,还因为实际需求需要考虑推理速度等因素。

+

SVTR算法简介

+

主流的场景文本识别模型通常包含两个模块:用于特征提取的视觉模型和用于文本转录的序列模型。这种架构虽然准确,但复杂且效率较低,限制了在实际场景中的应用。SVTR提出了一种用于场景文本识别的单视觉模型,该模型在patch-wise image tokenization框架内,完全摒弃了序列建模,在精度具有竞争力的前提下,模型参数量更少,速度更快,主要有以下几点贡献:

+
    +
  1. 首次发现单视觉模型可以达到与视觉语言模型相媲美甚至更高的准确率,并且其具有效率高和适应多语言的优点,在实际应用中很有前景。
  2. +
  3. SVTR从字符组件的角度出发,逐渐的合并字符组件,自下而上地完成字符的识别。
  4. +
  5. SVTR引入了局部和全局Mixing,分别用于提取字符组件特征和字符间依赖关系,与多尺度的特征一起,形成多粒度特征描述。
  6. +
+

SVTR在场景文本识别公开数据集上的精度(%)和模型文件如下:

+
    +
  • 中文数据集来自于Chinese Benckmark ,SVTR的中文训练评估策略遵循该论文。
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型IC13
857
SVTIIIT5k
3000
IC15
1811
SVTPCUTE80Avg_6IC15
2077
IC13
1015
IC03
867
IC03
860
Avg_10Chinese
scene_test
下载链接
SVTR Tiny96.8591.3494.5383.9985.4389.2490.8780.5595.3795.2795.7090.1367.90英文 / 中文
SVTR Small95.9293.0495.0384.7087.9192.0191.6382.7294.8896.0896.2891.0269.00英文 / 中文
SVTR Base97.0891.5096.0385.2089.9291.6792.3383.7395.6695.6295.8191.6171.40英文 / -
SVTR Large97.2091.6596.3086.5888.3795.1492.8284.5496.3596.5496.7492.2472.10英文 / 中文
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

数据集准备

+

英文数据集下载 +中文数据集下载

+

启动训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练SVTR识别模型时需要更换配置文件SVTR配置文件

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_svtrnet.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_svtrnet.yml
+
+

3.2 评估

+

可下载SVTR提供的模型文件和配置文件:下载地址 ,以SVTR-T为例,使用如下命令进行评估:

+
1
+2
+3
+4
# 下载包含SVTR-T的模型文件和配置文件的tar压缩包并解压
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_en_train.tar && tar xf rec_svtr_tiny_none_ctc_en_train.tar
+# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c ./rec_svtr_tiny_none_ctc_en_train/rec_svtr_tiny_6local_6global_stn_en.yml -o Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c ./rec_svtr_tiny_none_ctc_en_train/rec_svtr_tiny_6local_6global_stn_en.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。下面以SVTR-T在英文数据集训练的模型为例(模型和配置文件下载地址 ),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c ./rec_svtr_tiny_none_ctc_en_train/rec_svtr_tiny_6local_6global_stn_en.yml -o Global.pretrained_model=./rec_svtr_tiny_none_ctc_en_train/best_accuracy Global.save_inference_dir=./inference/rec_svtr_tiny_stn_en
+
+

注意: 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否为所正确的字典文件。

+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_svtr_tiny_stn_en/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_svtr_tiny_stn_en/' --rec_algorithm='SVTR' --rec_image_shape='3,64,256' --rec_char_dict_path='./ppocr/utils/ic15_dict.txt'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999998807907104)
+
+

注意

+
    +
  • 如果您调整了训练时的输入分辨率,需要通过参数rec_image_shape设置为您需要的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中SVTR的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持SVTR,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  • +
      +
    1. GPU和CPU速度对比
    2. +
    +
  • +
  • +

    由于SVTR使用的算子大多为矩阵相乘,在GPU环境下,速度具有优势,但在CPU开启mkldnn加速环境下,SVTR相比于被优化的卷积网络没有优势。

    +
  • +
  • +
      +
    1. SVTR模型转ONNX失败
    2. +
    +
  • +
  • 保证paddle2onnxonnxruntime版本最新,转onnx命令参考SVTR模型转onnx步骤实例
  • +
  • +
      +
    1. SVTR转ONNX成功但是推理结果不正确
    2. +
    +
  • +
  • 可能的原因模型参数out_char_num设置不正确,应设置为W//4、W//8或者W//12,可以参考高精度中文场景文本识别模型SVTR的3.3.3章节
  • +
  • +
      +
    1. 长文本识别优化
    2. +
    +
  • +
  • 参考高精度中文场景文本识别模型SVTR的3.3章节
  • +
  • +
      +
    1. 论文结果复现注意事项
    2. +
    +
  • +
  • 数据集使用ABINet提供的数据集;
  • +
  • 默认使用4卡GPU训练,单卡Batchsize默认为512,总Batchsize为2048,对应的学习率为0.0005,当修改Batchsize或者改变GPU卡数,学习率应等比例修改。
  • +
  • +
      +
    1. 进一步优化的探索点
    2. +
    +
  • +
  • 学习率调整:可以调整为默认的两倍保持Batchsize不变;或者将Batchsize减小为默认的1/2,保持学习率不变;
  • +
  • 数据增强策略:可选RecConAugRecAug
  • +
  • 如果不使用STN时,可以将mixerLocal替换为Convlocal_mixer全部修改为[5, 5]
  • +
  • 网格搜索最优的embed_dimdepthnum_heads配置;
  • +
  • 使用后Normalization策略,即是将模型配置prenorm修改为True
  • +
+

引用

+
@article{Du2022SVTR,
+  title     = {SVTR: Scene Text Recognition with a Single Visual Model},
+  author    = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Zheng, Tianlun and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},
+  booktitle = {IJCAI},
+  year      = {2022},
+  url       = {https://arxiv.org/abs/2205.00159}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_svtrv2.html b/algorithm/text_recognition/algorithm_rec_svtrv2.html new file mode 100644 index 0000000000..6cad0cdcb7 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_svtrv2.html @@ -0,0 +1,5500 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + SVTRv2 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-SVTRv2

+

1. 算法简介

+

SVTRv2算法简介

+

🔥 该算法由来自复旦大学视觉与学习实验室(FVL)的OpenOCR团队研发,其在PaddleOCR算法模型挑战赛 - 赛题一:OCR端到端识别任务中荣获一等奖,B榜端到端识别精度相比PP-OCRv4提升2.5%,推理速度持平。主要思路:1、检测和识别模型的Backbone升级为RepSVTR;2、识别教师模型升级为SVTRv2,可识别长文本。

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型配置文件端到端下载链接
PP-OCRv4A榜 62.77%
B榜 62.51%
Model List
SVTRv2(Rec Sever)configs/rec/SVTRv2/rec_svtrv2_ch.ymlA榜 68.81% (使用PP-OCRv4检测模型)训练模型 / 推理模型
RepSVTR(Mobile)识别
识别蒸馏
检测
B榜 65.07%识别: 训练模型 / 推理模型
识别蒸馏: 训练模型 / 推理模型
检测: 训练模型 / 推理模型
+

🚀 快速使用:参考PP-OCR推理说明文档,将检测和识别模型替换为上表中对应的RepSVTR或SVTRv2推理模型即可使用。

+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

训练命令:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/SVTRv2/rec_repsvtr_gtc.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+# Rec 学生模型
+python -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/rec/SVTRv2/rec_repsvtr_gtc.yml
+# Rec 教师模型
+python -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/rec/SVTRv2/rec_svtrv2_gtc.yml
+# Rec 蒸馏训练
+python -m paddle.distributed.launch --gpus '0,1,2,3,4,5,6,7'  tools/train.py -c configs/rec/SVTRv2/rec_svtrv2_gtc_distill.yml
+
+

3.2 评估

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/SVTRv2/rec_repsvtr_gtc.yml -o Global.pretrained_model=output/rec_repsvtr_gtc/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c tools/eval.py -c configs/rec/SVTRv2/rec_repsvtr_gtc.yml -o Global.pretrained_model=output/rec_repsvtr_gtc/best_accuracy Global.infer_img='./doc/imgs_words_en/word_10.png'
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model,以RepSVTR为例,可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/SVTRv2/rec_repsvtr_gtc.yml -o Global.pretrained_model=output/rec_repsvtr_gtc/best_accuracy Global.save_inference_dir=./inference/rec_repsvtr_infer
+
+

注意: 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否为所正确的字典文件。

+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
./inference/rec_repsvtr_infer/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_repsvtr_infer/'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9999998807907104)
+
+

注意

+
    +
  • 如果您调整了训练时的输入分辨率,需要通过参数rec_image_shape设置为您需要的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中SVTR的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

准备好推理模型后,参考cpp infer教程进行操作即可。

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+
    +
  • Paddle2ONNX推理:准备好推理模型后,参考paddle2onnx教程操作。
  • +
+

5. FAQ

+

引用

+
@article{Du2022SVTR,
+  title     = {SVTR: Scene Text Recognition with a Single Visual Model},
+  author    = {Du, Yongkun and Chen, Zhineng and Jia, Caiyan and Yin, Xiaoting and Zheng, Tianlun and Li, Chenxia and Du, Yuning and Jiang, Yu-Gang},
+  booktitle = {IJCAI},
+  year      = {2022},
+  url       = {https://arxiv.org/abs/2205.00159}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_visionlan.html b/algorithm/text_recognition/algorithm_rec_visionlan.html new file mode 100644 index 0000000000..09ec926316 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_visionlan.html @@ -0,0 +1,5493 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + VisionLAN - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-VisionLAN

+

1. 算法简介

+

论文信息:

+
+

From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network +Yuxin Wang, Hongtao Xie, Shancheng Fang, Jing Wang, Shenggao Zhu, Yongdong Zhang +ICCV, 2021

+
+

VisionLAN使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
VisionLANResNet45rec_r45_visionlan.yml90.30%预训练、训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练VisionLAN识别模型时需要更换配置文件VisionLAN配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_r45_visionlan.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_r45_visionlan.yml
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/eval.py -c configs/rec/rec_r45_visionlan.yml -o Global.pretrained_model=./rec_r45_visionlan_train/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_r45_visionlan.yml -o Global.infer_img='./doc/imgs_words/en/word_2.png' Global.pretrained_model=./rec_r45_visionlan_train/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_r45_visionlan.yml -o Global.pretrained_model=./rec_r45_visionlan_train/best_accuracy Global.save_inference_dir=./inference/rec_r45_visionlan/
+
+

注意:

+
    +
  • 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。
  • +
  • 如果您修改了训练时的输入大小,请修改tools/export_model.py文件中的对应VisionLAN的infer_shape
  • +
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
./inference/rec_r45_visionlan/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words/en/word_2.png' --rec_model_dir='./inference/rec_r45_visionlan/' --rec_algorithm='VisionLAN' --rec_image_shape='3,64,256' --rec_char_dict_path='./ppocr/utils/ic15_dict.txt' --use_space_char=False
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

img

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words/en/word_2.png:('yourself', 0.9999493)
+
+

注意

+
    +
  • 训练上述模型采用的图像分辨率是[3,64,256],需要通过参数rec_image_shape设置为您训练时的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中VisionLAN的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持VisionLAN,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. MJSynth和SynthText两种数据集来自于VisionLAN源repo
  2. +
  3. 我们使用VisionLAN作者提供的预训练模型进行finetune训练,预训练模型配套字典为'ppocr/utils/ic15_dict.txt'。
  4. +
+

引用

+
@inproceedings{wang2021two,
+  title={From Two to One: A New Scene Text Recognizer with Visual Language Modeling Network},
+  author={Wang, Yuxin and Xie, Hongtao and Fang, Shancheng and Wang, Jing and Zhu, Shenggao and Zhang, Yongdong},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
+  pages={14194--14203},
+  year={2021}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/algorithm_rec_vitstr.html b/algorithm/text_recognition/algorithm_rec_vitstr.html new file mode 100644 index 0000000000..c73300d6f7 --- /dev/null +++ b/algorithm/text_recognition/algorithm_rec_vitstr.html @@ -0,0 +1,5493 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ViTSTR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景文本识别算法-ViTSTR

+

1. 算法简介

+

论文信息:

+
+

Vision Transformer for Fast and Efficient Scene Text Recognition +Rowel Atienza +ICDAR, 2021

+
+

ViTSTR使用MJSynth和SynthText两个文字识别数据集训练,在IIIT, SVT, IC03, IC13, IC15, SVTP, CUTE数据集上进行评估,算法复现效果如下:

+ + + + + + + + + + + + + + + + + + + +
模型骨干网络配置文件Acc下载链接
ViTSTRViTSTRrec_vitstr_none_ce.yml79.82%训练模型
+

2. 环境配置

+

请先参考《运行环境准备》配置PaddleOCR运行环境,参考《项目克隆》克隆项目代码。

+

3. 模型训练、评估、预测

+

3.1 模型训练

+

请参考文本识别训练教程。PaddleOCR对代码进行了模块化,训练ViTSTR识别模型时需要更换配置文件ViTSTR配置文件

+

启动训练

+

具体地,在完成数据准备后,便可以启动训练,训练命令如下:

+
1
+2
+3
+4
+5
#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/rec_vitstr_none_ce.yml
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/rec_vitstr_none_ce.yml
+
+

3.2 评估

+

可下载已训练完成的模型文件,使用如下命令进行评估:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/rec_vitstr_none_ce.yml -o Global.pretrained_model=./rec_vitstr_none_ce_train/best_accuracy
+
+

3.3 预测

+

使用如下命令进行单张图片预测:

+
1
+2
+3
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/infer_rec.py -c configs/rec/rec_vitstr_none_ce.yml -o Global.infer_img='./doc/imgs_words_en/word_10.png' Global.pretrained_model=./rec_vitstr_none_ce_train/best_accuracy
+# 预测文件夹下所有图像时,可修改infer_img为文件夹,如 Global.infer_img='./doc/imgs_words_en/'。
+
+

4. 推理部署

+

4.1 Python推理

+

首先将训练得到best模型,转换成inference model。这里以训练完成的模型为例(模型下载地址 ),可以使用如下命令进行转换:

+
1
+2
# 注意将pretrained_model的路径设置为本地路径。
+python3 tools/export_model.py -c configs/rec/rec_vitstr_none_ce.yml -o Global.pretrained_model=./rec_vitstr_none_ce_train/best_accuracy Global.save_inference_dir=./inference/rec_vitstr/
+
+

注意:

+
    +
  • 如果您是在自己的数据集上训练的模型,并且调整了字典文件,请注意修改配置文件中的character_dict_path是否是所需要的字典文件。
  • +
  • 如果您修改了训练时的输入大小,请修改tools/export_model.py文件中的对应ViTSTR的infer_shape
  • +
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_vitstr/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

执行如下命令进行模型推理:

+
1
+2
python3 tools/infer/predict_rec.py --image_dir='./doc/imgs_words_en/word_10.png' --rec_model_dir='./inference/rec_vitstr/' --rec_algorithm='ViTSTR' --rec_image_shape='1,224,224' --rec_char_dict_path='./ppocr/utils/EN_symbol_dict.txt'
+# 预测文件夹下所有图像时,可修改image_dir为文件夹,如 --image_dir='./doc/imgs_words_en/'。
+
+

img

+

执行命令后,上面图像的预测结果(识别的文本和得分)会打印到屏幕上,示例如下: +结果如下:

+
Predicts of ./doc/imgs_words_en/word_10.png:('pain', 0.9998350143432617)
+
+

注意

+
    +
  • 训练上述模型采用的图像分辨率是[1,224,224],需要通过参数rec_image_shape设置为您训练时的识别图像形状。
  • +
  • 在推理时需要设置参数rec_char_dict_path指定字典,如果您修改了字典,请修改该参数为您的字典文件。
  • +
  • 如果您修改了预处理方法,需修改tools/infer/predict_rec.py中ViTSTR的预处理为您的预处理方法。
  • +
+

4.2 C++推理部署

+

由于C++预处理后处理还未支持ViTSTR,所以暂未支持

+

4.3 Serving服务化部署

+

暂不支持

+

4.4 更多推理部署

+

暂不支持

+

5. FAQ

+
    +
  1. ViTSTR论文中,使用在ImageNet1k上的预训练权重进行初始化训练,我们在训练未采用预训练权重,最终精度没有变化甚至有所提高。
  2. +
  3. 我们仅仅复现了ViTSTR中的tiny版本,如果需要使用small、base版本,可将ViTSTR源repo 中的预训练权重转为Paddle权重使用。
  4. +
+

引用

+
@article{Atienza2021ViTSTR,
+  title     = {Vision Transformer for Fast and Efficient Scene Text Recognition},
+  author    = {Rowel Atienza},
+  booktitle = {ICDAR},
+  year      = {2021},
+  url       = {https://arxiv.org/abs/2105.08582}
+}
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/algorithm/text_recognition/images/word_1-20240704183926496.png b/algorithm/text_recognition/images/word_1-20240704183926496.png new file mode 100644 index 0000000000..7b915fd6da Binary files /dev/null and b/algorithm/text_recognition/images/word_1-20240704183926496.png differ diff --git a/algorithm/text_recognition/images/word_1-20240704184113913.png b/algorithm/text_recognition/images/word_1-20240704184113913.png new file mode 100644 index 0000000000..7b915fd6da Binary files /dev/null and b/algorithm/text_recognition/images/word_1-20240704184113913.png differ diff --git a/algorithm/text_recognition/images/word_10.png b/algorithm/text_recognition/images/word_10.png new file mode 100644 index 0000000000..07370f757e Binary files /dev/null and b/algorithm/text_recognition/images/word_10.png differ diff --git a/algorithm/text_recognition/images/word_336-20240705082445918.png b/algorithm/text_recognition/images/word_336-20240705082445918.png new file mode 100644 index 0000000000..3bddd294ed Binary files /dev/null and b/algorithm/text_recognition/images/word_336-20240705082445918.png differ diff --git a/algorithm/text_recognition/images/word_336.png b/algorithm/text_recognition/images/word_336.png new file mode 100644 index 0000000000..3bddd294ed Binary files /dev/null and b/algorithm/text_recognition/images/word_336.png differ diff --git "a/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253.html" "b/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..74a63359d3 --- /dev/null +++ "b/applications/PCB\345\255\227\347\254\246\350\257\206\345\210\253.html" @@ -0,0 +1,6181 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PCB文字识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

基于PP-OCRv3的PCB字符识别

+

1. 项目介绍

+

印刷电路板(PCB)是电子产品中的核心器件,对于板件质量的测试与监控是生产中必不可少的环节。在一些场景中,通过PCB中信号灯颜色和文字组合可以定位PCB局部模块质量问题,PCB文字识别中存在如下难点:

+
    +
  • 裁剪出的PCB图片宽高比例较小
  • +
  • 文字区域整体面积也较小
  • +
  • 包含垂直、水平多种方向文本
  • +
+

针对本场景,PaddleOCR基于全新的PP-OCRv3通过合成数据、微调以及其他场景适配方法完成小字符文本识别任务,满足企业上线要求。PCB检测、识别效果如 图1 所示:

+

+

注:欢迎在AIStudio领取免费算力体验线上实训,项目链接: 基于PP-OCRv3实现PCB字符识别

+

2. 安装说明

+

下载PaddleOCR源码,安装依赖环境。

+
1
+2
+3
# 如仍需安装or安装更新,可以执行以下步骤
+git clone https://github.com/PaddlePaddle/PaddleOCR.git
+#  git clone https://gitee.com/PaddlePaddle/PaddleOCR
+
+
1
+2
# 安装依赖包
+pip install -r /home/aistudio/PaddleOCR/requirements.txt
+
+

3. 数据准备

+

我们通过图片合成工具生成 图2 所示的PCB图片,整图只有高25、宽150左右、文字区域高9、宽45左右,包含垂直和水平2种方向的文本:

+

+

暂时不开源生成的PCB数据集,但是通过更换背景,通过如下代码生成数据即可:

+
cd gen_data
+python3 gen.py --num_img=10
+
+

生成图片参数解释:

+
num_img:生成图片数量
+font_min_size、font_max_size:字体最大、最小尺寸
+bg_path:文字区域背景存放路径
+det_bg_path:整图背景存放路径
+fonts_path:字体路径
+corpus_path:语料路径
+output_dir:生成图片存储路径
+
+

这里生成 100张 相同尺寸和文本的图片,如 图3 所示,方便大家跑通实验。通过如下代码解压数据集:

+

+
tar xf ./data/data148165/dataset.tar -C ./
+
+

在生成数据集的时需要生成检测和识别训练需求的格式:

+
    +
  • 文本检测
  • +
+

标注文件格式如下,中间用'\t'分隔:

+
" 图像文件名                    json.dumps编码的图像标注信息"
+ch4_test_images/img_61.jpg    [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]
+
+

json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 points 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字,当其内容为“###”时,表示该文本框无效,在训练时会跳过。

+
    +
  • 文本识别
  • +
+

标注文件的格式如下, txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。

+
" 图像文件名                 图像标注信息 "
+
+train_data/rec/train/word_001.jpg   简单可依赖
+train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
+...
+
+

4. 文本检测

+

选用飞桨OCR开发套件PaddleOCR中的PP-OCRv3模型进行文本检测和识别。针对检测模型和识别模型,进行了共计9个方面的升级:

+
    +
  • +

    PP-OCRv3检测模型对PP-OCRv2中的CML协同互学习文本检测蒸馏策略进行了升级,分别针对教师模型和学生模型进行进一步效果优化。其中,在对教师模型优化时,提出了大感受野的PAN结构LK-PAN和引入了DML蒸馏策略;在对学生模型优化时,提出了残差注意力机制的FPN结构RSE-FPN。

    +
  • +
  • +

    PP-OCRv3的识别模块是基于文本识别算法SVTR优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。PP-OCRv3通过轻量级文本识别网络SVTR_LCNet、Attention损失指导CTC损失训练策略、挖掘文字上下文信息的数据增广策略TextConAug、TextRotNet自监督预训练模型、UDML联合互学习策略、UIM无标注数据挖掘方案,6个方面进行模型加速和效果提升。

    +
  • +
+

更多细节请参考PP-OCRv3技术报告

+

我们使用 3种方案 进行检测模型的训练、评估:

+
    +
  • PP-OCRv3英文超轻量检测预训练模型直接评估
  • +
  • PP-OCRv3英文超轻量检测预训练模型 + 验证集padding直接评估
  • +
  • PP-OCRv3英文超轻量检测预训练模型 + fine-tune
  • +
+

4.1 预训练模型直接评估

+

我们首先通过PaddleOCR提供的预训练模型在验证集上进行评估,如果评估指标能满足效果,可以直接使用预训练模型,不再需要训练。

+

使用预训练模型直接评估步骤如下:

+

1)下载预训练模型

+

PaddleOCR已经提供了PP-OCR系列模型,部分模型展示如下表所示:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型简介模型名称推荐场景检测模型方向分类器识别模型
中英文超轻量PP-OCRv3模型(16.2M)ch_PP-OCRv3_xx移动端&服务器端推理模型 / 训练模型推理模型 / 训练模型推理模型 / 训练模型
英文超轻量PP-OCRv3模型(13.4M)en_PP-OCRv3_xx移动端&服务器端推理模型 / 训练模型推理模型 / 训练模型推理模型 / 训练模型
中英文超轻量PP-OCRv2模型(13.0M)ch_PP-OCRv2_xx移动端&服务器端推理模型 / 训练模型推理模型 / 预训练模型推理模型 / 训练模型
中英文超轻量PP-OCR mobile模型(9.4M)ch_ppocr_mobile_v2.0_xx移动端&服务器端推理模型 / 预训练模型推理模型 / 预训练模型推理模型 / 预训练模型
中英文通用PP-OCR server模型(143.4M)ch_ppocr_server_v2.0_xx服务器端推理模型 / 预训练模型推理模型 / 预训练模型推理模型 / 预训练模型
+

更多模型下载(包括多语言),可以参考PP-OCR系列模型下载

+

这里我们使用PP-OCRv3英文超轻量检测模型,下载并解压预训练模型:

+
1
+2
+3
+4
+5
+6
+7
+8
# 如果更换其他模型,更新下载链接和解压指令就可以
+cd /home/aistudio/PaddleOCR
+mkdir pretrain_models
+cd pretrain_models
+# 下载英文预训练模型
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/english/en_PP-OCRv3_det_distill_train.tar
+tar xf en_PP-OCRv3_det_distill_train.tar && rm -rf en_PP-OCRv3_det_distill_train.tar
+%cd ..
+
+

模型评估

+

首先修改配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml中的以下字段:

+
1
+2
+3
+4
+5
Eval.dataset.data_dir:指向验证集图片存放目录,'/home/aistudio/dataset'
+Eval.dataset.label_file_list:指向验证集标注文件,'/home/aistudio/dataset/det_gt_val.txt'
+Eval.dataset.transforms.DetResizeForTest:  尺寸
+        limit_side_len: 48
+        limit_type: 'min'
+
+

然后在验证集上进行评估,具体代码如下:

+
1
+2
+3
+4
cd /home/aistudio/PaddleOCR
+python tools/eval.py \
+    -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml  \
+    -o Global.checkpoints="./pretrain_models/en_PP-OCRv3_det_distill_train/best_accuracy"
+
+

4.2 预训练模型+验证集padding直接评估

+

考虑到PCB图片比较小,宽度只有25左右、高度只有140-170左右,我们在原图的基础上进行padding,再进行检测评估,padding前后效果对比如 图4 所示:

+

+

将图片都padding到300*300大小,因为坐标信息发生了变化,我们同时要修改标注文件,在/home/aistudio/dataset目录里也提供了padding之后的图片,大家也可以尝试训练和评估:

+

同上,我们需要修改配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml中的以下字段:

+
1
+2
+3
+4
+5
Eval.dataset.data_dir:指向验证集图片存放目录,'/home/aistudio/dataset'
+Eval.dataset.label_file_list:指向验证集标注文件,/home/aistudio/dataset/det_gt_padding_val.txt
+Eval.dataset.transforms.DetResizeForTest:  尺寸
+        limit_side_len: 1100
+        limit_type: 'min'
+
+

将训练完成的模型放置在对应目录下即可完成模型推理

+
1
+2
+3
+4
cd /home/aistudio/PaddleOCR
+python tools/eval.py \
+    -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml  \
+    -o Global.checkpoints="./pretrain_models/en_PP-OCRv3_det_distill_train/best_accuracy"
+
+

4.3 预训练模型+fine-tune

+

基于预训练模型,在生成的1500图片上进行fine-tune训练和评估,其中train数据1200张,val数据300张,修改配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml中的以下字段:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
Global.epoch_num: 这里设置为1,方便快速跑通,实际中根据数据量调整该值
+Global.save_model_dir:模型保存路径
+Global.pretrained_model:指向预训练模型路径,'./pretrain_models/en_PP-OCRv3_det_distill_train/student.pdparams'
+Optimizer.lr.learning_rate:调整学习率,本实验设置为0.0005
+Train.dataset.data_dir:指向训练集图片存放目录,'/home/aistudio/dataset'
+Train.dataset.label_file_list:指向训练集标注文件,'/home/aistudio/dataset/det_gt_train.txt'
+Train.dataset.transforms.EastRandomCropData.size:训练尺寸改为[480,64]
+Eval.dataset.data_dir:指向验证集图片存放目录,'/home/aistudio/dataset/'
+Eval.dataset.label_file_list:指向验证集标注文件,'/home/aistudio/dataset/det_gt_val.txt'
+Eval.dataset.transforms.DetResizeForTest:评估尺寸,添加如下参数
+    limit_side_len: 64
+    limit_type:'min'
+
+

执行下面命令启动训练:

+
1
+2
+3
cd /home/aistudio/PaddleOCR/
+python tools/train.py \
+        -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml
+
+

模型评估

+

使用训练好的模型进行评估,更新模型路径Global.checkpoints:

+
1
+2
+3
+4
cd /home/aistudio/PaddleOCR/
+python3 tools/eval.py \
+    -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml  \
+    -o Global.checkpoints="./output/ch_PP-OCR_V3_det/latest"
+
+

使用训练好的模型进行评估,指标如下所示:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序号方案hmean效果提升实验分析
1PP-OCRv3英文超轻量检测预训练模型64.64%-提供的预训练模型具有泛化能力
2PP-OCRv3英文超轻量检测预训练模型 + 验证集padding72.13%+7.49%padding可以提升尺寸较小图片的检测效果
3PP-OCRv3英文超轻量检测预训练模型 + fine-tune100.00%+27.87%fine-tune会提升垂类场景效果
+

注:上述实验结果均是在1500张图片(1200张训练集,300张测试集)上训练、评估的得到,AIstudio只提供了100张数据,所以指标有所差异属于正常,只要策略有效、规律相同即可。

+

5. 文本识别

+

我们分别使用如下4种方案进行训练、评估:

+
    +
  • 方案1PP-OCRv3中英文超轻量识别预训练模型直接评估
  • +
  • 方案2:PP-OCRv3中英文超轻量检测预训练模型 + fine-tune
  • +
  • 方案3:PP-OCRv3中英文超轻量检测预训练模型 + fine-tune + 公开通用识别数据集
  • +
  • 方案4:PP-OCRv3中英文超轻量检测预训练模型 + fine-tune + 增加PCB图像数量
  • +
+

5.1 预训练模型直接评估

+

同检测模型,我们首先使用PaddleOCR提供的识别预训练模型在PCB验证集上进行评估。

+

使用预训练模型直接评估步骤如下:

+

1)下载预训练模型

+

我们使用PP-OCRv3中英文超轻量文本识别模型,下载并解压预训练模型:

+
1
+2
+3
+4
+5
# 如果更换其他模型,更新下载链接和解压指令就可以
+cd /home/aistudio/PaddleOCR/pretrain_models/
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+tar xf ch_PP-OCRv3_rec_train.tar && rm -rf ch_PP-OCRv3_rec_train.tar
+cd ..
+
+

模型评估 +首先修改配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv2_rec_distillation.yml中的以下字段:

+
1
+2
+3
Metric.ignore_space: True:忽略空格
+Eval.dataset.data_dir:指向验证集图片存放目录,'/home/aistudio/dataset'
+Eval.dataset.label_file_list:指向验证集标注文件,'/home/aistudio/dataset/rec_gt_val.txt'
+
+

我们使用下载的预训练模型进行评估:

+
1
+2
+3
+4
cd /home/aistudio/PaddleOCR
+python3 tools/eval.py \
+    -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
+    -o Global.checkpoints=pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy
+
+

5.2 三种fine-tune方案

+

方案2、3、4训练和评估方式是相同的,因此在我们了解每个技术方案之后,再具体看修改哪些参数是相同,哪些是不同的。

+

方案介绍:

+

1) 方案2:预训练模型 + fine-tune

+
    +
  • 在预训练模型的基础上进行fine-tune,使用1500张PCB进行训练和评估,其中训练集1200张,验证集300张。
  • +
+

2) 方案3:预训练模型 + fine-tune + 公开通用识别数据集

+
    +
  • 当识别数据比较少的情况,可以考虑添加公开通用识别数据集。在方案2的基础上,添加公开通用识别数据集,如lsvt、rctw等。
  • +
+

3)方案4:预训练模型 + fine-tune + 增加PCB图像数量

+
    +
  • 如果能够获取足够多真实场景,我们可以通过增加数据量提升模型效果。在方案2的基础上,增加PCB的数量到2W张左右。
  • +
+

参数修改:

+

接着我们看需要修改的参数,以上方案均需要修改配置文件configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml的参数,修改一次即可

+
1
+2
+3
+4
+5
Global.pretrained_model:指向预训练模型路径,'pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy'
+Optimizer.lr.values:学习率,本实验设置为0.0005
+Train.loader.batch_size_per_card: batch size,默认128,因为数据量小于128,因此我们设置为8,数据量大可以按默认的训练
+Eval.loader.batch_size_per_card: batch size,默认128,设置为4
+Metric.ignore_space: 忽略空格,本实验设置为True
+
+

更换不同的方案每次需要修改的参数:

+
1
+2
+3
+4
+5
+6
Global.epoch_num: 这里设置为1,方便快速跑通,实际中根据数据量调整该值
+Global.save_model_dir:指向模型保存路径
+Train.dataset.data_dir:指向训练集图片存放目录
+Train.dataset.label_file_list:指向训练集标注文件
+Eval.dataset.data_dir:指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+

同时方案3修改以下参数

+
1
+2
Eval.dataset.label_file_list:添加公开通用识别数据标注文件
+Eval.dataset.ratio_list:数据和公开通用识别数据每次采样比例,按实际修改即可
+
+

图5 所示:

+

+

我们提取Student模型的参数,在PCB数据集上进行fine-tune,可以参考如下代码:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
import paddle
+# 加载预训练模型
+all_params = paddle.load("./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy.pdparams")
+# 查看权重参数的keys
+print(all_params.keys())
+# 学生模型的权重提取
+s_params = {key[len("student_model."):]: all_params[key] for key in all_params if "student_model." in key}
+# 查看学生模型权重参数的keys
+print(s_params.keys())
+# 保存
+paddle.save(s_params, "./pretrain_models/ch_PP-OCRv3_rec_train/student.pdparams")
+
+

修改参数后,每个方案都执行如下命令启动训练:

+
1
+2
cd /home/aistudio/PaddleOCR/
+python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
+
+

使用训练好的模型进行评估,更新模型路径Global.checkpoints

+
1
+2
+3
+4
cd /home/aistudio/PaddleOCR/
+python3 tools/eval.py \
+    -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
+    -o Global.checkpoints=./output/rec_ppocr_v3/latest
+
+

所有方案评估指标如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序号方案acc效果提升实验分析
1PP-OCRv3中英文超轻量识别预训练模型直接评估46.67%-提供的预训练模型具有泛化能力
2PP-OCRv3中英文超轻量识别预训练模型 + fine-tune42.02%-4.65%在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试)
3PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集77.00%+30.33%在数据量不足的情况下,可以考虑补充公开数据训练
4PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量99.99%+22.99%如果能获取更多数据量的情况,可以通过增加数据量提升效果
+

注:上述实验结果均是在1500张图片(1200张训练集,300张测试集)、2W张图片、添加公开通用识别数据集上训练、评估的得到,AIstudio只提供了100张数据,所以指标有所差异属于正常,只要策略有效、规律相同即可。

+

6. 模型导出

+

inference 模型(paddle.jit.save保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

+
1
+2
+3
+4
+5
# 导出检测模型
+python3 tools/export_model.py \
+     -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml \
+     -o Global.pretrained_model="./output/ch_PP-OCR_V3_det/latest" \
+     Global.save_inference_dir="./inference_model/ch_PP-OCR_V3_det/"
+
+

因为上述模型只训练了1个epoch,因此我们使用训练最优的模型进行预测,存储在/home/aistudio/best_models/目录下,解压即可

+
1
+2
+3
cd /home/aistudio/best_models/
+wget https://paddleocr.bj.bcebos.com/fanliku/PCB/det_ppocr_v3_en_infer_PCB.tar
+tar xf /home/aistudio/best_models/det_ppocr_v3_en_infer_PCB.tar -C /home/aistudio/PaddleOCR/pretrain_models/
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
# 检测模型inference模型预测
+cd /home/aistudio/PaddleOCR/
+python3 tools/infer/predict_det.py \
+    --image_dir="/home/aistudio/dataset/imgs/0000.jpg" \
+    --det_algorithm="DB" \
+    --det_model_dir="./pretrain_models/det_ppocr_v3_en_infer_PCB/" \
+    --det_limit_side_len=48 \
+    --det_limit_type='min' \
+    --det_db_unclip_ratio=2.5 \
+    --use_gpu=True
+
+

结果存储在inference_results目录下,检测如下图所示:

+

+

同理,导出识别模型并进行推理。

+
1
+2
+3
+4
+5
# 导出识别模型
+python3 tools/export_model.py \
+    -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml \
+    -o Global.pretrained_model="./output/rec_ppocr_v3/latest" \
+    Global.save_inference_dir="./inference_model/rec_ppocr_v3/"
+
+

同检测模型,识别模型也只训练了1个epoch,因此我们使用训练最优的模型进行预测,存储在/home/aistudio/best_models/目录下,解压即可

+
1
+2
+3
cd /home/aistudio/best_models/
+wget https://paddleocr.bj.bcebos.com/fanliku/PCB/rec_ppocr_v3_ch_infer_PCB.tar
+tar xf /home/aistudio/best_models/rec_ppocr_v3_ch_infer_PCB.tar -C /home/aistudio/PaddleOCR/pretrain_models/
+
+
1
+2
+3
+4
+5
+6
+7
+8
# 识别模型inference模型预测
+cd /home/aistudio/PaddleOCR/
+python3 tools/infer/predict_rec.py \
+    --image_dir="../test_imgs/0000_rec.jpg" \
+    --rec_model_dir="./pretrain_models/rec_ppocr_v3_ch_infer_PCB" \
+    --rec_image_shape="3, 48, 320" \
+    --use_space_char=False \
+    --use_gpu=True
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
# 检测+识别模型inference模型预测
+cd /home/aistudio/PaddleOCR/
+python3 tools/infer/predict_system.py  \
+    --image_dir="../test_imgs/0000.jpg" \
+    --det_model_dir="./pretrain_models/det_ppocr_v3_en_infer_PCB" \
+    --det_limit_side_len=48 \
+    --det_limit_type='min' \
+    --det_db_unclip_ratio=2.5 \
+    --rec_model_dir="./pretrain_models/rec_ppocr_v3_ch_infer_PCB"  \
+    --rec_image_shape="3, 48, 320" \
+    --draw_img_save_dir=./det_rec_infer/ \
+    --use_space_char=False \
+    --use_angle_cls=False \
+    --use_gpu=True
+
+

端到端预测结果存储在det_res_infer文件夹内,结果如下图所示:

+

+

7. 端对端评测

+

接下来介绍文本检测+文本识别的端对端指标评估方式。主要分为三步:

+

1)首先运行tools/infer/predict_system.py,将image_dir改为需要评估的数据文件家,得到保存的结果:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
# 检测+识别模型inference模型预测
+python3 tools/infer/predict_system.py  \
+    --image_dir="../dataset/imgs/" \
+    --det_model_dir="./pretrain_models/det_ppocr_v3_en_infer_PCB" \
+    --det_limit_side_len=48 \
+    --det_limit_type='min' \
+    --det_db_unclip_ratio=2.5 \
+    --rec_model_dir="./pretrain_models/rec_ppocr_v3_ch_infer_PCB"  \
+    --rec_image_shape="3, 48, 320" \
+    --draw_img_save_dir=./det_rec_infer/ \
+    --use_space_char=False \
+    --use_angle_cls=False \
+    --use_gpu=True
+
+

得到保存结果,文本检测识别可视化图保存在det_rec_infer/目录下,预测结果保存在det_rec_infer/system_results.txt中,格式如下:0018.jpg [{"transcription": "E295", "points": [[88, 33], [137, 33], [137, 40], [88, 40]]}]

+

2)然后将步骤一保存的数据转换为端对端评测需要的数据格式: 修改 tools/end2end/convert_ppocr_label.py中的代码,convert_label函数中设置输入标签路径,Mode,保存标签路径等,对预测数据的GTlabel和预测结果的label格式进行转换。

+
1
+2
+3
+4
+5
ppocr_label_gt =  "/home/aistudio/dataset/det_gt_val.txt"
+convert_label(ppocr_label_gt, "gt", "./save_gt_label/")
+
+ppocr_label_gt =  "/home/aistudio/PaddleOCR/PCB_result/det_rec_infer/system_results.txt"
+convert_label(ppocr_label_gt, "pred", "./save_PPOCRV2_infer/")
+
+

运行convert_ppocr_label.py:

+
python3 tools/end2end/convert_ppocr_label.py
+
+

得到如下结果:

+
1
+2
├── ./save_gt_label/
+├── ./save_PPOCRV2_infer/
+
+

3) 最后,执行端对端评测,运行tools/end2end/eval_end2end.py计算端对端指标,运行方式如下:

+
1
+2
pip install editdistance
+python3 tools/end2end/eval_end2end.py ./save_gt_label/ ./save_PPOCRV2_infer/
+
+

使用预训练模型+fine-tune'检测模型预训练模型 + 2W张PCB图片funetune识别模型,在300张PCB图片上评估得到如下结果,fmeasure为主要关注的指标:

+

+

注: 使用上述命令不能跑出该结果,因为数据集不相同,可以更换为自己训练好的模型,按上述流程运行

+

8. Jetson部署

+

我们只需要以下步骤就可以完成Jetson nano部署模型,简单易操作:

+

1、在Jetson nano开发版上环境准备:

+
    +
  • +

    安装PaddlePaddle

    +
  • +
  • +

    下载PaddleOCR并安装依赖

    +
  • +
+

2、执行预测

+
    +
  • +

    将推理模型下载到jetson

    +
  • +
  • +

    执行检测、识别、串联预测即可

    +
  • +
+

详细参考流程

+

9. 总结

+

检测实验分别使用PP-OCRv3预训练模型在PCB数据集上进行了直接评估、验证集padding、 fine-tune 3种方案,识别实验分别使用PP-OCRv3预训练模型在PCB数据集上进行了直接评估、 fine-tune、添加公开通用识别数据集、增加PCB图片数量4种方案,指标对比如下:

+
    +
  • 检测
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序号方案hmean效果提升实验分析
1PP-OCRv3英文超轻量检测预训练模型直接评估64.64%-提供的预训练模型具有泛化能力
2PP-OCRv3英文超轻量检测预训练模型 + 验证集padding直接评估72.13%+7.49%padding可以提升尺寸较小图片的检测效果
3PP-OCRv3英文超轻量检测预训练模型 + fine-tune100.00%+27.87%fine-tune会提升垂类场景效果
+
    +
  • 识别
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
序号方案acc效果提升实验分析
1PP-OCRv3中英文超轻量识别预训练模型直接评估46.67%-提供的预训练模型具有泛化能力
2PP-OCRv3中英文超轻量识别预训练模型 + fine-tune42.02%-4.65%在数据量不足的情况,反而比预训练模型效果低(也可以通过调整超参数再试试)
3PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 公开通用识别数据集77.00%+30.33%在数据量不足的情况下,可以考虑补充公开数据训练
4PP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量99.99%+22.99%如果能获取更多数据量的情况,可以通过增加数据量提升效果
+
    +
  • 端到端
  • +
+ + + + + + + + + + + + + + + +
detrecfmeasure
PP-OCRv3英文超轻量检测预训练模型 + fine-tunePP-OCRv3中英文超轻量识别预训练模型 + fine-tune + 增加PCB图像数量93.30%
+

结论

+

PP-OCRv3的检测模型在未经过fine-tune的情况下,在PCB数据集上也有64.64%的精度,说明具有泛化能力。验证集padding之后,精度提升7.5%,在图片尺寸较小的情况,我们可以通过padding的方式提升检测效果。经过 fine-tune 后能够极大的提升检测效果,精度达到100%。

+

PP-OCRv3的识别模型方案1和方案2对比可以发现,当数据量不足的情况,预训练模型精度可能比fine-tune效果还要高,所以我们可以先尝试预训练模型直接评估。如果在数据量不足的情况下想进一步提升模型效果,可以通过添加公开通用识别数据集,识别效果提升30%,非常有效。最后如果我们能够采集足够多的真实场景数据集,可以通过增加数据量提升模型效果,精度达到99.99%。

+

更多资源

+ +

参考

+ + + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/applications/images/0639da09b774458096ae577e82b2c59e89ced6a00f55458f946997ab7472a4f8.jpeg b/applications/images/0639da09b774458096ae577e82b2c59e89ced6a00f55458f946997ab7472a4f8.jpeg new file mode 100644 index 0000000000..4f8e720195 Binary files /dev/null and b/applications/images/0639da09b774458096ae577e82b2c59e89ced6a00f55458f946997ab7472a4f8.jpeg differ diff --git a/applications/images/06af09bde845449ba0a676410f4daa1cdc3983ac95034bdbbafac3b7fd94042f.jpeg b/applications/images/06af09bde845449ba0a676410f4daa1cdc3983ac95034bdbbafac3b7fd94042f.jpeg new file mode 100644 index 0000000000..273f81077e Binary files /dev/null and b/applications/images/06af09bde845449ba0a676410f4daa1cdc3983ac95034bdbbafac3b7fd94042f.jpeg differ diff --git a/applications/images/07c3b060c54e4b00be7de8d41a8a4696ff53835343cc4981aab0555183306e79.jpeg b/applications/images/07c3b060c54e4b00be7de8d41a8a4696ff53835343cc4981aab0555183306e79.jpeg new file mode 100644 index 0000000000..8409f79c00 Binary files /dev/null and b/applications/images/07c3b060c54e4b00be7de8d41a8a4696ff53835343cc4981aab0555183306e79.jpeg differ diff --git a/applications/images/0b056be24f374812b61abf43305774767ae122c8479242f98aa0799b7bfc81d4.jpeg b/applications/images/0b056be24f374812b61abf43305774767ae122c8479242f98aa0799b7bfc81d4.jpeg new file mode 100644 index 0000000000..8bd59226fc Binary files /dev/null and b/applications/images/0b056be24f374812b61abf43305774767ae122c8479242f98aa0799b7bfc81d4.jpeg differ diff --git a/applications/images/0d582de9aa46474791e08654f84a614a6510e98bfe5f4ad3a26501cbf49ec151.jpeg b/applications/images/0d582de9aa46474791e08654f84a614a6510e98bfe5f4ad3a26501cbf49ec151.jpeg new file mode 100644 index 0000000000..bdfeeaf7a7 Binary files /dev/null and b/applications/images/0d582de9aa46474791e08654f84a614a6510e98bfe5f4ad3a26501cbf49ec151.jpeg differ diff --git a/applications/images/0e25da2ccded4af19e95c85c3d3287ab4d53e31a4eed4607b6a4cb637c43f6d3.jpeg b/applications/images/0e25da2ccded4af19e95c85c3d3287ab4d53e31a4eed4607b6a4cb637c43f6d3.jpeg new file mode 100644 index 0000000000..b9fbfeb51d Binary files /dev/null and b/applications/images/0e25da2ccded4af19e95c85c3d3287ab4d53e31a4eed4607b6a4cb637c43f6d3.jpeg differ diff --git a/applications/images/0f650c032b0f4d56bd639713924768cc820635e9977845008d233f465291a29e.jpeg b/applications/images/0f650c032b0f4d56bd639713924768cc820635e9977845008d233f465291a29e.jpeg new file mode 100644 index 0000000000..0ead4514a3 Binary files /dev/null and b/applications/images/0f650c032b0f4d56bd639713924768cc820635e9977845008d233f465291a29e.jpeg differ diff --git a/applications/images/0f7d50a0fb924b408b93e1fbd6ca64148eed34a2e6724280acd3e113fef7dc48.jpeg b/applications/images/0f7d50a0fb924b408b93e1fbd6ca64148eed34a2e6724280acd3e113fef7dc48.jpeg new file mode 100644 index 0000000000..da01579fa2 Binary files /dev/null and b/applications/images/0f7d50a0fb924b408b93e1fbd6ca64148eed34a2e6724280acd3e113fef7dc48.jpeg differ diff --git a/applications/images/0fa18b25819042d9bbf3397c3af0e21433b23d52f7a84b0a8681b8e6a308d433.png b/applications/images/0fa18b25819042d9bbf3397c3af0e21433b23d52f7a84b0a8681b8e6a308d433.png new file mode 100644 index 0000000000..b5ab7be9a1 Binary files /dev/null and b/applications/images/0fa18b25819042d9bbf3397c3af0e21433b23d52f7a84b0a8681b8e6a308d433.png differ diff --git a/applications/images/1.jpeg b/applications/images/1.jpeg new file mode 100644 index 0000000000..c6724f6506 Binary files /dev/null and b/applications/images/1.jpeg differ diff --git a/applications/images/12d402e6a06d482a88f979e0ebdfb39f4d3fc8b80517499689ec607ddb04fbf3.jpeg b/applications/images/12d402e6a06d482a88f979e0ebdfb39f4d3fc8b80517499689ec607ddb04fbf3.jpeg new file mode 100644 index 0000000000..fae7721f29 Binary files /dev/null and b/applications/images/12d402e6a06d482a88f979e0ebdfb39f4d3fc8b80517499689ec607ddb04fbf3.jpeg differ diff --git a/applications/images/166ce56d634c4c7589fe68fbc6e7ae663305dcc82ba144c781507341ffae7fe8.jpeg b/applications/images/166ce56d634c4c7589fe68fbc6e7ae663305dcc82ba144c781507341ffae7fe8.jpeg new file mode 100644 index 0000000000..0cfb3c34ee Binary files /dev/null and b/applications/images/166ce56d634c4c7589fe68fbc6e7ae663305dcc82ba144c781507341ffae7fe8.jpeg differ diff --git a/applications/images/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808-20240704190212828.jpg b/applications/images/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808-20240704190212828.jpg new file mode 100644 index 0000000000..6a5fd84c52 Binary files /dev/null and b/applications/images/185310636-6ce02f7c-790d-479f-b163-ea97a5a04808-20240704190212828.jpg differ diff --git a/applications/images/185381131-76b6e260-04fe-46d9-baca-6bdd7fe0d0ce.jpg b/applications/images/185381131-76b6e260-04fe-46d9-baca-6bdd7fe0d0ce.jpg new file mode 100644 index 0000000000..fc5620caaf Binary files /dev/null and b/applications/images/185381131-76b6e260-04fe-46d9-baca-6bdd7fe0d0ce.jpg differ diff --git a/applications/images/185384321-61153faa-e407-45c4-8e7c-a39540248189.jpg b/applications/images/185384321-61153faa-e407-45c4-8e7c-a39540248189.jpg new file mode 100644 index 0000000000..50e76d8aa8 Binary files /dev/null and b/applications/images/185384321-61153faa-e407-45c4-8e7c-a39540248189.jpg differ diff --git a/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d-20240704190305748.jpg b/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d-20240704190305748.jpg new file mode 100644 index 0000000000..a80ea6c574 Binary files /dev/null and b/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d-20240704190305748.jpg differ diff --git a/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d.jpg b/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d.jpg new file mode 100644 index 0000000000..a80ea6c574 Binary files /dev/null and b/applications/images/185387870-dc9125a0-9ceb-4036-abf3-184b6e65dc7d.jpg differ diff --git a/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704185610566.jpg b/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704185610566.jpg new file mode 100644 index 0000000000..d406a52da8 Binary files /dev/null and b/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704185610566.jpg differ diff --git a/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704190316813.jpg b/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704190316813.jpg new file mode 100644 index 0000000000..d406a52da8 Binary files /dev/null and b/applications/images/185393805-c67ff571-cf7e-4217-a4b0-8b396c4f22bb-20240704190316813.jpg differ diff --git a/applications/images/190596141-74f4feda-b082-46d7-908d-b0bd5839b430.jpg b/applications/images/190596141-74f4feda-b082-46d7-908d-b0bd5839b430.jpg new file mode 100644 index 0000000000..01864b867f Binary files /dev/null and b/applications/images/190596141-74f4feda-b082-46d7-908d-b0bd5839b430.jpg differ diff --git a/applications/images/190597086-2e685200-22d0-4042-9e46-f61f24e02e4e.jpg b/applications/images/190597086-2e685200-22d0-4042-9e46-f61f24e02e4e.jpg new file mode 100644 index 0000000000..790cfa295c Binary files /dev/null and b/applications/images/190597086-2e685200-22d0-4042-9e46-f61f24e02e4e.jpg differ diff --git a/applications/images/190599426-3415b38e-e16e-4e68-9253-2ff531b1b5ca.png b/applications/images/190599426-3415b38e-e16e-4e68-9253-2ff531b1b5ca.png new file mode 100644 index 0000000000..eafb642486 Binary files /dev/null and b/applications/images/190599426-3415b38e-e16e-4e68-9253-2ff531b1b5ca.png differ diff --git a/applications/images/23a5a19c746441309864586e467f995ec8a551a3661640e493fc4d77520309cd.jpeg b/applications/images/23a5a19c746441309864586e467f995ec8a551a3661640e493fc4d77520309cd.jpeg new file mode 100644 index 0000000000..bd16607aa4 Binary files /dev/null and b/applications/images/23a5a19c746441309864586e467f995ec8a551a3661640e493fc4d77520309cd.jpeg differ diff --git a/applications/images/268c707a62c54e93958d2b2ab29e0932953aad41819e44aaaaa05c8ad85c6491.jpeg b/applications/images/268c707a62c54e93958d2b2ab29e0932953aad41819e44aaaaa05c8ad85c6491.jpeg new file mode 100644 index 0000000000..a80ec7cb12 Binary files /dev/null and b/applications/images/268c707a62c54e93958d2b2ab29e0932953aad41819e44aaaaa05c8ad85c6491.jpeg differ diff --git a/applications/images/2854aee557a74079a82dd5cd57e48bc2ce97974d5637477fb4deea137d0e312c.png b/applications/images/2854aee557a74079a82dd5cd57e48bc2ce97974d5637477fb4deea137d0e312c.png new file mode 100644 index 0000000000..1039b1ac02 Binary files /dev/null and b/applications/images/2854aee557a74079a82dd5cd57e48bc2ce97974d5637477fb4deea137d0e312c.png differ diff --git a/applications/images/2aff41ee8fce4e9bac8295cc00720217bde2aeee7ee7473689848bed0b6fde05.jpeg b/applications/images/2aff41ee8fce4e9bac8295cc00720217bde2aeee7ee7473689848bed0b6fde05.jpeg new file mode 100644 index 0000000000..94de02d894 Binary files /dev/null and b/applications/images/2aff41ee8fce4e9bac8295cc00720217bde2aeee7ee7473689848bed0b6fde05.jpeg differ diff --git a/applications/images/2e45f297c9d44ca5b8718ae100a365f7348eaeed4cb8495b904f28a9c8075d8a.jpeg b/applications/images/2e45f297c9d44ca5b8718ae100a365f7348eaeed4cb8495b904f28a9c8075d8a.jpeg new file mode 100644 index 0000000000..afcd1eb3ab Binary files /dev/null and b/applications/images/2e45f297c9d44ca5b8718ae100a365f7348eaeed4cb8495b904f28a9c8075d8a.jpeg differ diff --git a/applications/images/31e3dbee31d441d2a36d45b5af660e832dfa2f437f4d49a1914312a15b6a29a7.jpeg b/applications/images/31e3dbee31d441d2a36d45b5af660e832dfa2f437f4d49a1914312a15b6a29a7.jpeg new file mode 100644 index 0000000000..91aa63ad3e Binary files /dev/null and b/applications/images/31e3dbee31d441d2a36d45b5af660e832dfa2f437f4d49a1914312a15b6a29a7.jpeg differ diff --git a/applications/images/3277b750159f4b68b2b58506bfec9005d49aeb5fb1d9411e83f96f9ff7eb66a5.png b/applications/images/3277b750159f4b68b2b58506bfec9005d49aeb5fb1d9411e83f96f9ff7eb66a5.png new file mode 100644 index 0000000000..15e0467c03 Binary files /dev/null and b/applications/images/3277b750159f4b68b2b58506bfec9005d49aeb5fb1d9411e83f96f9ff7eb66a5.png differ diff --git a/applications/images/37206ea48a244212ae7a821d50d1fd51faf3d7fe97ac47a29f04dfcbb377b019.png b/applications/images/37206ea48a244212ae7a821d50d1fd51faf3d7fe97ac47a29f04dfcbb377b019.png new file mode 100644 index 0000000000..f0a50b47ee Binary files /dev/null and b/applications/images/37206ea48a244212ae7a821d50d1fd51faf3d7fe97ac47a29f04dfcbb377b019.png differ diff --git a/applications/images/39ff30e0ab0442579712255e6a9ea6b5271169c98e624e6eb2b8781f003bfea0.png b/applications/images/39ff30e0ab0442579712255e6a9ea6b5271169c98e624e6eb2b8781f003bfea0.png new file mode 100644 index 0000000000..936dee3a54 Binary files /dev/null and b/applications/images/39ff30e0ab0442579712255e6a9ea6b5271169c98e624e6eb2b8781f003bfea0.png differ diff --git a/applications/images/3bce057a8e0c40a0acbd26b2e29e4e2590a31bc412764be7b9e49799c69cb91c.jpg b/applications/images/3bce057a8e0c40a0acbd26b2e29e4e2590a31bc412764be7b9e49799c69cb91c.jpg new file mode 100644 index 0000000000..d1e60cf7f4 Binary files /dev/null and b/applications/images/3bce057a8e0c40a0acbd26b2e29e4e2590a31bc412764be7b9e49799c69cb91c.jpg differ diff --git a/applications/images/3d762970e2184177a2c633695a31029332a4cd805631430ea797309492e45402.jpeg b/applications/images/3d762970e2184177a2c633695a31029332a4cd805631430ea797309492e45402.jpeg new file mode 100644 index 0000000000..8124a50222 Binary files /dev/null and b/applications/images/3d762970e2184177a2c633695a31029332a4cd805631430ea797309492e45402.jpeg differ diff --git a/applications/images/3dc7f69fac174cde96b9d08b5e2353a1d88dc63e7be9410894c0783660b35b76.jpeg b/applications/images/3dc7f69fac174cde96b9d08b5e2353a1d88dc63e7be9410894c0783660b35b76.jpeg new file mode 100644 index 0000000000..9d134f5da4 Binary files /dev/null and b/applications/images/3dc7f69fac174cde96b9d08b5e2353a1d88dc63e7be9410894c0783660b35b76.jpeg differ diff --git a/applications/images/3de0d475c69746d0a184029001ef07c85fd68816d66d4beaa10e6ef60030f9b4.jpeg b/applications/images/3de0d475c69746d0a184029001ef07c85fd68816d66d4beaa10e6ef60030f9b4.jpeg new file mode 100644 index 0000000000..825c2c8e28 Binary files /dev/null and b/applications/images/3de0d475c69746d0a184029001ef07c85fd68816d66d4beaa10e6ef60030f9b4.jpeg differ diff --git a/applications/images/42d2188d3d6b498880952e12c3ceae1efabf135f8d9f4c31823f09ebe02ba9d2.jpeg b/applications/images/42d2188d3d6b498880952e12c3ceae1efabf135f8d9f4c31823f09ebe02ba9d2.jpeg new file mode 100644 index 0000000000..bf72ad21c0 Binary files /dev/null and b/applications/images/42d2188d3d6b498880952e12c3ceae1efabf135f8d9f4c31823f09ebe02ba9d2.jpeg differ diff --git a/applications/images/456ae2acb27d4a94896c478812aee0bc3551c703d7bd40c9be4dc983c7b3fc8a.png b/applications/images/456ae2acb27d4a94896c478812aee0bc3551c703d7bd40c9be4dc983c7b3fc8a.png new file mode 100644 index 0000000000..5d6ad7b223 Binary files /dev/null and b/applications/images/456ae2acb27d4a94896c478812aee0bc3551c703d7bd40c9be4dc983c7b3fc8a.png differ diff --git a/applications/images/45f288ce8b2c45d8aa5407785b4b40f4876fc3da23744bd7a78060797fba0190.jpeg b/applications/images/45f288ce8b2c45d8aa5407785b4b40f4876fc3da23744bd7a78060797fba0190.jpeg new file mode 100644 index 0000000000..1737e0eaeb Binary files /dev/null and b/applications/images/45f288ce8b2c45d8aa5407785b4b40f4876fc3da23744bd7a78060797fba0190.jpeg differ diff --git a/applications/images/46258d0dc9dc40bab3ea0e70434e4a905646df8a647f4c49921e217de5142def.jpeg b/applications/images/46258d0dc9dc40bab3ea0e70434e4a905646df8a647f4c49921e217de5142def.jpeg new file mode 100644 index 0000000000..4491dd4b72 Binary files /dev/null and b/applications/images/46258d0dc9dc40bab3ea0e70434e4a905646df8a647f4c49921e217de5142def.jpeg differ diff --git a/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269-20240704185744623.png b/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269-20240704185744623.png new file mode 100644 index 0000000000..201198ae32 Binary files /dev/null and b/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269-20240704185744623.png differ diff --git a/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269.png b/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269.png new file mode 100644 index 0000000000..201198ae32 Binary files /dev/null and b/applications/images/498119182f0a414ab86ae2de752fa31c9ddc3a74a76847049cc57884602cb269.png differ diff --git a/applications/images/4de19ca3e54343e88961e816cad28bbacdc807f40b9440be914d871b0a914570.jpeg b/applications/images/4de19ca3e54343e88961e816cad28bbacdc807f40b9440be914d871b0a914570.jpeg new file mode 100644 index 0000000000..1c76ef2f97 Binary files /dev/null and b/applications/images/4de19ca3e54343e88961e816cad28bbacdc807f40b9440be914d871b0a914570.jpeg differ diff --git a/applications/images/4f8f5533a2914e0a821f4a639677843c32ec1f08a1b1488d94c0b8bfb6e72d2d.jpeg b/applications/images/4f8f5533a2914e0a821f4a639677843c32ec1f08a1b1488d94c0b8bfb6e72d2d.jpeg new file mode 100644 index 0000000000..2e72a9831f Binary files /dev/null and b/applications/images/4f8f5533a2914e0a821f4a639677843c32ec1f08a1b1488d94c0b8bfb6e72d2d.jpeg differ diff --git a/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b-0096905.png b/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b-0096905.png new file mode 100644 index 0000000000..a1c79b4da8 Binary files /dev/null and b/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b-0096905.png differ diff --git a/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b.png b/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b.png new file mode 100644 index 0000000000..a1c79b4da8 Binary files /dev/null and b/applications/images/54f3053e6e1b47a39b26e757006fe2c44910d60a3809422ab76c25396b92e69b.png differ diff --git a/applications/images/560c44b8dd604da7987bd25da0a882156ffcfb7f6bcb44108fe9bde77512e572.jpeg b/applications/images/560c44b8dd604da7987bd25da0a882156ffcfb7f6bcb44108fe9bde77512e572.jpeg new file mode 100644 index 0000000000..3e060015ba Binary files /dev/null and b/applications/images/560c44b8dd604da7987bd25da0a882156ffcfb7f6bcb44108fe9bde77512e572.jpeg differ diff --git a/applications/images/5939ae15a1f0445aaeec15c68107dbd897740a1ddd284bf8b583bb6242099157.jpeg b/applications/images/5939ae15a1f0445aaeec15c68107dbd897740a1ddd284bf8b583bb6242099157.jpeg new file mode 100644 index 0000000000..030ab548f6 Binary files /dev/null and b/applications/images/5939ae15a1f0445aaeec15c68107dbd897740a1ddd284bf8b583bb6242099157.jpeg differ diff --git a/applications/images/59ab0411c8eb4dfd917fb2b6e5b69a17ee7ca48351444aec9ac6104b79ff1028.jpg b/applications/images/59ab0411c8eb4dfd917fb2b6e5b69a17ee7ca48351444aec9ac6104b79ff1028.jpg new file mode 100644 index 0000000000..bb5f304b8c Binary files /dev/null and b/applications/images/59ab0411c8eb4dfd917fb2b6e5b69a17ee7ca48351444aec9ac6104b79ff1028.jpg differ diff --git a/applications/images/5a75137c5f924dfeb6956b5818812298cc3dc7992ac84954b4175be9adf83c77.jpeg b/applications/images/5a75137c5f924dfeb6956b5818812298cc3dc7992ac84954b4175be9adf83c77.jpeg new file mode 100644 index 0000000000..9858df6571 Binary files /dev/null and b/applications/images/5a75137c5f924dfeb6956b5818812298cc3dc7992ac84954b4175be9adf83c77.jpeg differ diff --git a/applications/images/5df160ac39ee4d9e92a937094bc53a737272f9f2abeb4ddfaebb48e8eccf1be2.jpeg b/applications/images/5df160ac39ee4d9e92a937094bc53a737272f9f2abeb4ddfaebb48e8eccf1be2.jpeg new file mode 100644 index 0000000000..ebb6feeb40 Binary files /dev/null and b/applications/images/5df160ac39ee4d9e92a937094bc53a737272f9f2abeb4ddfaebb48e8eccf1be2.jpeg differ diff --git a/applications/images/60b95b4945954f81a080a8f308cee66f83146479cd1142b9b6b1290938fd1df8.jpeg b/applications/images/60b95b4945954f81a080a8f308cee66f83146479cd1142b9b6b1290938fd1df8.jpeg new file mode 100644 index 0000000000..b2ebcd9651 Binary files /dev/null and b/applications/images/60b95b4945954f81a080a8f308cee66f83146479cd1142b9b6b1290938fd1df8.jpeg differ diff --git a/applications/images/68747470733.png b/applications/images/68747470733.png new file mode 100644 index 0000000000..60918da45b Binary files /dev/null and b/applications/images/68747470733.png differ diff --git a/applications/images/68747470733a2f2f6169.png b/applications/images/68747470733a2f2f6169.png new file mode 100644 index 0000000000..88c8dda981 Binary files /dev/null and b/applications/images/68747470733a2f2f6169.png differ diff --git a/applications/images/6afdbb77e8db4aef9b169e4e94c5d90a9764cfab4f2c4c04aa9afdf4f54d7680.jpeg b/applications/images/6afdbb77e8db4aef9b169e4e94c5d90a9764cfab4f2c4c04aa9afdf4f54d7680.jpeg new file mode 100644 index 0000000000..20e7a228ca Binary files /dev/null and b/applications/images/6afdbb77e8db4aef9b169e4e94c5d90a9764cfab4f2c4c04aa9afdf4f54d7680.jpeg differ diff --git a/applications/images/6f875b6e695e4fe5aedf427beb0d4ce8064ad7cc33c44faaad59d3eb9732639d.jpeg b/applications/images/6f875b6e695e4fe5aedf427beb0d4ce8064ad7cc33c44faaad59d3eb9732639d.jpeg new file mode 100644 index 0000000000..6783c686aa Binary files /dev/null and b/applications/images/6f875b6e695e4fe5aedf427beb0d4ce8064ad7cc33c44faaad59d3eb9732639d.jpeg differ diff --git a/applications/images/75b0e977dfb74a83851f8828460759f337b1b7a0c33c47a08a30f3570e1e2e74.jpeg b/applications/images/75b0e977dfb74a83851f8828460759f337b1b7a0c33c47a08a30f3570e1e2e74.jpeg new file mode 100644 index 0000000000..00fb5f3f2a Binary files /dev/null and b/applications/images/75b0e977dfb74a83851f8828460759f337b1b7a0c33c47a08a30f3570e1e2e74.jpeg differ diff --git a/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7-20240704185943337.png b/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7-20240704185943337.png new file mode 100644 index 0000000000..87c7515b86 Binary files /dev/null and b/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7-20240704185943337.png differ diff --git a/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7.png b/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7.png new file mode 100644 index 0000000000..87c7515b86 Binary files /dev/null and b/applications/images/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7.png differ diff --git a/applications/images/7a8865.png b/applications/images/7a8865.png new file mode 100644 index 0000000000..72cfec34cf Binary files /dev/null and b/applications/images/7a8865.png differ diff --git a/applications/images/7d5774a.png b/applications/images/7d5774a.png new file mode 100644 index 0000000000..88c8dda981 Binary files /dev/null and b/applications/images/7d5774a.png differ diff --git a/applications/images/864604967256461aa7c5d32cd240645e9f4c70af773341d5911f22d5a3e87b5f.jpeg b/applications/images/864604967256461aa7c5d32cd240645e9f4c70af773341d5911f22d5a3e87b5f.jpeg new file mode 100644 index 0000000000..60e6db5ba0 Binary files /dev/null and b/applications/images/864604967256461aa7c5d32cd240645e9f4c70af773341d5911f22d5a3e87b5f.jpeg differ diff --git a/applications/images/89ba046177864d8783ced6cb31ba92a66ca2169856a44ee59ac2bb18e44a6c4b.jpeg b/applications/images/89ba046177864d8783ced6cb31ba92a66ca2169856a44ee59ac2bb18e44a6c4b.jpeg new file mode 100644 index 0000000000..13581d6701 Binary files /dev/null and b/applications/images/89ba046177864d8783ced6cb31ba92a66ca2169856a44ee59ac2bb18e44a6c4b.jpeg differ diff --git a/applications/images/89f42eccd600439fa9e28c97ccb663726e4e54ce3a854825b4c3b7d554ea21df.jpeg b/applications/images/89f42eccd600439fa9e28c97ccb663726e4e54ce3a854825b4c3b7d554ea21df.jpeg new file mode 100644 index 0000000000..dd40242afd Binary files /dev/null and b/applications/images/89f42eccd600439fa9e28c97ccb663726e4e54ce3a854825b4c3b7d554ea21df.jpeg differ diff --git a/applications/images/8bb381f164c54ea9b4043cf66fc92ffdea8aaf851bab484fa6e19bd2f93f154f.jpeg b/applications/images/8bb381f164c54ea9b4043cf66fc92ffdea8aaf851bab484fa6e19bd2f93f154f.jpeg new file mode 100644 index 0000000000..42b598e006 Binary files /dev/null and b/applications/images/8bb381f164c54ea9b4043cf66fc92ffdea8aaf851bab484fa6e19bd2f93f154f.jpeg differ diff --git a/applications/images/8d1022ac25d9474daa4fb236235bd58760039d58ad46414f841559d68e0d057f.jpeg b/applications/images/8d1022ac25d9474daa4fb236235bd58760039d58ad46414f841559d68e0d057f.jpeg new file mode 100644 index 0000000000..459b443624 Binary files /dev/null and b/applications/images/8d1022ac25d9474daa4fb236235bd58760039d58ad46414f841559d68e0d057f.jpeg differ diff --git a/applications/images/8dca91f016884e16ad9216d416da72ea08190f97d87b4be883f15079b7ebab9a.jpeg b/applications/images/8dca91f016884e16ad9216d416da72ea08190f97d87b4be883f15079b7ebab9a.jpeg new file mode 100644 index 0000000000..d435a27307 Binary files /dev/null and b/applications/images/8dca91f016884e16ad9216d416da72ea08190f97d87b4be883f15079b7ebab9a.jpeg differ diff --git a/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373-20240704185855034.png b/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373-20240704185855034.png new file mode 100644 index 0000000000..6a9305c62d Binary files /dev/null and b/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373-20240704185855034.png differ diff --git a/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373.png b/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373.png new file mode 100644 index 0000000000..6a9305c62d Binary files /dev/null and b/applications/images/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373.png differ diff --git a/applications/images/93c66a43a69e472899c1c6732408b7a42e99a43721e94e9ca3c0a64e080306e4.jpeg b/applications/images/93c66a43a69e472899c1c6732408b7a42e99a43721e94e9ca3c0a64e080306e4.jpeg new file mode 100644 index 0000000000..c69632fc27 Binary files /dev/null and b/applications/images/93c66a43a69e472899c1c6732408b7a42e99a43721e94e9ca3c0a64e080306e4.jpeg differ diff --git a/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880-0096678.png b/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880-0096678.png new file mode 100644 index 0000000000..db1d6bd51e Binary files /dev/null and b/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880-0096678.png differ diff --git a/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880.png b/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880.png new file mode 100644 index 0000000000..db1d6bd51e Binary files /dev/null and b/applications/images/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880.png differ diff --git a/applications/images/965db9f758614c6f9be301286cd5918f21110603c8aa4a1dbf5371e3afeec782.jpeg b/applications/images/965db9f758614c6f9be301286cd5918f21110603c8aa4a1dbf5371e3afeec782.jpeg new file mode 100644 index 0000000000..fc2029f01f Binary files /dev/null and b/applications/images/965db9f758614c6f9be301286cd5918f21110603c8aa4a1dbf5371e3afeec782.jpeg differ diff --git a/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8-20240704185952731.jpeg b/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8-20240704185952731.jpeg new file mode 100644 index 0000000000..067cd277b2 Binary files /dev/null and b/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8-20240704185952731.jpeg differ diff --git a/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8.jpeg b/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8.jpeg new file mode 100644 index 0000000000..067cd277b2 Binary files /dev/null and b/applications/images/981640e17d05487e961162f8576c9e11634ca157f79048d4bd9d3bc21722afe8.jpeg differ diff --git a/applications/images/9a709f19e7174725a8cfb09fd922ade74f8e9eb73ae1438596cbb2facef9c24a.jpeg b/applications/images/9a709f19e7174725a8cfb09fd922ade74f8e9eb73ae1438596cbb2facef9c24a.jpeg new file mode 100644 index 0000000000..3fe74c42d5 Binary files /dev/null and b/applications/images/9a709f19e7174725a8cfb09fd922ade74f8e9eb73ae1438596cbb2facef9c24a.jpeg differ diff --git a/applications/images/9a7a4e19edc24310b46620f2ee7430f918223b93d4f14a15a52973c096926bad.jpeg b/applications/images/9a7a4e19edc24310b46620f2ee7430f918223b93d4f14a15a52973c096926bad.jpeg new file mode 100644 index 0000000000..0b3f6ceaee Binary files /dev/null and b/applications/images/9a7a4e19edc24310b46620f2ee7430f918223b93d4f14a15a52973c096926bad.jpeg differ diff --git a/applications/images/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e.jpeg b/applications/images/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e.jpeg new file mode 100644 index 0000000000..8c3c4d7f36 Binary files /dev/null and b/applications/images/9bd844b970f94e5ba0bc0c5799bd819ea9b1861bb306471fabc2d628864d418e.jpeg differ diff --git a/applications/images/9f45d3eef75e4842a0828bb9e518c2438300264aec0646cc9addfce860a04196.png b/applications/images/9f45d3eef75e4842a0828bb9e518c2438300264aec0646cc9addfce860a04196.png new file mode 100644 index 0000000000..f62001d2ed Binary files /dev/null and b/applications/images/9f45d3eef75e4842a0828bb9e518c2438300264aec0646cc9addfce860a04196.png differ diff --git a/applications/images/9fc78bbcdf754898b9b2c7f000ddf562afac786482ab4f2ab063e2242faa542a.jpeg b/applications/images/9fc78bbcdf754898b9b2c7f000ddf562afac786482ab4f2ab063e2242faa542a.jpeg new file mode 100644 index 0000000000..1db776bcde Binary files /dev/null and b/applications/images/9fc78bbcdf754898b9b2c7f000ddf562afac786482ab4f2ab063e2242faa542a.jpeg differ diff --git a/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b-0097611.jpeg b/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b-0097611.jpeg new file mode 100644 index 0000000000..ff0a282351 Binary files /dev/null and b/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b-0097611.jpeg differ diff --git a/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b.jpeg b/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b.jpeg new file mode 100644 index 0000000000..ff0a282351 Binary files /dev/null and b/applications/images/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b.jpeg differ diff --git a/applications/images/a5973a8ddeff4bd7ac082f02dc4d0c79de21e721b41641cbb831f23c2cb8fce2.jpeg b/applications/images/a5973a8ddeff4bd7ac082f02dc4d0c79de21e721b41641cbb831f23c2cb8fce2.jpeg new file mode 100644 index 0000000000..3aabbb6c87 Binary files /dev/null and b/applications/images/a5973a8ddeff4bd7ac082f02dc4d0c79de21e721b41641cbb831f23c2cb8fce2.jpeg differ diff --git a/applications/images/a73180425fa14f919ce52d9bf70246c3995acea1831843cca6c17d871b8f5d95.jpeg b/applications/images/a73180425fa14f919ce52d9bf70246c3995acea1831843cca6c17d871b8f5d95.jpeg new file mode 100644 index 0000000000..3a13c7d35f Binary files /dev/null and b/applications/images/a73180425fa14f919ce52d9bf70246c3995acea1831843cca6c17d871b8f5d95.jpeg differ diff --git a/applications/images/ab93d3d90d77437a81c9534b2dd1d3e39ef81e8473054fd3aeff6e837ebfb827.jpeg b/applications/images/ab93d3d90d77437a81c9534b2dd1d3e39ef81e8473054fd3aeff6e837ebfb827.jpeg new file mode 100644 index 0000000000..ddcf375813 Binary files /dev/null and b/applications/images/ab93d3d90d77437a81c9534b2dd1d3e39ef81e8473054fd3aeff6e837ebfb827.jpeg differ diff --git a/applications/images/ad7c02745491498d82e0ce95f4a274f9b3920b2f467646858709359b7af9d869.png b/applications/images/ad7c02745491498d82e0ce95f4a274f9b3920b2f467646858709359b7af9d869.png new file mode 100644 index 0000000000..178add3e81 Binary files /dev/null and b/applications/images/ad7c02745491498d82e0ce95f4a274f9b3920b2f467646858709359b7af9d869.png differ diff --git a/applications/images/b7230e9964074181837e1132029f9da8178bf564ac5c43a9a93a30e975c0d8b4.jpeg b/applications/images/b7230e9964074181837e1132029f9da8178bf564ac5c43a9a93a30e975c0d8b4.jpeg new file mode 100644 index 0000000000..26bed005ba Binary files /dev/null and b/applications/images/b7230e9964074181837e1132029f9da8178bf564ac5c43a9a93a30e975c0d8b4.jpeg differ diff --git a/applications/images/bab32d32bdec4339b9a3e5f911e4b41f77996f3faabc40bd8309b5b20cad31e4.jpeg b/applications/images/bab32d32bdec4339b9a3e5f911e4b41f77996f3faabc40bd8309b5b20cad31e4.jpeg new file mode 100644 index 0000000000..11e5cae380 Binary files /dev/null and b/applications/images/bab32d32bdec4339b9a3e5f911e4b41f77996f3faabc40bd8309b5b20cad31e4.jpeg differ diff --git a/applications/images/bb7a345687814a3d83a29790f2a2b7d081495b3a920b43988c93da6039cad653.jpeg b/applications/images/bb7a345687814a3d83a29790f2a2b7d081495b3a920b43988c93da6039cad653.jpeg new file mode 100644 index 0000000000..628dd9acdc Binary files /dev/null and b/applications/images/bb7a345687814a3d83a29790f2a2b7d081495b3a920b43988c93da6039cad653.jpeg differ diff --git a/applications/images/c07c88f708ad43cc8cd615861626d0e8333c0e3d4dda49ac8cba1f8939fa8a94.jpeg b/applications/images/c07c88f708ad43cc8cd615861626d0e8333c0e3d4dda49ac8cba1f8939fa8a94.jpeg new file mode 100644 index 0000000000..ec071d00dd Binary files /dev/null and b/applications/images/c07c88f708ad43cc8cd615861626d0e8333c0e3d4dda49ac8cba1f8939fa8a94.jpeg differ diff --git a/applications/images/c1a7d197847a4f168848c59b8e625d1d5e8066b778144395a8b9382bb85dc364.jpeg b/applications/images/c1a7d197847a4f168848c59b8e625d1d5e8066b778144395a8b9382bb85dc364.jpeg new file mode 100644 index 0000000000..c93d380cf2 Binary files /dev/null and b/applications/images/c1a7d197847a4f168848c59b8e625d1d5e8066b778144395a8b9382bb85dc364.jpeg differ diff --git a/applications/images/c306b2f028364805a55494d435ab553a76cf5ae5dd3f4649a948ea9aeaeb28b8.png b/applications/images/c306b2f028364805a55494d435ab553a76cf5ae5dd3f4649a948ea9aeaeb28b8.png new file mode 100644 index 0000000000..ccb5c8b21f Binary files /dev/null and b/applications/images/c306b2f028364805a55494d435ab553a76cf5ae5dd3f4649a948ea9aeaeb28b8.png differ diff --git a/applications/images/c570f343c29846c792da56ebaca16c50708477514dd048cea8bef37ffa85d03f.jpeg b/applications/images/c570f343c29846c792da56ebaca16c50708477514dd048cea8bef37ffa85d03f.jpeg new file mode 100644 index 0000000000..b0e78bdd32 Binary files /dev/null and b/applications/images/c570f343c29846c792da56ebaca16c50708477514dd048cea8bef37ffa85d03f.jpeg differ diff --git a/applications/images/c7fc5e631dd44bc8b714630f4e49d9155a831d9e56c64e2482ded87081d0db22.jpeg b/applications/images/c7fc5e631dd44bc8b714630f4e49d9155a831d9e56c64e2482ded87081d0db22.jpeg new file mode 100644 index 0000000000..efeed96302 Binary files /dev/null and b/applications/images/c7fc5e631dd44bc8b714630f4e49d9155a831d9e56c64e2482ded87081d0db22.jpeg differ diff --git a/applications/images/cbda3390cb994f98a3c8a9ba88c90c348497763f6c9f4b4797f7d63d84da5f63.jpeg b/applications/images/cbda3390cb994f98a3c8a9ba88c90c348497763f6c9f4b4797f7d63d84da5f63.jpeg new file mode 100644 index 0000000000..8c3b594637 Binary files /dev/null and b/applications/images/cbda3390cb994f98a3c8a9ba88c90c348497763f6c9f4b4797f7d63d84da5f63.jpeg differ diff --git a/applications/images/char_spacing_compact.jpg b/applications/images/char_spacing_compact.jpg new file mode 100644 index 0000000000..7355792851 Binary files /dev/null and b/applications/images/char_spacing_compact.jpg differ diff --git a/applications/images/color_image.jpg b/applications/images/color_image.jpg new file mode 100644 index 0000000000..19848675c2 Binary files /dev/null and b/applications/images/color_image.jpg differ diff --git a/applications/images/d1e7780f0c7745ada4be540decefd6288e4d59257d8141f6842682a4c05d28b6.jpg b/applications/images/d1e7780f0c7745ada4be540decefd6288e4d59257d8141f6842682a4c05d28b6.jpg new file mode 100644 index 0000000000..172b496d83 Binary files /dev/null and b/applications/images/d1e7780f0c7745ada4be540decefd6288e4d59257d8141f6842682a4c05d28b6.jpg differ diff --git a/applications/images/d445cf4d850e4063b9a7fc6a075c12204cf912ff23ec471fa2e268b661b3d693.jpeg b/applications/images/d445cf4d850e4063b9a7fc6a075c12204cf912ff23ec471fa2e268b661b3d693.jpeg new file mode 100644 index 0000000000..a0db62f345 Binary files /dev/null and b/applications/images/d445cf4d850e4063b9a7fc6a075c12204cf912ff23ec471fa2e268b661b3d693.jpeg differ diff --git a/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a-20240704185905678.jpg b/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a-20240704185905678.jpg new file mode 100644 index 0000000000..2f48d60123 Binary files /dev/null and b/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a-20240704185905678.jpg differ diff --git a/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a.png b/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a.png new file mode 100644 index 0000000000..c8a2989e22 Binary files /dev/null and b/applications/images/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a.png differ diff --git a/applications/images/d5143df967fa4364a38868793fe7c57b0c0b1213930243babd6ae01423dcbc4d.png b/applications/images/d5143df967fa4364a38868793fe7c57b0c0b1213930243babd6ae01423dcbc4d.png new file mode 100644 index 0000000000..d89db69446 Binary files /dev/null and b/applications/images/d5143df967fa4364a38868793fe7c57b0c0b1213930243babd6ae01423dcbc4d.png differ diff --git a/applications/images/d686a48d465a43d09fbee51924fdca42ee21c50e676646da8559fb9967b94185.png b/applications/images/d686a48d465a43d09fbee51924fdca42ee21c50e676646da8559fb9967b94185.png new file mode 100644 index 0000000000..93fdafc3b0 Binary files /dev/null and b/applications/images/d686a48d465a43d09fbee51924fdca42ee21c50e676646da8559fb9967b94185.png differ diff --git a/applications/images/d7f96effc2434a3ca2d4144ff33c50282b830670c892487d8d7dec151921cce7.jpeg b/applications/images/d7f96effc2434a3ca2d4144ff33c50282b830670c892487d8d7dec151921cce7.jpeg new file mode 100644 index 0000000000..1871630ef1 Binary files /dev/null and b/applications/images/d7f96effc2434a3ca2d4144ff33c50282b830670c892487d8d7dec151921cce7.jpeg differ diff --git a/applications/images/d9e0533cc1df47ffa3bbe99de9e42639a3ebfa5bce834bafb1ca4574bf9db684.jpeg b/applications/images/d9e0533cc1df47ffa3bbe99de9e42639a3ebfa5bce834bafb1ca4574bf9db684.jpeg new file mode 100644 index 0000000000..7e36cb6c8d Binary files /dev/null and b/applications/images/d9e0533cc1df47ffa3bbe99de9e42639a3ebfa5bce834bafb1ca4574bf9db684.jpeg differ diff --git a/applications/images/da82ae8ef8fd479aaa38e1049eb3a681cf020dc108fa458eb3ec79da53b45fd1.png b/applications/images/da82ae8ef8fd479aaa38e1049eb3a681cf020dc108fa458eb3ec79da53b45fd1.png new file mode 100644 index 0000000000..f4c5e8e6da Binary files /dev/null and b/applications/images/da82ae8ef8fd479aaa38e1049eb3a681cf020dc108fa458eb3ec79da53b45fd1.png differ diff --git a/applications/images/dc10a070018d4d27946c26ec24a2a85bc3f16422f4964f72a9b63c6170d954e1.jpeg b/applications/images/dc10a070018d4d27946c26ec24a2a85bc3f16422f4964f72a9b63c6170d954e1.jpeg new file mode 100644 index 0000000000..aedcbaa2a8 Binary files /dev/null and b/applications/images/dc10a070018d4d27946c26ec24a2a85bc3f16422f4964f72a9b63c6170d954e1.jpeg differ diff --git a/applications/images/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e.jpeg b/applications/images/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e.jpeg new file mode 100644 index 0000000000..bcbf9c6af4 Binary files /dev/null and b/applications/images/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e.jpeg differ diff --git a/applications/images/dedab7b7fd6543aa9e7f625132b24e3ba3f200e361fa468dac615f7814dfb98d.jpeg b/applications/images/dedab7b7fd6543aa9e7f625132b24e3ba3f200e361fa468dac615f7814dfb98d.jpeg new file mode 100644 index 0000000000..b82acbba3e Binary files /dev/null and b/applications/images/dedab7b7fd6543aa9e7f625132b24e3ba3f200e361fa468dac615f7814dfb98d.jpeg differ diff --git a/applications/images/e0dc05039c7444c5ab1260ff550a408748df8d4cfe864223adf390e51058dbd5.jpeg b/applications/images/e0dc05039c7444c5ab1260ff550a408748df8d4cfe864223adf390e51058dbd5.jpeg new file mode 100644 index 0000000000..c841c4be8b Binary files /dev/null and b/applications/images/e0dc05039c7444c5ab1260ff550a408748df8d4cfe864223adf390e51058dbd5.jpeg differ diff --git a/applications/images/e1e798c87472477fa0bfca0da12bb0c180845a3e167a4761b0d26ff4330a5ccb.jpeg b/applications/images/e1e798c87472477fa0bfca0da12bb0c180845a3e167a4761b0d26ff4330a5ccb.jpeg new file mode 100644 index 0000000000..bb260088f5 Binary files /dev/null and b/applications/images/e1e798c87472477fa0bfca0da12bb0c180845a3e167a4761b0d26ff4330a5ccb.jpeg differ diff --git a/applications/images/e61e6ba685534eda992cea30a63a9c461646040ffd0c4d208a5eebb85897dcf7-0096772.jpeg b/applications/images/e61e6ba685534eda992cea30a63a9c461646040ffd0c4d208a5eebb85897dcf7-0096772.jpeg new file mode 100644 index 0000000000..e78ea45fdd Binary files /dev/null and b/applications/images/e61e6ba685534eda992cea30a63a9c461646040ffd0c4d208a5eebb85897dcf7-0096772.jpeg differ diff --git a/applications/images/ee927ad9ebd442bb96f163a7ebbf4bc95e6bedee97324a51887cf82de0851fd3.jpeg b/applications/images/ee927ad9ebd442bb96f163a7ebbf4bc95e6bedee97324a51887cf82de0851fd3.jpeg new file mode 100644 index 0000000000..775136bd7c Binary files /dev/null and b/applications/images/ee927ad9ebd442bb96f163a7ebbf4bc95e6bedee97324a51887cf82de0851fd3.jpeg differ diff --git a/applications/images/f5acbc4f50dd401a8f535ed6a263f94b0edff82c1aed4285836a9ead989b9c13.png b/applications/images/f5acbc4f50dd401a8f535ed6a263f94b0edff82c1aed4285836a9ead989b9c13.png new file mode 100644 index 0000000000..f6a310be73 Binary files /dev/null and b/applications/images/f5acbc4f50dd401a8f535ed6a263f94b0edff82c1aed4285836a9ead989b9c13.png differ diff --git a/applications/images/f99af54fb2d14691a73b1a748e0ca22618aeddfded0c4da58bbbb03edb8c2340.png b/applications/images/f99af54fb2d14691a73b1a748e0ca22618aeddfded0c4da58bbbb03edb8c2340.png new file mode 100644 index 0000000000..83f5e738ef Binary files /dev/null and b/applications/images/f99af54fb2d14691a73b1a748e0ca22618aeddfded0c4da58bbbb03edb8c2340.png differ diff --git a/applications/images/fcdf517af5a6466294d72db7450209378d8efd9b77764e329d3f2aff3579a20c.jpeg b/applications/images/fcdf517af5a6466294d72db7450209378d8efd9b77764e329d3f2aff3579a20c.jpeg new file mode 100644 index 0000000000..bbd3b70397 Binary files /dev/null and b/applications/images/fcdf517af5a6466294d72db7450209378d8efd9b77764e329d3f2aff3579a20c.jpeg differ diff --git a/applications/images/fe350481be0241c58736d487d1bf06c2e65911bf01254a79944be629c4c10091.jpeg b/applications/images/fe350481be0241c58736d487d1bf06c2e65911bf01254a79944be629c4c10091.jpeg new file mode 100644 index 0000000000..6b380b40ef Binary files /dev/null and b/applications/images/fe350481be0241c58736d487d1bf06c2e65911bf01254a79944be629c4c10091.jpeg differ diff --git a/applications/images/steps_en.gif b/applications/images/steps_en.gif new file mode 100644 index 0000000000..e59339350a Binary files /dev/null and b/applications/images/steps_en.gif differ diff --git a/applications/images/svtr_tiny-20240708094336228.png b/applications/images/svtr_tiny-20240708094336228.png new file mode 100644 index 0000000000..29d636172f Binary files /dev/null and b/applications/images/svtr_tiny-20240708094336228.png differ diff --git a/applications/images/test_add_91.jpg b/applications/images/test_add_91.jpg new file mode 100644 index 0000000000..b5ded6e1de Binary files /dev/null and b/applications/images/test_add_91.jpg differ diff --git a/applications/overview.html b/applications/overview.html new file mode 100644 index 0000000000..0fcd67e736 --- /dev/null +++ b/applications/overview.html @@ -0,0 +1,5394 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 概述 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + +

场景应用

+

PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR垂类应用,在PP-OCR、PP-Structure的通用能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地OCR应用提供示范与启发。

+

教程文档

+

通用

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
类别亮点模型下载教程示例图
高精度中文识别模型SVTR比PP-OCRv3识别模型精度高3%,
可用于数据挖掘或对预测效率要求不高的场景。
模型下载中文/Englishimg
手写体识别新增字形支持模型下载中文/English
+

制造

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类别亮点模型下载教程示例图
数码管识别数码管数据合成、漏识别调优模型下载中文/English
液晶屏读数识别检测模型蒸馏、Serving部署模型下载中文/English
包装生产日期点阵字符合成、过曝过暗文字识别模型下载中文/English
PCB文字识别小尺寸文本检测与识别模型下载中文/English
电表识别大分辨率图像检测调优模型下载
液晶屏缺陷检测非文字字符识别
+

金融

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类别亮点模型下载教程示例图
表单VQA多模态通用表单结构化提取模型下载中文/English
增值税发票关键信息抽取,SER、RE任务训练模型下载中文/English
印章检测与识别端到端弯曲文本识别模型下载中文/English
通用卡证识别通用结构化提取模型下载中文/English
身份证识别结构化提取、图像阴影
合同比对密集文本检测、NLP关键信息抽取模型下载中文/English
+

交通

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类别亮点模型下载教程示例图
车牌识别多角度图像、轻量模型、端侧部署模型下载中文/English
驾驶证/行驶证识别尽请期待
快递单识别尽请期待
+

模型下载

+

🎁《动手学OCR》、《OCR产业范例20讲》电子书、OCR垂类模型、PDF2Word软件以及其他学习大礼包领取链接:百度网盘 PaddleOCR 开源大礼包,提取码:4232

+

如果您是企业开发者且未在上述场景中找到合适的方案,可以填写OCR应用合作调研问卷,免费与官方团队展开不同层次的合作,包括但不限于问题抽象、确定技术方案、项目答疑、共同研发等。如果您已经使用PaddleOCR落地项目,也可以填写此问卷,与飞桨平台共同宣传推广,提升企业技术品宣。期待您的提交!

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.html" "b/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..f9ebb5c70e --- /dev/null +++ "b/applications/\344\270\255\346\226\207\350\241\250\346\240\274\350\257\206\345\210\253.html" @@ -0,0 +1,5792 @@ + + + + + + + + + + + + + + + + + + + + + + + + + 智能运营:通用中文表格识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

智能运营:通用中文表格识别

+

1. 背景介绍

+

中文表格识别在金融行业有着广泛的应用,如保险理赔、财报分析和信息录入等领域。当前,金融行业的表格识别主要以手动录入为主,开发一种自动表格识别成为丞待解决的问题。

+

+

在金融行业中,表格图像主要有清单类的单元格密集型表格,申请表类的大单元格表格,拍照表格和倾斜表格四种主要形式。

+

+

+

当前的表格识别算法不能很好的处理这些场景下的表格图像。在本例中,我们使用PP-StructureV2最新发布的表格识别模型SLANet来演示如何进行中文表格是识别。同时,为了方便作业流程,我们使用表格属性识别模型对表格图像的属性进行识别,对表格的难易程度进行判断,加快人工进行校对速度。

+

本项目AI Studio链接:https://aistudio.baidu.com/aistudio/projectdetail/4588067

+

2. 中文表格识别

+

2.1 环境准备

+
1
+2
# 下载PaddleOCR代码
+! git clone -b dygraph https://gitee.com/paddlepaddle/PaddleOCR
+
+
1
+2
+3
# 安装PaddleOCR环境
+! pip install -r PaddleOCR/requirements.txt --force-reinstall
+! pip install protobuf==3.19
+
+

2.2 准备数据集

+

本例中使用的数据集采用表格生成工具制作。

+

使用如下命令对数据集进行解压,并查看数据集大小

+
1
+2
! cd data/data165849 && tar -xf table_gen_dataset.tar && cd -
+! wc -l data/data165849/table_gen_dataset/gt.txt
+
+

2.2.1 划分训练测试集

+

使用下述命令将数据集划分为训练集和测试集, 这里将90%划分为训练集,10%划分为测试集

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
import random
+with open('/home/aistudio/data/data165849/table_gen_dataset/gt.txt') as f:
+    lines = f.readlines()
+random.shuffle(lines)
+train_len = int(len(lines)*0.9)
+train_list = lines[:train_len]
+val_list = lines[train_len:]
+
+# 保存结果
+with open('/home/aistudio/train.txt','w',encoding='utf-8') as f:
+    f.writelines(train_list)
+with open('/home/aistudio/val.txt','w',encoding='utf-8') as f:
+    f.writelines(val_list)
+
+

划分完成后,数据集信息如下

+ + + + + + + + + + + + + + + + + + + + + + + +
类型数量图片地址标注文件路径
训练集18000/home/aistudio/data/data165849/table_gen_dataset/home/aistudio/train.txt
测试集2000/home/aistudio/data/data165849/table_gen_dataset/home/aistudio/val.txt
+

2.2.2 查看数据集

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
import cv2
+import os, json
+import numpy as np
+from matplotlib import pyplot as plt
+%matplotlib inline
+
+def parse_line(data_dir, line):
+    data_line = line.strip("\n")
+    info = json.loads(data_line)
+    file_name = info['filename']
+    cells = info['html']['cells'].copy()
+    structure = info['html']['structure']['tokens'].copy()
+
+    img_path = os.path.join(data_dir, file_name)
+    if not os.path.exists(img_path):
+        print(img_path)
+        return None
+    data = {
+        'img_path': img_path,
+        'cells': cells,
+        'structure': structure,
+        'file_name': file_name
+    }
+    return data
+
+def draw_bbox(img_path, points, color=(255, 0, 0), thickness=2):
+    if isinstance(img_path, str):
+        img_path = cv2.imread(img_path)
+    img_path = img_path.copy()
+    for point in points:
+        cv2.polylines(img_path, [point.astype(int)], True, color, thickness)
+    return img_path
+
+
+def rebuild_html(data):
+    html_code = data['structure']
+    cells = data['cells']
+    to_insert = [i for i, tag in enumerate(html_code) if tag in ('<td>', '>')]
+
+    for i, cell in zip(to_insert[::-1], cells[::-1]):
+        if cell['tokens']:
+            text = ''.join(cell['tokens'])
+            # skip empty text
+            sp_char_list = ['<b>', '</b>', '\u2028', ' ', '<i>', '</i>']
+            text_remove_style = skip_char(text, sp_char_list)
+            if len(text_remove_style) == 0:
+                continue
+            html_code.insert(i + 1, text)
+
+    html_code = ''.join(html_code)
+    return html_code
+
+
+def skip_char(text, sp_char_list):
+    """
+    skip empty cell
+    @param text: text in cell
+    @param sp_char_list: style char and special code
+    @return:
+    """
+    for sp_char in sp_char_list:
+        text = text.replace(sp_char, '')
+    return text
+
+save_dir = '/home/aistudio/vis'
+os.makedirs(save_dir, exist_ok=True)
+image_dir = '/home/aistudio/data/data165849/'
+html_str = '<table border="1">'
+
+# 解析标注信息并还原html表格
+data = parse_line(image_dir, val_list[0])
+
+img = cv2.imread(data['img_path'])
+img_name = ''.join(os.path.basename(data['file_name']).split('.')[:-1])
+img_save_name = os.path.join(save_dir, img_name)
+boxes = [np.array(x['bbox']) for x in data['cells']]
+show_img = draw_bbox(data['img_path'], boxes)
+cv2.imwrite(img_save_name + '_show.jpg', show_img)
+
+html = rebuild_html(data)
+html_str += html
+html_str += '</table>'
+
+# 显示标注的html字符串
+from IPython.core.display import display, HTML
+display(HTML(html_str))
+# 显示单元格坐标
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+

2.3 训练

+

这里选用PP-StructureV2中的表格识别模型SLANet

+

SLANet是PP-StructureV2全新推出的表格识别模型,相比PP-StructureV1中TableRec-RARE,在速度不变的情况下精度提升4.7%。TEDS提升2%

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
算法AccTEDS(Tree-Edit-Distance-based Similarity)Speed
EDD[2]x88.30%x
TableRec-RARE(ours)71.73%93.88%779ms
SLANet(ours)76.31%95.89%766ms
+

进行训练之前先使用如下命令下载预训练模型

+
1
+2
+3
+4
+5
# 进入PaddleOCR工作目录
+os.chdir('/home/aistudio/PaddleOCR')
+# 下载英文预训练模型
+! wget  -nc -P  ./pretrain_models/  https://paddleocr.bj.bcebos.com/ppstructure/models/slanet/en_ppstructure_mobile_v2.0_SLANet_train.tar --no-check-certificate
+! cd ./pretrain_models/ && tar xf en_ppstructure_mobile_v2.0_SLANet_train.tar  && cd ../
+
+

使用如下命令即可启动训练,需要修改的配置有

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
字段修改值含义
Global.pretrained_model./pretrain_models/en_ppstructure_mobile_v2.0_SLANet_train/best_accuracy.pdparams指向英文表格预训练模型地址
Global.eval_batch_step562模型多少step评估一次,一般设置为一个epoch总的step数
Optimizer.lr.nameConst学习率衰减器
Optimizer.lr.learning_rate0.0005学习率设为之前的0.05倍
Train.dataset.data_dir/home/aistudio/data/data165849指向训练集图片存放目录
Train.dataset.label_file_list/home/aistudio/data/data165849/table_gen_dataset/train.txt指向训练集标注文件
Train.loader.batch_size_per_card32训练时每张卡的batch_size
Train.loader.num_workers1训练集多进程数据读取的进程数,在aistudio中需要设为1
Eval.dataset.data_dir/home/aistudio/data/data165849指向测试集图片存放目录
Eval.dataset.label_file_list/home/aistudio/data/data165849/table_gen_dataset/val.txt指向测试集标注文件
Eval.loader.batch_size_per_card32测试时每张卡的batch_size
Eval.loader.num_workers1测试集多进程数据读取的进程数,在aistudio中需要设为1
+

已经修改好的配置存储在 /home/aistudio/SLANet_ch.yml

+
1
+2
+3
import os
+os.chdir('/home/aistudio/PaddleOCR')
+! python3 tools/train.py -c /home/aistudio/SLANet_ch.yml
+
+

大约在7个epoch后达到最高精度 97.49%

+

2.4 验证

+

训练完成后,可使用如下命令在测试集上评估最优模型的精度

+
! python3 tools/eval.py -c /home/aistudio/SLANet_ch.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/output/SLANet_ch/best_accuracy.pdparams
+
+

2.5 训练引擎推理

+

使用如下命令可使用训练引擎对单张图片进行推理

+
1
+2
import os;os.chdir('/home/aistudio/PaddleOCR')
+! python3 tools/infer_table.py -c /home/aistudio/SLANet_ch.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/output/SLANet_ch/best_accuracy.pdparams Global.infer_img=/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
import cv2
+from matplotlib import pyplot as plt
+%matplotlib inline
+
+# 显示原图
+show_img = cv2.imread('/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg')
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+# 显示预测的单元格
+show_img = cv2.imread('/home/aistudio/PaddleOCR/output/infer/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg')
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+

2.6 模型导出

+

使用如下命令可将模型导出为inference模型

+
! python3 tools/export_model.py -c /home/aistudio/SLANet_ch.yml -o Global.checkpoints=/home/aistudio/PaddleOCR/output/SLANet_ch/best_accuracy.pdparams Global.save_inference_dir=/home/aistudio/SLANet_ch/infer
+
+

2.7 预测引擎推理

+

使用如下命令可使用预测引擎对单张图片进行推理

+
1
+2
+3
+4
+5
+6
os.chdir('/home/aistudio/PaddleOCR/ppstructure')
+! python3 table/predict_structure.py \
+    --table_model_dir=/home/aistudio/SLANet_ch/infer \
+    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
+    --image_dir=/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg \
+    --output=../output/inference
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
# 显示原图
+show_img = cv2.imread('/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg')
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+# 显示预测的单元格
+show_img = cv2.imread('/home/aistudio/PaddleOCR/output/inference/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg')
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+

2.8 表格识别

+

在表格结构模型训练完成后,可结合OCR检测识别模型,对表格内容进行识别。

+

首先下载PP-OCRv3文字检测识别模型

+
1
+2
+3
+4
# 下载PP-OCRv3文本检测识别模型并解压
+! wget  -nc -P  ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_slim_infer.tar --no-check-certificate
+! wget  -nc -P  ./inference/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_slim_infer.tar --no-check-certificate
+! cd ./inference/ && tar xf ch_PP-OCRv3_det_slim_infer.tar && tar xf ch_PP-OCRv3_rec_slim_infer.tar  && cd ../
+
+

模型下载完成后,使用如下命令进行表格识别

+
1
+2
+3
+4
+5
+6
+7
+8
+9
import os;os.chdir('/home/aistudio/PaddleOCR/ppstructure')
+! python3 table/predict_table.py \
+    --det_model_dir=inference/ch_PP-OCRv3_det_slim_infer \
+    --rec_model_dir=inference/ch_PP-OCRv3_rec_slim_infer  \
+    --table_model_dir=/home/aistudio/SLANet_ch/infer \
+    --rec_char_dict_path=../ppocr/utils/ppocr_keys_v1.txt \
+    --table_char_dict_path=../ppocr/utils/dict/table_structure_dict.txt \
+    --image_dir=/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg \
+    --output=../output/table
+
+
1
+2
+3
+4
+5
+6
+7
+8
+9
# 显示原图
+show_img = cv2.imread('/home/aistudio/data/data165849/table_gen_dataset/img/no_border_18298_G7XZH93DDCMATGJQ8RW2.jpg')
+plt.figure(figsize=(15,15))
+plt.imshow(show_img)
+plt.show()
+
+# 显示预测结果
+from IPython.core.display import display, HTML
+display(HTML('<html><body><table><tr><td colspan="5">alleadersh</td><td rowspan="2">不贰过,推</td><td rowspan="2">从自己参与浙江数</td><td rowspan="2">。另一方</td></tr><tr><td>AnSha</td><td>自己越</td><td>共商共建工作协商</td><td>w.east </td><td>抓好改革试点任务</td></tr><tr><td>Edime</td><td>ImisesElec</td><td>怀天下”。</td><td></td><td>22.26 </td><td>31.61</td><td>4.30 </td><td>794.94</td></tr><tr><td rowspan="2">ip</td><td> Profundi</td><td>:2019年12月1</td><td>Horspro</td><td>444.48</td><td>2.41 </td><td>87</td><td>679.98</td></tr><tr><td> iehaiTrain</td><td>组长蒋蕊</td><td>Toafterdec</td><td>203.43</td><td>23.54 </td><td>4</td><td>4266.62</td></tr><tr><td>Tyint </td><td> roudlyRol</td><td>谢您的好意,我知道</td><td>ErChows</td><td></td><td>48.90</td><td>1031</td><td>6</td></tr><tr><td>NaFlint</td><td></td><td>一辈的</td><td>aterreclam</td><td>7823.86</td><td>9829.23</td><td>7.96 </td><td> 3068</td></tr><tr><td>家上下游企业,5</td><td>Tr</td><td>景象。当地球上的我们</td><td>Urelaw</td><td>799.62</td><td>354.96</td><td>12.98</td><td>33 </td></tr><tr><td>赛事(</td><td> uestCh</td><td>复制的业务模式并</td><td>Listicjust</td><td>9.23</td><td></td><td>92</td><td>53.22</td></tr><tr><td> Ca</td><td> Iskole</td><td>扶贫"之名引导</td><td> Papua </td><td>7191.90</td><td>1.65</td><td>3.62</td><td>48</td></tr><tr><td rowspan="2">避讳</td><td>ir</td><td>但由于</td><td>Fficeof</td><td>0.22</td><td>6.37</td><td>7.17</td><td>3397.75</td></tr><tr><td>ndaTurk</td><td>百处遗址</td><td>gMa</td><td>1288.34</td><td>2053.66</td><td>2.29</td><td>885.45</td></tr></table></body></html>'))
+
+

3. 表格属性识别

+

3.1 代码、环境、数据准备

+

3.1.1 代码准备

+

首先,我们需要准备训练表格属性的代码,PaddleClas集成了PULC方案,该方案可以快速获得一个在CPU上用时2ms的属性识别模型。PaddleClas代码可以clone下载得到。获取方式如下:

+
! git clone -b develop https://gitee.com/paddlepaddle/PaddleClas
+
+

3.1.2 环境准备

+

其次,我们需要安装训练PaddleClas相关的依赖包

+
1
+2
! pip install -r PaddleClas/requirements.txt --force-reinstall
+! pip install protobuf==3.20.0
+
+

3.1.3 数据准备

+

最后,准备训练数据。在这里,我们一共定义了表格的6个属性,分别是表格来源、表格数量、表格颜色、表格清晰度、表格有无干扰、表格角度。其可视化如下:

+

+

这里,我们提供了一个表格属性的demo子集,可以快速迭代体验。下载方式如下:

+
1
+2
+3
+4
+5
%cd PaddleClas/dataset
+!wget https://paddleclas.bj.bcebos.com/data/PULC/table_attribute.tar
+!tar -xf table_attribute.tar
+%cd ../PaddleClas/dataset
+%cd ../
+
+

3.2 表格属性识别训练

+

表格属性训练整体pipelinie如下:

+

+

1.训练过程中,图片经过预处理之后,送入到骨干网络之中,骨干网络将抽取表格图片的特征,最终该特征连接输出的FC层,FC层经过Sigmoid激活函数后和真实标签做交叉熵损失函数,优化器通过对该损失函数做梯度下降来更新骨干网络的参数,经过多轮训练后,骨干网络的参数可以对为止图片做很好的预测;

+

2.推理过程中,图片经过预处理之后,送入到骨干网络之中,骨干网络加载学习好的权重后对该表格图片做出预测,预测的结果为一个6维向量,该向量中的每个元素反映了每个属性对应的概率值,通过对该值进一步卡阈值之后,得到最终的输出,最终的输出描述了该表格的6个属性。

+

当准备好相关的数据之后,可以一键启动表格属性的训练,训练代码如下:

+
!python tools/train.py -c ./ppcls/configs/PULC/table_attribute/PPLCNet_x1_0.yaml -o Global.device=cpu -o Global.epochs=10
+
+

3.3 表格属性识别推理和部署

+

3.3.1 模型转换

+

当训练好模型之后,需要将模型转换为推理模型进行部署。转换脚本如下:

+
!python tools/export_model.py -c ppcls/configs/PULC/table_attribute/PPLCNet_x1_0.yaml -o Global.pretrained_model=output/PPLCNet_x1_0/best_model
+
+

执行以上命令之后,会在当前目录上生成inference文件夹,该文件夹中保存了当前精度最高的推理模型。

+

3.3.2 模型推理

+

安装推理需要的paddleclas包, 此时需要通过下载安装paddleclas的develop的whl包

+
!pip install https://paddleclas.bj.bcebos.com/whl/paddleclas-0.0.0-py3-none-any.whl
+
+

进入deploy目录下即可对模型进行推理

+
%cd deploy/
+
+

推理命令如下:

+
1
+2
!python python/predict_cls.py -c configs/PULC/table_attribute/inference_table_attribute.yaml -o Global.inference_model_dir="../inference" -o Global.infer_imgs="../dataset/table_attribute/Table_val/val_9.jpg"
+!python python/predict_cls.py -c configs/PULC/table_attribute/inference_table_attribute.yaml -o Global.inference_model_dir="../inference" -o Global.infer_imgs="../dataset/table_attribute/Table_val/val_3253.jpg"
+
+

推理的表格图片:

+

+

预测结果如下:

+
val_9.jpg:   {'attributes': ['Scanned', 'Little', 'Black-and-White', 'Clear', 'Without-Obstacles', 'Horizontal'], 'output': [1, 1, 1, 1, 1, 1]}
+
+

推理的表格图片:

+

+

预测结果如下:

+
val_3253.jpg:    {'attributes': ['Photo', 'Little', 'Black-and-White', 'Blurry', 'Without-Obstacles', 'Tilted'], 'output': [0, 1, 1, 0, 1, 0]}
+
+

对比两张图片可以发现,第一张图片比较清晰,表格属性的结果也偏向于比较容易识别,我们可以更相信表格识别的结果,第二张图片比较模糊,且存在倾斜现象,表格识别可能存在错误,需要我们人工进一步校验。通过表格的属性识别能力,可以进一步将“人工”和“智能”很好的结合起来,为表格识别能力的落地的精度提供保障。

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.html" "b/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..91bdef150c --- /dev/null +++ "b/applications/\345\205\211\345\212\237\347\216\207\350\256\241\346\225\260\347\240\201\347\256\241\345\255\227\347\254\246\350\257\206\345\210\253.html" @@ -0,0 +1,6144 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 数码管识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

光功率计数码管字符识别

+

1. 背景介绍

+

光功率计(optical power meter )是指用于测量绝对光功率或通过一段光纤的光功率相对损耗的仪器。在光纤系统中,测量光功率是最基本的,非常像电子学中的万用表;在光纤测量中,光功率计是重负荷常用表。

+

+

目前光功率计缺少将数据直接输出的功能,需要人工读数。这一项工作单调重复,如果可以使用机器替代人工,将节约大量成本。针对上述问题,希望通过摄像头拍照->智能读数的方式高效地完成此任务。

+

为实现智能读数,通常会采取文本检测+文本识别的方案:

+

第一步,使用文本检测模型定位出光功率计中的数字部分;

+

第二步,使用文本识别模型获得准确的数字和单位信息。

+

本项目主要介绍如何完成第二步文本识别部分,包括:真实评估集的建立、训练数据的合成、基于 PP-OCRv3 和 SVTR_Tiny 两个模型进行训练,以及评估和推理。

+

本项目难点如下:

+
    +
  • 光功率计数码管字符数据较少,难以获取。
  • +
  • 数码管中小数点占像素较少,容易漏识别。
  • +
+

针对以上问题, 本例选用 PP-OCRv3 和 SVTR_Tiny 两个高精度模型训练,同时提供了真实数据挖掘案例和数据合成案例。基于 PP-OCRv3 模型,在构建的真实评估集上精度从 52% 提升至 72%,SVTR_Tiny 模型精度可达到 78.9%。

+

aistudio项目链接: 光功率计数码管字符识别

+

2. PaddleOCR 快速使用

+

PaddleOCR 旨在打造一套丰富、领先、且实用的OCR工具库,助力开发者训练出更好的模型,并应用落地。

+

+

官方提供了适用于通用场景的高精轻量模型,首先使用官方提供的 PP-OCRv3 模型预测图片,验证下当前模型在光功率计场景上的效果。

+

准备环境

+
1
+2
python3 -m pip install -U pip
+python3 -m pip install paddleocr
+
+

测试效果

+

测试图:

+

+
paddleocr --lang=ch --det=Fase --image_dir=data
+
+

得到如下测试结果:

+
('.7000', 0.6885431408882141)
+
+

发现数字识别较准,然而对负号和小数点识别不准确。 由于PP-OCRv3的训练数据大多为通用场景数据,在特定的场景上效果可能不够好。因此需要基于场景数据进行微调。

+

下面就主要介绍如何在光功率计(数码管)场景上微调训练。

+

3. 开始训练

+

3.1 数据准备

+

特定的工业场景往往很难获取开源的真实数据集,光功率计也是如此。在实际工业场景中,可以通过摄像头采集的方法收集大量真实数据,本例中重点介绍数据合成方法和真实数据挖掘方法,如何利用有限的数据优化模型精度。

+

数据集分为两个部分:合成数据,真实数据, 其中合成数据由 text_renderer 工具批量生成得到, 真实数据通过爬虫等方式在百度图片中搜索并使用 PPOCRLabel 标注得到。

+

合成数据

+

本例中数据合成工具使用的是 text_renderer, 该工具可以合成用于文本识别训练的文本行数据:

+

+

+
1
+2
export https_proxy=http://172.19.57.45:3128
+git clone https://github.com/oh-my-ocr/text_renderer
+
+
1
+2
+3
+4
+5
+6
+7
python3 setup.py develop
+python3 -m pip install -r docker/requirements.txt
+python3 main.py \
+    --config example_data/example.py \
+    --dataset img \
+    --num_processes 2 \
+    --log_period 10
+
+

给定字体和语料,就可以合成较为丰富样式的文本行数据。 光功率计识别场景,目标是正确识别数码管文本,因此需要收集部分数码管字体,训练语料,用于合成文本识别数据。

+

将收集好的语料存放在 example_data 路径下:

+
1
+2
ln -s ./fonts/DS* text_renderer/example_data/font/
+ln -s ./corpus/digital.txt text_renderer/example_data/text/
+
+

修改text_renderer/example_data/font_list/font_list.txt,选择需要的字体开始合成:

+
1
+2
+3
+4
+5
python3 main.py \
+    --config example_data/digital_example.py \
+    --dataset img \
+    --num_processes 2 \
+    --log_period 10
+
+

合成图片会被存在目录 text_renderer/example_data/digital/chn_data 下

+

查看合成的数据样例:

+

img

+

真实数据挖掘

+

模型训练需要使用真实数据作为评价指标,否则很容易过拟合到简单的合成数据中。没有开源数据的情况下,可以利用部分无标注数据+标注工具获得真实数据。

+
1. 数据搜集
+

使用爬虫工具获得无标注数据

+
2. PPOCRLabel 完成半自动标注
+

PPOCRLabel是一款适用于OCR领域的半自动化图形标注工具,内置PP-OCR模型对数据自动标注和重新识别。使用Python3和PyQT5编写,支持矩形框标注、表格标注、不规则文本标注、关键信息标注模式,导出格式可直接用于PaddleOCR检测和识别模型的训练。

+

img

+

收集完数据后就可以进行分配了,验证集中一般都是真实数据,训练集中包含合成数据+真实数据。本例中标注了155张图片,其中训练集和验证集的数目为100和55。

+

最终 data 文件夹应包含以下几部分:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
|-data
+  |- synth_train.txt
+  |- real_train.txt
+  |- real_eval.txt
+  |- synthetic_data
+      |- word_001.png
+      |- word_002.jpg
+      |- word_003.jpg
+      | ...
+  |- real_data
+      |- word_001.png
+      |- word_002.jpg
+      |- word_003.jpg
+      | ...
+  ...
+
+

3.2 模型选择

+

本案例提供了2种文本识别模型:PP-OCRv3 识别模型 和 SVTR_Tiny:

+

PP-OCRv3 识别模型:PP-OCRv3的识别模块是基于文本识别算法SVTR优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。并进行了一系列结构改进加速模型预测。

+

SVTR_Tiny:SVTR提出了一种用于场景文本识别的单视觉模型,该模型在patch-wise image tokenization框架内,完全摒弃了序列建模,在精度具有竞争力的前提下,模型参数量更少,速度更快。

+

以上两个策略在自建中文数据集上的精度和速度对比如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ID策略模型大小精度预测耗时(CPU + MKLDNN)
01PP-OCRv28M74.80%8.54ms
02SVTR_Tiny21M80.10%97.00ms
03SVTR_LCNet(h32)12M71.90%6.60ms
04SVTR_LCNet(h48)12M73.98%7.60ms
05+ GTC12M75.80%7.60ms
06+ TextConAug12M76.30%7.60ms
07+ TextRotNet12M76.90%7.60ms
08+ UDML12M78.40%7.60ms
09+ UIM12M79.40%7.60ms
+

3.3 开始训练

+

首先下载 PaddleOCR 代码库

+
git clone -b release/2.5 https://github.com/PaddlePaddle/PaddleOCR.git
+
+

PaddleOCR提供了训练脚本、评估脚本和预测脚本,本节将以 PP-OCRv3 中文识别模型为例:

+

Step1:下载预训练模型

+

首先下载 pretrain model,您可以下载训练好的模型在自定义数据上进行finetune

+
1
+2
+3
+4
+5
+6
cd PaddleOCR/
+# 下载PP-OCRv3 中文预训练模型
+wget -P ./pretrain_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+# 解压模型参数
+cd pretrain_models
+tar -xf ch_PP-OCRv3_rec_train.tar && rm -rf ch_PP-OCRv3_rec_train.tar
+
+

Step2:自定义字典文件

+

接下来需要提供一个字典({word_dict_name}.txt),使模型在训练时,可以将所有出现的字符映射为字典的索引。

+

因此字典需要包含所有希望被正确识别的字符,{word_dict_name}.txt需要写成如下格式,并以 utf-8 编码格式保存:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+-
+.
+
+

word_dict.txt 每行有一个单字,将字符与数字索引映射在一起,“3.14” 将被映射成 [3, 11, 1, 4]

+
    +
  • 内置字典
  • +
+

PaddleOCR内置了一部分字典,可以按需使用。

+

ppocr/utils/ppocr_keys_v1.txt 是一个包含6623个字符的中文字典

+

ppocr/utils/ic15_dict.txt 是一个包含36个字符的英文字典

+
    +
  • 自定义字典
  • +
+

内置字典面向通用场景,具体的工业场景中,可能需要识别特殊字符,或者只需识别某几个字符,此时自定义字典会更提升模型精度。例如在光功率计场景中,需要识别数字和单位。

+

遍历真实数据标签中的字符,制作字典digital_dict.txt如下所示:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
-
+.
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+B
+E
+F
+H
+L
+N
+T
+W
+d
+k
+m
+n
+o
+z
+
+

Step3:修改配置文件

+

为了更好的使用预训练模型,训练推荐使用ch_PP-OCRv3_rec_distillation.yml配置文件,并参考下列说明修改配置文件:

+

ch_PP-OCRv3_rec_distillation.yml 为例:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
Global:
+  ...
+  # 添加自定义字典,如修改字典请将路径指向新字典
+  character_dict_path: ppocr/utils/dict/digital_dict.txt
+  ...
+  # 识别空格
+  use_space_char: True
+
+
+Optimizer:
+  ...
+  # 添加学习率衰减策略
+  lr:
+    name: Cosine
+    learning_rate: 0.001
+  ...
+
+...
+
+Train:
+  dataset:
+    # 数据集格式,支持LMDBDataSet以及SimpleDataSet
+    name: SimpleDataSet
+    # 数据集路径
+    data_dir: ./data/
+    # 训练集标签文件
+    label_file_list:
+    - ./train_data/digital_img/digital_train.txt  #11w
+    - ./train_data/digital_img/real_train.txt     #100
+    - ./train_data/digital_img/dbm_img/dbm.txt    #3w
+    ratio_list:
+    - 0.3
+    - 1.0
+    - 1.0
+    transforms:
+      ...
+      - RecResizeImg:
+          # 修改 image_shape 以适应长文本
+          image_shape: [3, 48, 320]
+      ...
+  loader:
+    ...
+    # 单卡训练的batch_size
+    batch_size_per_card: 256
+    ...
+
+Eval:
+  dataset:
+    # 数据集格式,支持LMDBDataSet以及SimpleDataSet
+    name: SimpleDataSet
+    # 数据集路径
+    data_dir: ./data
+    # 验证集标签文件
+    label_file_list:
+    - ./train_data/digital_img/real_val.txt
+    transforms:
+      ...
+      - RecResizeImg:
+          # 修改 image_shape 以适应长文本
+          image_shape: [3, 48, 320]
+      ...
+  loader:
+    # 单卡验证的batch_size
+    batch_size_per_card: 256
+    ...
+
+

注意,训练/预测/评估时的配置文件请务必与训练一致。

+

Step4:启动训练

+

如果您安装的是cpu版本,请将配置文件中的 use_gpu 字段修改为false

+
1
+2
+3
+4
+5
+6
+7
+8
# GPU训练 支持单卡,多卡训练
+# 训练数码管数据 训练日志会自动保存为 "{save_model_dir}" 下的train.log
+
+#单卡训练(训练周期长,不建议)
+python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=./pretrain_models/ch_PP-OCRv3_rec_train/best_accuracy
+
+# 多卡训练,通过--gpus参数指定卡号
+python3 -m paddle.distributed.launch --gpus '0,1,2,3'  tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model=./pretrain_models/en_PP-OCRv3_rec_train/best_accuracy
+
+

PaddleOCR支持训练和评估交替进行, 可以在 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml 中修改 eval_batch_step 设置评估频率,默认每500个iter评估一次。评估过程中默认将最佳acc模型,保存为 output/ch_PP-OCRv3_rec_distill/best_accuracy

+

如果验证集很大,测试将会比较耗时,建议减少评估次数,或训练完再进行评估。

+

SVTR_Tiny 训练

+

SVTR_Tiny 训练步骤与上面一致,SVTR支持的配置和模型训练权重可以参考算法介绍文档

+

Step1:下载预训练模型

+
1
+2
+3
+4
# 下载 SVTR_Tiny 中文识别预训练模型和配置文件
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/rec_svtr_tiny_none_ctc_ch_train.tar
+# 解压模型参数
+tar -xf rec_svtr_tiny_none_ctc_ch_train.tar && rm -rf rec_svtr_tiny_none_ctc_ch_train.tar
+
+

Step2:自定义字典文件

+

字典依然使用自定义的 digital_dict.txt

+

Step3:修改配置文件

+

配置文件中对应修改字典路径和数据路径

+

Step4:启动训练

+
1
+2
+3
# 单卡训练
+python tools/train.py -c rec_svtr_tiny_none_ctc_ch_train/rec_svtr_tiny_6local_6global_stn_ch.yml \
+           -o Global.pretrained_model=./rec_svtr_tiny_none_ctc_ch_train/best_accuracy
+
+

3.4 验证效果

+

将训练完成的模型放置在对应目录下即可完成模型推理

+

指标评估

+

训练中模型参数默认保存在Global.save_model_dir目录下。在评估指标时,需要设置Global.checkpoints指向保存的参数文件。评估数据集可以通过 configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml 修改Eval中的 label_file_path 设置。

+
1
+2
# GPU 评估, Global.checkpoints 为待测权重
+python3 -m paddle.distributed.launch --gpus '0' tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints={path/to/weights}/best_accuracy
+
+

测试识别效果

+

使用 PaddleOCR 训练好的模型,可以通过以下脚本进行快速预测。

+

默认预测图片存储在 infer_img 里,通过 -o Global.checkpoints 加载训练好的参数文件:

+

根据配置文件中设置的 save_model_dirsave_epoch_step 字段,会有以下几种参数被保存下来:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
output/rec/
+├── best_accuracy.pdopt
+├── best_accuracy.pdparams
+├── best_accuracy.states
+├── config.yml
+├── iter_epoch_3.pdopt
+├── iter_epoch_3.pdparams
+├── iter_epoch_3.states
+├── latest.pdopt
+├── latest.pdparams
+├── latest.states
+└── train.log
+
+

其中 best_accuracy.是评估集上的最优模型;iter_epoch_x. 是以 save_epoch_step 为间隔保存下来的模型;latest.* 是最后一个epoch的模型。

+
1
+2
# 预测英文结果
+python3 tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model={path/to/weights}/best_accuracy  Global.infer_img=test_digital.png
+
+

预测图片:

+

+

得到输入图像的预测结果:

+
1
+2
infer_img: test_digital.png
+        result: ('-70.00', 0.9998967)
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\214\205\350\243\205\347\224\237\344\272\247\346\227\245\346\234\237\350\257\206\345\210\253.html" "b/applications/\345\214\205\350\243\205\347\224\237\344\272\247\346\227\245\346\234\237\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..85b5e67d07 --- /dev/null +++ "b/applications/\345\214\205\350\243\205\347\224\237\344\272\247\346\227\245\346\234\237\350\257\206\345\210\253.html" @@ -0,0 +1,6320 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 包装生产日期 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

一种基于PaddleOCR的产品包装生产日期识别模型

+

1. 项目介绍

+

产品包装生产日期是计算机视觉图像识别技术在工业场景中的一种应用。产品包装生产日期识别技术要求能够将产品生产日期从复杂背景中提取并识别出来,在物流管理、物资管理中得到广泛应用。

+

+
    +
  • +

    项目难点

    +
  • +
  • +

    没有训练数据

    +
  • +
  • 图像质量层次不齐: 角度倾斜、图片模糊、光照不足、过曝等问题严重
  • +
+

针对以上问题, 本例选用PP-OCRv3这一开源超轻量OCR系统进行包装产品生产日期识别系统的开发。直接使用PP-OCRv3进行评估的精度为62.99%。为提升识别精度,我们首先使用数据合成工具合成了3k数据,基于这部分数据进行finetune,识别精度提升至73.66%。由于合成数据与真实数据之间的分布存在差异,为进一步提升精度,我们使用网络爬虫配合数据挖掘策略得到了1k带标签的真实数据,基于真实数据finetune的精度为71.33%。最后,我们综合使用合成数据和真实数据进行finetune,将识别精度提升至86.99%。各策略的精度提升效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
策略精度
PP-OCRv3评估62.99
合成数据finetune73.66
真实数据finetune71.33
真实+合成数据finetune86.99
+

AIStudio项目链接: 一种基于PaddleOCR的包装生产日期识别方法

+

2. 环境搭建

+

本任务基于Aistudio完成, 具体环境如下:

+
    +
  • 操作系统: Linux
  • +
  • PaddlePaddle: 2.3
  • +
  • PaddleOCR: Release/2.5
  • +
  • text_renderer: master
  • +
+

下载PaddlleOCR代码并安装依赖库:

+
1
+2
+3
+4
+5
git clone -b dygraph https://gitee.com/paddlepaddle/PaddleOCR
+
+# 安装依赖库
+cd PaddleOCR
+pip install -r PaddleOCR/requirements.txt
+
+

3. 数据准备

+

本项目使用人工预标注的300张图像作为测试集。

+

部分数据示例如下:

+

+

标签文件格式如下:

+
数据路径 标签(中间以制表符分隔)
+
+ + + + + + + + + + + + + +
数据集类型数量
测试集300
+

数据集下载链接,下载后可以通过下方命令解压:

+
1
+2
tar -xvf data.tar
+mv data ${PaddleOCR_root}
+
+

数据解压后的文件结构如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
PaddleOCR
+├── data
+   ├── mining_images            # 挖掘的真实数据示例
+   ├── mining_train.list        # 挖掘的真实数据文件列表
+   ├── render_images            # 合成数据示例
+   ├── render_train.list        # 合成数据文件列表
+   ├── val                      # 测试集数据
+   └── val.list                 # 测试集数据文件列表
+|   ├── bg                       # 合成数据所需背景图像
+   └── corpus                   # 合成数据所需语料
+
+

4. 直接使用PP-OCRv3模型评估

+

准备好测试数据后,可以使用PaddleOCR的PP-OCRv3模型进行识别。

+

下载预训练模型

+

首先需要下载PP-OCR v3中英文识别模型文件,下载链接可以在link获取,下载命令:

+
1
+2
+3
+4
+5
+6
cd ${PaddleOCR_root}
+mkdir ckpt
+wget -nc -P ckpt https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+pushd ckpt/
+tar -xvf ch_PP-OCRv3_rec_train.tar
+popd
+
+

模型评估

+

使用以下命令进行PP-OCRv3评估:

+
1
+2
+3
+4
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
+                         -o Global.checkpoints=ckpt/ch_PP-OCRv3_rec_train/best_accuracy \
+                         Eval.dataset.data_dir=./data \
+                         Eval.dataset.label_file_list=["./data/val.list"]
+
+

其中各参数含义如下:

+
1
+2
+3
+4
+5
-c: 指定使用的配置文件,ch_PP-OCRv3_rec_distillation.yml对应于OCRv3识别模型。
+-o: 覆盖配置文件中参数
+Global.checkpoints: 指定评估使用的模型文件路径
+Eval.dataset.data_dir: 指定评估数据集路径
+Eval.dataset.label_file_list: 指定评估数据集文件列表
+
+

5. 基于合成数据finetune

+

5.1 Text Renderer数据合成方法

+

5.1.1 下载Text Renderer代码

+

首先从github或gitee下载Text Renderer代码,并安装相关依赖。

+
1
+2
+3
+4
+5
git clone https://gitee.com/wowowoll/text_renderer.git
+
+# 安装依赖库
+cd text_renderer
+pip install -r requirements.txt
+
+

使用text renderer合成数据之前需要准备好背景图片、语料以及字体库,下面将逐一介绍各个步骤。

+

5.1.2 准备背景图片

+

观察日常生活中常见的包装生产日期图片,我们可以发现其背景相对简单。为此我们可以从网上找一下图片,截取部分图像块作为背景图像。

+

本项目已准备了部分图像作为背景图片,在第3部分完成数据准备后,可以得到我们准备好的背景图像,示例如下:

+

+

背景图像存放于如下位置:

+
1
+2
+3
PaddleOCR
+├── data
+   ├── bg     # 合成数据所需背景图像
+
+

5.1.3 准备语料

+

观察测试集生产日期图像,我们可以知道如下数据有如下特点:

+
    +
  1. 由年月日组成,中间可能以“/”、“-”、“:”、“.”或者空格间隔,也可能以汉字年月日分隔
  2. +
  3. 有些生产日期包含在产品批号中,此时可能包含具体时间、英文字母或数字标识
  4. +
+

基于以上两点,我们编写语料生成脚本:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
import random
+from random import choice
+import os
+
+cropus_num = 2000 #设置语料数量
+
+def get_cropus(f):
+    # 随机生成年份
+    year = random.randint(0, 22)
+    # 随机生成月份
+    month = random.randint(1, 12)
+    # 随机生成日期
+    day_dict = {31: [1,3,5,7,8,10,12], 30: [4,6,9,11], 28: [2]}
+    for item in day_dict:
+        if month in day_dict[item]:
+            day = random.randint(0, item)
+    # 随机生成小时
+    hours = random.randint(0, 24)
+    # 随机生成分钟
+    minute = random.randint(0, 60)
+     # 随机生成秒数
+    second = random.randint(0, 60)
+
+    # 随机生成产品标识字符
+    length = random.randint(0, 6)
+    file_id = []
+    flag = 0
+    my_dict = [i for i in range(48,58)] + [j for j in range(40, 42)] + [k for k in range(65,90)]  # 大小写字母 + 括号
+
+    for i in range(1, length):
+        if flag:
+            if i == flag+2:  #括号匹配
+                file_id.append(')')
+                flag = 0
+                continue
+        sel = choice(my_dict)
+        if sel == 41:
+            continue
+        if sel == 40:
+            if i == 1 or i > length-3:
+                continue
+            flag = i
+        my_ascii = chr(sel)
+        file_id.append(my_ascii)
+    file_id_str = ''.join(file_id)
+
+    #随机生成产品标识字符
+    file_id2 = random.randint(0, 9)
+
+    rad = random.random()
+    if rad < 0.3:
+        f.write('20{:02d}{:02d}{:02d} {}'.format(year, month, day, file_id_str))
+    elif 0.3 < rad < 0.5:
+        f.write('20{:02d}{:02d}{:02d}日'.format(year, month, day))
+    elif 0.5 < rad < 0.7:
+        f.write('20{:02d}/{:02d}/{:02d}'.format(year, month, day))
+    elif 0.7 < rad < 0.8:
+        f.write('20{:02d}-{:02d}-{:02d}'.format(year, month, day))
+    elif 0.8 < rad < 0.9:
+        f.write('20{:02d}.{:02d}.{:02d}'.format(year, month, day))
+    else:
+        f.write('{:02d}:{:02d}:{:02d} {:02d}'.format(hours, minute, second, file_id2))
+
+if __name__ == "__main__":
+    file_path = '/home/aistudio/text_renderer/my_data/cropus'
+    if not os.path.exists(file_path):
+        os.makedirs(file_path)
+    file_name = os.path.join(file_path, 'books.txt')
+    f = open(file_name, 'w')
+    for i in range(cropus_num):
+        get_cropus(f)
+        if i < cropus_num-1:
+            f.write('\n')
+
+    f.close()
+
+

本项目已准备了部分语料,在第3部分完成数据准备后,可以得到我们准备好的语料库,默认位置如下:

+
1
+2
+3
PaddleOCR
+├── data
+   └── corpus              #合成数据所需语料
+
+

5.1.4 下载字体

+

观察包装生产日期,我们可以发现其使用的字体为点阵体。字体可以在如下网址下载: +https://www.fonts.net.cn/fonts-en/tag-dianzhen-1.html

+

本项目已准备了部分字体,在第3部分完成数据准备后,可以得到我们准备好的字体,默认位置如下:

+
1
+2
+3
PaddleOCR
+├── data
+   └── fonts                #合成数据所需字体
+
+

下载好字体后,还需要在list文件中指定字体文件存放路径,脚本如下:

+
1
+2
+3
cd text_renderer/my_data/
+touch fonts.list
+ls /home/aistudio/PaddleOCR/data/fonts/* > fonts.list
+
+

5.1.5 运行数据合成命令

+

完成数据准备后,my_data文件结构如下:

+
1
+2
+3
+4
+5
my_data/
+├── cropus
+   └── books.txt #语料库
+├── eng.txt    #字符列表
+└── fonts.list #字体列表
+
+

在运行合成数据命令之前,还有两处细节需要手动修改:

+
    +
  1. +

    将默认配置文件text_renderer/configs/default.yaml中第9行enable的值设为true,即允许合成彩色图像。否则合成的都是灰度图。

    +
    1
    +2
    +3
    # color boundary is in R,G,B format
    +font_color:
    ++  enable: true #false
    +
    +
  2. +
  3. +

    text_renderer/textrenderer/renderer.py第184行作如下修改,取消padding。否则图片两端会有一些空白。

    +
    1
    +2
    padding = random.randint(s_bbox_width // 10, s_bbox_width // 8) #修改前
    +padding = 0 #修改后
    +
    +
  4. +
+

运行数据合成命令:

+
1
+2
+3
+4
+5
+6
+7
cd /home/aistudio/text_renderer/
+python main.py --num_img=3000 \
+                  --fonts_list='./my_data/fonts.list' \
+                  --corpus_dir "./my_data/cropus" \
+                  --corpus_mode "list" \
+                  --bg_dir "/home/aistudio/PaddleOCR/data/bg/" \
+                  --img_width 0
+
+

合成好的数据默认保存在text_renderer/output目录下,可进入该目录查看合成的数据。

+

合成数据示例如下 +

+

数据合成好后,还需要生成如下格式的训练所需的标注文件,

+
图像路径 标签
+
+

使用如下脚本即可生成标注文件:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
import random
+
+abspath = '/home/aistudio/text_renderer/output/default/'
+
+#标注文件生成路径
+fout = open('./render_train.list', 'w', encoding='utf-8')
+
+with open('./output/default/tmp_labels.txt','r') as f:
+    lines = f.readlines()
+    for item in lines:
+        label = item[9:]
+        filename = item[:8] + '.jpg'
+        fout.write(abspath + filename + '\t' + label)
+
+    fout.close()
+
+

经过以上步骤,我们便完成了包装生产日期数据合成。 +数据位于text_renderer/output,标注文件位于text_renderer/render_train.list

+

本项目提供了生成好的数据供大家体验,完成步骤3的数据准备后,可得数据路径位于:

+
1
+2
+3
+4
PaddleOCR
+├── data
+   ├── render_images     # 合成数据示例
+   ├── render_train.list   #合成数据文件列表
+
+

5.2 模型训练

+

准备好合成数据后,我们可以使用以下命令,利用合成数据进行finetune:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
cd ${PaddleOCR_root}
+python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
+                       -o Global.pretrained_model=./ckpt/ch_PP-OCRv3_rec_train/best_accuracy \
+                       Global.epoch_num=20 \
+                       Global.eval_batch_step='[0, 20]' \
+                       Train.dataset.data_dir=./data \
+                       Train.dataset.label_file_list=['./data/render_train.list'] \
+                       Train.loader.batch_size_per_card=64 \
+                       Eval.dataset.data_dir=./data \
+                       Eval.dataset.label_file_list=["./data/val.list"] \
+                       Eval.loader.batch_size_per_card=64
+
+

其中各参数含义如下:

+
-c: 指定使用的配置文件,ch_PP-OCRv3_rec_distillation.yml对应于OCRv3识别模型。
+-o: 覆盖配置文件中参数
+Global.pretrained_model: 指定finetune使用的预训练模型
+Global.epoch_num: 指定训练的epoch数
+Global.eval_batch_step: 间隔多少step做一次评估
+Train.dataset.data_dir: 训练数据集路径
+Train.dataset.label_file_list: 训练集文件列表
+Train.loader.batch_size_per_card: 训练单卡batch size
+Eval.dataset.data_dir: 评估数据集路径
+Eval.dataset.label_file_list: 评估数据集文件列表
+Eval.loader.batch_size_per_card: 评估单卡batch size
+
+

6. 基于真实数据finetune

+

使用合成数据finetune能提升我们模型的识别精度,但由于合成数据和真实数据之间的分布可能有一定差异,因此作用有限。为进一步提高识别精度,本节介绍如何挖掘真实数据进行模型finetune。

+

数据挖掘的整体思路如下:

+
    +
  1. 使用python爬虫从网上获取大量无标签数据
  2. +
  3. 使用模型从大量无标签数据中构建出有效训练集
  4. +
+

6.1 python爬虫获取数据

+

推荐使用爬虫工具获取无标签图片。图片获取后,可按如下目录格式组织:

+
sprider
+├── file.list
+├── data
+│   ├── 00000.jpg
+│   ├── 00001.jpg
+...
+
+

6.2 数据挖掘

+

我们使用PaddleOCR对获取到的图片进行挖掘,具体步骤如下:

+
    +
  1. 使用 PP-OCRv3检测模型+svtr-tiny识别模型,对每张图片进行预测。
  2. +
  3. 使用数据挖掘策略,得到有效图片。
  4. +
  5. 将有效图片对应的图像区域和标签提取出来,构建训练集。
  6. +
+

首先下载预训练模型,PP-OCRv3检测模型下载链接:https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar

+

完成下载后,可将模型存储于如下位置:

+
1
+2
+3
PaddleOCR
+├── data
+   ├── rec_vit_sub_64_363_all/  # svtr_tiny高精度识别模型
+
+
1
+2
+3
+4
+5
+6
# 下载解压PP-OCRv3检测模型
+cd ${PaddleOCR_root}
+wget -nc -P ckpt https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar
+pushd ckpt
+tar -xvf ch_PP-OCRv3_det_infer.tar
+popd ckpt
+
+

在使用PPOCRv3检测模型+svtr-tiny识别模型进行预测之前,有如下两处细节需要手动修改:

+
    +
  1. +

    tools/infer/predict_rec.py中第110行imgW修改为320

    +
    1
    +2
    #imgW = int((imgH * max_wh_ratio))
    +imgW = 320
    +
    +
  2. +
  3. +

    tools/infer/predict_system.py第169行添加如下一行,将预测分数也写入结果文件中。

    +
    "scores": rec_res[idx][1],
    +
    +
  4. +
+

模型预测命令:

+
1
+2
+3
+4
+5
python tools/infer/predict_system.py \
+        --image_dir="/home/aistudio/sprider/data" \
+        --det_model_dir="./ckpt/ch_PP-OCRv3_det_infer/"  \
+        --rec_model_dir="/home/aistudio/PaddleOCR/data/rec_vit_sub_64_363_all/" \
+        --rec_image_shape="3,32,320"
+
+

获得预测结果后,我们使用数据挖掘策略得到有效图片。具体挖掘策略如下:

+
    +
  1. 预测置信度高于95%
  2. +
  3. 识别结果包含字符‘20’,即年份
  4. +
  5. 没有中文,或者有中文并且‘日’和'月'同时在识别结果中
  6. +
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
# 获取有效预测
+
+import json
+import re
+
+zh_pattern = re.compile(u'[\u4e00-\u9fa5]+')  #正则表达式,筛选字符是否包含中文
+
+file_path = '/home/aistudio/PaddleOCR/inference_results/system_results.txt'
+out_path = '/home/aistudio/PaddleOCR/selected_results.txt'
+f_out = open(out_path, 'w')
+
+with open(file_path, "r", encoding='utf-8') as fin:
+    lines = fin.readlines()
+
+
+for line in lines:
+    flag = False
+    # 读取文件内容
+    file_name, json_file = line.strip().split('\t')
+    preds = json.loads(json_file)
+    res = []
+    for item in preds:
+        transcription = item['transcription'] #获取识别结果
+        scores = item['scores']               #获取识别得分
+        # 挖掘策略
+        if scores > 0.95:
+            if '20' in transcription and len(transcription) > 4 and len(transcription) < 12:
+                word = transcription
+                if not(zh_pattern.search(word) and ('日' not in word or '月' not in word)):
+                    flag = True
+                    res.append(item)
+    save_pred = file_name + "\t" + json.dumps(
+        res, ensure_ascii=False) + "\n"
+    if flag ==True:
+        f_out.write(save_pred)
+
+f_out.close()
+
+

然后将有效预测对应的图像区域和标签提取出来,构建训练集。具体实现脚本如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
import cv2
+import json
+import numpy as np
+
+PATH = '/home/aistudio/PaddleOCR/inference_results/'  #数据原始路径
+SAVE_PATH = '/home/aistudio/mining_images/'             #裁剪后数据保存路径
+file_list = '/home/aistudio/PaddleOCR/selected_results.txt' #数据预测结果
+label_file = '/home/aistudio/mining_images/mining_train.list'  #输出真实数据训练集标签list
+
+if not os.path.exists(SAVE_PATH):
+    os.mkdir(SAVE_PATH)
+
+f_label = open(label_file, 'w')
+
+
+def get_rotate_crop_image(img, points):
+    """
+    根据检测结果points,从输入图像img中裁剪出相应的区域
+    """
+    assert len(points) == 4, "shape of points must be 4*2"
+    img_crop_width = int(
+        max(
+            np.linalg.norm(points[0] - points[1]),
+            np.linalg.norm(points[2] - points[3])))
+    img_crop_height = int(
+        max(
+            np.linalg.norm(points[0] - points[3]),
+            np.linalg.norm(points[1] - points[2])))
+    pts_std = np.float32([[0, 0], [img_crop_width, 0],
+                          [img_crop_width, img_crop_height],
+                          [0, img_crop_height]])
+    M = cv2.getPerspectiveTransform(points, pts_std)
+    # 形变或倾斜,会做透视变换,reshape成矩形
+    dst_img = cv2.warpPerspective(
+        img,
+        M, (img_crop_width, img_crop_height),
+        borderMode=cv2.BORDER_REPLICATE,
+        flags=cv2.INTER_CUBIC)
+    dst_img_height, dst_img_width = dst_img.shape[0:2]
+    if dst_img_height * 1.0 / dst_img_width >= 1.5:
+        dst_img = np.rot90(dst_img)
+    return dst_img
+
+def crop_and_get_filelist(file_list):
+    with open(file_list, "r", encoding='utf-8') as fin:
+        lines = fin.readlines()
+
+    img_num = 0
+    for line in lines:
+        img_name, json_file = line.strip().split('\t')
+        preds = json.loads(json_file)
+        for item in preds:
+            transcription = item['transcription']
+            points = item['points']
+            points = np.array(points).astype('float32')
+            #print('processing {}...'.format(img_name))
+
+            img = cv2.imread(PATH+img_name)
+            dst_img = get_rotate_crop_image(img, points)
+            h, w, c = dst_img.shape
+            newWidth = int((32. / h) * w)
+            newImg = cv2.resize(dst_img, (newWidth, 32))
+            new_img_name = '{:05d}.jpg'.format(img_num)
+            cv2.imwrite(SAVE_PATH+new_img_name, dst_img)
+            f_label.write(SAVE_PATH+new_img_name+'\t'+transcription+'\n')
+            img_num += 1
+
+
+crop_and_get_filelist(file_list)
+f_label.close()
+
+

6.3 模型训练

+

通过数据挖掘,我们得到了真实场景数据和对应的标签。接下来使用真实数据finetune,观察精度提升效果。

+

利用真实数据进行finetune:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
cd ${PaddleOCR_root}
+python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
+                       -o Global.pretrained_model=./ckpt/ch_PP-OCRv3_rec_train/best_accuracy \
+                       Global.epoch_num=20 \
+                       Global.eval_batch_step='[0, 20]' \
+                       Train.dataset.data_dir=./data \
+                       Train.dataset.label_file_list=['./data/mining_train.list'] \
+                       Train.loader.batch_size_per_card=64 \
+                       Eval.dataset.data_dir=./data \
+                       Eval.dataset.label_file_list=["./data/val.list"] \
+                       Eval.loader.batch_size_per_card=64
+
+

各参数含义参考第6部分合成数据finetune,只需要对训练数据路径做相应的修改:

+
Train.dataset.data_dir: 训练数据集路径
+Train.dataset.label_file_list: 训练集文件列表
+
+

示例使用我们提供的真实数据进行finetune,如想换成自己的数据,只需要相应的修改Train.dataset.data_dirTrain.dataset.label_file_list参数即可。

+

由于数据量不大,这里仅训练20个epoch即可。训练完成后,可以得到合成数据finetune后的精度为best acc=71.33%

+

由于数量比较少,精度会比合成数据finetue的略低。

+

7. 基于合成+真实数据finetune

+

为进一步提升模型精度,我们结合使用合成数据和挖掘到的真实数据进行finetune。

+

利用合成+真实数据进行finetune,各参数含义参考第6部分合成数据finetune,只需要对训练数据路径做相应的修改:

+
Train.dataset.data_dir: 训练数据集路径
+Train.dataset.label_file_list: 训练集文件列表
+
+

生成训练list文件:

+
1
+2
# 生成训练集文件list
+cat /home/aistudio/PaddleOCR/data/render_train.list /home/aistudio/PaddleOCR/data/mining_train.list > /home/aistudio/PaddleOCR/data/render_mining_train.list
+
+

启动训练:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
cd ${PaddleOCR_root}
+python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml \
+                       -o Global.pretrained_model=./ckpt/ch_PP-OCRv3_rec_train/best_accuracy \
+                       Global.epoch_num=40 \
+                       Global.eval_batch_step='[0, 20]' \
+                       Train.dataset.data_dir=./data \
+                       Train.dataset.label_file_list=['./data/render_mining_train.list'] \
+                       Train.loader.batch_size_per_card=64 \
+                       Eval.dataset.data_dir=./data \
+                       Eval.dataset.label_file_list=["./data/val.list"] \
+                       Eval.loader.batch_size_per_card=64
+
+

示例使用我们提供的真实+合成数据进行finetune,如想换成自己的数据,只需要相应的修改Train.dataset.data_dir和Train.dataset.label_file_list参数即可。

+

由于数据量不大,这里仅训练40个epoch即可。训练完成后,可以得到合成数据finetune后的精度为best acc=86.99%

+

可以看到,相较于原始PP-OCRv3的识别精度62.99%,使用合成数据+真实数据finetune后,识别精度能提升24%。

+

模型的推理部署方法可以参考repo文档: docs

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.html" "b/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..c79ef6d4c6 --- /dev/null +++ "b/applications/\345\215\260\347\253\240\345\274\257\346\233\262\346\226\207\345\255\227\350\257\206\345\210\253.html" @@ -0,0 +1,6965 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 印章检测与识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

印章检测与识别

+ +

1. 项目介绍

+

弯曲文字识别在OCR任务中有着广泛的应用,比如:自然场景下的招牌,艺术文字,以及常见的印章文字识别。

+

在本项目中,将以印章识别任务为例,介绍如何使用PaddleDetection和PaddleOCR完成印章检测和印章文字识别任务。

+

项目难点:

+
    +
  1. 缺乏训练数据
  2. +
  3. 图像质量参差不齐,图像模糊,文字不清晰
  4. +
+

针对以上问题,本项目选用PaddleOCR里的PPOCRLabel工具完成数据标注。基于PaddleDetection完成印章区域检测,然后通过PaddleOCR里的端对端OCR算法和两阶段OCR算法分别完成印章文字识别任务。不同任务的精度效果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
任务训练数据数量精度
印章检测100095.00%
印章文字识别-端对端OCR方法70047.00%
印章文字识别-两阶段OCR方法70055.00%
+

点击进入 AI Studio 项目

+

2. 环境搭建

+

本项目需要准备PaddleDetection和PaddleOCR的项目运行环境,其中PaddleDetection用于实现印章检测任务,PaddleOCR用于实现文字识别任务

+

2.1 准备PaddleDetection环境

+

下载PaddleDetection代码:

+
1
+2
+3
!git clone https://github.com/PaddlePaddle/PaddleDetection.git
+# 如果克隆github代码较慢,请从gitee上克隆代码
+#git clone https://gitee.com/PaddlePaddle/PaddleDetection.git
+
+

安装PaddleDetection依赖

+
!cd PaddleDetection && pip install -r requirements.txt
+
+

2.2 准备PaddleOCR环境

+

下载PaddleOCR代码:

+
1
+2
+3
!git clone https://github.com/PaddlePaddle/PaddleOCR.git
+# 如果克隆github代码较慢,请从gitee上克隆代码
+#git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
+
+

安装PaddleOCR依赖

+
!cd PaddleOCR && git checkout dygraph  && pip install -r requirements.txt
+
+

3. 数据集准备

+

3.1 数据标注

+

本项目中使用PPOCRLabel工具标注印章检测数据,标注内容包括印章的位置以及印章中文字的位置和文字内容。

+

注:PPOCRLabel的使用方法参考文档

+

PPOCRlabel标注印章数据步骤:

+
    +
  • 打开数据集所在文件夹
  • +
  • 按下快捷键Q进行4点(多点)标注——针对印章文本识别,
  • +
  • 印章弯曲文字包围框采用偶数点标注(比如4点,8点,16点),按照阅读顺序,以16点标注为例,从文字左上方开始标注->到文字右上方标注8个点->到文字右下方->文字左下方8个点,一共8个点,形成包围曲线,参考下图。如果文字弯曲程度不高,为了减小标注工作量,可以采用4点、8点标注,需要注意的是,文字上下点数相同。(总点数尽量不要超过18个)
  • +
  • 对于需要识别的印章中非弯曲文字,采用4点框标注即可
  • +
  • 对应包围框的文字部分默认是”待识别”,需要修改为包围框内的具体文字内容
  • +
  • 快捷键W进行矩形标注——针对印章区域检测,印章检测区域保证标注框包围整个印章,包围框对应文字可以设置为'印章区域',方便后续处理。
  • +
  • 针对印章中的水平文字可以视情况考虑矩形或四点标注:保证按行标注即可。如果背景文字与印章文字比较接近,标注时尽量避开背景文字。
  • +
  • 标注完成后修改右侧文本结果,确认无误后点击下方check(或CTRL+V),确认本张图片的标注。
  • +
  • 所有图片标注完成后,在顶部菜单栏点击File -> Export Label导出label.txt。
  • +
+

标注完成后,可视化效果如下: +

+

数据标注完成后,标签中包含印章检测的标注和印章文字识别的标注,如下所示:

+
img/1.png    [{"transcription": "印章区域", "points": [[87, 245], [214, 245], [214, 369], [87, 369]], "difficult": false}, {"transcription": "国家税务总局泸水市税务局第二税务分局", "points": [[110, 314], [116, 290], [131, 275], [152, 273], [170, 277], [181, 289], [186, 303], [186, 312], [201, 311], [198, 289], [189, 272], [175, 259], [152, 252], [124, 257], [100, 280], [94, 312]], "difficult": false}, {"transcription": "征税专用章", "points": [[117, 334], [183, 334], [183, 352], [117, 352]], "difficult": false}]
+
+

标注中包含表示'印章区域'的坐标和'印章文字'坐标以及文字内容。

+

3.2 数据处理

+

标注时为了方便标注,没有区分印章区域的标注框和文字区域的标注框,可以通过python代码完成标签的划分。

+

在本项目的'/home/aistudio/work/seal_labeled_datas'目录下,存放了标注的数据示例,如下:

+

+

标签文件'/home/aistudio/work/seal_labeled_datas/Label.txt'中的标注内容如下:

+
img/test1.png   [{"transcription": "待识别", "points": [[408, 232], [537, 232], [537, 352], [408, 352]], "difficult": false}, {"transcription": "电子回单", "points": [[437, 305], [504, 305], [504, 322], [437, 322]], "difficult": false}, {"transcription": "云南省农村信用社", "points": [[417, 290], [434, 295], [438, 281], [446, 267], [455, 261], [472, 258], [489, 264], [498, 277], [502, 295], [526, 289], [518, 267], [503, 249], [475, 232], [446, 239], [429, 255], [418, 275]], "difficult": false}, {"transcription": "专用章", "points": [[437, 319], [503, 319], [503, 338], [437, 338]], "difficult": false}]
+
+

为了方便训练,我们需要通过python代码将用于训练印章检测和训练印章文字识别的标注区分开。

+
+ +
  1
+  2
+  3
+  4
+  5
+  6
+  7
+  8
+  9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+255
+256
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
import numpy as np
+import json
+import cv2
+import os
+from shapely.geometry import Polygon
+
+
+def poly2box(poly):
+    xmin = np.min(np.array(poly)[:, 0])
+    ymin = np.min(np.array(poly)[:, 1])
+    xmax = np.max(np.array(poly)[:, 0])
+    ymax = np.max(np.array(poly)[:, 1])
+    return np.array([[xmin, ymin], [xmax, ymin], [xmax, ymax], [xmin, ymax]])
+
+
+def draw_text_det_res(dt_boxes, src_im, color=(255, 255, 0)):
+    for box in dt_boxes:
+        box = np.array(box).astype(np.int32).reshape(-1, 2)
+        cv2.polylines(src_im, [box], True, color=color, thickness=2)
+    return src_im
+
+class LabelDecode(object):
+    def __init__(self, **kwargs):
+        pass
+
+    def __call__(self, data):
+        label = json.loads(data['label'])
+
+        nBox = len(label)
+        seal_boxes = self.get_seal_boxes(label)
+
+        gt_label = []
+
+        for seal_box in seal_boxes:
+            seal_anno = {'seal_box': seal_box}
+            boxes, txts, txt_tags = [], [], []
+
+            for bno in range(0, nBox):
+                box = label[bno]['points']
+                txt = label[bno]['transcription']
+                try:
+                    ints = self.get_intersection(box, seal_box)
+                except Exception as E:
+                    print(E)
+                    continue
+
+                if abs(Polygon(box).area - self.get_intersection(box, seal_box)) < 1e-3 and \
+                    abs(Polygon(box).area - self.get_union(box, seal_box)) > 1e-3:
+
+                    boxes.append(box)
+                    txts.append(txt)
+                    if txt in ['*', '###', '待识别']:
+                        txt_tags.append(True)
+                    else:
+                        txt_tags.append(False)
+
+            seal_anno['polys'] = boxes
+            seal_anno['texts'] = txts
+            seal_anno['ignore_tags'] = txt_tags
+
+            gt_label.append(seal_anno)
+
+        return gt_label
+
+    def get_seal_boxes(self, label):
+
+        nBox = len(label)
+        seal_box = []
+        for bno in range(0, nBox):
+            box = label[bno]['points']
+            if len(box) == 4:
+                seal_box.append(box)
+
+        if len(seal_box) == 0:
+            return None
+
+        seal_box = self.valid_seal_box(seal_box)
+        return seal_box
+
+
+    def is_seal_box(self, box, boxes):
+        is_seal = True
+        for poly in boxes:
+            if list(box.shape()) != list(box.shape.shape()):
+                if abs(Polygon(box).area - self.get_intersection(box, poly)) < 1e-3:
+                    return False
+            else:
+                if np.sum(np.array(box) - np.array(poly)) < 1e-3:
+                    # continue when the box is same with poly
+                    continue
+                if abs(Polygon(box).area - self.get_intersection(box, poly)) < 1e-3:
+                    return False
+        return is_seal
+
+
+    def valid_seal_box(self, boxes):
+        if len(boxes) == 1:
+            return boxes
+
+        new_boxes = []
+        flag = True
+        for k in range(0, len(boxes)):
+            flag = True
+            tmp_box = boxes[k]
+            for i in range(0, len(boxes)):
+                if k == i: continue
+                if abs(Polygon(tmp_box).area - self.get_intersection(tmp_box, boxes[i])) < 1e-3:
+                    flag = False
+                    continue
+            if flag:
+                new_boxes.append(tmp_box)
+
+        return new_boxes
+
+
+    def get_union(self, pD, pG):
+        return Polygon(pD).union(Polygon(pG)).area
+
+    def get_intersection_over_union(self, pD, pG):
+        return get_intersection(pD, pG) / get_union(pD, pG)
+
+    def get_intersection(self, pD, pG):
+        return Polygon(pD).intersection(Polygon(pG)).area
+
+    def expand_points_num(self, boxes):
+        max_points_num = 0
+        for box in boxes:
+            if len(box) > max_points_num:
+                max_points_num = len(box)
+        ex_boxes = []
+        for box in boxes:
+            ex_box = box + [box[-1]] * (max_points_num - len(box))
+            ex_boxes.append(ex_box)
+        return ex_boxes
+
+
+def gen_extract_label(data_dir, label_file, seal_gt, seal_ppocr_gt):
+    label_decode_func = LabelDecode()
+    gts = open(label_file, "r").readlines()
+
+    seal_gt_list = []
+    seal_ppocr_list = []
+
+    for idx, line in enumerate(gts):
+        img_path, label = line.strip().split("\t")
+        data = {'label': label, 'img_path':img_path}
+        res = label_decode_func(data)
+        src_img = cv2.imread(os.path.join(data_dir, img_path))
+        if res is None:
+            print("ERROR! res is None!")
+            continue
+
+        anno = []
+        for i, gt in enumerate(res):
+            # print(i, box, type(box), )
+            anno.append({'polys': gt['seal_box'], 'cls':1})
+
+        seal_gt_list.append(f"{img_path}\t{json.dumps(anno)}\n")
+        seal_ppocr_list.append(f"{img_path}\t{json.dumps(res)}\n")
+
+    if not os.path.exists(os.path.dirname(seal_gt)):
+        os.makedirs(os.path.dirname(seal_gt))
+    if not os.path.exists(os.path.dirname(seal_ppocr_gt)):
+        os.makedirs(os.path.dirname(seal_ppocr_gt))
+
+    with open(seal_gt, "w") as f:
+        f.writelines(seal_gt_list)
+        f.close()
+
+    with open(seal_ppocr_gt, 'w') as f:
+        f.writelines(seal_ppocr_list)
+        f.close()
+
+def vis_seal_ppocr(data_dir, label_file, save_dir):
+
+    datas = open(label_file, 'r').readlines()
+    for idx, line in enumerate(datas):
+        img_path, label = line.strip().split('\t')
+        img_path = os.path.join(data_dir, img_path)
+
+        label = json.loads(label)
+        src_im = cv2.imread(img_path)
+        if src_im is None:
+            continue
+
+        for anno in label:
+            seal_box = anno['seal_box']
+            txt_boxes = anno['polys']
+
+             # vis seal box
+            src_im = draw_text_det_res([seal_box], src_im, color=(255, 255, 0))
+            src_im = draw_text_det_res(txt_boxes, src_im, color=(255, 0, 0))
+
+        save_path = os.path.join(save_dir, os.path.basename(img_path))
+        if not os.path.exists(save_dir):
+            os.makedirs(save_dir)
+        # print(src_im.shape)
+        cv2.imwrite(save_path, src_im)
+
+
+def draw_html(img_dir, save_name):
+    import glob
+
+    images_dir = glob.glob(img_dir + "/*")
+    print(len(images_dir))
+
+    html_path = save_name
+    with open(html_path, 'w') as html:
+        html.write('<html>\n<body>\n')
+        html.write('<table border="1">\n')
+        html.write("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />")
+
+        html.write("<tr>\n")
+        html.write(f'<td> \n GT')
+
+        for i, filename in enumerate(sorted(images_dir)):
+            if filename.endswith("txt"): continue
+            print(filename)
+
+            base = "{}".format(filename)
+            if True:
+                html.write("<tr>\n")
+                html.write(f'<td> {filename}\n GT')
+                html.write('<td>GT 310\n<img src="%s" width=640></td>' % (base))
+                html.write("</tr>\n")
+
+        html.write('<style>\n')
+        html.write('span {\n')
+        html.write('    color: red;\n')
+        html.write('}\n')
+        html.write('</style>\n')
+        html.write('</table>\n')
+        html.write('</html>\n</body>\n')
+    print("ok")
+
+
+def crop_seal_from_img(label_file, data_dir, save_dir, save_gt_path):
+
+    if not os.path.exists(save_dir):
+        os.makedirs(save_dir)
+
+    datas = open(label_file, 'r').readlines()
+    all_gts = []
+    count = 0
+    for idx, line in enumerate(datas):
+        img_path, label = line.strip().split('\t')
+        img_path = os.path.join(data_dir, img_path)
+
+        label = json.loads(label)
+        src_im = cv2.imread(img_path)
+        if src_im is None:
+            continue
+
+        for c, anno in enumerate(label):
+            seal_poly = anno['seal_box']
+            txt_boxes = anno['polys']
+            txts = anno['texts']
+            ignore_tags = anno['ignore_tags']
+
+            box = poly2box(seal_poly)
+            img_crop = src_im[box[0][1]:box[2][1], box[0][0]:box[2][0], :]
+
+            save_path = os.path.join(save_dir, f"{idx}_{c}.jpg")
+            cv2.imwrite(save_path, np.array(img_crop))
+
+            img_gt = []
+            for i in range(len(txts)):
+                txt_boxes_crop = np.array(txt_boxes[i])
+                txt_boxes_crop[:, 1] -= box[0, 1]
+                txt_boxes_crop[:, 0] -= box[0, 0]
+                img_gt.append({'transcription': txts[i], "points": txt_boxes_crop.tolist(), "ignore_tag": ignore_tags[i]})
+
+            if len(img_gt) >= 1:
+                count += 1
+            save_gt = f"{os.path.basename(save_path)}\t{json.dumps(img_gt)}\n"
+
+            all_gts.append(save_gt)
+
+    print(f"The num of all image: {len(all_gts)}, and the number of useful image: {count}")
+    if not os.path.exists(os.path.dirname(save_gt_path)):
+        os.makedirs(os.path.dirname(save_gt_path))
+
+    with open(save_gt_path, "w") as f:
+        f.writelines(all_gts)
+        f.close()
+    print("Done")
+
+
+if __name__ == "__main__":
+    # 数据处理
+    gen_extract_label("./seal_labeled_datas", "./seal_labeled_datas/Label.txt", "./seal_ppocr_gt/seal_det_img.txt", "./seal_ppocr_gt/seal_ppocr_img.txt")
+    vis_seal_ppocr("./seal_labeled_datas", "./seal_ppocr_gt/seal_ppocr_img.txt", "./seal_ppocr_gt/seal_ppocr_vis/")
+    draw_html("./seal_ppocr_gt/seal_ppocr_vis/", "./vis_seal_ppocr.html")
+    seal_ppocr_img_label = "./seal_ppocr_gt/seal_ppocr_img.txt"
+    crop_seal_from_img(seal_ppocr_img_label, "./seal_labeled_datas/", "./seal_img_crop", "./seal_img_crop/label.txt")
+
+ +
+ +

处理完成后,生成的文件如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
├── seal_img_crop/
+│   ├── 0_0.jpg
+│   ├── ...
+│   └── label.txt
+├── seal_ppocr_gt/
+│   ├── seal_det_img.txt
+│   ├── seal_ppocr_img.txt
+│   └── seal_ppocr_vis/
+│       ├── test1.png
+│       ├── ...
+└── vis_seal_ppocr.html
+
+

其中seal_img_crop/label.txt文件为印章识别标签文件,其内容格式为:

+
0_0.jpg    [{"transcription": "\u7535\u5b50\u56de\u5355", "points": [[29, 73], [96, 73], [96, 90], [29, 90]], "ignore_tag": false}, {"transcription": "\u4e91\u5357\u7701\u519c\u6751\u4fe1\u7528\u793e", "points": [[9, 58], [26, 63], [30, 49], [38, 35], [47, 29], [64, 26], [81, 32], [90, 45], [94, 63], [118, 57], [110, 35], [95, 17], [67, 0], [38, 7], [21, 23], [10, 43]], "ignore_tag": false}, {"transcription": "\u4e13\u7528\u7ae0", "points": [[29, 87], [95, 87], [95, 106], [29, 106]], "ignore_tag": false}]
+
+

可以直接用于PaddleOCR的PGNet算法的训练。

+

seal_ppocr_gt/seal_det_img.txt为印章检测标签文件,其内容格式为:

+
img/test1.png    [{"polys": [[408, 232], [537, 232], [537, 352], [408, 352]], "cls": 1}]
+
+

为了使用PaddleDetection工具完成印章检测模型的训练,需要将seal_det_img.txt转换为COCO或者VOC的数据标注格式。

+

可以直接使用下述代码将印章检测标注转换成VOC格式。

+
+ +
  1
+  2
+  3
+  4
+  5
+  6
+  7
+  8
+  9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
import numpy as np
+import json
+import cv2
+import os
+from shapely.geometry import Polygon
+
+seal_train_gt = "./seal_ppocr_gt/seal_det_img.txt"
+# 注:仅用于示例,实际使用中需要分别转换训练集和测试集的标签
+seal_valid_gt = "./seal_ppocr_gt/seal_det_img.txt"
+
+def gen_main_train_txt(mode='train'):
+    if mode == "train":
+        file_path = seal_train_gt
+    if mode in ['valid', 'test']:
+        file_path = seal_valid_gt
+
+    save_path = f"./seal_VOC/ImageSets/Main/{mode}.txt"
+    save_train_path = f"./seal_VOC/{mode}.txt"
+    if not os.path.exists(os.path.dirname(save_path)):
+        os.makedirs(os.path.dirname(save_path))
+
+    datas = open(file_path, 'r').readlines()
+    img_names = []
+    train_names = []
+    for line in datas:
+        img_name = line.strip().split('\t')[0]
+        img_name = os.path.basename(img_name)
+        (i_name, extension) = os.path.splitext(img_name)
+        t_name = 'JPEGImages/'+str(img_name)+' '+'Annotations/'+str(i_name)+'.xml\n'
+        train_names.append(t_name)
+        img_names.append(i_name + "\n")
+
+    with open(save_train_path, "w") as f:
+        f.writelines(train_names)
+        f.close()
+
+    with open(save_path, "w") as f:
+        f.writelines(img_names)
+        f.close()
+
+    print(f"{mode} save done")
+
+
+def gen_xml_label(mode='train'):
+    if mode == "train":
+        file_path = seal_train_gt
+    if mode in ['valid', 'test']:
+        file_path = seal_valid_gt
+
+    datas = open(file_path, 'r').readlines()
+    img_names = []
+    train_names = []
+    anno_path = "./seal_VOC/Annotations"
+    img_path = "./seal_VOC/JPEGImages"
+
+    if not os.path.exists(anno_path):
+        os.makedirs(anno_path)
+    if not os.path.exists(img_path):
+        os.makedirs(img_path)
+
+    for idx, line in enumerate(datas):
+        img_name, label = line.strip().split('\t')
+        img = cv2.imread(os.path.join("./seal_labeled_datas", img_name))
+        cv2.imwrite(os.path.join(img_path, os.path.basename(img_name)), img)
+        height, width, c = img.shape
+        img_name = os.path.basename(img_name)
+        (i_name, extension) = os.path.splitext(img_name)
+        label = json.loads(label)
+
+        xml_file = open(("./seal_VOC/Annotations" + '/' + i_name + '.xml'), 'w')
+        xml_file.write('<annotation>\n')
+        xml_file.write('    <folder>seal_VOC</folder>\n')
+        xml_file.write('    <filename>' + str(img_name) + '</filename>\n')
+        xml_file.write('    <path>' + 'Annotations/' + str(img_name) + '</path>\n')
+        xml_file.write('    <size>\n')
+        xml_file.write('        <width>' + str(width) + '</width>\n')
+        xml_file.write('        <height>' + str(height) + '</height>\n')
+        xml_file.write('        <depth>3</depth>\n')
+        xml_file.write('    </size>\n')
+        xml_file.write('    <segmented>0</segmented>\n')
+
+        for anno in label:
+            poly = anno['polys']
+            if anno['cls'] == 1:
+                gt_cls = 'redseal'
+            xmin = np.min(np.array(poly)[:, 0])
+            ymin = np.min(np.array(poly)[:, 1])
+            xmax = np.max(np.array(poly)[:, 0])
+            ymax = np.max(np.array(poly)[:, 1])
+            xmin,ymin,xmax,ymax= int(xmin),int(ymin),int(xmax),int(ymax)
+            xml_file.write('    <object>\n')
+            xml_file.write('        <name>'+str(gt_cls)+'</name>\n')
+            xml_file.write('        <pose>Unspecified</pose>\n')
+            xml_file.write('        <truncated>0</truncated>\n')
+            xml_file.write('        <difficult>0</difficult>\n')
+            xml_file.write('        <bndbox>\n')
+            xml_file.write('            <xmin>'+str(xmin)+'</xmin>\n')
+            xml_file.write('            <ymin>'+str(ymin)+'</ymin>\n')
+            xml_file.write('            <xmax>'+str(xmax)+'</xmax>\n')
+            xml_file.write('            <ymax>'+str(ymax)+'</ymax>\n')
+            xml_file.write('        </bndbox>\n')
+            xml_file.write('    </object>\n')
+        xml_file.write('</annotation>')
+        xml_file.close()
+    print(f'{mode} xml save done!')
+
+
+gen_main_train_txt()
+gen_main_train_txt('valid')
+gen_xml_label('train')
+gen_xml_label('valid')
+
+ +
+ +

数据处理完成后,转换为VOC格式的印章检测数据存储在~/data/seal_VOC目录下,目录组织结构为:

+
1
+2
+3
+4
+5
+6
+7
+8
+9
├── Annotations/
+├── ImageSets/
+│   └── Main/
+│       ├── train.txt
+│       └── valid.txt
+├── JPEGImages/
+├── train.txt
+└── valid.txt
+└── label_list.txt
+
+

Annotations下为数据的标签,JPEGImages目录下为图像文件,label_list.txt为标注检测框类别标签文件。

+

在接下来一节中,将介绍如何使用PaddleDetection工具库完成印章检测模型的训练。

+

4. 印章检测实践

+

在实际应用中,印章多是出现在合同,发票,公告等场景中,印章文字识别的任务需要排除图像中背景文字的影响,因此需要先检测出图像中的印章区域。

+

借助PaddleDetection目标检测库可以很容易的实现印章检测任务,使用PaddleDetection训练印章检测任务流程如下:

+
    +
  • 选择算法
  • +
  • 修改数据集配置路径
  • +
  • 启动训练
  • +
+

算法选择

+

PaddleDetection中有许多检测算法可以选择,考虑到每条数据中印章区域较为清晰,且考虑到性能需求。在本项目中,我们采用mobilenetv3为backbone的ppyolo算法完成印章检测任务,对应的配置文件是:configs/ppyolo/ppyolo_mbv3_large.yml

+

修改配置文件

+

配置文件中的默认数据路径是COCO, +需要修改为印章检测的数据路径,主要修改如下: +在配置文件'configs/ppyolo/ppyolo_mbv3_large.yml'末尾增加如下内容:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
metric: VOC
+map_type: 11point
+num_classes: 2
+
+TrainDataset:
+  !VOCDataSet
+    dataset_dir: dataset/seal_VOC
+    anno_path: train.txt
+    label_list: label_list.txt
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
+
+EvalDataset:
+  !VOCDataSet
+    dataset_dir: dataset/seal_VOC
+    anno_path: test.txt
+    label_list: label_list.txt
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'difficult']
+
+TestDataset:
+  !ImageFolder
+    anno_path: dataset/seal_VOC/label_list.txt
+
+

配置文件中设置的数据路径在PaddleDetection/dataset目录下,我们可以将处理后的印章检测训练数据移动到PaddleDetection/dataset目录下或者创建一个软连接。

+
!ln -s seal_VOC ./PaddleDetection/dataset/
+
+

另外图象中印章数量比较少,可以调整NMS后处理的检测框数量,即keep_top_k,nms_top_k 从100,1000,调整为10,100。在配置文件'configs/ppyolo/ppyolo_mbv3_large.yml'末尾增加如下内容完成后处理参数的调整

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
BBoxPostProcess:
+  decode:
+    name: YOLOBox
+    conf_thresh: 0.005
+    downsample_ratio: 32
+    clip_bbox: true
+    scale_x_y: 1.05
+  nms:
+    name: MultiClassNMS
+    keep_top_k: 10  # 修改前100
+    nms_threshold: 0.45
+    nms_top_k: 100  # 修改前1000
+    score_threshold: 0.005
+
+

修改完成后,需要在PaddleDetection中增加印章数据的处理代码,即在PaddleDetection/ppdet/data/source/目录下创建seal.py文件,文件中填充如下代码:

+
+ +
  1
+  2
+  3
+  4
+  5
+  6
+  7
+  8
+  9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
import os
+import numpy as np
+from ppdet.core.workspace import register, serializable
+from .dataset import DetDataset
+import cv2
+import json
+
+from ppdet.utils.logger import setup_logger
+logger = setup_logger(__name__)
+
+
+@register
+@serializable
+class SealDataSet(DetDataset):
+    """
+    Load dataset with COCO format.
+
+    Args:
+        dataset_dir (str): root directory for dataset.
+        image_dir (str): directory for images.
+        anno_path (str): coco annotation file path.
+        data_fields (list): key name of data dictionary, at least have 'image'.
+        sample_num (int): number of samples to load, -1 means all.
+        load_crowd (bool): whether to load crowded ground-truth.
+            False as default
+        allow_empty (bool): whether to load empty entry. False as default
+        empty_ratio (float): the ratio of empty record number to total
+            record's, if empty_ratio is out of [0. ,1.), do not sample the
+            records and use all the empty entries. 1. as default
+    """
+
+    def __init__(self,
+                 dataset_dir=None,
+                 image_dir=None,
+                 anno_path=None,
+                 data_fields=['image'],
+                 sample_num=-1,
+                 load_crowd=False,
+                 allow_empty=False,
+                 empty_ratio=1.):
+        super(SealDataSet, self).__init__(dataset_dir, image_dir, anno_path,
+                                          data_fields, sample_num)
+        self.load_image_only = False
+        self.load_semantic = False
+        self.load_crowd = load_crowd
+        self.allow_empty = allow_empty
+        self.empty_ratio = empty_ratio
+
+    def _sample_empty(self, records, num):
+        # if empty_ratio is out of [0. ,1.), do not sample the records
+        if self.empty_ratio < 0. or self.empty_ratio >= 1.:
+            return records
+        import random
+        sample_num = min(
+            int(num * self.empty_ratio / (1 - self.empty_ratio)), len(records))
+        records = random.sample(records, sample_num)
+        return records
+
+    def parse_dataset(self):
+        anno_path = os.path.join(self.dataset_dir, self.anno_path)
+        image_dir = os.path.join(self.dataset_dir, self.image_dir)
+
+        records = []
+        empty_records = []
+        ct = 0
+
+        assert anno_path.endswith('.txt'), \
+            'invalid seal_gt file: ' + anno_path
+
+        all_datas = open(anno_path, 'r').readlines()
+
+        for idx, line in enumerate(all_datas):
+            im_path, label = line.strip().split('\t')
+            img_path = os.path.join(image_dir, im_path)
+            label = json.loads(label)
+            im_h, im_w, im_c = cv2.imread(img_path).shape
+
+            coco_rec = {
+                'im_file': img_path,
+                'im_id': np.array([idx]),
+                'h': im_h,
+                'w': im_w,
+            } if 'image' in self.data_fields else {}
+
+            if not self.load_image_only:
+                bboxes = []
+                for anno in label:
+                    poly = anno['polys']
+                    # poly to box
+                    x1 = np.min(np.array(poly)[:, 0])
+                    y1 = np.min(np.array(poly)[:, 1])
+                    x2 = np.max(np.array(poly)[:, 0])
+                    y2 = np.max(np.array(poly)[:, 1])
+                eps = 1e-5
+                if x2 - x1 > eps and y2 - y1 > eps:
+                    clean_box = [
+                        round(float(x), 3) for x in [x1, y1, x2, y2]
+                    ]
+                    anno = {'clean_box': clean_box, 'gt_cls':int(anno['cls'])}
+                    bboxes.append(anno)
+                else:
+                    logger.info("invalid box")
+
+            num_bbox = len(bboxes)
+            if num_bbox <= 0:
+                continue
+
+            gt_bbox = np.zeros((num_bbox, 4), dtype=np.float32)
+            gt_class = np.zeros((num_bbox, 1), dtype=np.int32)
+            is_crowd = np.zeros((num_bbox, 1), dtype=np.int32)
+            # gt_poly = [None] * num_bbox
+
+            for i, box in enumerate(bboxes):
+                gt_class[i][0] = box['gt_cls']
+                gt_bbox[i, :] = box['clean_box']
+                is_crowd[i][0] = 0
+
+            gt_rec = {
+                        'is_crowd': is_crowd,
+                        'gt_class': gt_class,
+                        'gt_bbox': gt_bbox,
+                        # 'gt_poly': gt_poly,
+                    }
+
+            for k, v in gt_rec.items():
+                if k in self.data_fields:
+                    coco_rec[k] = v
+
+            records.append(coco_rec)
+            ct += 1
+            if self.sample_num > 0 and ct >= self.sample_num:
+                break
+        self.roidbs = records
+
+ +
+ +

启动训练

+

启动单卡训练的命令为:

+
1
+2
+3
+4
!python3  tools/train.py  -c configs/ppyolo/ppyolo_mbv3_large.yml  --eval
+
+# 分布式训练命令为:
+!python3 -m paddle.distributed.launch   --gpus 0,1,2,3,4,5,6,7  tools/train.py  -c configs/ppyolo/ppyolo_mbv3_large.yml  --eval
+
+

训练完成后,日志中会打印模型的精度:

+
1
+2
+3
+4
+5
[07/05 11:42:09] ppdet.engine INFO: Eval iter: 0
+[07/05 11:42:14] ppdet.metrics.metrics INFO: Accumulating evaluatation results...
+[07/05 11:42:14] ppdet.metrics.metrics INFO: mAP(0.50, 11point) = 99.31%
+[07/05 11:42:14] ppdet.engine INFO: Total sample number: 112, averge FPS: 26.45840794253432
+[07/05 11:42:14] ppdet.engine INFO: Best test bbox ap is 0.996.
+
+

我们可以使用训练好的模型观察预测结果:

+
!python3 tools/infer.py -c configs/ppyolo/ppyolo_mbv3_large.yml -o weights=./output/ppyolo_mbv3_large/model_final.pdparams  --img_dir=./test.jpg
+
+

预测结果如下:

+

+

5. 印章文字识别实践

+

在使用ppyolo检测到印章区域后,接下来借助PaddleOCR里的文字识别能力,完成印章中文字的识别。

+

PaddleOCR中的OCR算法包含文字检测算法,文字识别算法以及OCR端对端算法。

+

文字检测算法负责检测到图像中的文字,再由文字识别模型识别出检测到的文字,进而实现OCR的任务。文字检测+文字识别串联完成OCR任务的架构称为两阶段的OCR算法。相对应的端对端的OCR方法可以用一个算法同时完成文字检测和识别的任务。

+ + + + + + + + + + + + + + + +
文字检测文字识别端对端算法
DB\DB++\EAST\SAST\PSENetSVTR\CRNN\NRTN\Abinet\SAR...PGNet
+

本节中将分别介绍端对端的文字检测识别算法以及两阶段的文字检测识别算法在印章检测识别任务上的实践。

+

5.1 端对端印章文字识别实践

+

本节介绍使用PaddleOCR里的PGNet算法完成印章文字识别。

+

PGNet属于端对端的文字检测识别算法,在PaddleOCR中的配置文件为: +PaddleOCR/configs/e2e/e2e_r50_vd_pg.yml

+

使用PGNet完成文字检测识别任务的步骤为:

+
    +
  • 修改配置文件
  • +
  • 启动训练
  • +
+

PGNet默认配置文件的数据路径为totaltext数据集路径,本次训练中,需要修改为上一节数据处理后得到的标签文件和数据目录:

+

训练数据配置修改后如下:

+
1
+2
+3
+4
+5
+6
Train:
+  dataset:
+    name: PGDataSet
+    data_dir: ./train_data/seal_ppocr
+    label_file_list: [./train_data/seal_ppocr/seal_ppocr_img.txt]
+    ratio_list: [1.0]
+
+

测试数据集配置修改后如下:

+
1
+2
+3
+4
+5
Eval:
+  dataset:
+    name: PGDataSet
+    data_dir: ./train_data/seal_ppocr_test
+    label_file_list: [./train_data/seal_ppocr_test/seal_ppocr_img.txt]
+
+

启动训练的命令为:

+
!python3 tools/train.py -c configs/e2e/e2e_r50_vd_pg.yml
+
+

模型训练完成后,可以得到最终的精度为47.4%。数据量较少,以及数据质量较差会影响模型的训练精度,如果有更多的数据参与训练,精度将进一步提升。

+

如需获取已训练模型,请点击文末的链接,加入官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁

+

5.2 两阶段印章文字识别实践

+

上一节介绍了使用PGNet实现印章识别任务的训练流程。本小节将介绍使用PaddleOCR里的文字检测和文字识别算法分别完成印章文字的检测和识别。

+

5.2.1 印章文字检测

+

PaddleOCR中包含丰富的文字检测算法,包含DB,DB++,EAST,SAST,PSENet等等。其中DB,DB++,PSENet均支持弯曲文字检测,本项目中,使用DB++作为印章弯曲文字检测算法。

+

PaddleOCR中发布的db++文字检测算法模型是英文文本检测模型,因此需要重新训练模型。

+

修改[DB++配置文件](DB++的默认配置文件位于configs/det/det_r50_db++_icdar15.yml +中的数据路径:

+
1
+2
+3
+4
+5
+6
Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/seal_ppocr
+    label_file_list: [./train_data/seal_ppocr/seal_ppocr_img.txt]
+    ratio_list: [1.0]
+
+

测试数据集配置修改后如下:

+
1
+2
+3
+4
+5
Eval:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/seal_ppocr_test
+    label_file_list: [./train_data/seal_ppocr_test/seal_ppocr_img.txt]
+
+

启动训练:

+
!python3 tools/train.py  -c  configs/det/det_r50_db++_icdar15.yml -o Global.epoch_num=100
+
+

考虑到数据较少,通过Global.epoch_num设置仅训练100个epoch。 +模型训练完成后,在测试集上预测的可视化效果如下:

+

+

如需获取已训练模型,请点击文末的链接,加入官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁

+

5.2.2 印章文字识别

+

上一节中完成了印章文字的检测模型训练,本节介绍印章文字识别模型的训练。识别模型采用SVTR算法,SVTR算法是IJCAI收录的文字识别算法,SVTR模型具备超轻量高精度的特点。

+

在启动训练之前,需要准备印章文字识别需要的数据集,需要使用如下代码,将印章中的文字区域剪切出来构建训练集。

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
import cv2
+import numpy as np
+
+def get_rotate_crop_image(img, points):
+    '''
+    img_height, img_width = img.shape[0:2]
+    left = int(np.min(points[:, 0]))
+    right = int(np.max(points[:, 0]))
+    top = int(np.min(points[:, 1]))
+    bottom = int(np.max(points[:, 1]))
+    img_crop = img[top:bottom, left:right, :].copy()
+    points[:, 0] = points[:, 0] - left
+    points[:, 1] = points[:, 1] - top
+    '''
+    assert len(points) == 4, "shape of points must be 4*2"
+    img_crop_width = int(
+        max(
+            np.linalg.norm(points[0] - points[1]),
+            np.linalg.norm(points[2] - points[3])))
+    img_crop_height = int(
+        max(
+            np.linalg.norm(points[0] - points[3]),
+            np.linalg.norm(points[1] - points[2])))
+    pts_std = np.float32([[0, 0], [img_crop_width, 0],
+                          [img_crop_width, img_crop_height],
+                          [0, img_crop_height]])
+    M = cv2.getPerspectiveTransform(points, pts_std)
+    dst_img = cv2.warpPerspective(
+        img,
+        M, (img_crop_width, img_crop_height),
+        borderMode=cv2.BORDER_REPLICATE,
+        flags=cv2.INTER_CUBIC)
+    dst_img_height, dst_img_width = dst_img.shape[0:2]
+    if dst_img_height * 1.0 / dst_img_width >= 1.5:
+        dst_img = np.rot90(dst_img)
+    return dst_img
+
+
+def run(data_dir, label_file, save_dir):
+    datas = open(label_file, 'r').readlines()
+    for idx, line in enumerate(datas):
+        img_path, label = line.strip().split('\t')
+        img_path = os.path.join(data_dir, img_path)
+
+        label = json.loads(label)
+        src_im = cv2.imread(img_path)
+        if src_im is None:
+            continue
+
+        for anno in label:
+            seal_box = anno['seal_box']
+            txt_boxes = anno['polys']
+            crop_im = get_rotate_crop_image(src_im, text_boxes)
+
+            save_path = os.path.join(save_dir, f'{idx}.png')
+            if not os.path.exists(save_dir):
+                os.makedirs(save_dir)
+            # print(src_im.shape)
+            cv2.imwrite(save_path, crop_im)
+
+

数据处理完成后,即可配置训练的配置文件。SVTR配置文件选择configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml +修改SVTR配置文件中的训练数据部分如下:

+
1
+2
+3
+4
+5
+6
Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/seal_ppocr_crop/
+    label_file_list:
+    - ./train_data/seal_ppocr_crop/train_list.txt
+
+

修改预测部分配置文件:

+
1
+2
+3
+4
+5
+6
Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/seal_ppocr_crop/
+    label_file_list:
+    - ./train_data/seal_ppocr_crop_test/train_list.txt
+
+

启动训练:

+
!python3 tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml
+
+

训练完成后可以发现测试集指标达到了61%。 +由于数据较少,训练时会发现在训练集上的acc指标远大于测试集上的acc指标,即出现过拟合现象。通过补充数据和一些数据增强可以缓解这个问题。

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.html" "b/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.html" new file mode 100644 index 0000000000..2f88c21808 --- /dev/null +++ "b/applications/\345\217\221\347\245\250\345\205\263\351\224\256\344\277\241\346\201\257\346\212\275\345\217\226.html" @@ -0,0 +1,5751 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 增值税发票 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

基于VI-LayoutXLM的发票关键信息抽取

+

1. 项目背景及意义

+

关键信息抽取在文档场景中被广泛使用,如身份证中的姓名、住址信息抽取,快递单中的姓名、联系方式等关键字段内容的抽取。传统基于模板匹配的方案需要针对不同的场景制定模板并进行适配,较为繁琐,不够鲁棒。基于该问题,我们借助飞桨提供的PaddleOCR套件中的关键信息抽取方案,实现对增值税发票场景的关键信息抽取。

+

2. 项目内容

+

本项目基于PaddleOCR开源套件,以VI-LayoutXLM多模态关键信息抽取模型为基础,针对增值税发票场景进行适配,提取该场景的关键信息。

+

3. 安装环境

+
1
+2
+3
+4
+5
+6
+7
+8
# 首先git官方的PaddleOCR项目,安装需要的依赖
+# 第一次运行打开该注释
+git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+# 安装PaddleOCR的依赖
+pip install -r requirements.txt
+# 安装关键信息抽取任务的依赖
+pip install -r ./ppstructure/kie/requirements.txt
+
+

4. 关键信息抽取

+

基于文档图像的关键信息抽取包含3个部分:(1)文本检测(2)文本识别(3)关键信息抽取方法,包括语义实体识别或者关系抽取,下面分别进行介绍。

+

4.1 文本检测

+

本文重点关注发票的关键信息抽取模型训练与预测过程,因此在关键信息抽取过程中,直接使用标注的文本检测与识别标注信息进行测试,如果你希望自定义该场景的文本检测模型,完成端到端的关键信息抽取部分,请参考文本检测模型训练教程,按照训练数据格式准备数据,并完成该场景下垂类文本检测模型的微调过程。

+

4.2 文本识别

+

本文重点关注发票的关键信息抽取模型训练与预测过程,因此在关键信息抽取过程中,直接使用提供的文本检测与识别标注信息进行测试,如果你希望自定义该场景的文本检测模型,完成端到端的关键信息抽取部分,请参考文本识别模型训练教程,按照训练数据格式准备数据,并完成该场景下垂类文本识别模型的微调过程。

+

4.3 语义实体识别 (Semantic Entity Recognition)

+

语义实体识别指的是给定一段文本行,确定其类别(如姓名住址等类别)。PaddleOCR中提供了基于VI-LayoutXLM的多模态语义实体识别方法,融合文本、位置与版面信息,相比LayoutXLM多模态模型,去除了其中的视觉骨干网络特征提取部分,引入符合阅读顺序的文本行排序方法,同时使用UDML联合互蒸馏方法进行训练,最终在精度与速度方面均超越LayoutXLM。更多关于VI-LayoutXLM的算法介绍与精度指标,请参考:VI-LayoutXLM算法介绍

+

4.3.1 准备数据

+

发票场景为例,我们首先需要标注出其中的关键字段,我们将其标注为问题-答案的key-value pair,如下,编号No为12270830,则No字段标注为question,12270830字段标注为answer。如下图所示。

+

+

注意:

+
    +
  • 如果文本检测模型数据标注过程中,没有标注 非关键信息内容 的检测框,那么在标注关键信息抽取任务的时候,也不需要标注该部分,如上图所示;如果标注的过程,如果同时标注了非关键信息内容 的检测框,那么我们需要将该部分的label记为other。
  • +
  • 标注过程中,需要以文本行为单位进行标注,无需标注单个字符的位置信息。
  • +
+

已经处理好的增值税发票数据集从这里下载:增值税发票数据集下载链接

+

下载好发票数据集,并解压在train_data目录下,目录结构如下所示。

+
1
+2
+3
+4
+5
+6
train_data
+  |--zzsfp
+       |---class_list.txt
+       |---imgs/
+       |---train.json
+       |---val.json
+
+

其中class_list.txt是包含other, question, answer,3个种类的的类别列表(不区分大小写),imgs目录底下,train.jsonval.json分别表示训练与评估集合的标注文件。训练集中包含30张图片,验证集中包含8张图片。部分标注如下所示。

+
b33.jpg [{"transcription": "No", "label": "question", "points": [[2882, 472], [3026, 472], [3026, 588], [2882, 588]], }, {"transcription": "12269563", "label": "answer", "points": [[3066, 448], [3598, 448], [3598, 576], [3066, 576]], ]}]
+
+

相比于OCR检测的标注,仅多了label字段。

+

4.3.2 开始训练

+

VI-LayoutXLM的配置为ser_vi_layoutxlm_xfund_zh_udml.yml,需要修改数据、类别数目以及配置文件。

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
Architecture:
+  model_type: &model_type "kie"
+  name: DistillationModel
+  algorithm: Distillation
+  Models:
+    Teacher:
+      pretrained:
+      freeze_params: false
+      return_all_feats: true
+      model_type: *model_type
+      algorithm: &algorithm "LayoutXLM"
+      Transform:
+      Backbone:
+        name: LayoutXLMForSer
+        pretrained: True
+        # one of base or vi
+        mode: vi
+        checkpoints:
+        # 定义类别数目
+        num_classes: &num_classes 5
+   ...
+
+PostProcess:
+  name: DistillationSerPostProcess
+  model_name: ["Student", "Teacher"]
+  key: backbone_out
+  # 定义类别文件
+  class_path: &class_path train_data/zzsfp/class_list.txt
+
+Train:
+  dataset:
+    name: SimpleDataSet
+    # 定义训练数据目录与标注文件
+    data_dir: train_data/zzsfp/imgs
+    label_file_list:
+      - train_data/zzsfp/train.json
+  ...
+
+Eval:
+  dataset:
+    # 定义评估数据目录与标注文件
+    name: SimpleDataSet
+    data_dir: train_data/zzsfp/imgs
+    label_file_list:
+      - train_data/zzsfp/val.json
+  ...
+
+

LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。

+ + + + + + + + + + + + + + + + + + + + +
模型迭代轮数Hmean
LayoutXLM50100.00%
VI-LayoutXLM50100.00%
+

可以看出,由于当前数据量较少,场景比较简单,因此2个模型的Hmean均达到了100%。

+

4.3.3 模型评估

+

模型训练过程中,使用的是知识蒸馏的策略,最终保留了学生模型的参数,在评估时,我们需要针对学生模型的配置文件进行修改: ser_vi_layoutxlm_xfund_zh.yml,修改内容与训练配置相同,包括类别数、类别映射文件、数据目录

+

修改完成后,执行下面的命令完成评估过程。

+
1
+2
# 注意:需要根据你的配置文件地址与保存的模型地址,对评估命令进行修改
+python3 tools/eval.py -c ./fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy
+
+

输出结果如下所示:

+
1
+2
+3
+4
+5
[2022/08/18 08:49:58] ppocr INFO: metric eval ***************
+[2022/08/18 08:49:58] ppocr INFO: precision:1.0
+[2022/08/18 08:49:58] ppocr INFO: recall:1.0
+[2022/08/18 08:49:58] ppocr INFO: hmean:1.0
+[2022/08/18 08:49:58] ppocr INFO: fps:1.9740402401574881
+
+

4.3.4 模型预测

+

使用下面的命令进行预测:

+
python3 tools/infer_kie_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/XFUND/zh_val/val.json Global.infer_mode=False
+
+

预测结果会保存在配置文件中的Global.save_res_path目录中。

+

部分预测结果如下所示。

+

+
    +
  • 注意:在预测时,使用的文本检测与识别结果为标注的结果,直接从json文件里面进行读取。
  • +
+

如果希望使用OCR引擎结果得到的结果进行推理,则可以使用下面的命令进行推理。

+
python3 tools/infer_kie_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/imgs/b25.jpg Global.infer_mode=True
+
+

结果如下所示:

+

+

它会使用PP-OCRv3的文本检测与识别模型进行获取文本位置与内容信息。

+

可以看出,由于训练的过程中,没有标注额外的字段为other类别,所以大多数检测出来的字段被预测为question或者answer。

+

如果希望构建基于你在垂类场景训练得到的OCR检测与识别模型,可以使用下面的方法传入检测与识别的inference 模型路径,即可完成OCR文本检测与识别以及SER的串联过程。

+
python3 tools/infer_kie_token_ser.py -c fapiao/ser_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/imgs/b25.jpg Global.infer_mode=True Global.kie_rec_model_dir="your_rec_model" Global.kie_det_model_dir="your_det_model"
+
+

4.4 关系抽取(Relation Extraction)

+

使用SER模型,可以获取图像中所有的question与answer的字段,继续这些字段的类别,我们需要进一步获取question与answer之间的连接,因此需要进一步训练关系抽取模型,解决该问题。本文也基于VI-LayoutXLM多模态预训练模型,进行下游RE任务的模型训练。

+

4.4.1 准备数据

+

以发票场景为例,相比于SER任务,RE中还需要标记每个文本行的id信息以及链接关系linking,如下所示。

+

+

+

标注文件的部分内容如下所示。

+
b33.jpg [{"transcription": "No", "label": "question", "points": [[2882, 472], [3026, 472], [3026, 588], [2882, 588]], "id": 0, "linking": [[0, 1]]}, {"transcription": "12269563", "label": "answer", "points": [[3066, 448], [3598, 448], [3598, 576], [3066, 576]], "id": 1, "linking": [[0, 1]]}]
+
+

相比与SER的标注,多了idlinking的信息,分别表示唯一标识以及连接关系。

+

已经处理好的增值税发票数据集从这里下载:增值税发票数据集下载链接

+

4.4.2 开始训练

+

基于VI-LayoutXLM的RE任务配置为re_vi_layoutxlm_xfund_zh_udml.yml,需要修改数据路径、类别列表文件

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
Train:
+  dataset:
+    name: SimpleDataSet
+    # 定义训练数据目录与标注文件
+    data_dir: train_data/zzsfp/imgs
+    label_file_list:
+      - train_data/zzsfp/train.json
+    transforms:
+      - DecodeImage: # load image
+          img_mode: RGB
+          channel_first: False
+      - VQATokenLabelEncode: # Class handling label
+          contains_re: True
+          algorithm: *algorithm
+          class_path: &class_path train_data/zzsfp/class_list.txt
+  ...
+
+Eval:
+  dataset:
+    # 定义评估数据目录与标注文件
+    name: SimpleDataSet
+    data_dir: train_data/zzsfp/imgs
+    label_file_list:
+      - train_data/zzsfp/val.json
+  ...
+
+

LayoutXLM与VI-LayoutXLM针对该场景的训练结果如下所示。

+ + + + + + + + + + + + + + + + + + + + +
模型迭代轮数Hmean
LayoutXLM5098.00%
VI-LayoutXLM5099.30%
+

可以看出,对于VI-LayoutXLM相比LayoutXLM的Hmean高了1.3%。

+

4.4.3 模型评估

+

模型训练过程中,使用的是知识蒸馏的策略,最终保留了学生模型的参数,在评估时,我们需要针对学生模型的配置文件进行修改: re_vi_layoutxlm_xfund_zh.yml,修改内容与训练配置相同,包括类别映射文件、数据目录

+

修改完成后,执行下面的命令完成评估过程。

+
1
+2
# 注意:需要根据你的配置文件地址与保存的模型地址,对评估命令进行修改
+python3 tools/eval.py -c ./fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy
+
+

输出结果如下所示:

+
1
+2
+3
+4
+5
[2022/08/18 12:17:14] ppocr INFO: metric eval ***************
+[2022/08/18 12:17:14] ppocr INFO: precision:1.0
+[2022/08/18 12:17:14] ppocr INFO: recall:0.9873417721518988
+[2022/08/18 12:17:14] ppocr INFO: hmean:0.9936305732484078
+[2022/08/18 12:17:14] ppocr INFO: fps:2.765963539771157
+
+

4.4.4 模型预测

+

使用下面的命令进行预测:

+
1
+2
+3
+4
+5
# -c 后面的是RE任务的配置文件
+# -o 后面的字段是RE任务的配置
+# -c_ser 后面的是SER任务的配置文件
+# -c_ser 后面的字段是SER任务的配置
+python3 tools/infer_kie_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_trained/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=False -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_trained/best_accuracy
+
+

预测结果会保存在配置文件中的Global.save_res_path目录中。

+

部分预测结果如下所示。

+

+
    +
  • 注意:在预测时,使用的文本检测与识别结果为标注的结果,直接从json文件里面进行读取。
  • +
+

如果希望使用OCR引擎结果得到的结果进行推理,则可以使用下面的命令进行推理。

+
python3 tools/infer_kie_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=True -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy
+
+

如果希望构建基于你在垂类场景训练得到的OCR检测与识别模型,可以使用下面的方法传入,即可完成SER + RE的串联过程。

+
python3 tools/infer_kie_token_ser_re.py -c fapiao/re_vi_layoutxlm.yml -o Architecture.Backbone.checkpoints=fapiao/models/re_vi_layoutxlm_fapiao_udml/best_accuracy Global.infer_img=./train_data/zzsfp/val.json Global.infer_mode=True -c_ser fapiao/ser_vi_layoutxlm.yml -o_ser Architecture.Backbone.checkpoints=fapiao/models/ser_vi_layoutxlm_fapiao_udml/best_accuracy Global.kie_rec_model_dir="your_rec_model" Global.kie_det_model_dir="your_det_model"
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.html" "b/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..05466f41e8 --- /dev/null +++ "b/applications/\345\244\232\346\250\241\346\200\201\350\241\250\345\215\225\350\257\206\345\210\253.html" @@ -0,0 +1,6810 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 表单VQA - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + +

多模态表单识别

+

1 项目说明

+

计算机视觉在金融领域的应用覆盖文字识别、图像识别、视频识别等,其中文字识别(OCR)是金融领域中的核心AI能力,其应用覆盖客户服务、风险防控、运营管理等各项业务,针对的对象包括通用卡证票据识别(银行卡、身份证、营业执照等)、通用文本表格识别(印刷体、多语言、手写体等)以及一些金融特色票据凭证。通过因此如果能够在结构化信息提取时同时利用文字、页面布局等信息,便可增强不同版式下的泛化性。

+

表单识别旨在识别各种具有表格性质的证件、房产证、营业执照、个人信息表、发票等关键键值对(如姓名-张三),其广泛应用于银行、证券、公司财务等领域,具有很高的商业价值。本次范例项目开源了全流程表单识别方案,能够在多个场景快速实现迁移能力。表单识别通常存在以下难点:

+
    +
  • 人工摘录工作效率低;
  • +
  • 国内常见表单版式多;
  • +
  • 传统技术方案泛化效果不满足。
  • +
+

表单识别包含两大阶段:OCR阶段和文档视觉问答阶段。

+

其中,OCR阶段选取了PaddleOCR的PP-OCRv2模型,主要由文本检测和文本识别两个模块组成。DOC-VQA文档视觉问答阶段基于PaddleNLP自然语言处理算法库实现的LayoutXLM模型,支持基于多模态方法的语义实体识别(Semantic Entity Recognition, SER)以及关系抽取(Relation Extraction, RE)任务。本案例流程如 图1 所示:

+

+

注:欢迎再AIStudio领取免费算力体验线上实训,项目链接: 多模态表单识别

+

2 安装说明

+

下载PaddleOCR源码,上述AIStudio项目中已经帮大家打包好的PaddleOCR(已经修改好配置文件),无需下载解压即可,只需安装依赖环境~

+
unzip -q PaddleOCR.zip
+
+
1
+2
+3
# 如仍需安装or安装更新,可以执行以下步骤
+# git clone https://github.com/PaddlePaddle/PaddleOCR.git -b dygraph
+# git clone https://gitee.com/PaddlePaddle/PaddleOCR
+
+
1
+2
+3
+4
+5
+6
+7
# 安装依赖包
+pip install -U pip
+pip install -r /home/aistudio/PaddleOCR/requirements.txt
+pip install paddleocr
+
+pip install yacs gnureadline paddlenlp==2.2.1
+pip install xlsxwriter
+
+

3 数据准备

+

这里使用XFUN数据集做为实验数据集。 XFUN数据集是微软提出的一个用于KIE任务的多语言数据集,共包含七个数据集,每个数据集包含149张训练集和50张验证集

+

分别为:ZH(中文)、JA(日语)、ES(西班牙)、FR(法语)、IT(意大利)、DE(德语)、PT(葡萄牙)

+

本次实验选取中文数据集作为我们的演示数据集。法语数据集作为实践课程的数据集,数据集样例图如 图2 所示。

+

+

3.1 下载处理好的数据集

+

处理好的XFUND中文数据集下载地址:https://paddleocr.bj.bcebos.com/dataset/XFUND.tar ,可以运行如下指令完成中文数据集下载和解压。

+

+
1
+2
+3
+4
+5
+6
+7
+8
wget https://paddleocr.bj.bcebos.com/dataset/XFUND.tar
+tar -xf XFUND.tar
+
+# XFUN其他数据集使用下面的代码进行转换
+# 代码链接:https://github.com/PaddlePaddle/PaddleOCR/blob/release%2F2.4/ppstructure/vqa/helper/trans_xfun_data.py
+# %cd PaddleOCR
+# python3 ppstructure/vqa/tools/trans_xfun_data.py --ori_gt_path=path/to/json_path --output_path=path/to/save_path
+# %cd ../
+
+

运行上述指令后在 /home/aistudio/PaddleOCR/ppstructure/vqa/XFUND 目录下有2个文件夹,目录结构如下所示:

+
1
+2
+3
+4
+5
+6
+7
/home/aistudio/PaddleOCR/ppstructure/vqa/XFUND
+  └─ zh_train/                  训练集
+      ├── image/              图片存放文件夹
+      ├── xfun_normalize_train.json   标注信息
+  └─ zh_val/                    验证集
+      ├── image/          图片存放文件夹
+      ├── xfun_normalize_val.json     标注信息
+
+

该数据集的标注格式为

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
{
+    "height": 3508, # 图像高度
+    "width": 2480,  # 图像宽度
+    "ocr_info": [
+        {
+            "text": "邮政地址:",  # 单个文本内容
+            "label": "question", # 文本所属类别
+            "bbox": [261, 802, 483, 859], # 单个文本框
+            "id": 54,  # 文本索引
+            "linking": [[54, 60]], # 当前文本和其他文本的关系 [question, answer]
+            "words": []
+        },
+        {
+            "text": "湖南省怀化市市辖区",
+            "label": "answer",
+            "bbox": [487, 810, 862, 859],
+            "id": 60,
+            "linking": [[54, 60]],
+            "words": []
+        }
+    ]
+}
+
+

3.2 转换为PaddleOCR检测和识别格式

+

使用XFUND训练PaddleOCR检测和识别模型,需要将数据集格式改为训练需求的格式。

+

+

文本检测 标注文件格式如下,中间用'\t'分隔:

+

" 图像文件名 json.dumps编码的图像标注信息" +ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]]}, {...}]

+

json.dumps编码前的图像标注信息是包含多个字典的list,字典中的 points 表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。 transcription 表示当前文本框的文字,当其内容为“###”时,表示该文本框无效,在训练时会跳过。

+

文本识别 标注文件的格式如下, txt文件中默认请将图片路径和图片标签用'\t'分割,如用其他方式分割将造成训练报错。

+
1
+2
+3
+4
+5
" 图像文件名                 图像标注信息 "
+
+train_data/rec/train/word_001.jpg   简单可依赖
+train_data/rec/train/word_002.jpg   用科技让复杂的世界更简单
+...
+
+
unzip -q /home/aistudio/data/data140302/XFUND_ori.zip -d /home/aistudio/data/data140302/
+
+

已经提供转换脚本,执行如下代码即可转换成功:

+
1
+2
%cd /home/aistudio/
+python trans_xfund_data.py
+
+

4 OCR

+

选用飞桨OCR开发套件PaddleOCR中的PP-OCRv2模型进行文本检测和识别。PP-OCRv2在PP-OCR的基础上,进一步在5个方面重点优化,检测模型采用CML协同互学习知识蒸馏策略和CopyPaste数据增广策略;识别模型采用LCNet轻量级骨干网络、UDML 改进知识蒸馏策略和Enhanced CTC loss损失函数改进,进一步在推理速度和预测效果上取得明显提升。更多细节请参考PP-OCRv2技术报告

+

4.1 文本检测

+

我们使用2种方案进行训练、评估:

+
    +
  • PP-OCRv2中英文超轻量检测预训练模型
  • +
  • XFUND数据集+fine-tune
  • +
+

4.1.1 方案1:预训练模型

+
1)下载预训练模型
+

+

PaddleOCR已经提供了PP-OCR系列模型,部分模型展示如下表所示:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型简介模型名称推荐场景检测模型方向分类器识别模型
中英文超轻量PP-OCRv2模型(13.0M)ch_PP-OCRv2_xx移动端&服务器端推理模型 / 训练模型推理模型 / 预训练模型推理模型 / 训练模型
中英文超轻量PP-OCR mobile模型(9.4M)ch_ppocr_mobile_v2.0_xx移动端&服务器端推理模型 / 预训练模型推理模型 / 预训练模型推理模型 / 预训练模型
中英文通用PP-OCR server模型(143.4M)ch_ppocr_server_v2.0_xx服务器端推理模型 / 预训练模型推理模型 / 预训练模型推理模型 / 预训练模型
+

更多模型下载(包括多语言),可以参考PP-OCR 系列模型下载

+

这里我们使用PP-OCRv2中英文超轻量检测模型,下载并解压预训练模型:

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR/pretrain/
+wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_det_distill_train.tar
+tar -xf ch_PP-OCRv2_det_distill_train.tar && rm -rf ch_PP-OCRv2_det_distill_train.tar
+% cd ..
+
+
2)模型评估
+

+

接着使用下载的超轻量检测模型在XFUND验证集上进行评估,由于蒸馏需要包含多个网络,甚至多个Student网络,在计算指标的时候只需要计算一个Student网络的指标即可,key字段设置为Student则表示只计算Student网络的精度。

+
1
+2
+3
+4
+5
Metric:
+  name: DistillationMetric
+  base_metric_name: DetMetric
+  main_indicator: hmean
+  key: "Student"
+
+

首先修改配置文件configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml中的以下字段:

+
1
+2
Eval.dataset.data_dir:指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+

然后在XFUND验证集上进行评估,具体代码如下:

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR
+python tools/eval.py \
+    -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_distill.yml \
+    -o Global.checkpoints="./pretrain_models/ch_PP-OCRv2_det_distill_train/best_accuracy"
+
+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案hmeans
PP-OCRv2中英文超轻量检测预训练模型77.26%
+

使用文本检测预训练模型在XFUND验证集上评估,达到77%左右,充分说明ppocr提供的预训练模型具有泛化能力。

+

4.1.2 方案2:XFUND数据集+fine-tune

+

PaddleOCR提供的蒸馏预训练模型包含了多个模型的参数,我们提取Student模型的参数,在XFUND数据集上进行finetune,可以参考如下代码:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
import paddle
+# 加载预训练模型
+all_params = paddle.load("pretrain/ch_PP-OCRv2_det_distill_train/best_accuracy.pdparams")
+# 查看权重参数的keys
+# print(all_params.keys())
+# 学生模型的权重提取
+s_params = {key[len("student_model."):]: all_params[key] for key in all_params if "student_model." in key}
+# 查看学生模型权重参数的keys
+print(s_params.keys())
+# 保存
+paddle.save(s_params, "pretrain/ch_PP-OCRv2_det_distill_train/student.pdparams")
+
+
1)模型训练
+

+

修改配置文件configs/det/ch_PP-OCRv2_det_student.yml中的以下字段:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
Global.pretrained_model:指向预训练模型路径
+Train.dataset.data_dir:指向训练集图片存放目录
+Train.dataset.label_file_list:指向训练集标注文件
+Eval.dataset.data_dir:指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+Optimizer.lr.learning_rate:调整学习率,本实验设置为0.005
+Train.dataset.transforms.EastRandomCropData.size:训练尺寸改为[1600, 1600]
+Eval.dataset.transforms.DetResizeForTest:评估尺寸,添加如下参数
+       limit_side_len: 1600
+       limit_type: 'min'
+
+

执行下面命令启动训练:

+
1
+2
CUDA_VISIBLE_DEVICES=0 python tools/train.py \
+        -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml
+
+
2)模型评估
+

+

使用训练好的模型进行评估,更新模型路径Global.checkpoints

+

将训练完成的模型放置在对应目录下即可完成模型评估

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR/
+python tools/eval.py \
+    -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
+    -o Global.checkpoints="pretrain/ch_db_mv3-student1600-finetune/best_accuracy"
+
+

同时我们提供了未finetuen的模型,配置文件参数(pretrained_model设置为空,learning_rate 设置为0.001)

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR/
+python tools/eval.py \
+    -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
+    -o Global.checkpoints="pretrain/ch_db_mv3-student1600/best_accuracy"
+
+

使用训练好的模型进行评估,指标如下所示:

+ + + + + + + + + + + + + + + + + +
方案hmeans
XFUND数据集79.27%
XFUND数据集+fine-tune85.24%
+

对比仅使用XFUND数据集训练的模型,使用XFUND数据集+finetune训练,在验证集上评估达到85%左右,说明 finetune会提升垂类场景效果。

+
3)导出模型
+

+

在模型训练过程中保存的模型文件是包含前向预测和反向传播的过程,在实际的工业部署则不需要反向传播,因此需要将模型进行导成部署需要的模型格式。 执行下面命令,即可导出模型。

+
1
+2
+3
+4
+5
+6
+7
# 加载配置文件`ch_PP-OCRv2_det_student.yml`,从`pretrain/ch_db_mv3-student1600-finetune`目录下加载`best_accuracy`模型
+# inference模型保存在`./output/det_db_inference`目录下
+%cd /home/aistudio/PaddleOCR/
+python tools/export_model.py \
+    -c configs/det/ch_PP-OCRv2/ch_PP-OCRv2_det_student.yml \
+    -o Global.pretrained_model="pretrain/ch_db_mv3-student1600-finetune/best_accuracy" \
+    Global.save_inference_dir="./output/det_db_inference/"
+
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
/inference/rec_crnn/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+
4)模型预测
+

+

加载上面导出的模型,执行如下命令对验证集或测试集图片进行预测:

+
1
+2
+3
det_model_dir:预测模型
+image_dir:测试图片路径
+use_gpu:是否使用GPU
+
+

检测可视化结果保存在/home/aistudio/inference_results/目录下,查看检测效果。

+
1
+2
+3
+4
+5
+6
%pwd
+!python tools/infer/predict_det.py \
+    --det_algorithm="DB" \
+    --det_model_dir="./output/det_db_inference/" \
+    --image_dir="./doc/vqa/input/zh_val_21.jpg" \
+    --use_gpu=True
+
+

总结,我们分别使用PP-OCRv2中英文超轻量检测预训练模型、XFUND数据集+finetune2种方案进行评估、训练等,指标对比如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans结果分析
PP-OCRv2中英文超轻量检测预训练模型77.26%ppocr提供的预训练模型有泛化能力
XFUND数据集79.27%
XFUND数据集+finetune85.24%finetune会提升垂类场景效果
+

4.2 文本识别

+

我们分别使用如下3种方案进行训练、评估:

+
    +
  • PP-OCRv2中英文超轻量识别预训练模型
  • +
  • XFUND数据集+fine-tune
  • +
  • XFUND数据集+fine-tune+真实通用识别数据
  • +
+

4.2.1 方案1:预训练模型

+
1)下载预训练模型
+

+

我们使用PP-OCRv2中英文超轻量文本识别模型,下载并解压预训练模型:

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR/pretrain/
+wget https://paddleocr.bj.bcebos.com/PP-OCRv2/chinese/ch_PP-OCRv2_rec_train.tar
+tar -xf ch_PP-OCRv2_rec_train.tar && rm -rf ch_PP-OCRv2_rec_train.tar
+% cd ..
+
+
2)模型评估
+

+

首先修改配置文件configs/det/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml中的以下字段:

+
1
+2
Eval.dataset.data_dir:指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+

我们使用下载的预训练模型进行评估:

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
+    -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec_distillation.yml \
+    -o Global.checkpoints=./pretrain/ch_PP-OCRv2_rec_train/best_accuracy
+
+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案acc
PP-OCRv2中英文超轻量识别预训练模型67.48%
+

使用文本预训练模型在XFUND验证集上评估,acc达到67%左右,充分说明ppocr提供的预训练模型具有泛化能力。

+

4.2.2 方案2:XFUND数据集+finetune

+

同检测模型,我们提取Student模型的参数,在XFUND数据集上进行finetune,可以参考如下代码:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
import paddle
+# 加载预训练模型
+all_params = paddle.load("pretrain/ch_PP-OCRv2_rec_train/best_accuracy.pdparams")
+# 查看权重参数的keys
+print(all_params.keys())
+# 学生模型的权重提取
+s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
+# 查看学生模型权重参数的keys
+print(s_params.keys())
+# 保存
+paddle.save(s_params, "pretrain/ch_PP-OCRv2_rec_train/student.pdparams")
+
+
1)模型训练
+

+

修改配置文件configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml中的以下字段:

+
1
+2
+3
+4
+5
+6
+7
Global.pretrained_model:指向预训练模型路径
+Global.character_dict_path: 字典路径
+Optimizer.lr.values:学习率
+Train.dataset.data_dir:指向训练集图片存放目录
+Train.dataset.label_file_list:指向训练集标注文件
+Eval.dataset.data_dir:指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+

执行如下命令启动训练:

+
1
+2
+3
%cd /home/aistudio/PaddleOCR/
+CUDA_VISIBLE_DEVICES=0 python tools/train.py \
+        -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml
+
+
2)模型评估
+

+

使用训练好的模型进行评估,更新模型路径Global.checkpoints,这里为大家提供训练好的模型./pretrain/rec_mobile_pp-OCRv2-student-finetune/best_accuracy

+
1
+2
+3
+4
%cd /home/aistudio/PaddleOCR/
+CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
+    -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
+    -o Global.checkpoints=./pretrain/rec_mobile_pp-OCRv2-student-finetune/best_accuracy
+
+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案acc
XFUND数据集+finetune72.33%
+

使用XFUND数据集+finetune训练,在验证集上评估达到72%左右,说明 finetune会提升垂类场景效果。

+

4.2.3 方案3:XFUND数据集+finetune+真实通用识别数据

+

接着我们在上述XFUND数据集+finetune实验的基础上,添加真实通用识别数据,进一步提升识别效果。首先准备真实通用识别数据,并上传到AIStudio:

+
1)模型训练
+

+

在上述XFUND数据集+finetune实验中修改配置文件configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml的基础上,继续修改以下字段:

+
1
+2
Train.dataset.label_file_list:指向真实识别训练集图片存放目录
+Train.dataset.ratio_list:动态采样
+
+

执行如下命令启动训练:

+
1
+2
+3
%cd /home/aistudio/PaddleOCR/
+CUDA_VISIBLE_DEVICES=0 python tools/train.py \
+        -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml
+
+
2)模型评估
+

+

使用训练好的模型进行评估,更新模型路径Global.checkpoints

+
1
+2
+3
CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
+    -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
+    -o Global.checkpoints=./pretrain/rec_mobile_pp-OCRv2-student-realdata/best_accuracy
+
+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案acc
XFUND数据集+fine-tune+真实通用识别数据85.29%
+

使用XFUND数据集+finetune训练,在验证集上评估达到85%左右,说明真实通用识别数据对于性能提升很有帮助。

+
3)导出模型
+

+

导出模型只保留前向预测的过程:

+
1
+2
+3
+4
!python tools/export_model.py \
+    -c configs/rec/ch_PP-OCRv2/ch_PP-OCRv2_rec.yml \
+    -o Global.pretrained_model=pretrain/rec_mobile_pp-OCRv2-student-realdata/best_accuracy  \
+    Global.save_inference_dir=./output/rec_crnn_inference/
+
+
4)模型预测
+

+

加载上面导出的模型,执行如下命令对验证集或测试集图片进行预测,检测可视化结果保存在/home/aistudio/inference_results/目录下,查看检测、识别效果。需要通过--rec_char_dict_path指定使用的字典路径

+
1
+2
+3
+4
+5
+6
python tools/infer/predict_system.py \
+    --image_dir="./doc/vqa/input/zh_val_21.jpg" \
+    --det_model_dir="./output/det_db_inference/" \
+    --rec_model_dir="./output/rec_crnn_inference/" \
+    --rec_image_shape="3, 32, 320" \
+    --rec_char_dict_path="/home/aistudio/XFUND/word_dict.txt"
+
+

总结,我们分别使用PP-OCRv2中英文超轻量检测预训练模型、XFUND数据集+finetune2种方案进行评估、训练等,指标对比如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
方案acc结果分析
PP-OCRv2中英文超轻量识别预训练模型67.48%ppocr提供的预训练模型具有泛化能力
XFUND数据集+fine-tune72.33%finetune会提升垂类场景效果
XFUND数据集+fine-tune+真实通用识别数据85.29%真实通用识别数据对于性能提升很有帮助
+

5 文档视觉问答(DOC-VQA)

+

VQA指视觉问答,主要针对图像内容进行提问和回答,DOC-VQA是VQA任务中的一种,DOC-VQA主要针对文本图像的文字内容提出问题。

+

PaddleOCR中DOC-VQA系列算法基于PaddleNLP自然语言处理算法库实现LayoutXLM论文,支持基于多模态方法的 语义实体识别 (Semantic Entity Recognition, SER) 以及 关系抽取 (Relation Extraction, RE) 任务。

+

如果希望直接体验预测过程,可以下载我们提供的预训练模型,跳过训练过程,直接预测即可。

+
1
+2
+3
+4
+5
+6
%cd pretrain
+#下载SER模型
+wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar && tar -xvf ser_LayoutXLM_xfun_zh.tar
+#下载RE模型
+wget https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar && tar -xvf re_LayoutXLM_xfun_zh.tar
+%cd ../
+
+

5.1 SER

+

SER: 语义实体识别 (Semantic Entity Recognition), 可以完成对图像中的文本识别与分类。

+

+

图19 中不同颜色的框表示不同的类别,对于XFUND数据集,有QUESTION, ANSWER, HEADER 3种类别

+
    +
  • 深紫色:HEADER
  • +
  • 浅紫色:QUESTION
  • +
  • 军绿色:ANSWER
  • +
+

在OCR检测框的左上方也标出了对应的类别和OCR识别结果。

+

5.1.1 模型训练

+

+

启动训练之前,需要修改配置文件 configs/vqa/ser/layoutxlm.yml 以下四个字段:

+
1
+2
+3
+4
Train.dataset.data_dir:指向训练集图片存放目录
+Train.dataset.label_file_list:指向训练集标注文件
+Eval.dataset.data_dir:指指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+
1
+2
%cd /home/aistudio/PaddleOCR/
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/vqa/ser/layoutxlm.yml
+
+

最终会打印出precision, recall, hmean等指标。 在./output/ser_layoutxlm/文件夹中会保存训练日志,最优的模型和最新epoch的模型。

+

5.1.2 模型评估

+

+

我们使用下载的预训练模型进行评估,如果使用自己训练好的模型进行评估,将待评估的模型所在文件夹路径赋值给 Architecture.Backbone.checkpoints 字段即可。

+
1
+2
+3
CUDA_VISIBLE_DEVICES=0 python tools/eval.py \
+    -c configs/vqa/ser/layoutxlm.yml \
+    -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+
+

最终会打印出precision, recall, hmean等指标,预训练模型评估指标如下:

+

+

5.1.3 模型预测

+

+

使用如下命令即可完成OCR引擎 + SER的串联预测, 以SER预训练模型为例:

+
1
+2
+3
+4
CUDA_VISIBLE_DEVICES=0 python tools/infer_vqa_token_ser.py \
+    -c configs/vqa/ser/layoutxlm.yml  \
+    -o Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/ \
+    Global.infer_img=doc/vqa/input/zh_val_42.jpg
+
+

最终会在config.Global.save_res_path字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为infer_results.txt。通过如下命令查看预测图片:

+
1
+2
+3
+4
+5
+6
+7
+8
import cv2
+from matplotlib import pyplot as plt
+# 在notebook中使用matplotlib.pyplot绘图时,需要添加该命令进行显示
+%matplotlib inline
+
+img = cv2.imread('output/ser/zh_val_42_ser.jpg')
+plt.figure(figsize=(48,24))
+plt.imshow(img)
+
+

5.2 RE

+

基于 RE 任务,可以完成对图象中的文本内容的关系提取,如判断问题对(pair)。

+

+

图中红色框表示问题,蓝色框表示答案,问题和答案之间使用绿色线连接。在OCR检测框的左上方也标出了对应的类别和OCR识别结果。

+

5.2.1 模型训练

+

+

启动训练之前,需要修改配置文件configs/vqa/re/layoutxlm.yml中的以下四个字段

+
1
+2
+3
+4
Train.dataset.data_dir:指向训练集图片存放目录
+Train.dataset.label_file_list:指向训练集标注文件
+Eval.dataset.data_dir:指指向验证集图片存放目录
+Eval.dataset.label_file_list:指向验证集标注文件
+
+
CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/vqa/re/layoutxlm.yml
+
+

最终会打印出precision, recall, hmean等指标。 在./output/re_layoutxlm/文件夹中会保存训练日志,最优的模型和最新epoch的模型

+

5.2.2 模型评估

+

+

我们使用下载的预训练模型进行评估,如果使用自己训练好的模型进行评估,将待评估的模型所在文件夹路径赋值给 Architecture.Backbone.checkpoints 字段即可。

+
1
+2
+3
CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py \
+    -c configs/vqa/re/layoutxlm.yml \
+    -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/
+
+

最终会打印出precision, recall, hmean等指标,预训练模型评估指标如下:

+

+

5.2.3 模型预测

+

+

使用如下命令即可完成OCR引擎 + SER + RE的串联预测, 以预训练SER和RE模型为例,

+

最终会在config.Global.save_res_path字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为infer_results.txt。

+
1
+2
+3
+4
+5
+6
+7
cd /home/aistudio/PaddleOCR
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer_vqa_token_ser_re.py \
+    -c configs/vqa/re/layoutxlm.yml \
+    -o Architecture.Backbone.checkpoints=pretrain/re_LayoutXLM_xfun_zh/ \
+    Global.infer_img=test_imgs/ \
+    -c_ser configs/vqa/ser/layoutxlm.yml \
+    -o_ser Architecture.Backbone.checkpoints=pretrain/ser_LayoutXLM_xfun_zh/
+
+

最终会在config.Global.save_res_path字段所配置的目录下保存预测结果可视化图像以及预测结果文本文件,预测结果文本文件名为infer_results.txt, 每一行表示一张图片的结果,每张图片的结果如下所示,前面表示测试图片路径,后面为测试结果:key字段及对应的value字段。

+
test_imgs/t131.jpg  {"政治面税": "群众", "性别": "男", "籍贯": "河北省邯郸市", "婚姻状况": "亏末婚口已婚口已娇", "通讯地址": "邯郸市阳光苑7号楼003", "民族": "汉族", "毕业院校": "河南工业大学", "户口性质": "口农村城镇", "户口地址": "河北省邯郸市", "联系电话": "13288888888", "健康状况": "健康", "姓名": "小六", "好高cm": "180", "出生年月": "1996年8月9日", "文化程度": "本科", "身份证号码": "458933777777777777"}
+
+

展示预测结果

+
1
+2
+3
+4
+5
+6
+7
import cv2
+from matplotlib import pyplot as plt
+%matplotlib inline
+
+img = cv2.imread('./output/re/t131_ser.jpg')
+plt.figure(figsize=(48,24))
+plt.imshow(img)
+
+

6 导出Excel

+

+

为了输出信息匹配对,我们修改tools/infer_vqa_token_ser_re.py文件中的line 194-197

+
1
+2
+3
+4
 fout.write(img_path + "\t" + json.dumps(
+                {
+                    "ser_result": result,
+                }, ensure_ascii=False) + "\n")
+
+

更改为

+
1
+2
+3
+4
+5
+6
result_key = {}
+for ocr_info_head, ocr_info_tail in result:
+    result_key[ocr_info_head['text']] = ocr_info_tail['text']
+
+fout.write(img_path + "\t" + json.dumps(
+    result_key, ensure_ascii=False) + "\n")
+
+

同时将输出结果导出到Excel中,效果如 图28 所示: +

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
import json
+import xlsxwriter as xw
+
+workbook = xw.Workbook('output/re/infer_results.xlsx')
+format1 = workbook.add_format({
+    'align': 'center',
+    'valign': 'vcenter',
+    'text_wrap': True,
+})
+worksheet1 = workbook.add_worksheet('sheet1')
+worksheet1.activate()
+title = ['姓名', '性别', '民族', '文化程度', '身份证号码', '联系电话', '通讯地址']
+worksheet1.write_row('A1', title)
+i = 2
+
+with open('output/re/infer_results.txt', 'r', encoding='utf-8') as fin:
+    lines = fin.readlines()
+    for line in lines:
+        img_path, result = line.strip().split('\t')
+        result_key = json.loads(result)
+        # 写入Excel
+        row_data = [result_key['姓名'], result_key['性别'], result_key['民族'], result_key['文化程度'], result_key['身份证号码'],
+                    result_key['联系电话'], result_key['通讯地址']]
+        row = 'A' + str(i)
+        worksheet1.write_row(row, row_data, format1)
+        i+=1
+workbook.close()
+
+

更多资源

+ +

参考链接

+ + + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\345\277\253\351\200\237\346\236\204\345\273\272\345\215\241\350\257\201\347\261\273OCR.html" "b/applications/\345\277\253\351\200\237\346\236\204\345\273\272\345\215\241\350\257\201\347\261\273OCR.html" new file mode 100644 index 0000000000..61c483dcc5 --- /dev/null +++ "b/applications/\345\277\253\351\200\237\346\236\204\345\273\272\345\215\241\350\257\201\347\261\273OCR.html" @@ -0,0 +1,6644 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 通用卡证识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

快速构建卡证类OCR

+

1. 金融行业卡证识别应用

+

1.1 金融行业中的OCR相关技术

+

《“十四五”数字经济发展规划》指出,2020年我国数字经济核心产业增加值占GDP比重达7.8%,随着数字经济迈向全面扩展,到2025年该比例将提升至10%。

+

在过去数年的跨越发展与积累沉淀中,数字金融、金融科技已在对金融业的重塑与再造中充分印证了其自身价值。

+

以智能为目标,提升金融数字化水平,实现业务流程自动化,降低人力成本。

+

+

1.2 金融行业中的卡证识别场景介绍

+

应用场景:身份证、银行卡、营业执照、驾驶证等。

+

应用难点:由于数据的采集来源多样,以及实际采集数据各种噪声:反光、褶皱、模糊、倾斜等各种问题干扰。

+

+

1.3 OCR落地挑战

+

+

2. 卡证识别技术解析

+

+

2.1 卡证分类模型

+

卡证分类:基于PPLCNet

+

与其他轻量级模型相比在CPU环境下ImageNet数据集上的表现

+

+

+

模型来自模型库PaddleClas,它是一个图像识别和图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。

+

2.2 卡证识别模型

+

检测:DBNet 识别:SVRT

+

+

PPOCRv3在文本检测、识别进行了一系列改进优化,在保证精度的同时提升预测效率

+

+

+

3. OCR技术拆解

+

3.1技术流程

+

+

3.2 OCR技术拆解---卡证分类

+

卡证分类:数据、模型准备

+

A 使用爬虫获取无标注数据,将相同类别的放在同一文件夹下,文件名从0开始命名。具体格式如下图所示。

+

​注:卡证类数据,建议每个类别数据量在500张以上

+

+

B 一行命令生成标签文件

+
tree -r -i -f | grep -E "jpg|JPG|jpeg|JPEG|png|PNG|webp" | awk -F "/" '{print $0" "$2}' > train_list.txt
+
+

C 下载预训练模型

+

卡证分类---修改配置文件

+

配置文件主要修改三个部分:

+
    +
  • 全局参数:预训练模型路径/训练轮次/图像尺寸
  • +
  • 模型结构:分类数
  • +
  • 数据处理:训练/评估数据路径
  • +
+

+

卡证分类---训练

+

指定配置文件启动训练:

+
!python /home/aistudio/work/PaddleClas/tools/train.py -c   /home/aistudio/work/PaddleClas/ppcls/configs/PULC/text_image_orientation/PPLCNet_x1_0.yaml
+
+

+

​注:日志中显示了训练结果和评估结果(训练时可以设置固定轮数评估一次)

+

3.2 OCR技术拆解---卡证识别

+

卡证识别(以身份证检测为例) +存在的困难及问题:

+
    +
  • +

    在自然场景下,由于各种拍摄设备以及光线、角度不同等影响导致实际得到的证件影像千差万别。

    +
  • +
  • +

    如何快速提取需要的关键信息

    +
  • +
  • +

    多行的文本信息,检测结果如何正确拼接

    +
  • +
+

+
    +
  • +

    OCR技术拆解---OCR工具库

    +

    PaddleOCR是一个丰富、领先且实用的OCR工具库,助力开发者训练出更好的模型并应用落地

    +
  • +
+

身份证识别:用现有的方法识别

+

+

身份证识别:检测+分类

+
+

方法:基于现有的dbnet检测模型,加入分类方法。检测同时进行分类,从一定程度上优化识别流程

+
+

+

+

数据标注

+

使用PaddleOCRLable进行快速标注

+

+
    +
  • 修改PPOCRLabel.py,将下图中的kie参数设置为True
  • +
+

+
    +
  • 数据标注踩坑分享
  • +
+

+

​ 注:两者只有标注有差别,训练参数数据集都相同

+

4 . 项目实践

+

AIStudio项目链接:快速构建卡证类OCR

+

4.1 环境准备

+

1)拉取paddleocr项目,如果从github上拉取速度慢可以选择从gitee上获取。

+
!git clone https://github.com/PaddlePaddle/PaddleOCR.git  -b release/2.6  /home/aistudio/work/
+
+

2)获取并解压预训练模型,如果要使用其他模型可以从模型库里自主选择合适模型。

+
1
+2
!wget -P work/pre_trained/   https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
+!tar -vxf /home/aistudio/work/pre_trained/ch_PP-OCRv3_det_distill_train.tar -C /home/aistudio/work/pre_trained
+
+

3)安装必要依赖

+
!pip install -r /home/aistudio/work/requirements.txt
+
+

4.2 配置文件修改

+

修改配置文件 work/configs/det/detmv3db.yml

+

具体修改说明如下:

+

+

注:在上述的配置文件的Global变量中需要添加以下两个参数:

+

​ - label_list 为标签表 +​ - num_classes 为分类数 +​上述两个参数根据实际的情况配置即可

+

+

其中lable_list内容如下例所示,建议第一个参数设置为 background,不要设置为实际要提取的关键信息种类

+

+

配置文件中的其他设置说明

+

+

+

+

4.3 代码修改

+

4.3.1 数据读取

+

修改 PaddleOCR/ppocr/data/imaug/label_ops.py中的DetLabelEncode

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
class DetLabelEncode(object):
+
+    # 修改检测标签的编码处,新增了参数分类数:num_classes,重写初始化方法,以及分类标签的读取
+
+    def __init__(self, label_list, num_classes=8, **kwargs):
+        self.num_classes = num_classes
+        self.label_list = []
+        if label_list:
+            if isinstance(label_list, str):
+                with open(label_list, 'r+', encoding='utf-8') as f:
+                    for line in f.readlines():
+                        self.label_list.append(line.replace("\n", ""))
+            else:
+                self.label_list = label_list
+        else:
+            assert ' please check label_list whether it is none or config is right'
+
+        if num_classes != len(self.label_list): # 校验分类数和标签的一致性
+            assert 'label_list length is not equal to the num_classes'
+
+    def __call__(self, data):
+        label = data['label']
+        label = json.loads(label)
+        nBox = len(label)
+        boxes, txts, txt_tags, classes = [], [], [], []
+        for bno in range(0, nBox):
+            box = label[bno]['points']
+            txt = label[bno]['key_cls']  # 此处将kie中的参数作为分类读取
+            boxes.append(box)
+            txts.append(txt)
+
+            if txt in ['*', '###']:
+                txt_tags.append(True)
+                if self.num_classes > 1:
+                    classes.append(-2)
+            else:
+                txt_tags.append(False)
+                if self.num_classes > 1:  # 将KIE内容的key标签作为分类标签使用
+                    classes.append(int(self.label_list.index(txt)))
+
+        if len(boxes) == 0:
+
+            return None
+        boxes = self.expand_points_num(boxes)
+        boxes = np.array(boxes, dtype=np.float32)
+        txt_tags = np.array(txt_tags, dtype=np.bool_)
+        classes = classes
+        data['polys'] = boxes
+        data['texts'] = txts
+        data['ignore_tags'] = txt_tags
+        if self.num_classes > 1:
+            data['classes'] = classes
+        return data
+
+

修改PaddleOCR/ppocr/data/imaug/make_shrink_map.py中的MakeShrinkMap类。这里需要注意的是,如果我们设置的label_list中的第一个参数为要检测的信息那么会得到如下的mask,

+

举例说明: +这是检测的mask图,图中有四个mask那么实际对应的分类应该是4类

+

+

label_list中第一个为关键分类,则得到的分类Mask实际如下,与上图相比,少了一个box:

+

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
class MakeShrinkMap(object):
+    r'''
+    Making binary mask from detection data with ICDAR format.
+    Typically following the process of class `MakeICDARData`.
+    '''
+
+    def __init__(self, min_text_size=8, shrink_ratio=0.4, num_classes=8, **kwargs):
+        self.min_text_size = min_text_size
+        self.shrink_ratio = shrink_ratio
+        self.num_classes = num_classes  #  添加了分类
+
+    def __call__(self, data):
+        image = data['image']
+        text_polys = data['polys']
+        ignore_tags = data['ignore_tags']
+        if self.num_classes > 1:
+            classes = data['classes']
+
+        h, w = image.shape[:2]
+        text_polys, ignore_tags = self.validate_polygons(text_polys,
+                                                         ignore_tags, h, w)
+        gt = np.zeros((h, w), dtype=np.float32)
+        mask = np.ones((h, w), dtype=np.float32)
+        gt_class = np.zeros((h, w), dtype=np.float32)  # 新增分类
+        for i in range(len(text_polys)):
+            polygon = text_polys[i]
+            height = max(polygon[:, 1]) - min(polygon[:, 1])
+            width = max(polygon[:, 0]) - min(polygon[:, 0])
+            if ignore_tags[i] or min(height, width) < self.min_text_size:
+                cv2.fillPoly(mask,
+                             polygon.astype(np.int32)[np.newaxis, :, :], 0)
+                ignore_tags[i] = True
+            else:
+                polygon_shape = Polygon(polygon)
+                subject = [tuple(l) for l in polygon]
+                padding = pyclipper.PyclipperOffset()
+                padding.AddPath(subject, pyclipper.JT_ROUND,
+                                pyclipper.ET_CLOSEDPOLYGON)
+                shrinked = []
+
+                # Increase the shrink ratio every time we get multiple polygon returned back
+                possible_ratios = np.arange(self.shrink_ratio, 1,
+                                            self.shrink_ratio)
+                np.append(possible_ratios, 1)
+                for ratio in possible_ratios:
+                    distance = polygon_shape.area * (
+                        1 - np.power(ratio, 2)) / polygon_shape.length
+                    shrinked = padding.Execute(-distance)
+                    if len(shrinked) == 1:
+                        break
+
+                if shrinked == []:
+                    cv2.fillPoly(mask,
+                                 polygon.astype(np.int32)[np.newaxis, :, :], 0)
+                    ignore_tags[i] = True
+                    continue
+
+                for each_shirnk in shrinked:
+                    shirnk = np.array(each_shirnk).reshape(-1, 2)
+                    cv2.fillPoly(gt, [shirnk.astype(np.int32)], 1)
+                    if self.num_classes > 1:  # 绘制分类的mask
+                        cv2.fillPoly(gt_class, polygon.astype(np.int32)[np.newaxis, :, :], classes[i])
+
+
+        data['shrink_map'] = gt
+
+        if self.num_classes > 1:
+            data['class_mask'] = gt_class
+
+        data['shrink_mask'] = mask
+        return data
+
+

由于在训练数据中会对数据进行resize设置,yml中的操作为:EastRandomCropData,所以需要修改PaddleOCR/ppocr/data/imaug/random_crop_data.py中的EastRandomCropData

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
class EastRandomCropData(object):
+    def __init__(self,
+                 size=(640, 640),
+                 max_tries=10,
+                 min_crop_side_ratio=0.1,
+                 keep_ratio=True,
+                 num_classes=8,
+                 **kwargs):
+        self.size = size
+        self.max_tries = max_tries
+        self.min_crop_side_ratio = min_crop_side_ratio
+        self.keep_ratio = keep_ratio
+        self.num_classes = num_classes
+
+    def __call__(self, data):
+        img = data['image']
+        text_polys = data['polys']
+        ignore_tags = data['ignore_tags']
+        texts = data['texts']
+        if self.num_classes > 1:
+            classes = data['classes']
+        all_care_polys = [
+            text_polys[i] for i, tag in enumerate(ignore_tags) if not tag
+        ]
+        # 计算crop区域
+        crop_x, crop_y, crop_w, crop_h = crop_area(
+            img, all_care_polys, self.min_crop_side_ratio, self.max_tries)
+        # crop 图片 保持比例填充
+        scale_w = self.size[0] / crop_w
+        scale_h = self.size[1] / crop_h
+        scale = min(scale_w, scale_h)
+        h = int(crop_h * scale)
+        w = int(crop_w * scale)
+        if self.keep_ratio:
+            padimg = np.zeros((self.size[1], self.size[0], img.shape[2]),
+                              img.dtype)
+            padimg[:h, :w] = cv2.resize(
+                img[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w], (w, h))
+            img = padimg
+        else:
+            img = cv2.resize(
+                img[crop_y:crop_y + crop_h, crop_x:crop_x + crop_w],
+                tuple(self.size))
+        # crop 文本框
+        text_polys_crop = []
+        ignore_tags_crop = []
+        texts_crop = []
+        classes_crop = []
+        for poly, text, tag,class_index in zip(text_polys, texts, ignore_tags,classes):
+            poly = ((poly - (crop_x, crop_y)) * scale).tolist()
+            if not is_poly_outside_rect(poly, 0, 0, w, h):
+                text_polys_crop.append(poly)
+                ignore_tags_crop.append(tag)
+                texts_crop.append(text)
+                if self.num_classes > 1:
+                    classes_crop.append(class_index)
+        data['image'] = img
+        data['polys'] = np.array(text_polys_crop)
+        data['ignore_tags'] = ignore_tags_crop
+        data['texts'] = texts_crop
+        if self.num_classes > 1:
+            data['classes'] = classes_crop
+        return data
+
+

4.3.2 head修改

+

主要修改ppocr/modeling/heads/det_db_head.py,将Head类中的最后一层的输出修改为实际的分类数,同时在DBHead中新增分类的head。

+

+

4.3.3 修改loss

+

修改PaddleOCR/ppocr/losses/det_db_loss.py中的DBLoss类,分类采用交叉熵损失函数进行计算。

+

+

4.3.4 后处理

+

由于涉及到eval以及后续推理能否正常使用,我们需要修改后处理的相关代码,修改位置PaddleOCR/ppocr/postprocess/db_postprocess.py中的DBPostProcess类

+
  1
+  2
+  3
+  4
+  5
+  6
+  7
+  8
+  9
+ 10
+ 11
+ 12
+ 13
+ 14
+ 15
+ 16
+ 17
+ 18
+ 19
+ 20
+ 21
+ 22
+ 23
+ 24
+ 25
+ 26
+ 27
+ 28
+ 29
+ 30
+ 31
+ 32
+ 33
+ 34
+ 35
+ 36
+ 37
+ 38
+ 39
+ 40
+ 41
+ 42
+ 43
+ 44
+ 45
+ 46
+ 47
+ 48
+ 49
+ 50
+ 51
+ 52
+ 53
+ 54
+ 55
+ 56
+ 57
+ 58
+ 59
+ 60
+ 61
+ 62
+ 63
+ 64
+ 65
+ 66
+ 67
+ 68
+ 69
+ 70
+ 71
+ 72
+ 73
+ 74
+ 75
+ 76
+ 77
+ 78
+ 79
+ 80
+ 81
+ 82
+ 83
+ 84
+ 85
+ 86
+ 87
+ 88
+ 89
+ 90
+ 91
+ 92
+ 93
+ 94
+ 95
+ 96
+ 97
+ 98
+ 99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+222
+223
+224
+225
+226
class DBPostProcess(object):
+    """
+    The post process for Differentiable Binarization (DB).
+    """
+
+    def __init__(self,
+                 thresh=0.3,
+                 box_thresh=0.7,
+                 max_candidates=1000,
+                 unclip_ratio=2.0,
+                 use_dilation=False,
+                 score_mode="fast",
+                 **kwargs):
+        self.thresh = thresh
+        self.box_thresh = box_thresh
+        self.max_candidates = max_candidates
+        self.unclip_ratio = unclip_ratio
+        self.min_size = 3
+        self.score_mode = score_mode
+        assert score_mode in [
+            "slow", "fast"
+        ], "Score mode must be in [slow, fast] but got: {}".format(score_mode)
+
+        self.dilation_kernel = None if not use_dilation else np.array(
+            [[1, 1], [1, 1]])
+
+    def boxes_from_bitmap(self, pred, _bitmap, classes, dest_width, dest_height):
+        """
+        _bitmap: single map with shape (1, H, W),
+                whose values are binarized as {0, 1}
+        """
+
+        bitmap = _bitmap
+        height, width = bitmap.shape
+
+        outs = cv2.findContours((bitmap * 255).astype(np.uint8), cv2.RETR_LIST,
+                                cv2.CHAIN_APPROX_SIMPLE)
+        if len(outs) == 3:
+            img, contours, _ = outs[0], outs[1], outs[2]
+        elif len(outs) == 2:
+            contours, _ = outs[0], outs[1]
+
+        num_contours = min(len(contours), self.max_candidates)
+
+        boxes = []
+        scores = []
+        class_indexes = []
+        class_scores = []
+        for index in range(num_contours):
+            contour = contours[index]
+            points, sside = self.get_mini_boxes(contour)
+            if sside < self.min_size:
+                continue
+            points = np.array(points)
+            if self.score_mode == "fast":
+                score, class_index, class_score = self.box_score_fast(pred, points.reshape(-1, 2), classes)
+            else:
+                score, class_index, class_score = self.box_score_slow(pred, contour, classes)
+            if self.box_thresh > score:
+                continue
+
+            box = self.unclip(points).reshape(-1, 1, 2)
+            box, sside = self.get_mini_boxes(box)
+            if sside < self.min_size + 2:
+                continue
+            box = np.array(box)
+
+            box[:, 0] = np.clip(
+                np.round(box[:, 0] / width * dest_width), 0, dest_width)
+            box[:, 1] = np.clip(
+                np.round(box[:, 1] / height * dest_height), 0, dest_height)
+
+            boxes.append(box.astype(np.int16))
+            scores.append(score)
+
+            class_indexes.append(class_index)
+            class_scores.append(class_score)
+
+        if classes is None:
+            return np.array(boxes, dtype=np.int16), scores
+        else:
+            return np.array(boxes, dtype=np.int16), scores, class_indexes, class_scores
+
+    def unclip(self, box):
+        unclip_ratio = self.unclip_ratio
+        poly = Polygon(box)
+        distance = poly.area * unclip_ratio / poly.length
+        offset = pyclipper.PyclipperOffset()
+        offset.AddPath(box, pyclipper.JT_ROUND, pyclipper.ET_CLOSEDPOLYGON)
+        expanded = np.array(offset.Execute(distance))
+        return expanded
+
+    def get_mini_boxes(self, contour):
+        bounding_box = cv2.minAreaRect(contour)
+        points = sorted(list(cv2.boxPoints(bounding_box)), key=lambda x: x[0])
+
+        index_1, index_2, index_3, index_4 = 0, 1, 2, 3
+        if points[1][1] > points[0][1]:
+            index_1 = 0
+            index_4 = 1
+        else:
+            index_1 = 1
+            index_4 = 0
+        if points[3][1] > points[2][1]:
+            index_2 = 2
+            index_3 = 3
+        else:
+            index_2 = 3
+            index_3 = 2
+
+        box = [
+            points[index_1], points[index_2], points[index_3], points[index_4]
+        ]
+        return box, min(bounding_box[1])
+
+    def box_score_fast(self, bitmap, _box, classes):
+        '''
+        box_score_fast: use bbox mean score as the mean score
+        '''
+        h, w = bitmap.shape[:2]
+        box = _box.copy()
+        xmin = np.clip(np.floor(box[:, 0].min()).astype(np.int32), 0, w - 1)
+        xmax = np.clip(np.ceil(box[:, 0].max()).astype(np.int32), 0, w - 1)
+        ymin = np.clip(np.floor(box[:, 1].min()).astype(np.int32), 0, h - 1)
+        ymax = np.clip(np.ceil(box[:, 1].max()).astype(np.int32), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+        box[:, 0] = box[:, 0] - xmin
+        box[:, 1] = box[:, 1] - ymin
+        cv2.fillPoly(mask, box.reshape(1, -1, 2).astype(np.int32), 1)
+
+        if classes is None:
+            return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], None, None
+        else:
+            k = 999
+            class_mask = np.full((ymax - ymin + 1, xmax - xmin + 1), k, dtype=np.int32)
+
+            cv2.fillPoly(class_mask, box.reshape(1, -1, 2).astype(np.int32), 0)
+            classes = classes[ymin:ymax + 1, xmin:xmax + 1]
+
+            new_classes = classes + class_mask
+            a = new_classes.reshape(-1)
+            b = np.where(a >= k)
+            classes = np.delete(a, b[0].tolist())
+
+            class_index = np.argmax(np.bincount(classes))
+            class_score = np.sum(classes == class_index) / len(classes)
+
+            return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], class_index, class_score
+
+    def box_score_slow(self, bitmap, contour, classes):
+        """
+        box_score_slow: use polyon mean score as the mean score
+        """
+        h, w = bitmap.shape[:2]
+        contour = contour.copy()
+        contour = np.reshape(contour, (-1, 2))
+
+        xmin = np.clip(np.min(contour[:, 0]), 0, w - 1)
+        xmax = np.clip(np.max(contour[:, 0]), 0, w - 1)
+        ymin = np.clip(np.min(contour[:, 1]), 0, h - 1)
+        ymax = np.clip(np.max(contour[:, 1]), 0, h - 1)
+
+        mask = np.zeros((ymax - ymin + 1, xmax - xmin + 1), dtype=np.uint8)
+
+        contour[:, 0] = contour[:, 0] - xmin
+        contour[:, 1] = contour[:, 1] - ymin
+
+        cv2.fillPoly(mask, contour.reshape(1, -1, 2).astype(np.int32), 1)
+
+        if classes is None:
+            return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], None, None
+        else:
+            k = 999
+            class_mask = np.full((ymax - ymin + 1, xmax - xmin + 1), k, dtype=np.int32)
+
+            cv2.fillPoly(class_mask, contour.reshape(1, -1, 2).astype(np.int32), 0)
+            classes = classes[ymin:ymax + 1, xmin:xmax + 1]
+
+            new_classes = classes + class_mask
+            a = new_classes.reshape(-1)
+            b = np.where(a >= k)
+            classes = np.delete(a, b[0].tolist())
+
+            class_index = np.argmax(np.bincount(classes))
+            class_score = np.sum(classes == class_index) / len(classes)
+
+            return cv2.mean(bitmap[ymin:ymax + 1, xmin:xmax + 1], mask)[0], class_index, class_score
+
+    def __call__(self, outs_dict, shape_list):
+        pred = outs_dict['maps']
+        if isinstance(pred, paddle.Tensor):
+            pred = pred.numpy()
+        pred = pred[:, 0, :, :]
+        segmentation = pred > self.thresh
+
+        if "classes" in outs_dict:
+            classes = outs_dict['classes']
+            if isinstance(classes, paddle.Tensor):
+                classes = classes.numpy()
+            classes = classes[:, 0, :, :]
+
+        else:
+            classes = None
+
+        boxes_batch = []
+        for batch_index in range(pred.shape[0]):
+            src_h, src_w, ratio_h, ratio_w = shape_list[batch_index]
+            if self.dilation_kernel is not None:
+                mask = cv2.dilate(
+                    np.array(segmentation[batch_index]).astype(np.uint8),
+                    self.dilation_kernel)
+            else:
+                mask = segmentation[batch_index]
+
+            if classes is None:
+                boxes, scores = self.boxes_from_bitmap(pred[batch_index], mask, None,
+                                                       src_w, src_h)
+                boxes_batch.append({'points': boxes})
+            else:
+                boxes, scores, class_indexes, class_scores = self.boxes_from_bitmap(pred[batch_index], mask,
+                                                                                      classes[batch_index],
+                                                                                      src_w, src_h)
+                boxes_batch.append({'points': boxes, "classes": class_indexes, "class_scores": class_scores})
+
+        return boxes_batch
+
+

4.4. 模型启动

+

在完成上述步骤后我们就可以正常启动训练

+
!python /home/aistudio/work/PaddleOCR/tools/train.py  -c  /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
+
+

其他命令:

+
1
+2
!python /home/aistudio/work/PaddleOCR/tools/eval.py  -c  /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
+!python /home/aistudio/work/PaddleOCR/tools/infer_det.py  -c  /home/aistudio/work/PaddleOCR/configs/det/det_mv3_db.yml
+
+

模型推理

+
!python /home/aistudio/work/PaddleOCR/tools/infer/predict_det.py --image_dir="/home/aistudio/work/test_img/" --det_model_dir="/home/aistudio/work/PaddleOCR/output/infer"
+
+

5 总结

+
    +
  1. 分类+检测在一定程度上能够缩短用时,具体的模型选取要根据业务场景恰当选择。
  2. +
  3. 数据标注需要多次进行测试调整标注方法,一般进行检测模型微调,需要标注至少上百张。
  4. +
  5. 设置合理的batch_size以及resize大小,同时注意lr设置。
  6. +
+

References

+
    +
  1. https://github.com/PaddlePaddle/PaddleOCR
  2. +
  3. https://github.com/PaddlePaddle/PaddleClas
  4. +
  5. https://blog.csdn.net/YY007H/article/details/124491217
  6. +
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\346\211\213\345\206\231\346\226\207\345\255\227\350\257\206\345\210\253.html" "b/applications/\346\211\213\345\206\231\346\226\207\345\255\227\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..597f3f2b62 --- /dev/null +++ "b/applications/\346\211\213\345\206\231\346\226\207\345\255\227\350\257\206\345\210\253.html" @@ -0,0 +1,5598 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 手写体识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

基于PP-OCRv3的手写文字识别

+

1. 项目背景及意义

+

目前光学字符识别(OCR)技术在我们的生活当中被广泛使用,但是大多数模型在通用场景下的准确性还有待提高。针对于此我们借助飞桨提供的PaddleOCR套件较容易的实现了在垂类场景下的应用。手写体在日常生活中较为常见,然而手写体的识别却存在着很大的挑战,因为每个人的手写字体风格不一样,这对于视觉模型来说还是相当有挑战的。因此训练一个手写体识别模型具有很好的现实意义。下面给出一些手写体的示例图:

+

example

+

2. 项目内容

+

本项目基于PaddleOCR套件,以PP-OCRv3识别模型为基础,针对手写文字识别场景进行优化。

+

Aistudio项目链接:OCR手写文字识别

+

3. PP-OCRv3识别算法介绍

+

PP-OCRv3的识别模块是基于文本识别算法SVTR优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。如下图所示,PP-OCRv3采用了6个优化策略。

+

v3_rec

+

优化策略汇总如下:

+
    +
  • SVTR_LCNet:轻量级文本识别网络
  • +
  • GTC:Attention指导CTC训练策略
  • +
  • TextConAug:挖掘文字上下文信息的数据增广策略
  • +
  • TextRotNet:自监督的预训练模型
  • +
  • UDML:联合互学习策略
  • +
  • UIM:无标注数据挖掘方案
  • +
+

详细优化策略描述请参考PP-OCRv3优化策略

+

4. 安装环境

+
1
+2
+3
+4
# 首先git官方的PaddleOCR项目,安装需要的依赖
+git clone https://github.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+pip install -r requirements.txt
+
+

5. 数据准备

+

本项目使用公开的手写文本识别数据集,包含Chinese OCR, 中科院自动化研究所-手写中文数据集CASIA-HWDB2.x,以及由中科院手写数据和网上开源数据合并组合的数据集等,该项目已经挂载处理好的数据集,可直接下载使用进行训练。

+
1
+2
下载并解压数据
+tar -xf hw_data.tar
+
+

6. 模型训练

+

6.1 下载预训练模型

+

首先需要下载我们需要的PP-OCRv3识别预训练模型,更多选择请自行选择其他的文字识别模型

+
1
+2
+3
+4
# 使用该指令下载需要的预训练模型
+wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+# 解压预训练模型文件
+tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models
+
+

6.2 修改配置文件

+

我们使用configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
  epoch_num: 100 # 训练epoch数
+  save_model_dir: ./output/ch_PP-OCR_v3_rec
+  save_epoch_step: 10
+  eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
+  pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy  # 预训练模型路径
+
+
+  lr:
+    name: Cosine # 修改学习率衰减策略为Cosine
+    learning_rate: 0.0001 # 修改fine-tune的学习率
+    warmup_epoch: 2 # 修改warmup轮数
+
+Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data # 训练集图片路径
+    ext_op_transform_idx: 1
+    label_file_list:
+    - ./train_data/chineseocr-data/rec_hand_line_all_label_train.txt # 训练集标签
+    - ./train_data/handwrite/HWDB2.0Train_label.txt
+    - ./train_data/handwrite/HWDB2.1Train_label.txt
+    - ./train_data/handwrite/HWDB2.2Train_label.txt
+    - ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_train_labels.txt
+    - ./train_data/handwrite/HW_Chinese/train_hw.txt
+    ratio_list:
+    - 0.1
+    - 1.0
+    - 1.0
+    - 1.0
+    - 0.02
+    - 1.0
+  loader:
+    shuffle: true
+    batch_size_per_card: 64
+    drop_last: true
+    num_workers: 4
+Eval:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data # 测试集图片路径
+    label_file_list:
+    - ./train_data/chineseocr-data/rec_hand_line_all_label_val.txt # 测试集标签
+    - ./train_data/handwrite/HWDB2.0Test_label.txt
+    - ./train_data/handwrite/HWDB2.1Test_label.txt
+    - ./train_data/handwrite/HWDB2.2Test_label.txt
+    - ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_val_labels.txt
+    - ./train_data/handwrite/HW_Chinese/test_hw.txt
+  loader:
+    shuffle: false
+    drop_last: false
+    batch_size_per_card: 64
+    num_workers: 4
+
+

由于数据集大多是长文本,因此需要注释掉下面的数据增广策略,以便训练出更好的模型。

+
1
+2
+3
+4
- RecConAug:
+    prob: 0.5
+    ext_data_num: 2
+    image_shape: [48, 320, 3]
+
+

6.3 开始训练

+

我们使用上面修改好的配置文件configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。

+
1
+2
# 开始训练识别模型
+python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
+
+

7. 模型评估

+

在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:

+
1
+2
# 评估预训练模型
+python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
+
+
[2022/07/14 10:46:22] ppocr INFO: load pretrain successful from ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy
+eval model:: 100%|████████████████████████████| 687/687 [03:29<00:00,  3.27it/s]
+[2022/07/14 10:49:52] ppocr INFO: metric eval ***************
+[2022/07/14 10:49:52] ppocr INFO: acc:0.03724954461811258
+[2022/07/14 10:49:52] ppocr INFO: norm_edit_dis:0.4859541065843199
+[2022/07/14 10:49:52] ppocr INFO: Teacher_acc:0.0371584699368947
+[2022/07/14 10:49:52] ppocr INFO: Teacher_norm_edit_dis:0.48718814890536477
+[2022/07/14 10:49:52] ppocr INFO: fps:947.8562684823883
+
+

可以看出,直接加载预训练模型进行评估,效果较差,因为预训练模型并不是基于手写文字进行单独训练的,所以我们需要基于预训练模型进行finetune。 +训练完成后,可以进行测试评估,评估命令如下:

+
1
+2
# 评估finetune效果
+python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy"
+
+

评估结果如下,可以看出识别准确率为54.3%。

+
[2022/07/14 10:54:06] ppocr INFO: metric eval ***************
+[2022/07/14 10:54:06] ppocr INFO: acc:0.5430100180913
+[2022/07/14 10:54:06] ppocr INFO: norm_edit_dis:0.9203322593158589
+[2022/07/14 10:54:06] ppocr INFO: Teacher_acc:0.5401183969626324
+[2022/07/14 10:54:06] ppocr INFO: Teacher_norm_edit_dis:0.919827504507755
+[2022/07/14 10:54:06] ppocr INFO: fps:928.948733797251
+
+

将训练完成的模型放置在对应目录下即可完成模型推理

+

8. 模型导出推理

+

训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

+

8.1 模型导出

+

导出命令如下:

+
1
+2
# 转化为推理模型
+python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
+
+

8.2 模型推理

+

导出模型后,可以使用如下命令进行推理预测:

+
1
+2
# 推理预测
+python tools/infer/predict_rec.py --image_dir="train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
+
+
1
+2
[2022/07/14 10:55:56] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
+[2022/07/14 10:55:58] ppocr INFO: Predicts of train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg:('品结构,差异化的多品牌渗透使欧莱雅确立了其在中国化妆', 0.9904912114143372)
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
# 可视化文字识别图片
+from PIL import Image
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+
+
+img_path = 'train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg'
+
+def vis(img_path):
+    plt.figure()
+    image = Image.open(img_path)
+    plt.imshow(image)
+    plt.show()
+    # image = image.resize([208, 208])
+
+
+vis(img_path)
+
+

res

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\346\211\253\346\217\217\345\220\210\345\220\214\345\205\263\351\224\256\344\277\241\346\201\257\346\217\220\345\217\226.html" "b/applications/\346\211\253\346\217\217\345\220\210\345\220\214\345\205\263\351\224\256\344\277\241\346\201\257\346\217\220\345\217\226.html" new file mode 100644 index 0000000000..4800e34510 --- /dev/null +++ "b/applications/\346\211\253\346\217\217\345\220\210\345\220\214\345\205\263\351\224\256\344\277\241\346\201\257\346\217\220\345\217\226.html" @@ -0,0 +1,5691 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 合同比对 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

金融智能核验:扫描合同关键信息抽取

+

本案例将使用OCR技术和通用信息抽取技术,实现合同关键信息审核和比对。通过本章的学习,你可以快速掌握:

+
    +
  1. 使用PaddleOCR提取扫描文本内容
  2. +
  3. 使用PaddleNLP抽取自定义信息
  4. +
+

点击进入 AI Studio 项目

+

1. 项目背景

+

合同审核广泛应用于大中型企业、上市公司、证券、基金公司中,是规避风险的重要任务。

+
    +
  • 合同内容对比:合同审核场景中,快速找出不同版本合同修改区域、版本差异;如合同盖章归档场景中有效识别实际签署的纸质合同、电子版合同差异。
  • +
  • 合规性检查:法务人员进行合同审核,如合同完备性检查、大小写金额检查、签约主体一致性检查、双方权利和义务对等性分析等。
  • +
  • 风险点识别:通过合同审核可识别事实倾向型风险点和数值计算型风险点等,例如交付地点约定不明、合同总价款不一致、重要条款缺失等风险点。
  • +
+

+

传统业务中大多使用人工进行纸质版合同审核,存在成本高,工作量大,效率低的问题,且一旦出错将造成巨额损失。

+

本项目针对以上场景,使用PaddleOCR+PaddleNLP快速提取文本内容,经过少量数据微调即可准确抽取关键信息,高效完成合同内容对比、合规性检查、风险点识别等任务,提高效率,降低风险

+

+

2. 解决方案

+

2.1 扫描合同文本内容提取

+

使用PaddleOCR开源的模型可以快速完成扫描文档的文本内容提取,在清晰文档上识别准确率可达到95%+。下面来快速体验一下:

+

2.1.1 环境准备

+

PaddleOCR提供了适用于通用场景的高精轻量模型,提供数据预处理-模型推理-后处理全流程,支持pip安装:

+
python -m pip install paddleocr
+
+

2.1.2 效果测试

+

使用一张合同图片作为测试样本,感受ppocrv3模型效果:

+

+

使用中文检测+识别模型提取文本,实例化PaddleOCR类:

+
1
+2
+3
+4
from paddleocr import PaddleOCR, draw_ocr
+
+# paddleocr目前支持中英文、英文、法语、德语、韩语、日语等80个语种,可以通过修改lang参数进行切换
+ocr = PaddleOCR(use_angle_cls=False, lang="ch")  # need to run only once to download and load model into memory
+
+

一行命令启动预测,预测结果包括检测框文本识别内容:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
img_path = "./test_img/hetong2.jpg"
+result = ocr.ocr(img_path, cls=False)
+for line in result:
+    print(line)
+
+# 可视化结果
+from PIL import Image
+
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.show()
+
+

2.1.3 图片预处理

+

通过上图可视化结果可以看到,印章部分造成的文本遮盖,影响了文本识别结果,因此可以考虑通道提取,去除图片中的红色印章:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
import cv2
+import numpy as np
+import matplotlib.pyplot as plt
+
+#读入图像,三通道
+image=cv2.imread("./test_img/hetong2.jpg",cv2.IMREAD_COLOR) #timg.jpeg
+
+#获得三个通道
+Bch,Gch,Rch=cv2.split(image)
+
+#保存三通道图片
+cv2.imwrite('blue_channel.jpg',Bch)
+cv2.imwrite('green_channel.jpg',Gch)
+cv2.imwrite('red_channel.jpg',Rch)
+
+

2.1.4 合同文本信息提取

+

经过2.1.3的预处理后,合同照片的红色通道被分离,获得了一张相对更干净的图片,此时可以再次使用ppocr模型提取文本内容:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
import numpy as np
+import cv2
+
+
+img_path = './red_channel.jpg'
+result = ocr.ocr(img_path, cls=False)
+
+# 可视化结果
+from PIL import Image
+
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
+im_show = Image.fromarray(im_show)
+vis = np.array(im_show)
+im_show.show()
+
+

忽略检测框内容,提取完整的合同文本:

+
1
+2
+3
txts = [line[1][0] for line in result]
+all_context = "\n".join(txts)
+print(all_context)
+
+

通过以上环节就完成了扫描合同关键信息抽取的第一步:文本内容提取,接下来可以基于识别出的文本内容抽取关键信息

+

2.2 合同关键信息抽取

+

2.2.1 环境准备

+

安装PaddleNLP

+
1
+2
pip install --upgrade pip
+pip install --upgrade paddlenlp
+
+

2.2.2 合同关键信息抽取

+

PaddleNLP 使用 Taskflow 统一管理多场景任务的预测功能,其中information_extraction 通过大量的有标签样本进行训练,在通用的场景中一般可以直接使用,只需更换关键字即可。例如在合同信息抽取中,我们重新定义抽取关键字:

+

甲方、乙方、币种、金额、付款方式

+

将使用OCR提取好的文本作为输入,使用三行命令可以对上文中提取到的合同文本进行关键信息抽取:

+
1
+2
+3
+4
+5
from paddlenlp import Taskflow
+schema = ["甲方","乙方","总价"]
+ie = Taskflow('information_extraction', schema=schema)
+ie.set_schema(schema)
+ie(all_context)
+
+

可以看到UIE模型可以准确的提取出关键信息,用于后续的信息比对或审核。

+

3.效果优化

+

3.1 文本识别后处理调优

+

实际图片采集过程中,可能出现部分图片弯曲等问题,导致使用默认参数识别文本时存在漏检,影响关键信息获取。

+

例如下图:

+

+

直接进行预测:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
img_path = "./test_img/hetong3.jpg"
+# 预测结果
+result = ocr.ocr(img_path, cls=False)
+# 可视化结果
+from PIL import Image
+
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.show()
+
+

可视化结果可以看到,弯曲图片存在漏检,一般来说可以通过调整后处理参数解决,无需重新训练模型。漏检问题往往是因为检测模型获得的分割图太小,生成框的得分过低被过滤掉了,通常有两种方式调整参数:

+
    +
  • 开启use_dilatiion=True 膨胀分割区域
  • +
  • 调小det_db_box_thresh阈值
  • +
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
# 重新实例化 PaddleOCR
+ocr = PaddleOCR(use_angle_cls=False, lang="ch", det_db_box_thresh=0.3, use_dilation=True)
+
+# 预测并可视化
+img_path = "./test_img/hetong3.jpg"
+# 预测结果
+result = ocr.ocr(img_path, cls=False)
+# 可视化结果
+image = Image.open(img_path).convert('RGB')
+boxes = [line[0] for line in result]
+txts = [line[1][0] for line in result]
+scores = [line[1][1] for line in result]
+im_show = draw_ocr(image, boxes, txts, scores, font_path='./simfang.ttf')
+im_show = Image.fromarray(im_show)
+im_show.show()
+
+

可以看到漏检问题被很好的解决,提取完整的文本内容:

+
1
+2
+3
txts = [line[1][0] for line in result]
+context = "\n".join(txts)
+print(context)
+
+

3.2 关键信息提取调优

+

UIE通过大量有标签样本进行训练,得到了一个开箱即用的高精模型。 然而针对不同场景,可能会出现部分实体无法被抽取的情况。通常来说有以下几个方法进行效果调优:

+
    +
  • 修改 schema
  • +
  • 添加正则方法
  • +
  • 标注小样本微调模型
  • +
+

修改schema

+

Prompt和原文描述越像,抽取效果越好,例如

+
1
+2
+3
+4
三:合同价格:总价为人民币大写:参拾玖万捌仟伍佰
+元,小写:398500.00元。总价中包括站房工程建设、安装
+及相关避雷、消防、接地、电力、材料费、检验费、安全、
+验收等所需费用及其他相关费用和税金。
+
+

schema = ["总金额"] 时无法准确抽取,与原文描述差异较大。 修改 schema = ["总价"] 再次尝试:

+
1
+2
+3
+4
+5
+6
from paddlenlp import Taskflow
+# schema = ["总金额"]
+schema = ["总价"]
+ie = Taskflow('information_extraction', schema=schema)
+ie.set_schema(schema)
+ie(all_context)
+
+

模型微调 +UIE的建模方式主要是通过 Prompt 方式来建模, Prompt 在小样本上进行微调效果非常有效。详细的数据标注+模型微调步骤可以参考项目:

+

PaddleNLP信息抽取技术重磅升级!

+

工单信息抽取

+

快递单信息抽取

+

总结

+

扫描合同的关键信息提取可以使用 PaddleOCR + PaddleNLP 组合实现,两个工具均有以下优势:

+
    +
  • 使用简单:whl包一键安装,3行命令调用
  • +
  • 效果领先:优秀的模型效果可覆盖几乎全部的应用场景
  • +
  • 调优成本低:OCR模型可通过后处理参数的调整适配略有偏差的扫描文本, UIE模型可以通过极少的标注样本微调,成本很低。
  • +
+

作业

+

尝试自己解析出 test_img/homework.png 扫描合同中的 [甲方、乙方] 关键词:

+

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.html" "b/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..823489474c --- /dev/null +++ "b/applications/\346\266\262\346\231\266\345\261\217\350\257\273\346\225\260\350\257\206\345\210\253.html" @@ -0,0 +1,6629 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 液晶屏读数识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + +

基于PP-OCRv3的液晶屏读数识别

+

1. 项目背景及意义

+

目前光学字符识别(OCR)技术在我们的生活当中被广泛使用,但是大多数模型在通用场景下的准确性还有待提高,针对于此我们借助飞桨提供的PaddleOCR套件较容易的实现了在垂类场景下的应用。

+

该项目以国家质量基础(NQI)为准绳,充分利用大数据、云计算、物联网等高新技术,构建覆盖计量端、实验室端、数据端和硬件端的完整计量解决方案,解决传统计量校准中存在的难题,拓宽计量检测服务体系和服务领域;解决无数传接口或数传接口不统一、不公开的计量设备,以及计量设备所处的环境比较恶劣,不适合人工读取数据。通过OCR技术实现远程计量,引领计量行业向智慧计量转型和发展。

+

2. 项目内容

+

本项目基于PaddleOCR开源套件,以PP-OCRv3检测和识别模型为基础,针对液晶屏读数识别场景进行优化。

+

Aistudio项目链接:OCR液晶屏读数识别

+

3. 安装环境

+
1
+2
+3
+4
+5
# 首先git官方的PaddleOCR项目,安装需要的依赖
+# 第一次运行打开该注释
+# git clone https://gitee.com/PaddlePaddle/PaddleOCR.git
+cd PaddleOCR
+pip install -r requirements.txt
+
+

4. 文字检测

+

文本检测的任务是定位出输入图像中的文字区域。近年来学术界关于文本检测的研究非常丰富,一类方法将文本检测视为目标检测中的一个特定场景,基于通用目标检测算法进行改进适配,如TextBoxes[1]基于一阶段目标检测器SSD[2]算法,调整目标框使之适合极端长宽比的文本行,CTPN[3]则是基于Faster RCNN[4]架构改进而来。但是文本检测与目标检测在目标信息以及任务本身上仍存在一些区别,如文本一般长宽比较大,往往呈“条状”,文本行之间可能比较密集,弯曲文本等,因此又衍生了很多专用于文本检测的算法。本项目基于PP-OCRv3算法进行优化。

+

4.1 PP-OCRv3检测算法介绍

+

PP-OCRv3检测模型是对PP-OCRv2中的CML(Collaborative Mutual Learning) 协同互学习文本检测蒸馏策略进行了升级。如下图所示,CML的核心思想结合了①传统的Teacher指导Student的标准蒸馏与 ②Students网络之间的DML互学习,可以让Students网络互学习的同时,Teacher网络予以指导。PP-OCRv3分别针对教师模型和学生模型进行进一步效果优化。其中,在对教师模型优化时,提出了大感受野的PAN结构LK-PAN和引入了DML(Deep Mutual Learning)蒸馏策略;在对学生模型优化时,提出了残差注意力机制的FPN结构RSE-FPN。 +

+

详细优化策略描述请参考PP-OCRv3优化策略

+

4.2 数据准备

+

计量设备屏幕字符检测数据集数据来源于实际项目中各种计量设备的数显屏,以及在网上搜集的一些其他数显屏,包含训练集755张,测试集355张。

+
1
+2
+3
+4
# 在PaddleOCR下创建新的文件夹train_data
+mkdir train_data
+# 下载数据集并解压到指定路径下
+unzip icdar2015.zip  -d train_data
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
# 随机查看文字检测数据集图片
+from PIL import Image
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+
+
+train = './train_data/icdar2015/text_localization/test'
+# 从指定目录中选取一张图片
+def get_one_image(train):
+    plt.figure()
+    files = os.listdir(train)
+    n = len(files)
+    ind = np.random.randint(0,n)
+    img_dir = os.path.join(train,files[ind])
+    image = Image.open(img_dir)
+    plt.imshow(image)
+    plt.show()
+    image = image.resize([208, 208])
+
+get_one_image(train)
+
+

det_png

+

4.3 模型训练

+

4.3.1 预训练模型直接评估

+

下载我们需要的PP-OCRv3检测预训练模型,更多选择请自行选择其他的文字检测模型

+
1
+2
+3
+4
#使用该指令下载需要的预训练模型
+wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
+# 解压预训练模型文件
+tar -xf ./pretrained_models/ch_PP-OCRv3_det_distill_train.tar -C pretrained_models
+
+

在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:

+
1
+2
# 评估预训练模型
+python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + +
方案hmeans
0PP-OCRv3中英文超轻量检测预训练模型直接预测47.50%
+

4.3.2 预训练模型直接finetune

+
修改配置文件
+

我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:

+
1
+2
+3
+4
+5
+6
+7
epoch:100
+save_epoch_step:10
+eval_batch_step:[0, 50]
+save_model_dir: ./output/ch_PP-OCR_v3_det/
+pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
+learning_rate: 0.00025
+num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
+
+
开始训练
+

使用我们上面修改的配置文件configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml,训练命令如下:

+
1
+2
# 开始训练模型
+python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy
+
+

评估训练好的模型:

+
1
+2
# 评估训练好的模型
+python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + + + + + + +
方案hmeans
0PP-OCRv3中英文超轻量检测预训练模型直接预测47.50%
1PP-OCRv3中英文超轻量检测预训练模型fintune65.20%
+

4.3.3 基于预训练模型Finetune_student模型

+

我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:

+
1
+2
+3
+4
+5
+6
+7
epoch:100
+save_epoch_step:10
+eval_batch_step:[0, 50]
+save_model_dir: ./output/ch_PP-OCR_v3_det_student/
+pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/student
+learning_rate: 0.00025
+num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
+
+

训练命令如下:

+
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/student
+
+

评估训练好的模型:

+
1
+2
# 评估训练好的模型
+python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_student/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans
0PP-OCRv3中英文超轻量检测预训练模型直接预测47.50%
1PP-OCRv3中英文超轻量检测预训练模型fintune65.20%
2PP-OCRv3中英文超轻量检测预训练模型fintune学生模型80.00%
+

4.3.4 基于预训练模型Finetune_teacher模型

+

首先需要从提供的预训练模型best_accuracy.pdparams中提取teacher参数,组合成适合dml训练的初始化模型,提取代码如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
cd ./pretrained_models/
+# transform teacher params in best_accuracy.pdparams into teacher_dml.paramers
+import paddle
+
+# load pretrained model
+all_params = paddle.load("ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
+# print(all_params.keys())
+
+# keep teacher params
+t_params = {key[len("Teacher."):]: all_params[key] for key in all_params if "Teacher." in key}
+
+# print(t_params.keys())
+
+s_params = {"Student." + key: t_params[key] for key in t_params}
+s2_params = {"Student2." + key: t_params[key] for key in t_params}
+s_params = {**s_params, **s2_params}
+# print(s_params.keys())
+
+paddle.save(s_params, "ch_PP-OCRv3_det_distill_train/teacher_dml.pdparams")
+
+

我们使用configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:

+
1
+2
+3
+4
+5
+6
+7
epoch:100
+save_epoch_step:10
+eval_batch_step:[0, 50]
+save_model_dir: ./output/ch_PP-OCR_v3_det_teacher/
+pretrained_model: ./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
+learning_rate: 0.00025
+num_workers: 0 # 如果单卡训练,建议将Train和Eval的loader部分的num_workers设置为0,否则会出现`/dev/shm insufficient`的报错
+
+

训练命令如下:

+
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_dml
+
+

评估训练好的模型:

+
1
+2
# 评估训练好的模型
+python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_dml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_teacher/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans
0PP-OCRv3中英文超轻量检测预训练模型直接预测47.50%
1PP-OCRv3中英文超轻量检测预训练模型fintune65.20%
2PP-OCRv3中英文超轻量检测预训练模型fintune学生模型80.00%
3PP-OCRv3中英文超轻量检测预训练模型fintune教师模型84.80%
+

4.3.5 采用CML蒸馏进一步提升student模型精度

+

需要从4.3.3和4.3.4训练得到的best_accuracy.pdparams中提取各自代表student和teacher的参数,组合成适合cml训练的初始化模型,提取代码如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
# transform teacher params and student parameters into cml model
+import paddle
+
+all_params = paddle.load("./pretrained_models/ch_PP-OCRv3_det_distill_train/best_accuracy.pdparams")
+# print(all_params.keys())
+
+t_params = paddle.load("./output/ch_PP-OCR_v3_det_teacher/best_accuracy.pdparams")
+# print(t_params.keys())
+
+s_params = paddle.load("./output/ch_PP-OCR_v3_det_student/best_accuracy.pdparams")
+# print(s_params.keys())
+
+for key in all_params:
+    # teacher is OK
+    if "Teacher." in key:
+        new_key = key.replace("Teacher", "Student")
+        #print("{} >> {}\n".format(key, new_key))
+        assert all_params[key].shape == t_params[new_key].shape
+        all_params[key] = t_params[new_key]
+
+    if "Student." in key:
+        new_key = key.replace("Student.", "")
+        #print("{} >> {}\n".format(key, new_key))
+        assert all_params[key].shape == s_params[new_key].shape
+        all_params[key] = s_params[new_key]
+
+    if "Student2." in key:
+        new_key = key.replace("Student2.", "")
+        print("{} >> {}\n".format(key, new_key))
+        assert all_params[key].shape == s_params[new_key].shape
+        all_params[key] = s_params[new_key]
+
+paddle.save(all_params, "./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student.pdparams")
+
+

训练命令如下:

+
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model=./pretrained_models/ch_PP-OCRv3_det_distill_train/teacher_cml_student Global.save_model_dir=./output/ch_PP-OCR_v3_det_finetune/
+
+

评估训练好的模型:

+
1
+2
# 评估训练好的模型
+python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_det_finetune/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans
0PP-OCRv3中英文超轻量检测预训练模型直接预测47.50%
1PP-OCRv3中英文超轻量检测预训练模型fintune65.20%
2PP-OCRv3中英文超轻量检测预训练模型fintune学生模型80.00%
3PP-OCRv3中英文超轻量检测预训练模型fintune教师模型84.80%
4基于2和3训练好的模型fintune82.70%
+

将训练完成的模型放置在对应目录下即可完成模型推理

+

4.3.6 模型导出推理

+

训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

+
4.3.6.1 模型导出
+

导出命令如下:

+
1
+2
+3
+4
+5
# 转化为推理模型
+python tools/export_model.py \
+-c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_cml.yml \
+-o Global.pretrained_model=./output/ch_PP-OCR_v3_det_finetune/best_accuracy \
+-o Global.save_inference_dir="./inference/det_ppocrv3"
+
+
4.3.6.2 模型推理
+

导出模型后,可以使用如下命令进行推理预测:

+
1
+2
# 推理预测
+python tools/infer/predict_det.py --image_dir="train_data/icdar2015/text_localization/test/1.jpg" --det_model_dir="./inference/det_ppocrv3/Student"
+
+

5. 文字识别

+

文本识别的任务是识别出图像中的文字内容,一般输入来自于文本检测得到的文本框截取出的图像文字区域。文本识别一般可以根据待识别文本形状分为规则文本识别和不规则文本识别两大类。规则文本主要指印刷字体、扫描文本等,文本大致处在水平线位置;不规则文本往往不在水平位置,存在弯曲、遮挡、模糊等问题。不规则文本场景具有很大的挑战性,也是目前文本识别领域的主要研究方向。本项目基于PP-OCRv3算法进行优化。

+

5.1 PP-OCRv3识别算法介绍

+

PP-OCRv3的识别模块是基于文本识别算法SVTR优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。如下图所示,PP-OCRv3采用了6个优化策略。 +

+

优化策略汇总如下:

+
    +
  • SVTR_LCNet:轻量级文本识别网络
  • +
  • GTC:Attention指导CTC训练策略
  • +
  • TextConAug:挖掘文字上下文信息的数据增广策略
  • +
  • TextRotNet:自监督的预训练模型
  • +
  • UDML:联合互学习策略
  • +
  • UIM:无标注数据挖掘方案
  • +
+

详细优化策略描述请参考PP-OCRv3优化策略

+

5.2 数据准备

+

计量设备屏幕字符识别数据集数据来源于实际项目中各种计量设备的数显屏,以及在网上搜集的一些其他数显屏,包含训练集19912张,测试集4099张。

+
1
+2
# 解压下载的数据集到指定路径下
+unzip ic15_data.zip -d train_data
+
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
# 随机查看文字检测数据集图片
+from PIL import Image
+import matplotlib.pyplot as plt
+import numpy as np
+import os
+
+train = './train_data/ic15_data/train'
+# 从指定目录中选取一张图片
+def get_one_image(train):
+    plt.figure()
+    files = os.listdir(train)
+    n = len(files)
+    ind = np.random.randint(0,n)
+    img_dir = os.path.join(train,files[ind])
+    image = Image.open(img_dir)
+    plt.imshow(image)
+    plt.show()
+    image = image.resize([208, 208])
+
+get_one_image(train)
+
+

rec_png

+

5.3 模型训练

+

下载预训练模型

+

下载我们需要的PP-OCRv3识别预训练模型,更多选择请自行选择其他的文字识别模型

+
1
+2
+3
+4
# 使用该指令下载需要的预训练模型
+wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+# 解压预训练模型文件
+tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models
+
+

修改配置文件

+

我们使用configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
  epoch_num: 100 # 训练epoch数
+  save_model_dir: ./output/ch_PP-OCR_v3_rec
+  save_epoch_step: 10
+  eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
+  cal_metric_during_train: true
+  pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy  # 预训练模型路径
+  character_dict_path: ppocr/utils/ppocr_keys_v1.txt
+  use_space_char: true  # 使用空格
+
+  lr:
+    name: Cosine # 修改学习率衰减策略为Cosine
+    learning_rate: 0.0002 # 修改fine-tune的学习率
+    warmup_epoch: 2 # 修改warmup轮数
+
+Train:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/ic15_data/ # 训练集图片路径
+    ext_op_transform_idx: 1
+    label_file_list:
+    - ./train_data/ic15_data/rec_gt_train.txt # 训练集标签
+    ratio_list:
+    - 1.0
+  loader:
+    shuffle: true
+    batch_size_per_card: 64
+    drop_last: true
+    num_workers: 4
+Eval:
+  dataset:
+    name: SimpleDataSet
+    data_dir: ./train_data/ic15_data/ # 测试集图片路径
+    label_file_list:
+    - ./train_data/ic15_data/rec_gt_test.txt # 测试集标签
+    ratio_list:
+    - 1.0
+  loader:
+    shuffle: false
+    drop_last: false
+    batch_size_per_card: 64
+    num_workers: 4
+
+

在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:

+
1
+2
# 评估预训练模型
+python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + +
方案accuracy
0PP-OCRv3中英文超轻量识别预训练模型直接预测70.40%
+

开始训练

+

我们使用上面修改好的配置文件configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。

+
1
+2
# 开始训练识别模型
+python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
+
+

训练完成后,可以对训练模型中最好的进行测试,评估命令如下:

+
1
+2
# 评估finetune效果
+python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.checkpoints="./output/ch_PP-OCR_v3_rec/best_accuracy"
+
+

结果如下:

+ + + + + + + + + + + + + + + + + + + + +
方案accuracy
0PP-OCRv3中英文超轻量识别预训练模型直接预测70.40%
1PP-OCRv3中英文超轻量识别预训练模型finetune82.20%
+

如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁

+

+

将下载或训练完成的模型放置在对应目录下即可完成模型推理。

+

5.4 模型导出推理

+

训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

+

模型导出

+

导出命令如下:

+
1
+2
# 转化为推理模型
+python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
+
+

模型推理

+

导出模型后,可以使用如下命令进行推理预测

+
1
+2
# 推理预测
+python tools/infer/predict_rec.py --image_dir="train_data/ic15_data/test/1_crop_0.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
+
+

6. 系统串联

+

我们将上面训练好的检测和识别模型进行系统串联测试,命令如下:

+
1
+2
#串联测试
+python3 tools/infer/predict_system.py --image_dir="./train_data/icdar2015/text_localization/test/142.jpg" --det_model_dir="./inference/det_ppocrv3/Student"  --rec_model_dir="./inference/rec_ppocrv3/Student"
+
+

测试结果保存在./inference_results/目录下,可以用下面代码进行可视化

+
1
+2
+3
+4
+5
+6
+7
+8
+9
%cd /home/aistudio/PaddleOCR
+# 显示结果
+import matplotlib.pyplot as plt
+from PIL import Image
+img_path= "./inference_results/142.jpg"
+img = Image.open(img_path)
+plt.figure("test_img", figsize=(30,30))
+plt.imshow(img)
+plt.show()
+
+

sys_res_png

+

6.1 后处理

+

如果需要获取key-value信息,可以基于启发式的规则,将识别结果与关键字库进行匹配;如果匹配上了,则取该字段为key, 后面一个字段为value。

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
def postprocess(rec_res):
+    keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
+            "累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
+            "全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
+            "大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
+    key_value = []
+    if len(rec_res) > 1:
+        for i in range(len(rec_res) - 1):
+            rec_str, _ = rec_res[i]
+            for key in keys:
+                if rec_str in key:
+                    key_value.append([rec_str, rec_res[i + 1][0]])
+                    break
+    return key_value
+key_value = postprocess(filter_rec_res)
+
+

7. PaddleServing部署

+

首先需要安装PaddleServing部署相关的环境

+
1
+2
+3
python -m pip install paddle-serving-server-gpu
+python -m pip install paddle_serving_client
+python -m pip install paddle-serving-app
+
+

7.1 转化检测模型

+
1
+2
+3
+4
+5
+6
cd deploy/pdserving/
+python -m paddle_serving_client.convert --dirname ../../inference/det_ppocrv3/Student/  \
+                                         --model_filename inference.pdmodel          \
+                                         --params_filename inference.pdiparams       \
+                                         --serving_server ./ppocr_det_v3_serving/ \
+                                         --serving_client ./ppocr_det_v3_client/
+
+

7.2 转化识别模型

+
1
+2
+3
+4
+5
python -m paddle_serving_client.convert --dirname ../../inference/rec_ppocrv3/Student \
+                                         --model_filename inference.pdmodel          \
+                                         --params_filename inference.pdiparams       \
+                                         --serving_server ./ppocr_rec_v3_serving/ \
+                                         --serving_client ./ppocr_rec_v3_client/
+
+

7.3 启动服务

+

首先可以将后处理代码加入到web_service.py中,具体修改如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
# 代码153行后面增加下面代码
+def _postprocess(rec_res):
+    keys = ["型号", "厂家", "版本号", "检定校准分类", "计量器具编号", "烟尘流量",
+            "累积体积", "烟气温度", "动压", "静压", "时间", "试验台编号", "预测流速",
+            "全压", "烟温", "流速", "工况流量", "标杆流量", "烟尘直读嘴", "烟尘采样嘴",
+            "大气压", "计前温度", "计前压力", "干球温度", "湿球温度", "流量", "含湿量"]
+    key_value = []
+    if len(rec_res) > 1:
+        for i in range(len(rec_res) - 1):
+            rec_str, _ = rec_res[i]
+            for key in keys:
+                if rec_str in key:
+                    key_value.append([rec_str, rec_res[i + 1][0]])
+                    break
+    return key_value
+key_value = _postprocess(rec_list)
+res = {"result": str(key_value)}
+# res = {"result": str(result_list)}
+
+

启动服务端

+
python web_service.py 2>&1 >log.txt
+
+

7.4 发送请求

+

然后再开启一个新的终端,运行下面的客户端代码

+
python pipeline_http_client.py --image_dir ../../train_data/icdar2015/text_localization/test/142.jpg
+
+

可以获取到最终的key-value结果:

+
1
+2
+3
+4
+5
+6
+7
+8
大气压, 100.07kPa
+干球温度, 0000℃
+计前温度, 0000℃
+湿球温度, 0000℃
+计前压力, -0000kPa
+流量, 00.0L/min
+静压, 00000kPa
+含湿量, 00.0 %
+
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.html" "b/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.html" new file mode 100644 index 0000000000..2b4dcf2bb0 --- /dev/null +++ "b/applications/\350\275\273\351\207\217\347\272\247\350\275\246\347\211\214\350\257\206\345\210\253.html" @@ -0,0 +1,6980 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 车牌识别 - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + + + + + +
+
+ + + + + + + + + + + + + + + + + + + + +

一种基于PaddleOCR的轻量级车牌识别模型

+

1. 项目介绍

+

车牌识别(Vehicle License Plate Recognition,VLPR) 是计算机视频图像识别技术在车辆牌照识别中的一种应用。车牌识别技术要求能够将运动中的汽车牌照从复杂背景中提取并识别出来,在高速公路车辆管理,停车场管理和城市交通中得到广泛应用。

+

本项目难点如下:

+
    +
  1. 车牌在图像中的尺度差异大、在车辆上的悬挂位置不固定
  2. +
  3. 车牌图像质量层次不齐: 角度倾斜、图片模糊、光照不足、过曝等问题严重
  4. +
  5. 边缘和端测场景应用对模型大小有限制,推理速度有要求
  6. +
+

针对以上问题, 本例选用 PP-OCRv3 这一开源超轻量OCR系统进行车牌识别系统的开发。基于PP-OCRv3模型,在CCPD数据集达到99%的检测和94%的识别精度,模型大小12.8M(2.5M+10.3M)。基于量化对模型体积进行进一步压缩到5.8M(1M+4.8M), 同时推理速度提升25%。

+

aistudio项目链接: 基于PaddleOCR的轻量级车牌识别范例

+

2. 环境搭建

+

本任务基于Aistudio完成, 具体环境如下:

+
    +
  • 操作系统: Linux
  • +
  • PaddlePaddle: 2.3
  • +
  • paddleslim: 2.2.2
  • +
  • PaddleOCR: Release/2.5
  • +
+

下载 PaddleOCR代码

+
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
+
+

安装依赖库

+
pip install -r PaddleOCR/requirements.txt
+
+

3. 数据集准备

+

所使用的数据集为 CCPD2020 新能源车牌数据集,该数据集为

+

该数据集分布如下:

+ + + + + + + + + + + + + + + + + + + + + +
数据集类型数量
训练集5769
验证集1001
测试集5006
+

数据集图片示例如下:

+

+

数据集可以从这里下载 https://aistudio.baidu.com/aistudio/datasetdetail/101595

+

下载好数据集后对数据集进行解压

+
unzip -d /home/aistudio/data /home/aistudio/data/data101595/CCPD2020.zip
+
+

3.1 数据集标注规则

+

CPPD数据集的图片文件名具有特殊规则,详细可查看:https://github.com/detectRecog/CCPD

+

具体规则如下:

+

例如: 025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg

+

每个名称可以分为七个字段,以-符号作为分割。这些字段解释如下:

+
    +
  • 025:车牌面积与整个图片区域的面积比。025 (25%)
  • +
  • 95_113:水平倾斜程度和垂直倾斜度。水平 95度 垂直 113度
  • +
  • 154&383_386&473:左上和右下顶点的坐标。左上(154,383) 右下(386,473)
  • +
  • 386&473_177&454_154&383_363&402:整个图像中车牌的四个顶点的精确(x,y)坐标。这些坐标从右下角顶点开始。(386,473) (177,454) (154,383) (363,402)
  • +
  • 0_0_22_27_27_33_16:CCPD中的每个图像只有一个车牌。每个车牌号码由一个汉字,一个字母和五个字母或数字组成。有效的中文车牌由七个字符组成:省(1个字符),字母(1个字符),字母+数字(5个字符)。“ 0_0_22_27_27_33_16”是每个字符的索引。这三个数组定义如下:每个数组的最后一个字符是字母O,而不是数字0。我们将O用作“无字符”的符号,因为中文车牌字符中没有O。因此以上车牌拼起来即为 皖AY339S
  • +
  • 37:牌照区域的亮度。 37 (37%)
  • +
  • 15:车牌区域的模糊度。15 (15%)
  • +
+
1
+2
+3
provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
+alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W','X', 'Y', 'Z', 'O']
+ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X','Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
+
+

3.2 制作符合PP-OCR训练格式的标注文件

+

在开始训练之前,可使用如下代码制作符合PP-OCR训练格式的标注文件。

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
import cv2
+import os
+import json
+from tqdm import tqdm
+import numpy as np
+
+provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
+alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'O']
+ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
+
+def make_label(img_dir, save_gt_folder, phase):
+    crop_img_save_dir = os.path.join(save_gt_folder, phase, 'crop_imgs')
+    os.makedirs(crop_img_save_dir, exist_ok=True)
+
+    f_det = open(os.path.join(save_gt_folder, phase, 'det.txt'), 'w', encoding='utf-8')
+    f_rec = open(os.path.join(save_gt_folder, phase, 'rec.txt'), 'w', encoding='utf-8')
+
+    i = 0
+    for filename in tqdm(os.listdir(os.path.join(img_dir, phase))):
+        str_list = filename.split('-')
+        if len(str_list) < 5:
+            continue
+        coord_list = str_list[3].split('_')
+        txt_list = str_list[4].split('_')
+        boxes = []
+        for coord in coord_list:
+            boxes.append([int(x) for x in coord.split("&")])
+        boxes = [boxes[2], boxes[3], boxes[0], boxes[1]]
+        lp_number = provinces[int(txt_list[0])] + alphabets[int(txt_list[1])] + ''.join([ads[int(x)] for x in txt_list[2:]])
+
+        # det
+        det_info = [{'points':boxes, 'transcription':lp_number}]
+        f_det.write('{}\t{}\n'.format(os.path.join(phase, filename), json.dumps(det_info, ensure_ascii=False)))
+
+        # rec
+        boxes = np.float32(boxes)
+        img = cv2.imread(os.path.join(img_dir, phase, filename))
+        # crop_img = img[int(boxes[:,1].min()):int(boxes[:,1].max()),int(boxes[:,0].min()):int(boxes[:,0].max())]
+        crop_img = get_rotate_crop_image(img, boxes)
+        crop_img_save_filename = '{}_{}.jpg'.format(i,'_'.join(txt_list))
+        crop_img_save_path = os.path.join(crop_img_save_dir, crop_img_save_filename)
+        cv2.imwrite(crop_img_save_path, crop_img)
+        f_rec.write('{}/crop_imgs/{}\t{}\n'.format(phase, crop_img_save_filename, lp_number))
+        i+=1
+    f_det.close()
+    f_rec.close()
+
+def get_rotate_crop_image(img, points):
+    '''
+    img_height, img_width = img.shape[0:2]
+    left = int(np.min(points[:, 0]))
+    right = int(np.max(points[:, 0]))
+    top = int(np.min(points[:, 1]))
+    bottom = int(np.max(points[:, 1]))
+    img_crop = img[top:bottom, left:right, :].copy()
+    points[:, 0] = points[:, 0] - left
+    points[:, 1] = points[:, 1] - top
+    '''
+    assert len(points) == 4, "shape of points must be 4*2"
+    img_crop_width = int(
+        max(
+            np.linalg.norm(points[0] - points[1]),
+            np.linalg.norm(points[2] - points[3])))
+    img_crop_height = int(
+        max(
+            np.linalg.norm(points[0] - points[3]),
+            np.linalg.norm(points[1] - points[2])))
+    pts_std = np.float32([[0, 0], [img_crop_width, 0],
+                          [img_crop_width, img_crop_height],
+                          [0, img_crop_height]])
+    M = cv2.getPerspectiveTransform(points, pts_std)
+    dst_img = cv2.warpPerspective(
+        img,
+        M, (img_crop_width, img_crop_height),
+        borderMode=cv2.BORDER_REPLICATE,
+        flags=cv2.INTER_CUBIC)
+    dst_img_height, dst_img_width = dst_img.shape[0:2]
+    if dst_img_height * 1.0 / dst_img_width >= 1.5:
+        dst_img = np.rot90(dst_img)
+    return dst_img
+
+img_dir = '/home/aistudio/data/CCPD2020/ccpd_green'
+save_gt_folder = '/home/aistudio/data/CCPD2020/PPOCR'
+# phase = 'train' # change to val and test to make val dataset and test dataset
+for phase in ['train','val','test']:
+    make_label(img_dir, save_gt_folder, phase)
+
+

通过上述命令可以完成了训练集验证集测试集的制作,制作完成的数据集信息如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
类型数据集图片地址标签地址图片数量
检测训练集/home/aistudio/data/CCPD2020/ccpd_green/train/home/aistudio/data/CCPD2020/PPOCR/train/det.txt5769
检测验证集/home/aistudio/data/CCPD2020/ccpd_green/val/home/aistudio/data/CCPD2020/PPOCR/val/det.txt1001
检测测试集/home/aistudio/data/CCPD2020/ccpd_green/test/home/aistudio/data/CCPD2020/PPOCR/test/det.txt5006
识别训练集/home/aistudio/data/CCPD2020/PPOCR/train/crop_imgs/home/aistudio/data/CCPD2020/PPOCR/train/rec.txt5769
识别验证集/home/aistudio/data/CCPD2020/PPOCR/val/crop_imgs/home/aistudio/data/CCPD2020/PPOCR/val/rec.txt1001
识别测试集/home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt5006
+

在普遍的深度学习流程中,都是在训练集训练,在验证集选择最优模型后在测试集上进行测试。在本例中,我们省略中间步骤,直接在训练集训练,在测试集选择最优模型,因此我们只使用训练集和测试集。

+

4. 实验

+

由于数据集比较少,为了模型更好和更快的收敛,这里选用 PaddleOCR 中的 PP-OCRv3 模型进行文本检测和识别,并且使用 PP-OCRv3 模型参数作为预训练模型。PP-OCRv3在PP-OCRv2的基础上,中文场景端到端Hmean指标相比于PP-OCRv2提升5%, 英文数字模型端到端效果提升11%。详细优化细节请参考PP-OCRv3技术报告。

+

由于车牌场景均为端侧设备部署,因此对速度和模型大小有比较高的要求,因此还需要采用量化训练的方式进行模型大小的压缩和模型推理速度的加速。模型量化可以在基本不损失模型的精度的情况下,将FP32精度的模型参数转换为Int8精度,减小模型参数大小并加速计算,使用量化后的模型在移动端等部署时更具备速度优势。

+

因此,本实验中对于车牌检测和识别有如下3种方案:

+
    +
  1. PP-OCRv3中英文超轻量预训练模型直接预测
  2. +
  3. CCPD车牌数据集在PP-OCRv3模型上fine-tune
  4. +
  5. CCPD车牌数据集在PP-OCRv3模型上fine-tune后量化
  6. +
+

4.1 检测

+

4.1.1 预训练模型直接预测

+

从下表中下载PP-OCRv3文本检测预训练模型

+ + + + + + + + + + + + + + + + + + + +
模型名称模型简介配置文件推理模型大小下载地址
ch_PP-OCRv3_det【最新】原始超轻量模型,支持中英文、多语种文本检测ch_PP-OCRv3_det_cml.yml3.8M推理模型 / 训练模型
+

使用如下命令下载预训练模型

+
1
+2
+3
+4
+5
mkdir models
+cd models
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_distill_train.tar
+tar -xf ch_PP-OCRv3_det_distill_train.tar
+cd /home/aistudio/PaddleOCR
+
+

预训练模型下载完成后,我们使用ch_PP-OCRv3_det_student.yml 配置文件进行后续实验,在开始评估之前需要对配置文件中部分字段进行设置,具体如下:

+
    +
  1. 模型存储和训练相关:
  2. +
  3. Global.pretrained_model: 指向PP-OCRv3文本检测预训练模型地址
  4. +
  5. 数据集相关
  6. +
  7. Eval.dataset.data_dir:指向测试集图片存放目录
  8. +
  9. Eval.dataset.label_file_list:指向测试集标注文件
  10. +
+

上述字段均为必须修改的字段,可以通过修改配置文件的方式改动,也可在不需要修改配置文件的情况下,改变训练的参数。这里使用不改变配置文件的方式 。使用如下命令进行PP-OCRv3文本检测预训练模型的评估

+
1
+2
+3
+4
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
+
+

上述指令中,通过-c 选择训练使用配置文件,通过-o参数在不需要修改配置文件的情况下,改变训练的参数。

+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案hmeans
PP-OCRv3中英文超轻量检测预训练模型直接预测76.12%
+

4.1.2 CCPD车牌数据集fine-tune

+
训练
+

为了进行fine-tune训练,我们需要在配置文件中设置需要使用的预训练模型地址,学习率和数据集等参数。 具体如下:

+
    +
  1. 模型存储和训练相关:
  2. +
  3. Global.pretrained_model: 指向PP-OCRv3文本检测预训练模型地址
  4. +
  5. Global.eval_batch_step: 模型多少step评估一次,这里设为从第0个step开始每隔772个step评估一次,772为一个epoch总的step数。
  6. +
  7. 优化器相关:
  8. +
  9. Optimizer.lr.name: 学习率衰减器设为常量 Const
  10. +
  11. Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍
  12. +
  13. Optimizer.lr.warmup_epoch: warmup_epoch设为0
  14. +
  15. 数据集相关:
  16. +
  17. Train.dataset.data_dir:指向训练集图片存放目录
  18. +
  19. Train.dataset.label_file_list:指向训练集标注文件
  20. +
  21. Eval.dataset.data_dir:指向测试集图片存放目录
  22. +
  23. Eval.dataset.label_file_list:指向测试集标注文件
  24. +
+

使用如下代码即可启动在CCPD车牌数据集上的fine-tune。

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
python tools/train.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
+    Global.save_model_dir=output/CCPD/det \
+    Global.eval_batch_step="[0, 772]" \
+    Optimizer.lr.name=Const \
+    Optimizer.lr.learning_rate=0.0005 \
+    Optimizer.lr.warmup_epoch=0 \
+    Train.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/det.txt] \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
+
+

在上述命令中,通过-o的方式修改了配置文件中的参数。

+
评估
+

训练完成后使用如下命令进行评估

+
1
+2
+3
+4
python tools/eval.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
+
+

使用预训练模型和CCPD车牌数据集fine-tune,指标分别如下:

+ + + + + + + + + + + + + + + + + +
方案hmeans
PP-OCRv3中英文超轻量检测预训练模型直接预测76.12%
PP-OCRv3中英文超轻量检测预训练模型 fine-tune99.00%
+

可以看到进行fine-tune能显著提升车牌检测的效果。

+

4.1.3 CCPD车牌数据集fine-tune+量化训练

+

此处采用 PaddleOCR 中提供好的量化教程对模型进行量化训练。

+

量化训练可通过如下命令启动:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
python3.7 deploy/slim/quantization/quant.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
+    Global.save_model_dir=output/CCPD/det_quant \
+    Global.eval_batch_step="[0, 772]" \
+    Optimizer.lr.name=Const \
+    Optimizer.lr.learning_rate=0.0005 \
+    Optimizer.lr.warmup_epoch=0 \
+    Train.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/det.txt] \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt]
+
+

量化后指标对比如下

+ + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans模型大小预测速度(lite)
PP-OCRv3中英文超轻量检测预训练模型 fine-tune99.00%2.5M223ms
PP-OCRv3中英文超轻量检测预训练模型 fine-tune+量化98.91%1.0M189ms
+

可以看到通过量化训练在精度几乎无损的情况下,降低模型体积60%并且推理速度提升15%。

+

速度测试基于PaddleOCR lite教程完成。

+

4.1.4 模型导出

+

使用如下命令可以将训练好的模型进行导出

+

非量化模型

+
1
+2
+3
python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
+    Global.save_inference_dir=output/det/infer
+
+

量化模型

+
1
+2
+3
python deploy/slim/quantization/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det_quant/best_accuracy.pdparams \
+    Global.save_inference_dir=output/det/infer
+
+

4.2 识别

+

4.2.1 预训练模型直接预测

+

从下表中下载PP-OCRv3文本识别预训练模型

+ + + + + + + + + + + + + + + + + + + +
模型名称模型简介配置文件推理模型大小下载地址
ch_PP-OCRv3_rec【最新】原始超轻量模型,支持中英文、数字识别ch_PP-OCRv3_rec_distillation.yml12.4M推理模型 / 训练模型
+

使用如下命令下载预训练模型

+
1
+2
+3
+4
+5
mkdir models
+cd models
+wget https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
+tar -xf ch_PP-OCRv3_rec_train.tar
+cd /home/aistudio/PaddleOCR
+
+

PaddleOCR提供的PP-OCRv3识别模型采用蒸馏训练策略,因此提供的预训练模型中会包含TeacherStudent模型的参数,详细信息可参考knowledge_distillation.md。 因此,模型下载完成后需要使用如下代码提取Student模型的参数:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
import paddle
+# 加载预训练模型
+all_params = paddle.load("models/ch_PP-OCRv3_rec_train/best_accuracy.pdparams")
+# 查看权重参数的keys
+print(all_params.keys())
+# 学生模型的权重提取
+s_params = {key[len("Student."):]: all_params[key] for key in all_params if "Student." in key}
+# 查看学生模型权重参数的keys
+print(s_params.keys())
+# 保存
+paddle.save(s_params, "models/ch_PP-OCRv3_rec_train/student.pdparams")
+
+

预训练模型下载完成后,我们使用ch_PP-OCRv3_rec.yml 配置文件进行后续实验,在开始评估之前需要对配置文件中部分字段进行设置,具体如下:

+
    +
  1. 模型存储和训练相关:
  2. +
  3. Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
  4. +
  5. 数据集相关
  6. +
  7. Eval.dataset.data_dir:指向测试集图片存放目录
  8. +
  9. Eval.dataset.label_file_list:指向测试集标注文件
  10. +
+

使用如下命令进行PP-OCRv3文本识别预训练模型的评估

+
1
+2
+3
+4
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
+
+

评估部分日志如下:

+
1
+2
+3
+4
+5
+6
+7
+8
[2022/05/12 19:52:02] ppocr INFO: load pretrain successful from models/ch_PP-OCRv3_rec_train/best_accuracy
+eval model:: 100%|██████████████████████████████| 40/40 [00:15<00:00,  2.57it/s]
+[2022/05/12 19:52:17] ppocr INFO: metric eval ***************
+[2022/05/12 19:52:17] ppocr INFO: acc:0.0
+[2022/05/12 19:52:17] ppocr INFO: norm_edit_dis:0.8656084923002452
+[2022/05/12 19:52:17] ppocr INFO: Teacher_acc:0.000399520574511545
+[2022/05/12 19:52:17] ppocr INFO: Teacher_norm_edit_dis:0.8657902943394548
+[2022/05/12 19:52:17] ppocr INFO: fps:1443.1801978719905
+
+

使用预训练模型进行评估,指标如下所示:

+ + + + + + + + + + + + + +
方案acc
PP-OCRv3中英文超轻量识别预训练模型直接预测0%
+

从评估日志中可以看到,直接使用PP-OCRv3预训练模型进行评估,acc非常低,但是norm_edit_dis很高。因此,我们猜测是模型大部分文字识别是对的,只有少部分文字识别错误。使用如下命令进行infer查看模型的推理结果进行验证:

+
1
+2
+3
python tools/infer_rec.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
+    Global.infer_img=/home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs/0_0_0_3_32_30_31_30_30.jpg
+
+

输出部分日志如下:

+
1
+2
+3
+4
+5
+6
+7
[2022/05/01 08:51:57] ppocr INFO: train with paddle 2.2.2 and device CUDAPlace(0)
+W0501 08:51:57.127391 11326 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.0, Runtime API Version: 10.1
+W0501 08:51:57.132315 11326 device_context.cc:465] device: 0, cuDNN Version: 7.6.
+[2022/05/01 08:52:00] ppocr INFO: load pretrain successful from models/ch_PP-OCRv3_rec_train/student
+[2022/05/01 08:52:00] ppocr INFO: infer_img: /home/aistudio/data/CCPD2020/PPOCR/test/crop_imgs/0_0_3_32_30_31_30_30.jpg
+[2022/05/01 08:52:00] ppocr INFO:      result: {"Student": {"label": "皖A·D86766", "score": 0.9552637934684753}, "Teacher": {"label": "皖A·D86766", "score": 0.9917094707489014}}
+[2022/05/01 08:52:00] ppocr INFO: success!
+
+

从infer结果可以看到,车牌中的文字大部分都识别正确,只是多识别出了一个·。针对这种情况,有如下两种方案:

+
    +
  1. 直接通过后处理去掉多识别的·
  2. +
  3. 进行 fine-tune。
  4. +
+

4.2.2 预训练模型直接预测+改动后处理

+

直接通过后处理去掉多识别的·,在后处理的改动比较简单,只需在 ppocr/postprocess/rec_postprocess.py 文件的76行添加如下代码:

+
text = text.replace('·','')
+
+

改动前后指标对比:

+ + + + + + + + + + + + + + + + + +
方案acc
PP-OCRv3中英文超轻量识别预训练模型直接预测0.20%
PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的·90.97%
+

可以看到,去掉多余的·能大幅提高精度。

+

4.2.3 CCPD车牌数据集fine-tune

+
训练
+

为了进行fine-tune训练,我们需要在配置文件中设置需要使用的预训练模型地址,学习率和数据集等参数。 具体如下:

+
    +
  1. 模型存储和训练相关:
  2. +
  3. Global.pretrained_model: 指向PP-OCRv3文本识别预训练模型地址
  4. +
  5. Global.eval_batch_step: 模型多少step评估一次,这里设为从第0个step开始没隔45个step评估一次,45为一个epoch总的step数。
  6. +
  7. 优化器相关
  8. +
  9. Optimizer.lr.name: 学习率衰减器设为常量 Const
  10. +
  11. Optimizer.lr.learning_rate: 做 fine-tune 实验,学习率需要设置的比较小,此处学习率设为配置文件中的0.05倍
  12. +
  13. Optimizer.lr.warmup_epoch: warmup_epoch设为0
  14. +
  15. 数据集相关
  16. +
  17. Train.dataset.data_dir:指向训练集图片存放目录
  18. +
  19. Train.dataset.label_file_list:指向训练集标注文件
  20. +
  21. Eval.dataset.data_dir:指向测试集图片存放目录
  22. +
  23. Eval.dataset.label_file_list:指向测试集标注文件
  24. +
+

使用如下命令启动 fine-tune

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
+    Global.save_model_dir=output/CCPD/rec/ \
+    Global.eval_batch_step="[0, 90]" \
+    Optimizer.lr.name=Const \
+    Optimizer.lr.learning_rate=0.0005 \
+    Optimizer.lr.warmup_epoch=0 \
+    Train.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/rec.txt] \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
+
+
评估
+

训练完成后使用如下命令进行评估

+
1
+2
+3
+4
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
+
+

使用预训练模型和CCPD车牌数据集fine-tune,指标分别如下:

+ + + + + + + + + + + + + + + + + + + + + +
方案acc
PP-OCRv3中英文超轻量识别预训练模型直接预测0.00%
PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的·90.97%
PP-OCRv3中英文超轻量识别预训练模型 fine-tune94.54%
+

可以看到进行fine-tune能显著提升车牌识别的效果。

+

4.2.4 CCPD车牌数据集fine-tune+量化训练

+

此处采用 PaddleOCR 中提供好的量化教程对模型进行量化训练。

+

量化训练可通过如下命令启动:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
python3.7 deploy/slim/quantization/quant.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
+    Global.save_model_dir=output/CCPD/rec_quant/ \
+    Global.eval_batch_step="[0, 90]" \
+    Optimizer.lr.name=Const \
+    Optimizer.lr.learning_rate=0.0005 \
+    Optimizer.lr.warmup_epoch=0 \
+    Train.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Train.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/train/rec.txt] \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
+
+

量化后指标对比如下

+ + + + + + + + + + + + + + + + + + + + + + + +
方案acc模型大小预测速度(lite)
PP-OCRv3中英文超轻量识别预训练模型 fine-tune94.54%10.3M4.2ms
PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化93.40%4.8M1.8ms
+

可以看到量化后能降低模型体积53%并且推理速度提升57%,但是由于识别数据过少,量化带来了1%的精度下降。

+

速度测试基于PaddleOCR lite教程完成。

+

4.2.5 模型导出

+

使用如下命令可以将训练好的模型进行导出。

+

非量化模型

+
1
+2
+3
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/rec/infer
+
+

量化模型

+
1
+2
+3
python deploy/slim/quantization/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec_quant/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/rec_quant/infer
+
+

4.3 计算End2End指标

+

端到端指标可通过 PaddleOCR内置脚本 进行计算,具体步骤如下:

+

1. 导出模型

+

通过如下命令进行模型的导出。注意,量化模型导出时,需要配置eval数据集

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
# 检测模型
+
+# 预训练模型
+python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_det_distill_train/student.pdparams \
+    Global.save_inference_dir=output/ch_PP-OCRv3_det_distill_train/infer
+
+# 非量化模型
+python tools/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/det/infer
+
+# 量化模型
+python deploy/slim/quantization/export_model.py -c configs/det/ch_PP-OCRv3/ch_PP-OCRv3_det_student.yml -o \
+    Global.pretrained_model=output/CCPD/det_quant/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/det_quant/infer \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/ccpd_green \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/det.txt] \
+    Eval.loader.num_workers=0
+
+# 识别模型
+
+# 预训练模型
+python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=models/ch_PP-OCRv3_rec_train/student.pdparams \
+    Global.save_inference_dir=output/ch_PP-OCRv3_rec_train/infer
+
+# 非量化模型
+python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/rec/infer
+
+# 量化模型
+python deploy/slim/quantization/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec.yml -o \
+    Global.pretrained_model=output/CCPD/rec_quant/best_accuracy.pdparams \
+    Global.save_inference_dir=output/CCPD/rec_quant/infer \
+    Eval.dataset.data_dir=/home/aistudio/data/CCPD2020/PPOCR \
+    Eval.dataset.label_file_list=[/home/aistudio/data/CCPD2020/PPOCR/test/rec.txt]
+
+

2. 用导出的模型对测试集进行预测

+

此处,分别使用PP-OCRv3预训练模型,fintune模型和量化模型对测试集的所有图像进行预测,命令如下:

+
1
+2
+3
+4
+5
+6
+7
+8
# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型
+python3 tools/infer/predict_system.py --det_model_dir=models/ch_PP-OCRv3_det_distill_train/infer --rec_model_dir=models/ch_PP-OCRv3_rec_train/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/pretrain --use_dilation=true
+
+# PP-OCRv3中英文超轻量检测预训练模型+fine-tune,PP-OCRv3中英文超轻量识别预训练模型+fine-tune
+python3 tools/infer/predict_system.py --det_model_dir=output/CCPD/det/infer --rec_model_dir=output/CCPD/rec/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/fine-tune --use_dilation=true
+
+# PP-OCRv3中英文超轻量检测预训练模型 fine-tune +量化,PP-OCRv3中英文超轻量识别预训练模型 fine-tune +量化 结果转换和评估
+python3 tools/infer/predict_system.py --det_model_dir=output/CCPD/det_quant/infer --rec_model_dir=output/CCPD/rec_quant/infer --det_limit_side_len=736 --det_limit_type=min --image_dir=/home/aistudio/data/CCPD2020/ccpd_green/test/ --draw_img_save_dir=infer/quant --use_dilation=true
+
+

3. 转换label并计算指标

+

将gt和上一步保存的预测结果转换为端对端评测需要的数据格式,并根据转换后的数据进行端到端指标计算

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
python3 tools/end2end/convert_ppocr_label.py --mode=gt --label_path=/home/aistudio/data/CCPD2020/PPOCR/test/det.txt --save_folder=end2end/gt
+
+# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型 结果转换和评估
+python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/pretrain/system_results.txt --save_folder=end2end/pretrain
+python3 tools/end2end/eval_end2end.py end2end/gt end2end/pretrain
+
+# PP-OCRv3中英文超轻量检测预训练模型,PP-OCRv3中英文超轻量识别预训练模型+后处理去掉多识别的`·` 结果转换和评估
+# 需手动修改后处理函数
+python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/post/system_results.txt --save_folder=end2end/post
+python3 tools/end2end/eval_end2end.py end2end/gt end2end/post
+
+# PP-OCRv3中英文超轻量检测预训练模型 fine-tune,PP-OCRv3中英文超轻量识别预训练模型 fine-tune 结果转换和评估
+python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/fine-tune/system_results.txt --save_folder=end2end/fine-tune
+python3 tools/end2end/eval_end2end.py end2end/gt end2end/fine-tune
+
+# PP-OCRv3中英文超轻量检测预训练模型 fine-tune +量化,PP-OCRv3中英文超轻量识别预训练模型 fine-tune +量化 结果转换和评估
+python3 tools/end2end/convert_ppocr_label.py --mode=pred --label_path=infer/quant/system_results.txt --save_folder=end2end/quant
+python3 tools/end2end/eval_end2end.py end2end/gt end2end/quant
+
+

日志如下:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
The convert label saved in end2end/gt
+The convert label saved in end2end/pretrain
+start testing...
+hit, dt_count, gt_count 2 5988 5006
+character_acc: 70.42%
+avg_edit_dist_field: 2.37
+avg_edit_dist_img: 2.37
+precision: 0.03%
+recall: 0.04%
+fmeasure: 0.04%
+The convert label saved in end2end/post
+start testing...
+hit, dt_count, gt_count 4224 5988 5006
+character_acc: 81.59%
+avg_edit_dist_field: 1.47
+avg_edit_dist_img: 1.47
+precision: 70.54%
+recall: 84.38%
+fmeasure: 76.84%
+The convert label saved in end2end/fine-tune
+start testing...
+hit, dt_count, gt_count 4286 4898 5006
+character_acc: 94.16%
+avg_edit_dist_field: 0.47
+avg_edit_dist_img: 0.47
+precision: 87.51%
+recall: 85.62%
+fmeasure: 86.55%
+The convert label saved in end2end/quant
+start testing...
+hit, dt_count, gt_count 4349 4951 5006
+character_acc: 94.13%
+avg_edit_dist_field: 0.47
+avg_edit_dist_img: 0.47
+precision: 87.84%
+recall: 86.88%
+fmeasure: 87.36%
+
+

各个方案端到端指标如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
模型指标
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型
0.04%
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的·
78.27%
PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune
87.14%
PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化
88.00%
+

从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到78.27%,在CCPD数据集上进行 fine-tune 后指标进一步提升到87.14%, 在经过量化训练之后,由于检测模型的recall变高,指标进一步提升到88%。但是这个结果仍旧不符合检测模型+识别模型的真实性能(99%*94%=93%),因此我们需要对 base case 进行具体分析。

+

在之前的端到端预测结果中,可以看到很多不符合车牌标注的文字被识别出来, 因此可以进行简单的过滤来提升precision

+

为了快速评估,我们在 tools/end2end/convert_ppocr_label.py 脚本的 58 行加入如下代码,对非8个字符的结果进行过滤

+
1
+2
if len(txt) != 8: # 车牌字符串长度为8
+    continue
+
+

此外,通过可视化box可以发现有很多框都是竖直翻转之后的框,并且没有完全框住车牌边界,因此需要进行框的竖直翻转以及轻微扩大,示意图如下:

+

+

修改前后个方案指标对比如下:

+

各个方案端到端指标如下:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
模型baseA:识别结果过滤B:use_dilationC:flip_boxbest
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型
0.04%0.08%0.02%0.05%0.00%(A)
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的·
78.27%90.84%78.61%79.43%91.66%(A+B+C)
PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune
87.14%90.40%87.66%89.98%92.50%(A+B+C)
PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化
88.00%90.54%88.50%89.46%92.02%(A+B+C)
+

从结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到91.66%,在CCPD数据集上进行 fine-tune 后指标进一步提升到92.5%, 在经过量化训练之后,指标变为92.02%。

+

4.4 部署

+

基于 Paddle Inference 的python推理

+

检测模型和识别模型分别 fine-tune 并导出为inference模型之后,可以使用如下命令基于 Paddle Inference 进行端到端推理并对结果进行可视化。

+
1
+2
+3
+4
+5
python tools/infer/predict_system.py \
+    --det_model_dir=output/CCPD/det/infer/ \
+    --rec_model_dir=output/CCPD/rec/infer/ \
+    --image_dir="/home/aistudio/data/CCPD2020/ccpd_green/test/04131106321839081-92_258-159&509_530&611-527&611_172&599_159&509_530&525-0_0_3_32_30_31_30_30-109-106.jpg" \
+    --rec_image_shape=3,48,320
+
+

推理结果如下

+

+

端侧部署

+

端侧部署我们采用基于 PaddleLite 的 cpp 推理。Paddle Lite是飞桨轻量化推理引擎,为手机、IOT端提供高效推理能力,并广泛整合跨平台硬件,为端侧部署及应用落地问题提供轻量化的部署方案。具体可参考 PaddleOCR lite教程

+

4.5 实验总结

+

我们分别使用PP-OCRv3中英文超轻量预训练模型在车牌数据集上进行了直接评估和 fine-tune 和 fine-tune +量化3种方案的实验,并基于PaddleOCR lite教程进行了速度测试,指标对比如下:

+
    +
  • 检测
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
方案hmeans模型大小预测速度(lite)
PP-OCRv3中英文超轻量检测预训练模型直接预测76.12%2.5M233ms
PP-OCRv3中英文超轻量检测预训练模型 fine-tune99.00%2.5M233ms
PP-OCRv3中英文超轻量检测预训练模型 fine-tune + 量化98.91%1.0M189ms
+
    +
  • 识别
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
方案acc模型大小预测速度(lite)
PP-OCRv3中英文超轻量识别预训练模型直接预测0.00%10.3M4.2ms
PP-OCRv3中英文超轻量识别预训练模型直接预测+后处理去掉多识别的·90.97%10.3M4.2ms
PP-OCRv3中英文超轻量识别预训练模型 fine-tune94.54%10.3M4.2ms
PP-OCRv3中英文超轻量识别预训练模型 fine-tune + 量化93.40%4.8M1.8ms
+
    +
  • 端到端指标如下:
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
方案fmeasure模型大小预测速度(lite)
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型
0.08%12.8M298ms
PP-OCRv3中英文超轻量检测预训练模型
PP-OCRv3中英文超轻量识别预训练模型 + 后处理去掉多识别的·
91.66%12.8M298ms
PP-OCRv3中英文超轻量检测预训练模型+fine-tune
PP-OCRv3中英文超轻量识别预训练模型+fine-tune
92.50%12.8M298ms
PP-OCRv3中英文超轻量检测预训练模型+fine-tune+量化
PP-OCRv3中英文超轻量识别预训练模型+fine-tune+量化
92.02%5.80M224ms
+

结论

+

PP-OCRv3的检测模型在未经过fine-tune的情况下,在车牌数据集上也有一定的精度,经过 fine-tune 后能够极大的提升检测效果,精度达到99%。在使用量化训练后检测模型的精度几乎无损,并且模型大小压缩60%。

+

PP-OCRv3的识别模型在未经过fine-tune的情况下,在车牌数据集上精度为0,但是经过分析可以知道,模型大部分字符都预测正确,但是会多预测一个特殊字符,去掉这个特殊字符后,精度达到90%。PP-OCRv3识别模型在经过 fine-tune 后识别精度进一步提升,达到94.4%。在使用量化训练后识别模型大小压缩53%,但是由于数据量多少,带来了1%的精度损失。

+

从端到端结果中可以看到对预训练模型不做修改,只根据场景下的具体情况进行后处理的修改就能大幅提升端到端指标到91.66%,在CCPD数据集上进行 fine-tune 后指标进一步提升到92.5%, 在经过量化训练之后,指标轻微下降到92.02%但模型大小降低54%。

+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git "a/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.html" "b/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.html" new file mode 100644 index 0000000000..f587f612ec --- /dev/null +++ "b/applications/\351\253\230\347\262\276\345\272\246\344\270\255\346\226\207\350\257\206\345\210\253\346\250\241\345\236\213.html" @@ -0,0 +1,5300 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 高精度中文场景文本识别模型SVTR - PaddleOCR 文档 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + + 跳转至 + + +
+
+ +
+ + + + + + +
+ + + + + + + +
+ +
+ + + + +
+
+ + + +
+
+
+ + + + + + + +
+
+
+ + + +
+
+
+ + + +
+
+
+ + + +
+
+ + + + + + + + + + + + + + + + + + + + +

高精度中文场景文本识别模型SVTR

+

1. 简介

+

PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库,其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度,SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet,使用两层Global Blocks。在中文场景中,PP-OCRv3识别主要使用如下优化策略(详细技术报告):

+
    +
  • GTC:Attention指导CTC训练策略;
  • +
  • TextConAug:挖掘文字上下文信息的数据增广策略;
  • +
  • TextRotNet:自监督的预训练模型;
  • +
  • UDML:联合互学习策略;
  • +
  • UIM:无标注数据挖掘方案。
  • +
+

其中 UIM:无标注数据挖掘方案 使用了高精度的SVTR中文模型进行无标注文件的刷库,该模型在PP-OCRv3识别的数据集上训练,精度对比如下表。

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
中文识别算法模型UIM精度
PP-OCRv3SVTR_LCNetw/o78.40%
PP-OCRv3SVTR_LCNetw79.40%
SVTRSVTR-Tiny-82.50%
+

aistudio项目链接: 高精度中文场景文本识别模型SVTR

+

2. SVTR中文模型使用

+

环境准备

+

本任务基于Aistudio完成, 具体环境如下:

+
    +
  • 操作系统: Linux
  • +
  • PaddlePaddle: 2.3
  • +
  • PaddleOCR: dygraph
  • +
+

下载PaddleOCR代码

+
git clone -b dygraph https://github.com/PaddlePaddle/PaddleOCR
+
+

安装依赖库

+
pip install -r PaddleOCR/requirements.txt -i https://mirror.baidu.com/pypi/simple
+
+

快速使用

+
1
+2
# 解压模型文件
+tar xf svtr_ch_high_accuracy.tar
+
+

预测中文文本,以下图为例: +

+

预测命令:

+
1
+2
+3
+4
+5
# CPU预测
+python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=False
+
+# GPU预测
+#python tools/infer_rec.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.infer_img=./doc/imgs_words/ch/word_1.jpg Global.use_gpu=True
+
+

可以看到最后打印结果为

+
    +
  • result: 韩国小馆 0.9853458404541016
  • +
+

0.9853458404541016为预测置信度。

+

推理模型导出与预测

+

inference 模型(paddle.jit.save保存的模型) 一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。 训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。 与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。

+

运行识别模型转inference模型命令,如下:

+
python tools/export_model.py -c configs/rec/rec_svtrnet_ch.yml -o Global.pretrained_model=./svtr_ch_high_accuracy/best_accuracy Global.save_inference_dir=./inference/svtr_ch
+
+

转换成功后,在目录下有三个文件:

+
1
+2
+3
+4
inference/svtr_ch/
+    ├── inference.pdiparams         # 识别inference模型的参数文件
+    ├── inference.pdiparams.info    # 识别inference模型的参数信息,可忽略
+    └── inference.pdmodel           # 识别inference模型的program文件
+
+

inference模型预测,命令如下:

+
1
+2
+3
+4
+5
# CPU预测
+python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=False
+
+# GPU预测
+#python3 tools/infer/predict_rec.py --image_dir="./doc/imgs_words/ch/word_1.jpg" --rec_algorithm='SVTR' --rec_model_dir=./inference/svtr_ch/ --rec_image_shape='3, 32, 320'  --rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt --use_gpu=True
+
+

注意

+
    +
  • 使用SVTR算法时,需要指定--rec_algorithm='SVTR'
  • +
  • 如果使用自定义字典训练的模型,需要将--rec_char_dict_path=ppocr/utils/ppocr_keys_v1.txt修改为自定义的字典
  • +
  • --rec_image_shape='3, 32, 320' 该参数不能去掉
  • +
+ + + + + + + + + + + + + + + + + + + + + +

评论

+ + + + + + + + +
+
+ + + + + +
+ + + +
+ + + +
+
+
+
+ + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/assets/images/favicon.png b/assets/images/favicon.png new file mode 100644 index 0000000000..1cf13b9f9d Binary files /dev/null and b/assets/images/favicon.png differ diff --git a/assets/javascripts/bundle.d6f25eb3.min.js b/assets/javascripts/bundle.d6f25eb3.min.js new file mode 100644 index 0000000000..c5db0e00f3 --- /dev/null +++ b/assets/javascripts/bundle.d6f25eb3.min.js @@ -0,0 +1,16 @@ +"use strict";(()=>{var Fi=Object.create;var gr=Object.defineProperty;var Wi=Object.getOwnPropertyDescriptor;var Ui=Object.getOwnPropertyNames,Vt=Object.getOwnPropertySymbols,Di=Object.getPrototypeOf,yr=Object.prototype.hasOwnProperty,io=Object.prototype.propertyIsEnumerable;var no=(e,t,r)=>t in e?gr(e,t,{enumerable:!0,configurable:!0,writable:!0,value:r}):e[t]=r,$=(e,t)=>{for(var r in t||(t={}))yr.call(t,r)&&no(e,r,t[r]);if(Vt)for(var r of Vt(t))io.call(t,r)&&no(e,r,t[r]);return e};var ao=(e,t)=>{var r={};for(var o in e)yr.call(e,o)&&t.indexOf(o)<0&&(r[o]=e[o]);if(e!=null&&Vt)for(var o of Vt(e))t.indexOf(o)<0&&io.call(e,o)&&(r[o]=e[o]);return r};var xr=(e,t)=>()=>(t||e((t={exports:{}}).exports,t),t.exports);var Vi=(e,t,r,o)=>{if(t&&typeof t=="object"||typeof t=="function")for(let n of Ui(t))!yr.call(e,n)&&n!==r&&gr(e,n,{get:()=>t[n],enumerable:!(o=Wi(t,n))||o.enumerable});return e};var Lt=(e,t,r)=>(r=e!=null?Fi(Di(e)):{},Vi(t||!e||!e.__esModule?gr(r,"default",{value:e,enumerable:!0}):r,e));var so=(e,t,r)=>new Promise((o,n)=>{var i=p=>{try{s(r.next(p))}catch(c){n(c)}},a=p=>{try{s(r.throw(p))}catch(c){n(c)}},s=p=>p.done?o(p.value):Promise.resolve(p.value).then(i,a);s((r=r.apply(e,t)).next())});var po=xr((Er,co)=>{(function(e,t){typeof Er=="object"&&typeof co!="undefined"?t():typeof define=="function"&&define.amd?define(t):t()})(Er,function(){"use strict";function e(r){var o=!0,n=!1,i=null,a={text:!0,search:!0,url:!0,tel:!0,email:!0,password:!0,number:!0,date:!0,month:!0,week:!0,time:!0,datetime:!0,"datetime-local":!0};function s(k){return!!(k&&k!==document&&k.nodeName!=="HTML"&&k.nodeName!=="BODY"&&"classList"in k&&"contains"in k.classList)}function p(k){var ft=k.type,qe=k.tagName;return!!(qe==="INPUT"&&a[ft]&&!k.readOnly||qe==="TEXTAREA"&&!k.readOnly||k.isContentEditable)}function c(k){k.classList.contains("focus-visible")||(k.classList.add("focus-visible"),k.setAttribute("data-focus-visible-added",""))}function l(k){k.hasAttribute("data-focus-visible-added")&&(k.classList.remove("focus-visible"),k.removeAttribute("data-focus-visible-added"))}function f(k){k.metaKey||k.altKey||k.ctrlKey||(s(r.activeElement)&&c(r.activeElement),o=!0)}function u(k){o=!1}function d(k){s(k.target)&&(o||p(k.target))&&c(k.target)}function y(k){s(k.target)&&(k.target.classList.contains("focus-visible")||k.target.hasAttribute("data-focus-visible-added"))&&(n=!0,window.clearTimeout(i),i=window.setTimeout(function(){n=!1},100),l(k.target))}function M(k){document.visibilityState==="hidden"&&(n&&(o=!0),X())}function X(){document.addEventListener("mousemove",J),document.addEventListener("mousedown",J),document.addEventListener("mouseup",J),document.addEventListener("pointermove",J),document.addEventListener("pointerdown",J),document.addEventListener("pointerup",J),document.addEventListener("touchmove",J),document.addEventListener("touchstart",J),document.addEventListener("touchend",J)}function te(){document.removeEventListener("mousemove",J),document.removeEventListener("mousedown",J),document.removeEventListener("mouseup",J),document.removeEventListener("pointermove",J),document.removeEventListener("pointerdown",J),document.removeEventListener("pointerup",J),document.removeEventListener("touchmove",J),document.removeEventListener("touchstart",J),document.removeEventListener("touchend",J)}function J(k){k.target.nodeName&&k.target.nodeName.toLowerCase()==="html"||(o=!1,te())}document.addEventListener("keydown",f,!0),document.addEventListener("mousedown",u,!0),document.addEventListener("pointerdown",u,!0),document.addEventListener("touchstart",u,!0),document.addEventListener("visibilitychange",M,!0),X(),r.addEventListener("focus",d,!0),r.addEventListener("blur",y,!0),r.nodeType===Node.DOCUMENT_FRAGMENT_NODE&&r.host?r.host.setAttribute("data-js-focus-visible",""):r.nodeType===Node.DOCUMENT_NODE&&(document.documentElement.classList.add("js-focus-visible"),document.documentElement.setAttribute("data-js-focus-visible",""))}if(typeof window!="undefined"&&typeof document!="undefined"){window.applyFocusVisiblePolyfill=e;var t;try{t=new CustomEvent("focus-visible-polyfill-ready")}catch(r){t=document.createEvent("CustomEvent"),t.initCustomEvent("focus-visible-polyfill-ready",!1,!1,{})}window.dispatchEvent(t)}typeof document!="undefined"&&e(document)})});var qr=xr((ly,Sn)=>{"use strict";/*! + * escape-html + * Copyright(c) 2012-2013 TJ Holowaychuk + * Copyright(c) 2015 Andreas Lubbe + * Copyright(c) 2015 Tiancheng "Timothy" Gu + * MIT Licensed + */var ka=/["'&<>]/;Sn.exports=Ha;function Ha(e){var t=""+e,r=ka.exec(t);if(!r)return t;var o,n="",i=0,a=0;for(i=r.index;i{/*! + * clipboard.js v2.0.11 + * https://clipboardjs.com/ + * + * Licensed MIT © Zeno Rocha + */(function(t,r){typeof It=="object"&&typeof Yr=="object"?Yr.exports=r():typeof define=="function"&&define.amd?define([],r):typeof It=="object"?It.ClipboardJS=r():t.ClipboardJS=r()})(It,function(){return function(){var e={686:function(o,n,i){"use strict";i.d(n,{default:function(){return ji}});var a=i(279),s=i.n(a),p=i(370),c=i.n(p),l=i(817),f=i.n(l);function u(V){try{return document.execCommand(V)}catch(A){return!1}}var d=function(A){var L=f()(A);return u("cut"),L},y=d;function M(V){var A=document.documentElement.getAttribute("dir")==="rtl",L=document.createElement("textarea");L.style.fontSize="12pt",L.style.border="0",L.style.padding="0",L.style.margin="0",L.style.position="absolute",L.style[A?"right":"left"]="-9999px";var F=window.pageYOffset||document.documentElement.scrollTop;return L.style.top="".concat(F,"px"),L.setAttribute("readonly",""),L.value=V,L}var X=function(A,L){var F=M(A);L.container.appendChild(F);var D=f()(F);return u("copy"),F.remove(),D},te=function(A){var L=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body},F="";return typeof A=="string"?F=X(A,L):A instanceof HTMLInputElement&&!["text","search","url","tel","password"].includes(A==null?void 0:A.type)?F=X(A.value,L):(F=f()(A),u("copy")),F},J=te;function k(V){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?k=function(L){return typeof L}:k=function(L){return L&&typeof Symbol=="function"&&L.constructor===Symbol&&L!==Symbol.prototype?"symbol":typeof L},k(V)}var ft=function(){var A=arguments.length>0&&arguments[0]!==void 0?arguments[0]:{},L=A.action,F=L===void 0?"copy":L,D=A.container,Y=A.target,$e=A.text;if(F!=="copy"&&F!=="cut")throw new Error('Invalid "action" value, use either "copy" or "cut"');if(Y!==void 0)if(Y&&k(Y)==="object"&&Y.nodeType===1){if(F==="copy"&&Y.hasAttribute("disabled"))throw new Error('Invalid "target" attribute. Please use "readonly" instead of "disabled" attribute');if(F==="cut"&&(Y.hasAttribute("readonly")||Y.hasAttribute("disabled")))throw new Error(`Invalid "target" attribute. You can't cut text from elements with "readonly" or "disabled" attributes`)}else throw new Error('Invalid "target" value, use a valid Element');if($e)return J($e,{container:D});if(Y)return F==="cut"?y(Y):J(Y,{container:D})},qe=ft;function Fe(V){"@babel/helpers - typeof";return typeof Symbol=="function"&&typeof Symbol.iterator=="symbol"?Fe=function(L){return typeof L}:Fe=function(L){return L&&typeof Symbol=="function"&&L.constructor===Symbol&&L!==Symbol.prototype?"symbol":typeof L},Fe(V)}function Ai(V,A){if(!(V instanceof A))throw new TypeError("Cannot call a class as a function")}function oo(V,A){for(var L=0;L0&&arguments[0]!==void 0?arguments[0]:{};this.action=typeof D.action=="function"?D.action:this.defaultAction,this.target=typeof D.target=="function"?D.target:this.defaultTarget,this.text=typeof D.text=="function"?D.text:this.defaultText,this.container=Fe(D.container)==="object"?D.container:document.body}},{key:"listenClick",value:function(D){var Y=this;this.listener=c()(D,"click",function($e){return Y.onClick($e)})}},{key:"onClick",value:function(D){var Y=D.delegateTarget||D.currentTarget,$e=this.action(Y)||"copy",Dt=qe({action:$e,container:this.container,target:this.target(Y),text:this.text(Y)});this.emit(Dt?"success":"error",{action:$e,text:Dt,trigger:Y,clearSelection:function(){Y&&Y.focus(),window.getSelection().removeAllRanges()}})}},{key:"defaultAction",value:function(D){return vr("action",D)}},{key:"defaultTarget",value:function(D){var Y=vr("target",D);if(Y)return document.querySelector(Y)}},{key:"defaultText",value:function(D){return vr("text",D)}},{key:"destroy",value:function(){this.listener.destroy()}}],[{key:"copy",value:function(D){var Y=arguments.length>1&&arguments[1]!==void 0?arguments[1]:{container:document.body};return J(D,Y)}},{key:"cut",value:function(D){return y(D)}},{key:"isSupported",value:function(){var D=arguments.length>0&&arguments[0]!==void 0?arguments[0]:["copy","cut"],Y=typeof D=="string"?[D]:D,$e=!!document.queryCommandSupported;return Y.forEach(function(Dt){$e=$e&&!!document.queryCommandSupported(Dt)}),$e}}]),L}(s()),ji=Ii},828:function(o){var n=9;if(typeof Element!="undefined"&&!Element.prototype.matches){var i=Element.prototype;i.matches=i.matchesSelector||i.mozMatchesSelector||i.msMatchesSelector||i.oMatchesSelector||i.webkitMatchesSelector}function a(s,p){for(;s&&s.nodeType!==n;){if(typeof s.matches=="function"&&s.matches(p))return s;s=s.parentNode}}o.exports=a},438:function(o,n,i){var a=i(828);function s(l,f,u,d,y){var M=c.apply(this,arguments);return l.addEventListener(u,M,y),{destroy:function(){l.removeEventListener(u,M,y)}}}function p(l,f,u,d,y){return typeof l.addEventListener=="function"?s.apply(null,arguments):typeof u=="function"?s.bind(null,document).apply(null,arguments):(typeof l=="string"&&(l=document.querySelectorAll(l)),Array.prototype.map.call(l,function(M){return s(M,f,u,d,y)}))}function c(l,f,u,d){return function(y){y.delegateTarget=a(y.target,f),y.delegateTarget&&d.call(l,y)}}o.exports=p},879:function(o,n){n.node=function(i){return i!==void 0&&i instanceof HTMLElement&&i.nodeType===1},n.nodeList=function(i){var a=Object.prototype.toString.call(i);return i!==void 0&&(a==="[object NodeList]"||a==="[object HTMLCollection]")&&"length"in i&&(i.length===0||n.node(i[0]))},n.string=function(i){return typeof i=="string"||i instanceof String},n.fn=function(i){var a=Object.prototype.toString.call(i);return a==="[object Function]"}},370:function(o,n,i){var a=i(879),s=i(438);function p(u,d,y){if(!u&&!d&&!y)throw new Error("Missing required arguments");if(!a.string(d))throw new TypeError("Second argument must be a String");if(!a.fn(y))throw new TypeError("Third argument must be a Function");if(a.node(u))return c(u,d,y);if(a.nodeList(u))return l(u,d,y);if(a.string(u))return f(u,d,y);throw new TypeError("First argument must be a String, HTMLElement, HTMLCollection, or NodeList")}function c(u,d,y){return u.addEventListener(d,y),{destroy:function(){u.removeEventListener(d,y)}}}function l(u,d,y){return Array.prototype.forEach.call(u,function(M){M.addEventListener(d,y)}),{destroy:function(){Array.prototype.forEach.call(u,function(M){M.removeEventListener(d,y)})}}}function f(u,d,y){return s(document.body,u,d,y)}o.exports=p},817:function(o){function n(i){var a;if(i.nodeName==="SELECT")i.focus(),a=i.value;else if(i.nodeName==="INPUT"||i.nodeName==="TEXTAREA"){var s=i.hasAttribute("readonly");s||i.setAttribute("readonly",""),i.select(),i.setSelectionRange(0,i.value.length),s||i.removeAttribute("readonly"),a=i.value}else{i.hasAttribute("contenteditable")&&i.focus();var p=window.getSelection(),c=document.createRange();c.selectNodeContents(i),p.removeAllRanges(),p.addRange(c),a=p.toString()}return a}o.exports=n},279:function(o){function n(){}n.prototype={on:function(i,a,s){var p=this.e||(this.e={});return(p[i]||(p[i]=[])).push({fn:a,ctx:s}),this},once:function(i,a,s){var p=this;function c(){p.off(i,c),a.apply(s,arguments)}return c._=a,this.on(i,c,s)},emit:function(i){var a=[].slice.call(arguments,1),s=((this.e||(this.e={}))[i]||[]).slice(),p=0,c=s.length;for(p;p0&&i[i.length-1])&&(c[0]===6||c[0]===2)){r=0;continue}if(c[0]===3&&(!i||c[1]>i[0]&&c[1]=e.length&&(e=void 0),{value:e&&e[o++],done:!e}}};throw new TypeError(t?"Object is not iterable.":"Symbol.iterator is not defined.")}function N(e,t){var r=typeof Symbol=="function"&&e[Symbol.iterator];if(!r)return e;var o=r.call(e),n,i=[],a;try{for(;(t===void 0||t-- >0)&&!(n=o.next()).done;)i.push(n.value)}catch(s){a={error:s}}finally{try{n&&!n.done&&(r=o.return)&&r.call(o)}finally{if(a)throw a.error}}return i}function q(e,t,r){if(r||arguments.length===2)for(var o=0,n=t.length,i;o1||p(d,M)})},y&&(n[d]=y(n[d])))}function p(d,y){try{c(o[d](y))}catch(M){u(i[0][3],M)}}function c(d){d.value instanceof nt?Promise.resolve(d.value.v).then(l,f):u(i[0][2],d)}function l(d){p("next",d)}function f(d){p("throw",d)}function u(d,y){d(y),i.shift(),i.length&&p(i[0][0],i[0][1])}}function fo(e){if(!Symbol.asyncIterator)throw new TypeError("Symbol.asyncIterator is not defined.");var t=e[Symbol.asyncIterator],r;return t?t.call(e):(e=typeof he=="function"?he(e):e[Symbol.iterator](),r={},o("next"),o("throw"),o("return"),r[Symbol.asyncIterator]=function(){return this},r);function o(i){r[i]=e[i]&&function(a){return new Promise(function(s,p){a=e[i](a),n(s,p,a.done,a.value)})}}function n(i,a,s,p){Promise.resolve(p).then(function(c){i({value:c,done:s})},a)}}function H(e){return typeof e=="function"}function ut(e){var t=function(o){Error.call(o),o.stack=new Error().stack},r=e(t);return r.prototype=Object.create(Error.prototype),r.prototype.constructor=r,r}var zt=ut(function(e){return function(r){e(this),this.message=r?r.length+` errors occurred during unsubscription: +`+r.map(function(o,n){return n+1+") "+o.toString()}).join(` + `):"",this.name="UnsubscriptionError",this.errors=r}});function Qe(e,t){if(e){var r=e.indexOf(t);0<=r&&e.splice(r,1)}}var We=function(){function e(t){this.initialTeardown=t,this.closed=!1,this._parentage=null,this._finalizers=null}return e.prototype.unsubscribe=function(){var t,r,o,n,i;if(!this.closed){this.closed=!0;var a=this._parentage;if(a)if(this._parentage=null,Array.isArray(a))try{for(var s=he(a),p=s.next();!p.done;p=s.next()){var c=p.value;c.remove(this)}}catch(M){t={error:M}}finally{try{p&&!p.done&&(r=s.return)&&r.call(s)}finally{if(t)throw t.error}}else a.remove(this);var l=this.initialTeardown;if(H(l))try{l()}catch(M){i=M instanceof zt?M.errors:[M]}var f=this._finalizers;if(f){this._finalizers=null;try{for(var u=he(f),d=u.next();!d.done;d=u.next()){var y=d.value;try{uo(y)}catch(M){i=i!=null?i:[],M instanceof zt?i=q(q([],N(i)),N(M.errors)):i.push(M)}}}catch(M){o={error:M}}finally{try{d&&!d.done&&(n=u.return)&&n.call(u)}finally{if(o)throw o.error}}}if(i)throw new zt(i)}},e.prototype.add=function(t){var r;if(t&&t!==this)if(this.closed)uo(t);else{if(t instanceof e){if(t.closed||t._hasParent(this))return;t._addParent(this)}(this._finalizers=(r=this._finalizers)!==null&&r!==void 0?r:[]).push(t)}},e.prototype._hasParent=function(t){var r=this._parentage;return r===t||Array.isArray(r)&&r.includes(t)},e.prototype._addParent=function(t){var r=this._parentage;this._parentage=Array.isArray(r)?(r.push(t),r):r?[r,t]:t},e.prototype._removeParent=function(t){var r=this._parentage;r===t?this._parentage=null:Array.isArray(r)&&Qe(r,t)},e.prototype.remove=function(t){var r=this._finalizers;r&&Qe(r,t),t instanceof e&&t._removeParent(this)},e.EMPTY=function(){var t=new e;return t.closed=!0,t}(),e}();var Tr=We.EMPTY;function qt(e){return e instanceof We||e&&"closed"in e&&H(e.remove)&&H(e.add)&&H(e.unsubscribe)}function uo(e){H(e)?e():e.unsubscribe()}var Pe={onUnhandledError:null,onStoppedNotification:null,Promise:void 0,useDeprecatedSynchronousErrorHandling:!1,useDeprecatedNextContext:!1};var dt={setTimeout:function(e,t){for(var r=[],o=2;o0},enumerable:!1,configurable:!0}),t.prototype._trySubscribe=function(r){return this._throwIfClosed(),e.prototype._trySubscribe.call(this,r)},t.prototype._subscribe=function(r){return this._throwIfClosed(),this._checkFinalizedStatuses(r),this._innerSubscribe(r)},t.prototype._innerSubscribe=function(r){var o=this,n=this,i=n.hasError,a=n.isStopped,s=n.observers;return i||a?Tr:(this.currentObservers=null,s.push(r),new We(function(){o.currentObservers=null,Qe(s,r)}))},t.prototype._checkFinalizedStatuses=function(r){var o=this,n=o.hasError,i=o.thrownError,a=o.isStopped;n?r.error(i):a&&r.complete()},t.prototype.asObservable=function(){var r=new j;return r.source=this,r},t.create=function(r,o){return new wo(r,o)},t}(j);var wo=function(e){oe(t,e);function t(r,o){var n=e.call(this)||this;return n.destination=r,n.source=o,n}return t.prototype.next=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.next)===null||n===void 0||n.call(o,r)},t.prototype.error=function(r){var o,n;(n=(o=this.destination)===null||o===void 0?void 0:o.error)===null||n===void 0||n.call(o,r)},t.prototype.complete=function(){var r,o;(o=(r=this.destination)===null||r===void 0?void 0:r.complete)===null||o===void 0||o.call(r)},t.prototype._subscribe=function(r){var o,n;return(n=(o=this.source)===null||o===void 0?void 0:o.subscribe(r))!==null&&n!==void 0?n:Tr},t}(g);var _r=function(e){oe(t,e);function t(r){var o=e.call(this)||this;return o._value=r,o}return Object.defineProperty(t.prototype,"value",{get:function(){return this.getValue()},enumerable:!1,configurable:!0}),t.prototype._subscribe=function(r){var o=e.prototype._subscribe.call(this,r);return!o.closed&&r.next(this._value),o},t.prototype.getValue=function(){var r=this,o=r.hasError,n=r.thrownError,i=r._value;if(o)throw n;return this._throwIfClosed(),i},t.prototype.next=function(r){e.prototype.next.call(this,this._value=r)},t}(g);var At={now:function(){return(At.delegate||Date).now()},delegate:void 0};var Ct=function(e){oe(t,e);function t(r,o,n){r===void 0&&(r=1/0),o===void 0&&(o=1/0),n===void 0&&(n=At);var i=e.call(this)||this;return i._bufferSize=r,i._windowTime=o,i._timestampProvider=n,i._buffer=[],i._infiniteTimeWindow=!0,i._infiniteTimeWindow=o===1/0,i._bufferSize=Math.max(1,r),i._windowTime=Math.max(1,o),i}return t.prototype.next=function(r){var o=this,n=o.isStopped,i=o._buffer,a=o._infiniteTimeWindow,s=o._timestampProvider,p=o._windowTime;n||(i.push(r),!a&&i.push(s.now()+p)),this._trimBuffer(),e.prototype.next.call(this,r)},t.prototype._subscribe=function(r){this._throwIfClosed(),this._trimBuffer();for(var o=this._innerSubscribe(r),n=this,i=n._infiniteTimeWindow,a=n._buffer,s=a.slice(),p=0;p0?e.prototype.schedule.call(this,r,o):(this.delay=o,this.state=r,this.scheduler.flush(this),this)},t.prototype.execute=function(r,o){return o>0||this.closed?e.prototype.execute.call(this,r,o):this._execute(r,o)},t.prototype.requestAsyncId=function(r,o,n){return n===void 0&&(n=0),n!=null&&n>0||n==null&&this.delay>0?e.prototype.requestAsyncId.call(this,r,o,n):(r.flush(this),0)},t}(gt);var Oo=function(e){oe(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t}(yt);var kr=new Oo(So);var Mo=function(e){oe(t,e);function t(r,o){var n=e.call(this,r,o)||this;return n.scheduler=r,n.work=o,n}return t.prototype.requestAsyncId=function(r,o,n){return n===void 0&&(n=0),n!==null&&n>0?e.prototype.requestAsyncId.call(this,r,o,n):(r.actions.push(this),r._scheduled||(r._scheduled=vt.requestAnimationFrame(function(){return r.flush(void 0)})))},t.prototype.recycleAsyncId=function(r,o,n){var i;if(n===void 0&&(n=0),n!=null?n>0:this.delay>0)return e.prototype.recycleAsyncId.call(this,r,o,n);var a=r.actions;o!=null&&((i=a[a.length-1])===null||i===void 0?void 0:i.id)!==o&&(vt.cancelAnimationFrame(o),r._scheduled=void 0)},t}(gt);var Lo=function(e){oe(t,e);function t(){return e!==null&&e.apply(this,arguments)||this}return t.prototype.flush=function(r){this._active=!0;var o=this._scheduled;this._scheduled=void 0;var n=this.actions,i;r=r||n.shift();do if(i=r.execute(r.state,r.delay))break;while((r=n[0])&&r.id===o&&n.shift());if(this._active=!1,i){for(;(r=n[0])&&r.id===o&&n.shift();)r.unsubscribe();throw i}},t}(yt);var me=new Lo(Mo);var S=new j(function(e){return e.complete()});function Yt(e){return e&&H(e.schedule)}function Hr(e){return e[e.length-1]}function Xe(e){return H(Hr(e))?e.pop():void 0}function ke(e){return Yt(Hr(e))?e.pop():void 0}function Bt(e,t){return typeof Hr(e)=="number"?e.pop():t}var xt=function(e){return e&&typeof e.length=="number"&&typeof e!="function"};function Gt(e){return H(e==null?void 0:e.then)}function Jt(e){return H(e[bt])}function Xt(e){return Symbol.asyncIterator&&H(e==null?void 0:e[Symbol.asyncIterator])}function Zt(e){return new TypeError("You provided "+(e!==null&&typeof e=="object"?"an invalid object":"'"+e+"'")+" where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.")}function Ji(){return typeof Symbol!="function"||!Symbol.iterator?"@@iterator":Symbol.iterator}var er=Ji();function tr(e){return H(e==null?void 0:e[er])}function rr(e){return mo(this,arguments,function(){var r,o,n,i;return Nt(this,function(a){switch(a.label){case 0:r=e.getReader(),a.label=1;case 1:a.trys.push([1,,9,10]),a.label=2;case 2:return[4,nt(r.read())];case 3:return o=a.sent(),n=o.value,i=o.done,i?[4,nt(void 0)]:[3,5];case 4:return[2,a.sent()];case 5:return[4,nt(n)];case 6:return[4,a.sent()];case 7:return a.sent(),[3,2];case 8:return[3,10];case 9:return r.releaseLock(),[7];case 10:return[2]}})})}function or(e){return H(e==null?void 0:e.getReader)}function W(e){if(e instanceof j)return e;if(e!=null){if(Jt(e))return Xi(e);if(xt(e))return Zi(e);if(Gt(e))return ea(e);if(Xt(e))return _o(e);if(tr(e))return ta(e);if(or(e))return ra(e)}throw Zt(e)}function Xi(e){return new j(function(t){var r=e[bt]();if(H(r.subscribe))return r.subscribe(t);throw new TypeError("Provided object does not correctly implement Symbol.observable")})}function Zi(e){return new j(function(t){for(var r=0;r=2;return function(o){return o.pipe(e?b(function(n,i){return e(n,i,o)}):le,Te(1),r?De(t):qo(function(){return new ir}))}}function jr(e){return e<=0?function(){return S}:E(function(t,r){var o=[];t.subscribe(T(r,function(n){o.push(n),e=2,!0))}function pe(e){e===void 0&&(e={});var t=e.connector,r=t===void 0?function(){return new g}:t,o=e.resetOnError,n=o===void 0?!0:o,i=e.resetOnComplete,a=i===void 0?!0:i,s=e.resetOnRefCountZero,p=s===void 0?!0:s;return function(c){var l,f,u,d=0,y=!1,M=!1,X=function(){f==null||f.unsubscribe(),f=void 0},te=function(){X(),l=u=void 0,y=M=!1},J=function(){var k=l;te(),k==null||k.unsubscribe()};return E(function(k,ft){d++,!M&&!y&&X();var qe=u=u!=null?u:r();ft.add(function(){d--,d===0&&!M&&!y&&(f=Wr(J,p))}),qe.subscribe(ft),!l&&d>0&&(l=new at({next:function(Fe){return qe.next(Fe)},error:function(Fe){M=!0,X(),f=Wr(te,n,Fe),qe.error(Fe)},complete:function(){y=!0,X(),f=Wr(te,a),qe.complete()}}),W(k).subscribe(l))})(c)}}function Wr(e,t){for(var r=[],o=2;oe.next(document)),e}function P(e,t=document){return Array.from(t.querySelectorAll(e))}function R(e,t=document){let r=fe(e,t);if(typeof r=="undefined")throw new ReferenceError(`Missing element: expected "${e}" to be present`);return r}function fe(e,t=document){return t.querySelector(e)||void 0}function Ie(){var e,t,r,o;return(o=(r=(t=(e=document.activeElement)==null?void 0:e.shadowRoot)==null?void 0:t.activeElement)!=null?r:document.activeElement)!=null?o:void 0}var xa=O(h(document.body,"focusin"),h(document.body,"focusout")).pipe(_e(1),Q(void 0),m(()=>Ie()||document.body),G(1));function et(e){return xa.pipe(m(t=>e.contains(t)),K())}function $t(e,t){return C(()=>O(h(e,"mouseenter").pipe(m(()=>!0)),h(e,"mouseleave").pipe(m(()=>!1))).pipe(t?Ht(r=>Me(+!r*t)):le,Q(e.matches(":hover"))))}function Go(e,t){if(typeof t=="string"||typeof t=="number")e.innerHTML+=t.toString();else if(t instanceof Node)e.appendChild(t);else if(Array.isArray(t))for(let r of t)Go(e,r)}function x(e,t,...r){let o=document.createElement(e);if(t)for(let n of Object.keys(t))typeof t[n]!="undefined"&&(typeof t[n]!="boolean"?o.setAttribute(n,t[n]):o.setAttribute(n,""));for(let n of r)Go(o,n);return o}function sr(e){if(e>999){let t=+((e-950)%1e3>99);return`${((e+1e-6)/1e3).toFixed(t)}k`}else return e.toString()}function Tt(e){let t=x("script",{src:e});return C(()=>(document.head.appendChild(t),O(h(t,"load"),h(t,"error").pipe(v(()=>$r(()=>new ReferenceError(`Invalid script: ${e}`))))).pipe(m(()=>{}),_(()=>document.head.removeChild(t)),Te(1))))}var Jo=new g,Ea=C(()=>typeof ResizeObserver=="undefined"?Tt("https://unpkg.com/resize-observer-polyfill"):I(void 0)).pipe(m(()=>new ResizeObserver(e=>e.forEach(t=>Jo.next(t)))),v(e=>O(Ye,I(e)).pipe(_(()=>e.disconnect()))),G(1));function ce(e){return{width:e.offsetWidth,height:e.offsetHeight}}function ge(e){let t=e;for(;t.clientWidth===0&&t.parentElement;)t=t.parentElement;return Ea.pipe(w(r=>r.observe(t)),v(r=>Jo.pipe(b(o=>o.target===t),_(()=>r.unobserve(t)))),m(()=>ce(e)),Q(ce(e)))}function St(e){return{width:e.scrollWidth,height:e.scrollHeight}}function cr(e){let t=e.parentElement;for(;t&&(e.scrollWidth<=t.scrollWidth&&e.scrollHeight<=t.scrollHeight);)t=(e=t).parentElement;return t?e:void 0}function Xo(e){let t=[],r=e.parentElement;for(;r;)(e.clientWidth>r.clientWidth||e.clientHeight>r.clientHeight)&&t.push(r),r=(e=r).parentElement;return t.length===0&&t.push(document.documentElement),t}function Ve(e){return{x:e.offsetLeft,y:e.offsetTop}}function Zo(e){let t=e.getBoundingClientRect();return{x:t.x+window.scrollX,y:t.y+window.scrollY}}function en(e){return O(h(window,"load"),h(window,"resize")).pipe(Le(0,me),m(()=>Ve(e)),Q(Ve(e)))}function pr(e){return{x:e.scrollLeft,y:e.scrollTop}}function Ne(e){return O(h(e,"scroll"),h(window,"scroll"),h(window,"resize")).pipe(Le(0,me),m(()=>pr(e)),Q(pr(e)))}var tn=new g,wa=C(()=>I(new IntersectionObserver(e=>{for(let t of e)tn.next(t)},{threshold:0}))).pipe(v(e=>O(Ye,I(e)).pipe(_(()=>e.disconnect()))),G(1));function tt(e){return wa.pipe(w(t=>t.observe(e)),v(t=>tn.pipe(b(({target:r})=>r===e),_(()=>t.unobserve(e)),m(({isIntersecting:r})=>r))))}function rn(e,t=16){return Ne(e).pipe(m(({y:r})=>{let o=ce(e),n=St(e);return r>=n.height-o.height-t}),K())}var lr={drawer:R("[data-md-toggle=drawer]"),search:R("[data-md-toggle=search]")};function on(e){return lr[e].checked}function Je(e,t){lr[e].checked!==t&&lr[e].click()}function ze(e){let t=lr[e];return h(t,"change").pipe(m(()=>t.checked),Q(t.checked))}function Ta(e,t){switch(e.constructor){case HTMLInputElement:return e.type==="radio"?/^Arrow/.test(t):!0;case HTMLSelectElement:case HTMLTextAreaElement:return!0;default:return e.isContentEditable}}function Sa(){return O(h(window,"compositionstart").pipe(m(()=>!0)),h(window,"compositionend").pipe(m(()=>!1))).pipe(Q(!1))}function nn(){let e=h(window,"keydown").pipe(b(t=>!(t.metaKey||t.ctrlKey)),m(t=>({mode:on("search")?"search":"global",type:t.key,claim(){t.preventDefault(),t.stopPropagation()}})),b(({mode:t,type:r})=>{if(t==="global"){let o=Ie();if(typeof o!="undefined")return!Ta(o,r)}return!0}),pe());return Sa().pipe(v(t=>t?S:e))}function ye(){return new URL(location.href)}function lt(e,t=!1){if(B("navigation.instant")&&!t){let r=x("a",{href:e.href});document.body.appendChild(r),r.click(),r.remove()}else location.href=e.href}function an(){return new g}function sn(){return location.hash.slice(1)}function cn(e){let t=x("a",{href:e});t.addEventListener("click",r=>r.stopPropagation()),t.click()}function Oa(e){return O(h(window,"hashchange"),e).pipe(m(sn),Q(sn()),b(t=>t.length>0),G(1))}function pn(e){return Oa(e).pipe(m(t=>fe(`[id="${t}"]`)),b(t=>typeof t!="undefined"))}function Pt(e){let t=matchMedia(e);return ar(r=>t.addListener(()=>r(t.matches))).pipe(Q(t.matches))}function ln(){let e=matchMedia("print");return O(h(window,"beforeprint").pipe(m(()=>!0)),h(window,"afterprint").pipe(m(()=>!1))).pipe(Q(e.matches))}function Nr(e,t){return e.pipe(v(r=>r?t():S))}function zr(e,t){return new j(r=>{let o=new XMLHttpRequest;return o.open("GET",`${e}`),o.responseType="blob",o.addEventListener("load",()=>{o.status>=200&&o.status<300?(r.next(o.response),r.complete()):r.error(new Error(o.statusText))}),o.addEventListener("error",()=>{r.error(new Error("Network error"))}),o.addEventListener("abort",()=>{r.complete()}),typeof(t==null?void 0:t.progress$)!="undefined"&&(o.addEventListener("progress",n=>{var i;if(n.lengthComputable)t.progress$.next(n.loaded/n.total*100);else{let a=(i=o.getResponseHeader("Content-Length"))!=null?i:0;t.progress$.next(n.loaded/+a*100)}}),t.progress$.next(5)),o.send(),()=>o.abort()})}function je(e,t){return zr(e,t).pipe(v(r=>r.text()),m(r=>JSON.parse(r)),G(1))}function mn(e,t){let r=new DOMParser;return zr(e,t).pipe(v(o=>o.text()),m(o=>r.parseFromString(o,"text/html")),G(1))}function fn(e,t){let r=new DOMParser;return zr(e,t).pipe(v(o=>o.text()),m(o=>r.parseFromString(o,"text/xml")),G(1))}function un(){return{x:Math.max(0,scrollX),y:Math.max(0,scrollY)}}function dn(){return O(h(window,"scroll",{passive:!0}),h(window,"resize",{passive:!0})).pipe(m(un),Q(un()))}function hn(){return{width:innerWidth,height:innerHeight}}function bn(){return h(window,"resize",{passive:!0}).pipe(m(hn),Q(hn()))}function vn(){return z([dn(),bn()]).pipe(m(([e,t])=>({offset:e,size:t})),G(1))}function mr(e,{viewport$:t,header$:r}){let o=t.pipe(ee("size")),n=z([o,r]).pipe(m(()=>Ve(e)));return z([r,t,n]).pipe(m(([{height:i},{offset:a,size:s},{x:p,y:c}])=>({offset:{x:a.x-p,y:a.y-c+i},size:s})))}function Ma(e){return h(e,"message",t=>t.data)}function La(e){let t=new g;return t.subscribe(r=>e.postMessage(r)),t}function gn(e,t=new Worker(e)){let r=Ma(t),o=La(t),n=new g;n.subscribe(o);let i=o.pipe(Z(),ie(!0));return n.pipe(Z(),Re(r.pipe(U(i))),pe())}var _a=R("#__config"),Ot=JSON.parse(_a.textContent);Ot.base=`${new URL(Ot.base,ye())}`;function xe(){return Ot}function B(e){return Ot.features.includes(e)}function Ee(e,t){return typeof t!="undefined"?Ot.translations[e].replace("#",t.toString()):Ot.translations[e]}function Se(e,t=document){return R(`[data-md-component=${e}]`,t)}function ae(e,t=document){return P(`[data-md-component=${e}]`,t)}function Aa(e){let t=R(".md-typeset > :first-child",e);return h(t,"click",{once:!0}).pipe(m(()=>R(".md-typeset",e)),m(r=>({hash:__md_hash(r.innerHTML)})))}function yn(e){if(!B("announce.dismiss")||!e.childElementCount)return S;if(!e.hidden){let t=R(".md-typeset",e);__md_hash(t.innerHTML)===__md_get("__announce")&&(e.hidden=!0)}return C(()=>{let t=new g;return t.subscribe(({hash:r})=>{e.hidden=!0,__md_set("__announce",r)}),Aa(e).pipe(w(r=>t.next(r)),_(()=>t.complete()),m(r=>$({ref:e},r)))})}function Ca(e,{target$:t}){return t.pipe(m(r=>({hidden:r!==e})))}function xn(e,t){let r=new g;return r.subscribe(({hidden:o})=>{e.hidden=o}),Ca(e,t).pipe(w(o=>r.next(o)),_(()=>r.complete()),m(o=>$({ref:e},o)))}function Rt(e,t){return t==="inline"?x("div",{class:"md-tooltip md-tooltip--inline",id:e,role:"tooltip"},x("div",{class:"md-tooltip__inner md-typeset"})):x("div",{class:"md-tooltip",id:e,role:"tooltip"},x("div",{class:"md-tooltip__inner md-typeset"}))}function En(...e){return x("div",{class:"md-tooltip2",role:"tooltip"},x("div",{class:"md-tooltip2__inner md-typeset"},e))}function wn(e,t){if(t=t?`${t}_annotation_${e}`:void 0,t){let r=t?`#${t}`:void 0;return x("aside",{class:"md-annotation",tabIndex:0},Rt(t),x("a",{href:r,class:"md-annotation__index",tabIndex:-1},x("span",{"data-md-annotation-id":e})))}else return x("aside",{class:"md-annotation",tabIndex:0},Rt(t),x("span",{class:"md-annotation__index",tabIndex:-1},x("span",{"data-md-annotation-id":e})))}function Tn(e){return x("button",{class:"md-clipboard md-icon",title:Ee("clipboard.copy"),"data-clipboard-target":`#${e} > code`})}var On=Lt(qr());function Qr(e,t){let r=t&2,o=t&1,n=Object.keys(e.terms).filter(p=>!e.terms[p]).reduce((p,c)=>[...p,x("del",null,(0,On.default)(c))," "],[]).slice(0,-1),i=xe(),a=new URL(e.location,i.base);B("search.highlight")&&a.searchParams.set("h",Object.entries(e.terms).filter(([,p])=>p).reduce((p,[c])=>`${p} ${c}`.trim(),""));let{tags:s}=xe();return x("a",{href:`${a}`,class:"md-search-result__link",tabIndex:-1},x("article",{class:"md-search-result__article md-typeset","data-md-score":e.score.toFixed(2)},r>0&&x("div",{class:"md-search-result__icon md-icon"}),r>0&&x("h1",null,e.title),r<=0&&x("h2",null,e.title),o>0&&e.text.length>0&&e.text,e.tags&&x("nav",{class:"md-tags"},e.tags.map(p=>{let c=s?p in s?`md-tag-icon md-tag--${s[p]}`:"md-tag-icon":"";return x("span",{class:`md-tag ${c}`},p)})),o>0&&n.length>0&&x("p",{class:"md-search-result__terms"},Ee("search.result.term.missing"),": ",...n)))}function Mn(e){let t=e[0].score,r=[...e],o=xe(),n=r.findIndex(l=>!`${new URL(l.location,o.base)}`.includes("#")),[i]=r.splice(n,1),a=r.findIndex(l=>l.scoreQr(l,1)),...p.length?[x("details",{class:"md-search-result__more"},x("summary",{tabIndex:-1},x("div",null,p.length>0&&p.length===1?Ee("search.result.more.one"):Ee("search.result.more.other",p.length))),...p.map(l=>Qr(l,1)))]:[]];return x("li",{class:"md-search-result__item"},c)}function Ln(e){return x("ul",{class:"md-source__facts"},Object.entries(e).map(([t,r])=>x("li",{class:`md-source__fact md-source__fact--${t}`},typeof r=="number"?sr(r):r)))}function Kr(e){let t=`tabbed-control tabbed-control--${e}`;return x("div",{class:t,hidden:!0},x("button",{class:"tabbed-button",tabIndex:-1,"aria-hidden":"true"}))}function _n(e){return x("div",{class:"md-typeset__scrollwrap"},x("div",{class:"md-typeset__table"},e))}function $a(e){var o;let t=xe(),r=new URL(`../${e.version}/`,t.base);return x("li",{class:"md-version__item"},x("a",{href:`${r}`,class:"md-version__link"},e.title,((o=t.version)==null?void 0:o.alias)&&e.aliases.length>0&&x("span",{class:"md-version__alias"},e.aliases[0])))}function An(e,t){var o;let r=xe();return e=e.filter(n=>{var i;return!((i=n.properties)!=null&&i.hidden)}),x("div",{class:"md-version"},x("button",{class:"md-version__current","aria-label":Ee("select.version")},t.title,((o=r.version)==null?void 0:o.alias)&&t.aliases.length>0&&x("span",{class:"md-version__alias"},t.aliases[0])),x("ul",{class:"md-version__list"},e.map($a)))}var Pa=0;function Ra(e){let t=z([et(e),$t(e)]).pipe(m(([o,n])=>o||n),K()),r=C(()=>Xo(e)).pipe(ne(Ne),pt(1),He(t),m(()=>Zo(e)));return t.pipe(Ae(o=>o),v(()=>z([t,r])),m(([o,n])=>({active:o,offset:n})),pe())}function Ia(e,t){let{content$:r,viewport$:o}=t,n=`__tooltip2_${Pa++}`;return C(()=>{let i=new g,a=new _r(!1);i.pipe(Z(),ie(!1)).subscribe(a);let s=a.pipe(Ht(c=>Me(+!c*250,kr)),K(),v(c=>c?r:S),w(c=>c.id=n),pe());z([i.pipe(m(({active:c})=>c)),s.pipe(v(c=>$t(c,250)),Q(!1))]).pipe(m(c=>c.some(l=>l))).subscribe(a);let p=a.pipe(b(c=>c),re(s,o),m(([c,l,{size:f}])=>{let u=e.getBoundingClientRect(),d=u.width/2;if(l.role==="tooltip")return{x:d,y:8+u.height};if(u.y>=f.height/2){let{height:y}=ce(l);return{x:d,y:-16-y}}else return{x:d,y:16+u.height}}));return z([s,i,p]).subscribe(([c,{offset:l},f])=>{c.style.setProperty("--md-tooltip-host-x",`${l.x}px`),c.style.setProperty("--md-tooltip-host-y",`${l.y}px`),c.style.setProperty("--md-tooltip-x",`${f.x}px`),c.style.setProperty("--md-tooltip-y",`${f.y}px`),c.classList.toggle("md-tooltip2--top",f.y<0),c.classList.toggle("md-tooltip2--bottom",f.y>=0)}),a.pipe(b(c=>c),re(s,(c,l)=>l),b(c=>c.role==="tooltip")).subscribe(c=>{let l=ce(R(":scope > *",c));c.style.setProperty("--md-tooltip-width",`${l.width}px`),c.style.setProperty("--md-tooltip-tail","0px")}),a.pipe(K(),ve(me),re(s)).subscribe(([c,l])=>{l.classList.toggle("md-tooltip2--active",c)}),z([a.pipe(b(c=>c)),s]).subscribe(([c,l])=>{l.role==="dialog"?(e.setAttribute("aria-controls",n),e.setAttribute("aria-haspopup","dialog")):e.setAttribute("aria-describedby",n)}),a.pipe(b(c=>!c)).subscribe(()=>{e.removeAttribute("aria-controls"),e.removeAttribute("aria-describedby"),e.removeAttribute("aria-haspopup")}),Ra(e).pipe(w(c=>i.next(c)),_(()=>i.complete()),m(c=>$({ref:e},c)))})}function mt(e,{viewport$:t},r=document.body){return Ia(e,{content$:new j(o=>{let n=e.title,i=En(n);return o.next(i),e.removeAttribute("title"),r.append(i),()=>{i.remove(),e.setAttribute("title",n)}}),viewport$:t})}function ja(e,t){let r=C(()=>z([en(e),Ne(t)])).pipe(m(([{x:o,y:n},i])=>{let{width:a,height:s}=ce(e);return{x:o-i.x+a/2,y:n-i.y+s/2}}));return et(e).pipe(v(o=>r.pipe(m(n=>({active:o,offset:n})),Te(+!o||1/0))))}function Cn(e,t,{target$:r}){let[o,n]=Array.from(e.children);return C(()=>{let i=new g,a=i.pipe(Z(),ie(!0));return i.subscribe({next({offset:s}){e.style.setProperty("--md-tooltip-x",`${s.x}px`),e.style.setProperty("--md-tooltip-y",`${s.y}px`)},complete(){e.style.removeProperty("--md-tooltip-x"),e.style.removeProperty("--md-tooltip-y")}}),tt(e).pipe(U(a)).subscribe(s=>{e.toggleAttribute("data-md-visible",s)}),O(i.pipe(b(({active:s})=>s)),i.pipe(_e(250),b(({active:s})=>!s))).subscribe({next({active:s}){s?e.prepend(o):o.remove()},complete(){e.prepend(o)}}),i.pipe(Le(16,me)).subscribe(({active:s})=>{o.classList.toggle("md-tooltip--active",s)}),i.pipe(pt(125,me),b(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:s})=>s)).subscribe({next(s){s?e.style.setProperty("--md-tooltip-0",`${-s}px`):e.style.removeProperty("--md-tooltip-0")},complete(){e.style.removeProperty("--md-tooltip-0")}}),h(n,"click").pipe(U(a),b(s=>!(s.metaKey||s.ctrlKey))).subscribe(s=>{s.stopPropagation(),s.preventDefault()}),h(n,"mousedown").pipe(U(a),re(i)).subscribe(([s,{active:p}])=>{var c;if(s.button!==0||s.metaKey||s.ctrlKey)s.preventDefault();else if(p){s.preventDefault();let l=e.parentElement.closest(".md-annotation");l instanceof HTMLElement?l.focus():(c=Ie())==null||c.blur()}}),r.pipe(U(a),b(s=>s===o),Ge(125)).subscribe(()=>e.focus()),ja(e,t).pipe(w(s=>i.next(s)),_(()=>i.complete()),m(s=>$({ref:e},s)))})}function Fa(e){return e.tagName==="CODE"?P(".c, .c1, .cm",e):[e]}function Wa(e){let t=[];for(let r of Fa(e)){let o=[],n=document.createNodeIterator(r,NodeFilter.SHOW_TEXT);for(let i=n.nextNode();i;i=n.nextNode())o.push(i);for(let i of o){let a;for(;a=/(\(\d+\))(!)?/.exec(i.textContent);){let[,s,p]=a;if(typeof p=="undefined"){let c=i.splitText(a.index);i=c.splitText(s.length),t.push(c)}else{i.textContent=s,t.push(i);break}}}}return t}function kn(e,t){t.append(...Array.from(e.childNodes))}function fr(e,t,{target$:r,print$:o}){let n=t.closest("[id]"),i=n==null?void 0:n.id,a=new Map;for(let s of Wa(t)){let[,p]=s.textContent.match(/\((\d+)\)/);fe(`:scope > li:nth-child(${p})`,e)&&(a.set(p,wn(p,i)),s.replaceWith(a.get(p)))}return a.size===0?S:C(()=>{let s=new g,p=s.pipe(Z(),ie(!0)),c=[];for(let[l,f]of a)c.push([R(".md-typeset",f),R(`:scope > li:nth-child(${l})`,e)]);return o.pipe(U(p)).subscribe(l=>{e.hidden=!l,e.classList.toggle("md-annotation-list",l);for(let[f,u]of c)l?kn(f,u):kn(u,f)}),O(...[...a].map(([,l])=>Cn(l,t,{target$:r}))).pipe(_(()=>s.complete()),pe())})}function Hn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Hn(t)}}function $n(e,t){return C(()=>{let r=Hn(e);return typeof r!="undefined"?fr(r,e,t):S})}var Pn=Lt(Br());var Ua=0;function Rn(e){if(e.nextElementSibling){let t=e.nextElementSibling;if(t.tagName==="OL")return t;if(t.tagName==="P"&&!t.children.length)return Rn(t)}}function Da(e){return ge(e).pipe(m(({width:t})=>({scrollable:St(e).width>t})),ee("scrollable"))}function In(e,t){let{matches:r}=matchMedia("(hover)"),o=C(()=>{let n=new g,i=n.pipe(jr(1));n.subscribe(({scrollable:c})=>{c&&r?e.setAttribute("tabindex","0"):e.removeAttribute("tabindex")});let a=[];if(Pn.default.isSupported()&&(e.closest(".copy")||B("content.code.copy")&&!e.closest(".no-copy"))){let c=e.closest("pre");c.id=`__code_${Ua++}`;let l=Tn(c.id);c.insertBefore(l,e),B("content.tooltips")&&a.push(mt(l,{viewport$}))}let s=e.closest(".highlight");if(s instanceof HTMLElement){let c=Rn(s);if(typeof c!="undefined"&&(s.classList.contains("annotate")||B("content.code.annotate"))){let l=fr(c,e,t);a.push(ge(s).pipe(U(i),m(({width:f,height:u})=>f&&u),K(),v(f=>f?l:S)))}}return P(":scope > span[id]",e).length&&e.classList.add("md-code__content"),Da(e).pipe(w(c=>n.next(c)),_(()=>n.complete()),m(c=>$({ref:e},c)),Re(...a))});return B("content.lazy")?tt(e).pipe(b(n=>n),Te(1),v(()=>o)):o}function Va(e,{target$:t,print$:r}){let o=!0;return O(t.pipe(m(n=>n.closest("details:not([open])")),b(n=>e===n),m(()=>({action:"open",reveal:!0}))),r.pipe(b(n=>n||!o),w(()=>o=e.open),m(n=>({action:n?"open":"close"}))))}function jn(e,t){return C(()=>{let r=new g;return r.subscribe(({action:o,reveal:n})=>{e.toggleAttribute("open",o==="open"),n&&e.scrollIntoView()}),Va(e,t).pipe(w(o=>r.next(o)),_(()=>r.complete()),m(o=>$({ref:e},o)))})}var Fn=".node circle,.node ellipse,.node path,.node polygon,.node rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}marker{fill:var(--md-mermaid-edge-color)!important}.edgeLabel .label rect{fill:#0000}.label{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.label foreignObject{line-height:normal;overflow:visible}.label div .edgeLabel{color:var(--md-mermaid-label-fg-color)}.edgeLabel,.edgeLabel p,.label div .edgeLabel{background-color:var(--md-mermaid-label-bg-color)}.edgeLabel,.edgeLabel p{fill:var(--md-mermaid-label-bg-color);color:var(--md-mermaid-edge-color)}.edgePath .path,.flowchart-link{stroke:var(--md-mermaid-edge-color);stroke-width:.05rem}.edgePath .arrowheadPath{fill:var(--md-mermaid-edge-color);stroke:none}.cluster rect{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}.cluster span{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}g #flowchart-circleEnd,g #flowchart-circleStart,g #flowchart-crossEnd,g #flowchart-crossStart,g #flowchart-pointEnd,g #flowchart-pointStart{stroke:none}g.classGroup line,g.classGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.classGroup text{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.classLabel .box{fill:var(--md-mermaid-label-bg-color);background-color:var(--md-mermaid-label-bg-color);opacity:1}.classLabel .label{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.node .divider{stroke:var(--md-mermaid-node-fg-color)}.relation{stroke:var(--md-mermaid-edge-color)}.cardinality{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.cardinality text{fill:inherit!important}defs #classDiagram-compositionEnd,defs #classDiagram-compositionStart,defs #classDiagram-dependencyEnd,defs #classDiagram-dependencyStart,defs #classDiagram-extensionEnd,defs #classDiagram-extensionStart{fill:var(--md-mermaid-edge-color)!important;stroke:var(--md-mermaid-edge-color)!important}defs #classDiagram-aggregationEnd,defs #classDiagram-aggregationStart{fill:var(--md-mermaid-label-bg-color)!important;stroke:var(--md-mermaid-edge-color)!important}g.stateGroup rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}g.stateGroup .state-title{fill:var(--md-mermaid-label-fg-color)!important;font-family:var(--md-mermaid-font-family)}g.stateGroup .composit{fill:var(--md-mermaid-label-bg-color)}.nodeLabel,.nodeLabel p{color:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}a .nodeLabel{text-decoration:underline}.node circle.state-end,.node circle.state-start,.start-state{fill:var(--md-mermaid-edge-color);stroke:none}.end-state-inner,.end-state-outer{fill:var(--md-mermaid-edge-color)}.end-state-inner,.node circle.state-end{stroke:var(--md-mermaid-label-bg-color)}.transition{stroke:var(--md-mermaid-edge-color)}[id^=state-fork] rect,[id^=state-join] rect{fill:var(--md-mermaid-edge-color)!important;stroke:none!important}.statediagram-cluster.statediagram-cluster .inner{fill:var(--md-default-bg-color)}.statediagram-cluster rect{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.statediagram-state rect.divider{fill:var(--md-default-fg-color--lightest);stroke:var(--md-default-fg-color--lighter)}defs #statediagram-barbEnd{stroke:var(--md-mermaid-edge-color)}.attributeBoxEven,.attributeBoxOdd{fill:var(--md-mermaid-node-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityBox{fill:var(--md-mermaid-label-bg-color);stroke:var(--md-mermaid-node-fg-color)}.entityLabel{fill:var(--md-mermaid-label-fg-color);font-family:var(--md-mermaid-font-family)}.relationshipLabelBox{fill:var(--md-mermaid-label-bg-color);fill-opacity:1;background-color:var(--md-mermaid-label-bg-color);opacity:1}.relationshipLabel{fill:var(--md-mermaid-label-fg-color)}.relationshipLine{stroke:var(--md-mermaid-edge-color)}defs #ONE_OR_MORE_END *,defs #ONE_OR_MORE_START *,defs #ONLY_ONE_END *,defs #ONLY_ONE_START *,defs #ZERO_OR_MORE_END *,defs #ZERO_OR_MORE_START *,defs #ZERO_OR_ONE_END *,defs #ZERO_OR_ONE_START *{stroke:var(--md-mermaid-edge-color)!important}defs #ZERO_OR_MORE_END circle,defs #ZERO_OR_MORE_START circle{fill:var(--md-mermaid-label-bg-color)}.actor{fill:var(--md-mermaid-sequence-actor-bg-color);stroke:var(--md-mermaid-sequence-actor-border-color)}text.actor>tspan{fill:var(--md-mermaid-sequence-actor-fg-color);font-family:var(--md-mermaid-font-family)}line{stroke:var(--md-mermaid-sequence-actor-line-color)}.actor-man circle,.actor-man line{fill:var(--md-mermaid-sequence-actorman-bg-color);stroke:var(--md-mermaid-sequence-actorman-line-color)}.messageLine0,.messageLine1{stroke:var(--md-mermaid-sequence-message-line-color)}.note{fill:var(--md-mermaid-sequence-note-bg-color);stroke:var(--md-mermaid-sequence-note-border-color)}.loopText,.loopText>tspan,.messageText,.noteText>tspan{stroke:none;font-family:var(--md-mermaid-font-family)!important}.messageText{fill:var(--md-mermaid-sequence-message-fg-color)}.loopText,.loopText>tspan{fill:var(--md-mermaid-sequence-loop-fg-color)}.noteText>tspan{fill:var(--md-mermaid-sequence-note-fg-color)}#arrowhead path{fill:var(--md-mermaid-sequence-message-line-color);stroke:none}.loopLine{fill:var(--md-mermaid-sequence-loop-bg-color);stroke:var(--md-mermaid-sequence-loop-border-color)}.labelBox{fill:var(--md-mermaid-sequence-label-bg-color);stroke:none}.labelText,.labelText>span{fill:var(--md-mermaid-sequence-label-fg-color);font-family:var(--md-mermaid-font-family)}.sequenceNumber{fill:var(--md-mermaid-sequence-number-fg-color)}rect.rect{fill:var(--md-mermaid-sequence-box-bg-color);stroke:none}rect.rect+text.text{fill:var(--md-mermaid-sequence-box-fg-color)}defs #sequencenumber{fill:var(--md-mermaid-sequence-number-bg-color)!important}";var Gr,za=0;function qa(){return typeof mermaid=="undefined"||mermaid instanceof Element?Tt("https://unpkg.com/mermaid@11/dist/mermaid.min.js"):I(void 0)}function Wn(e){return e.classList.remove("mermaid"),Gr||(Gr=qa().pipe(w(()=>mermaid.initialize({startOnLoad:!1,themeCSS:Fn,sequence:{actorFontSize:"16px",messageFontSize:"16px",noteFontSize:"16px"}})),m(()=>{}),G(1))),Gr.subscribe(()=>so(this,null,function*(){e.classList.add("mermaid");let t=`__mermaid_${za++}`,r=x("div",{class:"mermaid"}),o=e.textContent,{svg:n,fn:i}=yield mermaid.render(t,o),a=r.attachShadow({mode:"closed"});a.innerHTML=n,e.replaceWith(r),i==null||i(a)})),Gr.pipe(m(()=>({ref:e})))}var Un=x("table");function Dn(e){return e.replaceWith(Un),Un.replaceWith(_n(e)),I({ref:e})}function Qa(e){let t=e.find(r=>r.checked)||e[0];return O(...e.map(r=>h(r,"change").pipe(m(()=>R(`label[for="${r.id}"]`))))).pipe(Q(R(`label[for="${t.id}"]`)),m(r=>({active:r})))}function Vn(e,{viewport$:t,target$:r}){let o=R(".tabbed-labels",e),n=P(":scope > input",e),i=Kr("prev");e.append(i);let a=Kr("next");return e.append(a),C(()=>{let s=new g,p=s.pipe(Z(),ie(!0));z([s,ge(e),tt(e)]).pipe(U(p),Le(1,me)).subscribe({next([{active:c},l]){let f=Ve(c),{width:u}=ce(c);e.style.setProperty("--md-indicator-x",`${f.x}px`),e.style.setProperty("--md-indicator-width",`${u}px`);let d=pr(o);(f.xd.x+l.width)&&o.scrollTo({left:Math.max(0,f.x-16),behavior:"smooth"})},complete(){e.style.removeProperty("--md-indicator-x"),e.style.removeProperty("--md-indicator-width")}}),z([Ne(o),ge(o)]).pipe(U(p)).subscribe(([c,l])=>{let f=St(o);i.hidden=c.x<16,a.hidden=c.x>f.width-l.width-16}),O(h(i,"click").pipe(m(()=>-1)),h(a,"click").pipe(m(()=>1))).pipe(U(p)).subscribe(c=>{let{width:l}=ce(o);o.scrollBy({left:l*c,behavior:"smooth"})}),r.pipe(U(p),b(c=>n.includes(c))).subscribe(c=>c.click()),o.classList.add("tabbed-labels--linked");for(let c of n){let l=R(`label[for="${c.id}"]`);l.replaceChildren(x("a",{href:`#${l.htmlFor}`,tabIndex:-1},...Array.from(l.childNodes))),h(l.firstElementChild,"click").pipe(U(p),b(f=>!(f.metaKey||f.ctrlKey)),w(f=>{f.preventDefault(),f.stopPropagation()})).subscribe(()=>{history.replaceState({},"",`#${l.htmlFor}`),l.click()})}return B("content.tabs.link")&&s.pipe(Ce(1),re(t)).subscribe(([{active:c},{offset:l}])=>{let f=c.innerText.trim();if(c.hasAttribute("data-md-switching"))c.removeAttribute("data-md-switching");else{let u=e.offsetTop-l.y;for(let y of P("[data-tabs]"))for(let M of P(":scope > input",y)){let X=R(`label[for="${M.id}"]`);if(X!==c&&X.innerText.trim()===f){X.setAttribute("data-md-switching",""),M.click();break}}window.scrollTo({top:e.offsetTop-u});let d=__md_get("__tabs")||[];__md_set("__tabs",[...new Set([f,...d])])}}),s.pipe(U(p)).subscribe(()=>{for(let c of P("audio, video",e))c.pause()}),Qa(n).pipe(w(c=>s.next(c)),_(()=>s.complete()),m(c=>$({ref:e},c)))}).pipe(Ke(se))}function Nn(e,{viewport$:t,target$:r,print$:o}){return O(...P(".annotate:not(.highlight)",e).map(n=>$n(n,{target$:r,print$:o})),...P("pre:not(.mermaid) > code",e).map(n=>In(n,{target$:r,print$:o})),...P("pre.mermaid",e).map(n=>Wn(n)),...P("table:not([class])",e).map(n=>Dn(n)),...P("details",e).map(n=>jn(n,{target$:r,print$:o})),...P("[data-tabs]",e).map(n=>Vn(n,{viewport$:t,target$:r})),...P("[title]",e).filter(()=>B("content.tooltips")).map(n=>mt(n,{viewport$:t})))}function Ka(e,{alert$:t}){return t.pipe(v(r=>O(I(!0),I(!1).pipe(Ge(2e3))).pipe(m(o=>({message:r,active:o})))))}function zn(e,t){let r=R(".md-typeset",e);return C(()=>{let o=new g;return o.subscribe(({message:n,active:i})=>{e.classList.toggle("md-dialog--active",i),r.textContent=n}),Ka(e,t).pipe(w(n=>o.next(n)),_(()=>o.complete()),m(n=>$({ref:e},n)))})}var Ya=0;function Ba(e,t){document.body.append(e);let{width:r}=ce(e);e.style.setProperty("--md-tooltip-width",`${r}px`),e.remove();let o=cr(t),n=typeof o!="undefined"?Ne(o):I({x:0,y:0}),i=O(et(t),$t(t)).pipe(K());return z([i,n]).pipe(m(([a,s])=>{let{x:p,y:c}=Ve(t),l=ce(t),f=t.closest("table");return f&&t.parentElement&&(p+=f.offsetLeft+t.parentElement.offsetLeft,c+=f.offsetTop+t.parentElement.offsetTop),{active:a,offset:{x:p-s.x+l.width/2-r/2,y:c-s.y+l.height+8}}}))}function qn(e){let t=e.title;if(!t.length)return S;let r=`__tooltip_${Ya++}`,o=Rt(r,"inline"),n=R(".md-typeset",o);return n.innerHTML=t,C(()=>{let i=new g;return i.subscribe({next({offset:a}){o.style.setProperty("--md-tooltip-x",`${a.x}px`),o.style.setProperty("--md-tooltip-y",`${a.y}px`)},complete(){o.style.removeProperty("--md-tooltip-x"),o.style.removeProperty("--md-tooltip-y")}}),O(i.pipe(b(({active:a})=>a)),i.pipe(_e(250),b(({active:a})=>!a))).subscribe({next({active:a}){a?(e.insertAdjacentElement("afterend",o),e.setAttribute("aria-describedby",r),e.removeAttribute("title")):(o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t))},complete(){o.remove(),e.removeAttribute("aria-describedby"),e.setAttribute("title",t)}}),i.pipe(Le(16,me)).subscribe(({active:a})=>{o.classList.toggle("md-tooltip--active",a)}),i.pipe(pt(125,me),b(()=>!!e.offsetParent),m(()=>e.offsetParent.getBoundingClientRect()),m(({x:a})=>a)).subscribe({next(a){a?o.style.setProperty("--md-tooltip-0",`${-a}px`):o.style.removeProperty("--md-tooltip-0")},complete(){o.style.removeProperty("--md-tooltip-0")}}),Ba(o,e).pipe(w(a=>i.next(a)),_(()=>i.complete()),m(a=>$({ref:e},a)))}).pipe(Ke(se))}function Ga({viewport$:e}){if(!B("header.autohide"))return I(!1);let t=e.pipe(m(({offset:{y:n}})=>n),Be(2,1),m(([n,i])=>[nMath.abs(i-n.y)>100),m(([,[n]])=>n),K()),o=ze("search");return z([e,o]).pipe(m(([{offset:n},i])=>n.y>400&&!i),K(),v(n=>n?r:I(!1)),Q(!1))}function Qn(e,t){return C(()=>z([ge(e),Ga(t)])).pipe(m(([{height:r},o])=>({height:r,hidden:o})),K((r,o)=>r.height===o.height&&r.hidden===o.hidden),G(1))}function Kn(e,{header$:t,main$:r}){return C(()=>{let o=new g,n=o.pipe(Z(),ie(!0));o.pipe(ee("active"),He(t)).subscribe(([{active:a},{hidden:s}])=>{e.classList.toggle("md-header--shadow",a&&!s),e.hidden=s});let i=ue(P("[title]",e)).pipe(b(()=>B("content.tooltips")),ne(a=>qn(a)));return r.subscribe(o),t.pipe(U(n),m(a=>$({ref:e},a)),Re(i.pipe(U(n))))})}function Ja(e,{viewport$:t,header$:r}){return mr(e,{viewport$:t,header$:r}).pipe(m(({offset:{y:o}})=>{let{height:n}=ce(e);return{active:o>=n}}),ee("active"))}function Yn(e,t){return C(()=>{let r=new g;r.subscribe({next({active:n}){e.classList.toggle("md-header__title--active",n)},complete(){e.classList.remove("md-header__title--active")}});let o=fe(".md-content h1");return typeof o=="undefined"?S:Ja(o,t).pipe(w(n=>r.next(n)),_(()=>r.complete()),m(n=>$({ref:e},n)))})}function Bn(e,{viewport$:t,header$:r}){let o=r.pipe(m(({height:i})=>i),K()),n=o.pipe(v(()=>ge(e).pipe(m(({height:i})=>({top:e.offsetTop,bottom:e.offsetTop+i})),ee("bottom"))));return z([o,n,t]).pipe(m(([i,{top:a,bottom:s},{offset:{y:p},size:{height:c}}])=>(c=Math.max(0,c-Math.max(0,a-p,i)-Math.max(0,c+p-s)),{offset:a-i,height:c,active:a-i<=p})),K((i,a)=>i.offset===a.offset&&i.height===a.height&&i.active===a.active))}function Xa(e){let t=__md_get("__palette")||{index:e.findIndex(o=>matchMedia(o.getAttribute("data-md-color-media")).matches)},r=Math.max(0,Math.min(t.index,e.length-1));return I(...e).pipe(ne(o=>h(o,"change").pipe(m(()=>o))),Q(e[r]),m(o=>({index:e.indexOf(o),color:{media:o.getAttribute("data-md-color-media"),scheme:o.getAttribute("data-md-color-scheme"),primary:o.getAttribute("data-md-color-primary"),accent:o.getAttribute("data-md-color-accent")}})),G(1))}function Gn(e){let t=P("input",e),r=x("meta",{name:"theme-color"});document.head.appendChild(r);let o=x("meta",{name:"color-scheme"});document.head.appendChild(o);let n=Pt("(prefers-color-scheme: light)");return C(()=>{let i=new g;return i.subscribe(a=>{if(document.body.setAttribute("data-md-color-switching",""),a.color.media==="(prefers-color-scheme)"){let s=matchMedia("(prefers-color-scheme: light)"),p=document.querySelector(s.matches?"[data-md-color-media='(prefers-color-scheme: light)']":"[data-md-color-media='(prefers-color-scheme: dark)']");a.color.scheme=p.getAttribute("data-md-color-scheme"),a.color.primary=p.getAttribute("data-md-color-primary"),a.color.accent=p.getAttribute("data-md-color-accent")}for(let[s,p]of Object.entries(a.color))document.body.setAttribute(`data-md-color-${s}`,p);for(let s=0;sa.key==="Enter"),re(i,(a,s)=>s)).subscribe(({index:a})=>{a=(a+1)%t.length,t[a].click(),t[a].focus()}),i.pipe(m(()=>{let a=Se("header"),s=window.getComputedStyle(a);return o.content=s.colorScheme,s.backgroundColor.match(/\d+/g).map(p=>(+p).toString(16).padStart(2,"0")).join("")})).subscribe(a=>r.content=`#${a}`),i.pipe(ve(se)).subscribe(()=>{document.body.removeAttribute("data-md-color-switching")}),Xa(t).pipe(U(n.pipe(Ce(1))),ct(),w(a=>i.next(a)),_(()=>i.complete()),m(a=>$({ref:e},a)))})}function Jn(e,{progress$:t}){return C(()=>{let r=new g;return r.subscribe(({value:o})=>{e.style.setProperty("--md-progress-value",`${o}`)}),t.pipe(w(o=>r.next({value:o})),_(()=>r.complete()),m(o=>({ref:e,value:o})))})}var Jr=Lt(Br());function Za(e){e.setAttribute("data-md-copying","");let t=e.closest("[data-copy]"),r=t?t.getAttribute("data-copy"):e.innerText;return e.removeAttribute("data-md-copying"),r.trimEnd()}function Xn({alert$:e}){Jr.default.isSupported()&&new j(t=>{new Jr.default("[data-clipboard-target], [data-clipboard-text]",{text:r=>r.getAttribute("data-clipboard-text")||Za(R(r.getAttribute("data-clipboard-target")))}).on("success",r=>t.next(r))}).pipe(w(t=>{t.trigger.focus()}),m(()=>Ee("clipboard.copied"))).subscribe(e)}function Zn(e,t){return e.protocol=t.protocol,e.hostname=t.hostname,e}function es(e,t){let r=new Map;for(let o of P("url",e)){let n=R("loc",o),i=[Zn(new URL(n.textContent),t)];r.set(`${i[0]}`,i);for(let a of P("[rel=alternate]",o)){let s=a.getAttribute("href");s!=null&&i.push(Zn(new URL(s),t))}}return r}function ur(e){return fn(new URL("sitemap.xml",e)).pipe(m(t=>es(t,new URL(e))),de(()=>I(new Map)))}function ts(e,t){if(!(e.target instanceof Element))return S;let r=e.target.closest("a");if(r===null)return S;if(r.target||e.metaKey||e.ctrlKey)return S;let o=new URL(r.href);return o.search=o.hash="",t.has(`${o}`)?(e.preventDefault(),I(new URL(r.href))):S}function ei(e){let t=new Map;for(let r of P(":scope > *",e.head))t.set(r.outerHTML,r);return t}function ti(e){for(let t of P("[href], [src]",e))for(let r of["href","src"]){let o=t.getAttribute(r);if(o&&!/^(?:[a-z]+:)?\/\//i.test(o)){t[r]=t[r];break}}return I(e)}function rs(e){for(let o of["[data-md-component=announce]","[data-md-component=container]","[data-md-component=header-topic]","[data-md-component=outdated]","[data-md-component=logo]","[data-md-component=skip]",...B("navigation.tabs.sticky")?["[data-md-component=tabs]"]:[]]){let n=fe(o),i=fe(o,e);typeof n!="undefined"&&typeof i!="undefined"&&n.replaceWith(i)}let t=ei(document);for(let[o,n]of ei(e))t.has(o)?t.delete(o):document.head.appendChild(n);for(let o of t.values()){let n=o.getAttribute("name");n!=="theme-color"&&n!=="color-scheme"&&o.remove()}let r=Se("container");return Ue(P("script",r)).pipe(v(o=>{let n=e.createElement("script");if(o.src){for(let i of o.getAttributeNames())n.setAttribute(i,o.getAttribute(i));return o.replaceWith(n),new j(i=>{n.onload=()=>i.complete()})}else return n.textContent=o.textContent,o.replaceWith(n),S}),Z(),ie(document))}function ri({location$:e,viewport$:t,progress$:r}){let o=xe();if(location.protocol==="file:")return S;let n=ur(o.base);I(document).subscribe(ti);let i=h(document.body,"click").pipe(He(n),v(([p,c])=>ts(p,c)),pe()),a=h(window,"popstate").pipe(m(ye),pe());i.pipe(re(t)).subscribe(([p,{offset:c}])=>{history.replaceState(c,""),history.pushState(null,"",p)}),O(i,a).subscribe(e);let s=e.pipe(ee("pathname"),v(p=>mn(p,{progress$:r}).pipe(de(()=>(lt(p,!0),S)))),v(ti),v(rs),pe());return O(s.pipe(re(e,(p,c)=>c)),s.pipe(v(()=>e),ee("pathname"),v(()=>e),ee("hash")),e.pipe(K((p,c)=>p.pathname===c.pathname&&p.hash===c.hash),v(()=>i),w(()=>history.back()))).subscribe(p=>{var c,l;history.state!==null||!p.hash?window.scrollTo(0,(l=(c=history.state)==null?void 0:c.y)!=null?l:0):(history.scrollRestoration="auto",cn(p.hash),history.scrollRestoration="manual")}),e.subscribe(()=>{history.scrollRestoration="manual"}),h(window,"beforeunload").subscribe(()=>{history.scrollRestoration="auto"}),t.pipe(ee("offset"),_e(100)).subscribe(({offset:p})=>{history.replaceState(p,"")}),s}var oi=Lt(qr());function ni(e){let t=e.separator.split("|").map(n=>n.replace(/(\(\?[!=<][^)]+\))/g,"").length===0?"\uFFFD":n).join("|"),r=new RegExp(t,"img"),o=(n,i,a)=>`${i}${a}`;return n=>{n=n.replace(/[\s*+\-:~^]+/g," ").trim();let i=new RegExp(`(^|${e.separator}|)(${n.replace(/[|\\{}()[\]^$+*?.-]/g,"\\$&").replace(r,"|")})`,"img");return a=>(0,oi.default)(a).replace(i,o).replace(/<\/mark>(\s+)]*>/img,"$1")}}function jt(e){return e.type===1}function dr(e){return e.type===3}function ii(e,t){let r=gn(e);return O(I(location.protocol!=="file:"),ze("search")).pipe(Ae(o=>o),v(()=>t)).subscribe(({config:o,docs:n})=>r.next({type:0,data:{config:o,docs:n,options:{suggest:B("search.suggest")}}})),r}function ai({document$:e}){let t=xe(),r=je(new URL("../versions.json",t.base)).pipe(de(()=>S)),o=r.pipe(m(n=>{let[,i]=t.base.match(/([^/]+)\/?$/);return n.find(({version:a,aliases:s})=>a===i||s.includes(i))||n[0]}));r.pipe(m(n=>new Map(n.map(i=>[`${new URL(`../${i.version}/`,t.base)}`,i]))),v(n=>h(document.body,"click").pipe(b(i=>!i.metaKey&&!i.ctrlKey),re(o),v(([i,a])=>{if(i.target instanceof Element){let s=i.target.closest("a");if(s&&!s.target&&n.has(s.href)){let p=s.href;return!i.target.closest(".md-version")&&n.get(p)===a?S:(i.preventDefault(),I(p))}}return S}),v(i=>ur(new URL(i)).pipe(m(a=>{let p=ye().href.replace(t.base,i);return a.has(p.split("#")[0])?new URL(p):new URL(i)})))))).subscribe(n=>lt(n,!0)),z([r,o]).subscribe(([n,i])=>{R(".md-header__topic").appendChild(An(n,i))}),e.pipe(v(()=>o)).subscribe(n=>{var a;let i=__md_get("__outdated",sessionStorage);if(i===null){i=!0;let s=((a=t.version)==null?void 0:a.default)||"latest";Array.isArray(s)||(s=[s]);e:for(let p of s)for(let c of n.aliases.concat(n.version))if(new RegExp(p,"i").test(c)){i=!1;break e}__md_set("__outdated",i,sessionStorage)}if(i)for(let s of ae("outdated"))s.hidden=!1})}function is(e,{worker$:t}){let{searchParams:r}=ye();r.has("q")&&(Je("search",!0),e.value=r.get("q"),e.focus(),ze("search").pipe(Ae(i=>!i)).subscribe(()=>{let i=ye();i.searchParams.delete("q"),history.replaceState({},"",`${i}`)}));let o=et(e),n=O(t.pipe(Ae(jt)),h(e,"keyup"),o).pipe(m(()=>e.value),K());return z([n,o]).pipe(m(([i,a])=>({value:i,focus:a})),G(1))}function si(e,{worker$:t}){let r=new g,o=r.pipe(Z(),ie(!0));z([t.pipe(Ae(jt)),r],(i,a)=>a).pipe(ee("value")).subscribe(({value:i})=>t.next({type:2,data:i})),r.pipe(ee("focus")).subscribe(({focus:i})=>{i&&Je("search",i)}),h(e.form,"reset").pipe(U(o)).subscribe(()=>e.focus());let n=R("header [for=__search]");return h(n,"click").subscribe(()=>e.focus()),is(e,{worker$:t}).pipe(w(i=>r.next(i)),_(()=>r.complete()),m(i=>$({ref:e},i)),G(1))}function ci(e,{worker$:t,query$:r}){let o=new g,n=rn(e.parentElement).pipe(b(Boolean)),i=e.parentElement,a=R(":scope > :first-child",e),s=R(":scope > :last-child",e);ze("search").subscribe(l=>s.setAttribute("role",l?"list":"presentation")),o.pipe(re(r),Ur(t.pipe(Ae(jt)))).subscribe(([{items:l},{value:f}])=>{switch(l.length){case 0:a.textContent=f.length?Ee("search.result.none"):Ee("search.result.placeholder");break;case 1:a.textContent=Ee("search.result.one");break;default:let u=sr(l.length);a.textContent=Ee("search.result.other",u)}});let p=o.pipe(w(()=>s.innerHTML=""),v(({items:l})=>O(I(...l.slice(0,10)),I(...l.slice(10)).pipe(Be(4),Vr(n),v(([f])=>f)))),m(Mn),pe());return p.subscribe(l=>s.appendChild(l)),p.pipe(ne(l=>{let f=fe("details",l);return typeof f=="undefined"?S:h(f,"toggle").pipe(U(o),m(()=>f))})).subscribe(l=>{l.open===!1&&l.offsetTop<=i.scrollTop&&i.scrollTo({top:l.offsetTop})}),t.pipe(b(dr),m(({data:l})=>l)).pipe(w(l=>o.next(l)),_(()=>o.complete()),m(l=>$({ref:e},l)))}function as(e,{query$:t}){return t.pipe(m(({value:r})=>{let o=ye();return o.hash="",r=r.replace(/\s+/g,"+").replace(/&/g,"%26").replace(/=/g,"%3D"),o.search=`q=${r}`,{url:o}}))}function pi(e,t){let r=new g,o=r.pipe(Z(),ie(!0));return r.subscribe(({url:n})=>{e.setAttribute("data-clipboard-text",e.href),e.href=`${n}`}),h(e,"click").pipe(U(o)).subscribe(n=>n.preventDefault()),as(e,t).pipe(w(n=>r.next(n)),_(()=>r.complete()),m(n=>$({ref:e},n)))}function li(e,{worker$:t,keyboard$:r}){let o=new g,n=Se("search-query"),i=O(h(n,"keydown"),h(n,"focus")).pipe(ve(se),m(()=>n.value),K());return o.pipe(He(i),m(([{suggest:s},p])=>{let c=p.split(/([\s-]+)/);if(s!=null&&s.length&&c[c.length-1]){let l=s[s.length-1];l.startsWith(c[c.length-1])&&(c[c.length-1]=l)}else c.length=0;return c})).subscribe(s=>e.innerHTML=s.join("").replace(/\s/g," ")),r.pipe(b(({mode:s})=>s==="search")).subscribe(s=>{switch(s.type){case"ArrowRight":e.innerText.length&&n.selectionStart===n.value.length&&(n.value=e.innerText);break}}),t.pipe(b(dr),m(({data:s})=>s)).pipe(w(s=>o.next(s)),_(()=>o.complete()),m(()=>({ref:e})))}function mi(e,{index$:t,keyboard$:r}){let o=xe();try{let n=ii(o.search,t),i=Se("search-query",e),a=Se("search-result",e);h(e,"click").pipe(b(({target:p})=>p instanceof Element&&!!p.closest("a"))).subscribe(()=>Je("search",!1)),r.pipe(b(({mode:p})=>p==="search")).subscribe(p=>{let c=Ie();switch(p.type){case"Enter":if(c===i){let l=new Map;for(let f of P(":first-child [href]",a)){let u=f.firstElementChild;l.set(f,parseFloat(u.getAttribute("data-md-score")))}if(l.size){let[[f]]=[...l].sort(([,u],[,d])=>d-u);f.click()}p.claim()}break;case"Escape":case"Tab":Je("search",!1),i.blur();break;case"ArrowUp":case"ArrowDown":if(typeof c=="undefined")i.focus();else{let l=[i,...P(":not(details) > [href], summary, details[open] [href]",a)],f=Math.max(0,(Math.max(0,l.indexOf(c))+l.length+(p.type==="ArrowUp"?-1:1))%l.length);l[f].focus()}p.claim();break;default:i!==Ie()&&i.focus()}}),r.pipe(b(({mode:p})=>p==="global")).subscribe(p=>{switch(p.type){case"f":case"s":case"/":i.focus(),i.select(),p.claim();break}});let s=si(i,{worker$:n});return O(s,ci(a,{worker$:n,query$:s})).pipe(Re(...ae("search-share",e).map(p=>pi(p,{query$:s})),...ae("search-suggest",e).map(p=>li(p,{worker$:n,keyboard$:r}))))}catch(n){return e.hidden=!0,Ye}}function fi(e,{index$:t,location$:r}){return z([t,r.pipe(Q(ye()),b(o=>!!o.searchParams.get("h")))]).pipe(m(([o,n])=>ni(o.config)(n.searchParams.get("h"))),m(o=>{var a;let n=new Map,i=document.createNodeIterator(e,NodeFilter.SHOW_TEXT);for(let s=i.nextNode();s;s=i.nextNode())if((a=s.parentElement)!=null&&a.offsetHeight){let p=s.textContent,c=o(p);c.length>p.length&&n.set(s,c)}for(let[s,p]of n){let{childNodes:c}=x("span",null,p);s.replaceWith(...Array.from(c))}return{ref:e,nodes:n}}))}function ss(e,{viewport$:t,main$:r}){let o=e.closest(".md-grid"),n=o.offsetTop-o.parentElement.offsetTop;return z([r,t]).pipe(m(([{offset:i,height:a},{offset:{y:s}}])=>(a=a+Math.min(n,Math.max(0,s-i))-n,{height:a,locked:s>=i+n})),K((i,a)=>i.height===a.height&&i.locked===a.locked))}function Xr(e,o){var n=o,{header$:t}=n,r=ao(n,["header$"]);let i=R(".md-sidebar__scrollwrap",e),{y:a}=Ve(i);return C(()=>{let s=new g,p=s.pipe(Z(),ie(!0)),c=s.pipe(Le(0,me));return c.pipe(re(t)).subscribe({next([{height:l},{height:f}]){i.style.height=`${l-2*a}px`,e.style.top=`${f}px`},complete(){i.style.height="",e.style.top=""}}),c.pipe(Ae()).subscribe(()=>{for(let l of P(".md-nav__link--active[href]",e)){if(!l.clientHeight)continue;let f=l.closest(".md-sidebar__scrollwrap");if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:d}=ce(f);f.scrollTo({top:u-d/2})}}}),ue(P("label[tabindex]",e)).pipe(ne(l=>h(l,"click").pipe(ve(se),m(()=>l),U(p)))).subscribe(l=>{let f=R(`[id="${l.htmlFor}"]`);R(`[aria-labelledby="${l.id}"]`).setAttribute("aria-expanded",`${f.checked}`)}),ss(e,r).pipe(w(l=>s.next(l)),_(()=>s.complete()),m(l=>$({ref:e},l)))})}function ui(e,t){if(typeof t!="undefined"){let r=`https://api.github.com/repos/${e}/${t}`;return st(je(`${r}/releases/latest`).pipe(de(()=>S),m(o=>({version:o.tag_name})),De({})),je(r).pipe(de(()=>S),m(o=>({stars:o.stargazers_count,forks:o.forks_count})),De({}))).pipe(m(([o,n])=>$($({},o),n)))}else{let r=`https://api.github.com/users/${e}`;return je(r).pipe(m(o=>({repositories:o.public_repos})),De({}))}}function di(e,t){let r=`https://${e}/api/v4/projects/${encodeURIComponent(t)}`;return st(je(`${r}/releases/permalink/latest`).pipe(de(()=>S),m(({tag_name:o})=>({version:o})),De({})),je(r).pipe(de(()=>S),m(({star_count:o,forks_count:n})=>({stars:o,forks:n})),De({}))).pipe(m(([o,n])=>$($({},o),n)))}function hi(e){let t=e.match(/^.+github\.com\/([^/]+)\/?([^/]+)?/i);if(t){let[,r,o]=t;return ui(r,o)}if(t=e.match(/^.+?([^/]*gitlab[^/]+)\/(.+?)\/?$/i),t){let[,r,o]=t;return di(r,o)}return S}var cs;function ps(e){return cs||(cs=C(()=>{let t=__md_get("__source",sessionStorage);if(t)return I(t);if(ae("consent").length){let o=__md_get("__consent");if(!(o&&o.github))return S}return hi(e.href).pipe(w(o=>__md_set("__source",o,sessionStorage)))}).pipe(de(()=>S),b(t=>Object.keys(t).length>0),m(t=>({facts:t})),G(1)))}function bi(e){let t=R(":scope > :last-child",e);return C(()=>{let r=new g;return r.subscribe(({facts:o})=>{t.appendChild(Ln(o)),t.classList.add("md-source__repository--active")}),ps(e).pipe(w(o=>r.next(o)),_(()=>r.complete()),m(o=>$({ref:e},o)))})}function ls(e,{viewport$:t,header$:r}){return ge(document.body).pipe(v(()=>mr(e,{header$:r,viewport$:t})),m(({offset:{y:o}})=>({hidden:o>=10})),ee("hidden"))}function vi(e,t){return C(()=>{let r=new g;return r.subscribe({next({hidden:o}){e.hidden=o},complete(){e.hidden=!1}}),(B("navigation.tabs.sticky")?I({hidden:!1}):ls(e,t)).pipe(w(o=>r.next(o)),_(()=>r.complete()),m(o=>$({ref:e},o)))})}function ms(e,{viewport$:t,header$:r}){let o=new Map,n=P(".md-nav__link",e);for(let s of n){let p=decodeURIComponent(s.hash.substring(1)),c=fe(`[id="${p}"]`);typeof c!="undefined"&&o.set(s,c)}let i=r.pipe(ee("height"),m(({height:s})=>{let p=Se("main"),c=R(":scope > :first-child",p);return s+.8*(c.offsetTop-p.offsetTop)}),pe());return ge(document.body).pipe(ee("height"),v(s=>C(()=>{let p=[];return I([...o].reduce((c,[l,f])=>{for(;p.length&&o.get(p[p.length-1]).tagName>=f.tagName;)p.pop();let u=f.offsetTop;for(;!u&&f.parentElement;)f=f.parentElement,u=f.offsetTop;let d=f.offsetParent;for(;d;d=d.offsetParent)u+=d.offsetTop;return c.set([...p=[...p,l]].reverse(),u)},new Map))}).pipe(m(p=>new Map([...p].sort(([,c],[,l])=>c-l))),He(i),v(([p,c])=>t.pipe(Fr(([l,f],{offset:{y:u},size:d})=>{let y=u+d.height>=Math.floor(s.height);for(;f.length;){let[,M]=f[0];if(M-c=u&&!y)f=[l.pop(),...f];else break}return[l,f]},[[],[...p]]),K((l,f)=>l[0]===f[0]&&l[1]===f[1])))))).pipe(m(([s,p])=>({prev:s.map(([c])=>c),next:p.map(([c])=>c)})),Q({prev:[],next:[]}),Be(2,1),m(([s,p])=>s.prev.length{let i=new g,a=i.pipe(Z(),ie(!0));if(i.subscribe(({prev:s,next:p})=>{for(let[c]of p)c.classList.remove("md-nav__link--passed"),c.classList.remove("md-nav__link--active");for(let[c,[l]]of s.entries())l.classList.add("md-nav__link--passed"),l.classList.toggle("md-nav__link--active",c===s.length-1)}),B("toc.follow")){let s=O(t.pipe(_e(1),m(()=>{})),t.pipe(_e(250),m(()=>"smooth")));i.pipe(b(({prev:p})=>p.length>0),He(o.pipe(ve(se))),re(s)).subscribe(([[{prev:p}],c])=>{let[l]=p[p.length-1];if(l.offsetHeight){let f=cr(l);if(typeof f!="undefined"){let u=l.offsetTop-f.offsetTop,{height:d}=ce(f);f.scrollTo({top:u-d/2,behavior:c})}}})}return B("navigation.tracking")&&t.pipe(U(a),ee("offset"),_e(250),Ce(1),U(n.pipe(Ce(1))),ct({delay:250}),re(i)).subscribe(([,{prev:s}])=>{let p=ye(),c=s[s.length-1];if(c&&c.length){let[l]=c,{hash:f}=new URL(l.href);p.hash!==f&&(p.hash=f,history.replaceState({},"",`${p}`))}else p.hash="",history.replaceState({},"",`${p}`)}),ms(e,{viewport$:t,header$:r}).pipe(w(s=>i.next(s)),_(()=>i.complete()),m(s=>$({ref:e},s)))})}function fs(e,{viewport$:t,main$:r,target$:o}){let n=t.pipe(m(({offset:{y:a}})=>a),Be(2,1),m(([a,s])=>a>s&&s>0),K()),i=r.pipe(m(({active:a})=>a));return z([i,n]).pipe(m(([a,s])=>!(a&&s)),K(),U(o.pipe(Ce(1))),ie(!0),ct({delay:250}),m(a=>({hidden:a})))}function yi(e,{viewport$:t,header$:r,main$:o,target$:n}){let i=new g,a=i.pipe(Z(),ie(!0));return i.subscribe({next({hidden:s}){e.hidden=s,s?(e.setAttribute("tabindex","-1"),e.blur()):e.removeAttribute("tabindex")},complete(){e.style.top="",e.hidden=!0,e.removeAttribute("tabindex")}}),r.pipe(U(a),ee("height")).subscribe(({height:s})=>{e.style.top=`${s+16}px`}),h(e,"click").subscribe(s=>{s.preventDefault(),window.scrollTo({top:0})}),fs(e,{viewport$:t,main$:o,target$:n}).pipe(w(s=>i.next(s)),_(()=>i.complete()),m(s=>$({ref:e},s)))}function xi({document$:e,viewport$:t}){e.pipe(v(()=>P(".md-ellipsis")),ne(r=>tt(r).pipe(U(e.pipe(Ce(1))),b(o=>o),m(()=>r),Te(1))),b(r=>r.offsetWidth{let o=r.innerText,n=r.closest("a")||r;return n.title=o,B("content.tooltips")?mt(n,{viewport$:t}).pipe(U(e.pipe(Ce(1))),_(()=>n.removeAttribute("title"))):S})).subscribe(),B("content.tooltips")&&e.pipe(v(()=>P(".md-status")),ne(r=>mt(r,{viewport$:t}))).subscribe()}function Ei({document$:e,tablet$:t}){e.pipe(v(()=>P(".md-toggle--indeterminate")),w(r=>{r.indeterminate=!0,r.checked=!1}),ne(r=>h(r,"change").pipe(Dr(()=>r.classList.contains("md-toggle--indeterminate")),m(()=>r))),re(t)).subscribe(([r,o])=>{r.classList.remove("md-toggle--indeterminate"),o&&(r.checked=!1)})}function us(){return/(iPad|iPhone|iPod)/.test(navigator.userAgent)}function wi({document$:e}){e.pipe(v(()=>P("[data-md-scrollfix]")),w(t=>t.removeAttribute("data-md-scrollfix")),b(us),ne(t=>h(t,"touchstart").pipe(m(()=>t)))).subscribe(t=>{let r=t.scrollTop;r===0?t.scrollTop=1:r+t.offsetHeight===t.scrollHeight&&(t.scrollTop=r-1)})}function Ti({viewport$:e,tablet$:t}){z([ze("search"),t]).pipe(m(([r,o])=>r&&!o),v(r=>I(r).pipe(Ge(r?400:100))),re(e)).subscribe(([r,{offset:{y:o}}])=>{if(r)document.body.setAttribute("data-md-scrolllock",""),document.body.style.top=`-${o}px`;else{let n=-1*parseInt(document.body.style.top,10);document.body.removeAttribute("data-md-scrolllock"),document.body.style.top="",n&&window.scrollTo(0,n)}})}Object.entries||(Object.entries=function(e){let t=[];for(let r of Object.keys(e))t.push([r,e[r]]);return t});Object.values||(Object.values=function(e){let t=[];for(let r of Object.keys(e))t.push(e[r]);return t});typeof Element!="undefined"&&(Element.prototype.scrollTo||(Element.prototype.scrollTo=function(e,t){typeof e=="object"?(this.scrollLeft=e.left,this.scrollTop=e.top):(this.scrollLeft=e,this.scrollTop=t)}),Element.prototype.replaceWith||(Element.prototype.replaceWith=function(...e){let t=this.parentNode;if(t){e.length===0&&t.removeChild(this);for(let r=e.length-1;r>=0;r--){let o=e[r];typeof o=="string"?o=document.createTextNode(o):o.parentNode&&o.parentNode.removeChild(o),r?t.insertBefore(this.previousSibling,o):t.replaceChild(o,this)}}}));function ds(){return location.protocol==="file:"?Tt(`${new URL("search/search_index.js",Zr.base)}`).pipe(m(()=>__index),G(1)):je(new URL("search/search_index.json",Zr.base))}document.documentElement.classList.remove("no-js");document.documentElement.classList.add("js");var ot=Bo(),Wt=an(),Mt=pn(Wt),eo=nn(),Oe=vn(),hr=Pt("(min-width: 960px)"),Oi=Pt("(min-width: 1220px)"),Mi=ln(),Zr=xe(),Li=document.forms.namedItem("search")?ds():Ye,to=new g;Xn({alert$:to});var ro=new g;B("navigation.instant")&&ri({location$:Wt,viewport$:Oe,progress$:ro}).subscribe(ot);var Si;((Si=Zr.version)==null?void 0:Si.provider)==="mike"&&ai({document$:ot});O(Wt,Mt).pipe(Ge(125)).subscribe(()=>{Je("drawer",!1),Je("search",!1)});eo.pipe(b(({mode:e})=>e==="global")).subscribe(e=>{switch(e.type){case"p":case",":let t=fe("link[rel=prev]");typeof t!="undefined"&<(t);break;case"n":case".":let r=fe("link[rel=next]");typeof r!="undefined"&<(r);break;case"Enter":let o=Ie();o instanceof HTMLLabelElement&&o.click()}});xi({viewport$:Oe,document$:ot});Ei({document$:ot,tablet$:hr});wi({document$:ot});Ti({viewport$:Oe,tablet$:hr});var rt=Qn(Se("header"),{viewport$:Oe}),Ft=ot.pipe(m(()=>Se("main")),v(e=>Bn(e,{viewport$:Oe,header$:rt})),G(1)),hs=O(...ae("consent").map(e=>xn(e,{target$:Mt})),...ae("dialog").map(e=>zn(e,{alert$:to})),...ae("header").map(e=>Kn(e,{viewport$:Oe,header$:rt,main$:Ft})),...ae("palette").map(e=>Gn(e)),...ae("progress").map(e=>Jn(e,{progress$:ro})),...ae("search").map(e=>mi(e,{index$:Li,keyboard$:eo})),...ae("source").map(e=>bi(e))),bs=C(()=>O(...ae("announce").map(e=>yn(e)),...ae("content").map(e=>Nn(e,{viewport$:Oe,target$:Mt,print$:Mi})),...ae("content").map(e=>B("search.highlight")?fi(e,{index$:Li,location$:Wt}):S),...ae("header-title").map(e=>Yn(e,{viewport$:Oe,header$:rt})),...ae("sidebar").map(e=>e.getAttribute("data-md-type")==="navigation"?Nr(Oi,()=>Xr(e,{viewport$:Oe,header$:rt,main$:Ft})):Nr(hr,()=>Xr(e,{viewport$:Oe,header$:rt,main$:Ft}))),...ae("tabs").map(e=>vi(e,{viewport$:Oe,header$:rt})),...ae("toc").map(e=>gi(e,{viewport$:Oe,header$:rt,main$:Ft,target$:Mt})),...ae("top").map(e=>yi(e,{viewport$:Oe,header$:rt,main$:Ft,target$:Mt})))),_i=ot.pipe(v(()=>bs),Re(hs),G(1));_i.subscribe();window.document$=ot;window.location$=Wt;window.target$=Mt;window.keyboard$=eo;window.viewport$=Oe;window.tablet$=hr;window.screen$=Oi;window.print$=Mi;window.alert$=to;window.progress$=ro;window.component$=_i;})(); +//# sourceMappingURL=bundle.d6f25eb3.min.js.map + diff --git a/assets/javascripts/bundle.d6f25eb3.min.js.map b/assets/javascripts/bundle.d6f25eb3.min.js.map new file mode 100644 index 0000000000..e3806a548d --- /dev/null +++ b/assets/javascripts/bundle.d6f25eb3.min.js.map @@ -0,0 +1,7 @@ +{ + "version": 3, + "sources": ["node_modules/focus-visible/dist/focus-visible.js", "node_modules/escape-html/index.js", "node_modules/clipboard/dist/clipboard.js", "src/templates/assets/javascripts/bundle.ts", "node_modules/tslib/tslib.es6.mjs", "node_modules/rxjs/src/internal/util/isFunction.ts", "node_modules/rxjs/src/internal/util/createErrorClass.ts", "node_modules/rxjs/src/internal/util/UnsubscriptionError.ts", "node_modules/rxjs/src/internal/util/arrRemove.ts", "node_modules/rxjs/src/internal/Subscription.ts", "node_modules/rxjs/src/internal/config.ts", "node_modules/rxjs/src/internal/scheduler/timeoutProvider.ts", "node_modules/rxjs/src/internal/util/reportUnhandledError.ts", "node_modules/rxjs/src/internal/util/noop.ts", "node_modules/rxjs/src/internal/NotificationFactories.ts", "node_modules/rxjs/src/internal/util/errorContext.ts", "node_modules/rxjs/src/internal/Subscriber.ts", "node_modules/rxjs/src/internal/symbol/observable.ts", "node_modules/rxjs/src/internal/util/identity.ts", "node_modules/rxjs/src/internal/util/pipe.ts", "node_modules/rxjs/src/internal/Observable.ts", "node_modules/rxjs/src/internal/util/lift.ts", "node_modules/rxjs/src/internal/operators/OperatorSubscriber.ts", "node_modules/rxjs/src/internal/scheduler/animationFrameProvider.ts", "node_modules/rxjs/src/internal/util/ObjectUnsubscribedError.ts", "node_modules/rxjs/src/internal/Subject.ts", "node_modules/rxjs/src/internal/BehaviorSubject.ts", "node_modules/rxjs/src/internal/scheduler/dateTimestampProvider.ts", "node_modules/rxjs/src/internal/ReplaySubject.ts", "node_modules/rxjs/src/internal/scheduler/Action.ts", "node_modules/rxjs/src/internal/scheduler/intervalProvider.ts", "node_modules/rxjs/src/internal/scheduler/AsyncAction.ts", "node_modules/rxjs/src/internal/Scheduler.ts", "node_modules/rxjs/src/internal/scheduler/AsyncScheduler.ts", "node_modules/rxjs/src/internal/scheduler/async.ts", "node_modules/rxjs/src/internal/scheduler/QueueAction.ts", "node_modules/rxjs/src/internal/scheduler/QueueScheduler.ts", "node_modules/rxjs/src/internal/scheduler/queue.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameAction.ts", "node_modules/rxjs/src/internal/scheduler/AnimationFrameScheduler.ts", "node_modules/rxjs/src/internal/scheduler/animationFrame.ts", "node_modules/rxjs/src/internal/observable/empty.ts", "node_modules/rxjs/src/internal/util/isScheduler.ts", "node_modules/rxjs/src/internal/util/args.ts", "node_modules/rxjs/src/internal/util/isArrayLike.ts", "node_modules/rxjs/src/internal/util/isPromise.ts", "node_modules/rxjs/src/internal/util/isInteropObservable.ts", "node_modules/rxjs/src/internal/util/isAsyncIterable.ts", "node_modules/rxjs/src/internal/util/throwUnobservableError.ts", "node_modules/rxjs/src/internal/symbol/iterator.ts", "node_modules/rxjs/src/internal/util/isIterable.ts", "node_modules/rxjs/src/internal/util/isReadableStreamLike.ts", "node_modules/rxjs/src/internal/observable/innerFrom.ts", "node_modules/rxjs/src/internal/util/executeSchedule.ts", "node_modules/rxjs/src/internal/operators/observeOn.ts", "node_modules/rxjs/src/internal/operators/subscribeOn.ts", "node_modules/rxjs/src/internal/scheduled/scheduleObservable.ts", "node_modules/rxjs/src/internal/scheduled/schedulePromise.ts", "node_modules/rxjs/src/internal/scheduled/scheduleArray.ts", "node_modules/rxjs/src/internal/scheduled/scheduleIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleAsyncIterable.ts", "node_modules/rxjs/src/internal/scheduled/scheduleReadableStreamLike.ts", "node_modules/rxjs/src/internal/scheduled/scheduled.ts", "node_modules/rxjs/src/internal/observable/from.ts", "node_modules/rxjs/src/internal/observable/of.ts", "node_modules/rxjs/src/internal/observable/throwError.ts", "node_modules/rxjs/src/internal/util/EmptyError.ts", "node_modules/rxjs/src/internal/util/isDate.ts", "node_modules/rxjs/src/internal/operators/map.ts", "node_modules/rxjs/src/internal/util/mapOneOrManyArgs.ts", "node_modules/rxjs/src/internal/util/argsArgArrayOrObject.ts", "node_modules/rxjs/src/internal/util/createObject.ts", "node_modules/rxjs/src/internal/observable/combineLatest.ts", "node_modules/rxjs/src/internal/operators/mergeInternals.ts", "node_modules/rxjs/src/internal/operators/mergeMap.ts", "node_modules/rxjs/src/internal/operators/mergeAll.ts", "node_modules/rxjs/src/internal/operators/concatAll.ts", "node_modules/rxjs/src/internal/observable/concat.ts", "node_modules/rxjs/src/internal/observable/defer.ts", "node_modules/rxjs/src/internal/observable/fromEvent.ts", "node_modules/rxjs/src/internal/observable/fromEventPattern.ts", "node_modules/rxjs/src/internal/observable/timer.ts", "node_modules/rxjs/src/internal/observable/merge.ts", "node_modules/rxjs/src/internal/observable/never.ts", "node_modules/rxjs/src/internal/util/argsOrArgArray.ts", "node_modules/rxjs/src/internal/operators/filter.ts", "node_modules/rxjs/src/internal/observable/zip.ts", "node_modules/rxjs/src/internal/operators/audit.ts", "node_modules/rxjs/src/internal/operators/auditTime.ts", "node_modules/rxjs/src/internal/operators/bufferCount.ts", "node_modules/rxjs/src/internal/operators/catchError.ts", "node_modules/rxjs/src/internal/operators/scanInternals.ts", "node_modules/rxjs/src/internal/operators/combineLatest.ts", "node_modules/rxjs/src/internal/operators/combineLatestWith.ts", "node_modules/rxjs/src/internal/operators/debounce.ts", "node_modules/rxjs/src/internal/operators/debounceTime.ts", "node_modules/rxjs/src/internal/operators/defaultIfEmpty.ts", "node_modules/rxjs/src/internal/operators/take.ts", "node_modules/rxjs/src/internal/operators/ignoreElements.ts", "node_modules/rxjs/src/internal/operators/mapTo.ts", "node_modules/rxjs/src/internal/operators/delayWhen.ts", "node_modules/rxjs/src/internal/operators/delay.ts", "node_modules/rxjs/src/internal/operators/distinctUntilChanged.ts", "node_modules/rxjs/src/internal/operators/distinctUntilKeyChanged.ts", "node_modules/rxjs/src/internal/operators/throwIfEmpty.ts", "node_modules/rxjs/src/internal/operators/endWith.ts", "node_modules/rxjs/src/internal/operators/finalize.ts", "node_modules/rxjs/src/internal/operators/first.ts", "node_modules/rxjs/src/internal/operators/takeLast.ts", "node_modules/rxjs/src/internal/operators/merge.ts", "node_modules/rxjs/src/internal/operators/mergeWith.ts", "node_modules/rxjs/src/internal/operators/repeat.ts", "node_modules/rxjs/src/internal/operators/scan.ts", "node_modules/rxjs/src/internal/operators/share.ts", "node_modules/rxjs/src/internal/operators/shareReplay.ts", "node_modules/rxjs/src/internal/operators/skip.ts", "node_modules/rxjs/src/internal/operators/skipUntil.ts", "node_modules/rxjs/src/internal/operators/startWith.ts", "node_modules/rxjs/src/internal/operators/switchMap.ts", "node_modules/rxjs/src/internal/operators/takeUntil.ts", "node_modules/rxjs/src/internal/operators/takeWhile.ts", "node_modules/rxjs/src/internal/operators/tap.ts", "node_modules/rxjs/src/internal/operators/throttle.ts", "node_modules/rxjs/src/internal/operators/throttleTime.ts", "node_modules/rxjs/src/internal/operators/withLatestFrom.ts", "node_modules/rxjs/src/internal/operators/zip.ts", "node_modules/rxjs/src/internal/operators/zipWith.ts", "src/templates/assets/javascripts/browser/document/index.ts", "src/templates/assets/javascripts/browser/element/_/index.ts", "src/templates/assets/javascripts/browser/element/focus/index.ts", "src/templates/assets/javascripts/browser/element/hover/index.ts", "src/templates/assets/javascripts/utilities/h/index.ts", "src/templates/assets/javascripts/utilities/round/index.ts", "src/templates/assets/javascripts/browser/script/index.ts", "src/templates/assets/javascripts/browser/element/size/_/index.ts", "src/templates/assets/javascripts/browser/element/size/content/index.ts", "src/templates/assets/javascripts/browser/element/offset/_/index.ts", "src/templates/assets/javascripts/browser/element/offset/content/index.ts", "src/templates/assets/javascripts/browser/element/visibility/index.ts", "src/templates/assets/javascripts/browser/toggle/index.ts", "src/templates/assets/javascripts/browser/keyboard/index.ts", "src/templates/assets/javascripts/browser/location/_/index.ts", "src/templates/assets/javascripts/browser/location/hash/index.ts", "src/templates/assets/javascripts/browser/media/index.ts", "src/templates/assets/javascripts/browser/request/index.ts", "src/templates/assets/javascripts/browser/viewport/offset/index.ts", "src/templates/assets/javascripts/browser/viewport/size/index.ts", "src/templates/assets/javascripts/browser/viewport/_/index.ts", "src/templates/assets/javascripts/browser/viewport/at/index.ts", "src/templates/assets/javascripts/browser/worker/index.ts", "src/templates/assets/javascripts/_/index.ts", "src/templates/assets/javascripts/components/_/index.ts", "src/templates/assets/javascripts/components/announce/index.ts", "src/templates/assets/javascripts/components/consent/index.ts", "src/templates/assets/javascripts/templates/tooltip/index.tsx", "src/templates/assets/javascripts/templates/annotation/index.tsx", "src/templates/assets/javascripts/templates/clipboard/index.tsx", "src/templates/assets/javascripts/templates/search/index.tsx", "src/templates/assets/javascripts/templates/source/index.tsx", "src/templates/assets/javascripts/templates/tabbed/index.tsx", "src/templates/assets/javascripts/templates/table/index.tsx", "src/templates/assets/javascripts/templates/version/index.tsx", "src/templates/assets/javascripts/components/tooltip2/index.ts", "src/templates/assets/javascripts/components/content/annotation/_/index.ts", "src/templates/assets/javascripts/components/content/annotation/list/index.ts", "src/templates/assets/javascripts/components/content/annotation/block/index.ts", "src/templates/assets/javascripts/components/content/code/_/index.ts", "src/templates/assets/javascripts/components/content/details/index.ts", "src/templates/assets/javascripts/components/content/mermaid/index.css", "src/templates/assets/javascripts/components/content/mermaid/index.ts", "src/templates/assets/javascripts/components/content/table/index.ts", "src/templates/assets/javascripts/components/content/tabs/index.ts", "src/templates/assets/javascripts/components/content/_/index.ts", "src/templates/assets/javascripts/components/dialog/index.ts", "src/templates/assets/javascripts/components/tooltip/index.ts", "src/templates/assets/javascripts/components/header/_/index.ts", "src/templates/assets/javascripts/components/header/title/index.ts", "src/templates/assets/javascripts/components/main/index.ts", "src/templates/assets/javascripts/components/palette/index.ts", "src/templates/assets/javascripts/components/progress/index.ts", "src/templates/assets/javascripts/integrations/clipboard/index.ts", "src/templates/assets/javascripts/integrations/sitemap/index.ts", "src/templates/assets/javascripts/integrations/instant/index.ts", "src/templates/assets/javascripts/integrations/search/highlighter/index.ts", "src/templates/assets/javascripts/integrations/search/worker/message/index.ts", "src/templates/assets/javascripts/integrations/search/worker/_/index.ts", "src/templates/assets/javascripts/integrations/version/index.ts", "src/templates/assets/javascripts/components/search/query/index.ts", "src/templates/assets/javascripts/components/search/result/index.ts", "src/templates/assets/javascripts/components/search/share/index.ts", "src/templates/assets/javascripts/components/search/suggest/index.ts", "src/templates/assets/javascripts/components/search/_/index.ts", "src/templates/assets/javascripts/components/search/highlight/index.ts", "src/templates/assets/javascripts/components/sidebar/index.ts", "src/templates/assets/javascripts/components/source/facts/github/index.ts", "src/templates/assets/javascripts/components/source/facts/gitlab/index.ts", "src/templates/assets/javascripts/components/source/facts/_/index.ts", "src/templates/assets/javascripts/components/source/_/index.ts", "src/templates/assets/javascripts/components/tabs/index.ts", "src/templates/assets/javascripts/components/toc/index.ts", "src/templates/assets/javascripts/components/top/index.ts", "src/templates/assets/javascripts/patches/ellipsis/index.ts", "src/templates/assets/javascripts/patches/indeterminate/index.ts", "src/templates/assets/javascripts/patches/scrollfix/index.ts", "src/templates/assets/javascripts/patches/scrolllock/index.ts", "src/templates/assets/javascripts/polyfills/index.ts"], + "sourcesContent": ["(function (global, factory) {\n typeof exports === 'object' && typeof module !== 'undefined' ? factory() :\n typeof define === 'function' && define.amd ? define(factory) :\n (factory());\n}(this, (function () { 'use strict';\n\n /**\n * Applies the :focus-visible polyfill at the given scope.\n * A scope in this case is either the top-level Document or a Shadow Root.\n *\n * @param {(Document|ShadowRoot)} scope\n * @see https://github.com/WICG/focus-visible\n */\n function applyFocusVisiblePolyfill(scope) {\n var hadKeyboardEvent = true;\n var hadFocusVisibleRecently = false;\n var hadFocusVisibleRecentlyTimeout = null;\n\n var inputTypesAllowlist = {\n text: true,\n search: true,\n url: true,\n tel: true,\n email: true,\n password: true,\n number: true,\n date: true,\n month: true,\n week: true,\n time: true,\n datetime: true,\n 'datetime-local': true\n };\n\n /**\n * Helper function for legacy browsers and iframes which sometimes focus\n * elements like document, body, and non-interactive SVG.\n * @param {Element} el\n */\n function isValidFocusTarget(el) {\n if (\n el &&\n el !== document &&\n el.nodeName !== 'HTML' &&\n el.nodeName !== 'BODY' &&\n 'classList' in el &&\n 'contains' in el.classList\n ) {\n return true;\n }\n return false;\n }\n\n /**\n * Computes whether the given element should automatically trigger the\n * `focus-visible` class being added, i.e. whether it should always match\n * `:focus-visible` when focused.\n * @param {Element} el\n * @return {boolean}\n */\n function focusTriggersKeyboardModality(el) {\n var type = el.type;\n var tagName = el.tagName;\n\n if (tagName === 'INPUT' && inputTypesAllowlist[type] && !el.readOnly) {\n return true;\n }\n\n if (tagName === 'TEXTAREA' && !el.readOnly) {\n return true;\n }\n\n if (el.isContentEditable) {\n return true;\n }\n\n return false;\n }\n\n /**\n * Add the `focus-visible` class to the given element if it was not added by\n * the author.\n * @param {Element} el\n */\n function addFocusVisibleClass(el) {\n if (el.classList.contains('focus-visible')) {\n return;\n }\n el.classList.add('focus-visible');\n el.setAttribute('data-focus-visible-added', '');\n }\n\n /**\n * Remove the `focus-visible` class from the given element if it was not\n * originally added by the author.\n * @param {Element} el\n */\n function removeFocusVisibleClass(el) {\n if (!el.hasAttribute('data-focus-visible-added')) {\n return;\n }\n el.classList.remove('focus-visible');\n el.removeAttribute('data-focus-visible-added');\n }\n\n /**\n * If the most recent user interaction was via the keyboard;\n * and the key press did not include a meta, alt/option, or control key;\n * then the modality is keyboard. Otherwise, the modality is not keyboard.\n * Apply `focus-visible` to any current active element and keep track\n * of our keyboard modality state with `hadKeyboardEvent`.\n * @param {KeyboardEvent} e\n */\n function onKeyDown(e) {\n if (e.metaKey || e.altKey || e.ctrlKey) {\n return;\n }\n\n if (isValidFocusTarget(scope.activeElement)) {\n addFocusVisibleClass(scope.activeElement);\n }\n\n hadKeyboardEvent = true;\n }\n\n /**\n * If at any point a user clicks with a pointing device, ensure that we change\n * the modality away from keyboard.\n * This avoids the situation where a user presses a key on an already focused\n * element, and then clicks on a different element, focusing it with a\n * pointing device, while we still think we're in keyboard modality.\n * @param {Event} e\n */\n function onPointerDown(e) {\n hadKeyboardEvent = false;\n }\n\n /**\n * On `focus`, add the `focus-visible` class to the target if:\n * - the target received focus as a result of keyboard navigation, or\n * - the event target is an element that will likely require interaction\n * via the keyboard (e.g. a text box)\n * @param {Event} e\n */\n function onFocus(e) {\n // Prevent IE from focusing the document or HTML element.\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (hadKeyboardEvent || focusTriggersKeyboardModality(e.target)) {\n addFocusVisibleClass(e.target);\n }\n }\n\n /**\n * On `blur`, remove the `focus-visible` class from the target.\n * @param {Event} e\n */\n function onBlur(e) {\n if (!isValidFocusTarget(e.target)) {\n return;\n }\n\n if (\n e.target.classList.contains('focus-visible') ||\n e.target.hasAttribute('data-focus-visible-added')\n ) {\n // To detect a tab/window switch, we look for a blur event followed\n // rapidly by a visibility change.\n // If we don't see a visibility change within 100ms, it's probably a\n // regular focus change.\n hadFocusVisibleRecently = true;\n window.clearTimeout(hadFocusVisibleRecentlyTimeout);\n hadFocusVisibleRecentlyTimeout = window.setTimeout(function() {\n hadFocusVisibleRecently = false;\n }, 100);\n removeFocusVisibleClass(e.target);\n }\n }\n\n /**\n * If the user changes tabs, keep track of whether or not the previously\n * focused element had .focus-visible.\n * @param {Event} e\n */\n function onVisibilityChange(e) {\n if (document.visibilityState === 'hidden') {\n // If the tab becomes active again, the browser will handle calling focus\n // on the element (Safari actually calls it twice).\n // If this tab change caused a blur on an element with focus-visible,\n // re-apply the class when the user switches back to the tab.\n if (hadFocusVisibleRecently) {\n hadKeyboardEvent = true;\n }\n addInitialPointerMoveListeners();\n }\n }\n\n /**\n * Add a group of listeners to detect usage of any pointing devices.\n * These listeners will be added when the polyfill first loads, and anytime\n * the window is blurred, so that they are active when the window regains\n * focus.\n */\n function addInitialPointerMoveListeners() {\n document.addEventListener('mousemove', onInitialPointerMove);\n document.addEventListener('mousedown', onInitialPointerMove);\n document.addEventListener('mouseup', onInitialPointerMove);\n document.addEventListener('pointermove', onInitialPointerMove);\n document.addEventListener('pointerdown', onInitialPointerMove);\n document.addEventListener('pointerup', onInitialPointerMove);\n document.addEventListener('touchmove', onInitialPointerMove);\n document.addEventListener('touchstart', onInitialPointerMove);\n document.addEventListener('touchend', onInitialPointerMove);\n }\n\n function removeInitialPointerMoveListeners() {\n document.removeEventListener('mousemove', onInitialPointerMove);\n document.removeEventListener('mousedown', onInitialPointerMove);\n document.removeEventListener('mouseup', onInitialPointerMove);\n document.removeEventListener('pointermove', onInitialPointerMove);\n document.removeEventListener('pointerdown', onInitialPointerMove);\n document.removeEventListener('pointerup', onInitialPointerMove);\n document.removeEventListener('touchmove', onInitialPointerMove);\n document.removeEventListener('touchstart', onInitialPointerMove);\n document.removeEventListener('touchend', onInitialPointerMove);\n }\n\n /**\n * When the polfyill first loads, assume the user is in keyboard modality.\n * If any event is received from a pointing device (e.g. mouse, pointer,\n * touch), turn off keyboard modality.\n * This accounts for situations where focus enters the page from the URL bar.\n * @param {Event} e\n */\n function onInitialPointerMove(e) {\n // Work around a Safari quirk that fires a mousemove on whenever the\n // window blurs, even if you're tabbing out of the page. \u00AF\\_(\u30C4)_/\u00AF\n if (e.target.nodeName && e.target.nodeName.toLowerCase() === 'html') {\n return;\n }\n\n hadKeyboardEvent = false;\n removeInitialPointerMoveListeners();\n }\n\n // For some kinds of state, we are interested in changes at the global scope\n // only. For example, global pointer input, global key presses and global\n // visibility change should affect the state at every scope:\n document.addEventListener('keydown', onKeyDown, true);\n document.addEventListener('mousedown', onPointerDown, true);\n document.addEventListener('pointerdown', onPointerDown, true);\n document.addEventListener('touchstart', onPointerDown, true);\n document.addEventListener('visibilitychange', onVisibilityChange, true);\n\n addInitialPointerMoveListeners();\n\n // For focus and blur, we specifically care about state changes in the local\n // scope. This is because focus / blur events that originate from within a\n // shadow root are not re-dispatched from the host element if it was already\n // the active element in its own scope:\n scope.addEventListener('focus', onFocus, true);\n scope.addEventListener('blur', onBlur, true);\n\n // We detect that a node is a ShadowRoot by ensuring that it is a\n // DocumentFragment and also has a host property. This check covers native\n // implementation and polyfill implementation transparently. If we only cared\n // about the native implementation, we could just check if the scope was\n // an instance of a ShadowRoot.\n if (scope.nodeType === Node.DOCUMENT_FRAGMENT_NODE && scope.host) {\n // Since a ShadowRoot is a special kind of DocumentFragment, it does not\n // have a root element to add a class to. So, we add this attribute to the\n // host element instead:\n scope.host.setAttribute('data-js-focus-visible', '');\n } else if (scope.nodeType === Node.DOCUMENT_NODE) {\n document.documentElement.classList.add('js-focus-visible');\n document.documentElement.setAttribute('data-js-focus-visible', '');\n }\n }\n\n // It is important to wrap all references to global window and document in\n // these checks to support server-side rendering use cases\n // @see https://github.com/WICG/focus-visible/issues/199\n if (typeof window !== 'undefined' && typeof document !== 'undefined') {\n // Make the polyfill helper globally available. This can be used as a signal\n // to interested libraries that wish to coordinate with the polyfill for e.g.,\n // applying the polyfill to a shadow root:\n window.applyFocusVisiblePolyfill = applyFocusVisiblePolyfill;\n\n // Notify interested libraries of the polyfill's presence, in case the\n // polyfill was loaded lazily:\n var event;\n\n try {\n event = new CustomEvent('focus-visible-polyfill-ready');\n } catch (error) {\n // IE11 does not support using CustomEvent as a constructor directly:\n event = document.createEvent('CustomEvent');\n event.initCustomEvent('focus-visible-polyfill-ready', false, false, {});\n }\n\n window.dispatchEvent(event);\n }\n\n if (typeof document !== 'undefined') {\n // Apply the polyfill to the global document, so that no JavaScript\n // coordination is required to use the polyfill in the top-level document:\n applyFocusVisiblePolyfill(document);\n }\n\n})));\n", "/*!\n * escape-html\n * Copyright(c) 2012-2013 TJ Holowaychuk\n * Copyright(c) 2015 Andreas Lubbe\n * Copyright(c) 2015 Tiancheng \"Timothy\" Gu\n * MIT Licensed\n */\n\n'use strict';\n\n/**\n * Module variables.\n * @private\n */\n\nvar matchHtmlRegExp = /[\"'&<>]/;\n\n/**\n * Module exports.\n * @public\n */\n\nmodule.exports = escapeHtml;\n\n/**\n * Escape special characters in the given string of html.\n *\n * @param {string} string The string to escape for inserting into HTML\n * @return {string}\n * @public\n */\n\nfunction escapeHtml(string) {\n var str = '' + string;\n var match = matchHtmlRegExp.exec(str);\n\n if (!match) {\n return str;\n }\n\n var escape;\n var html = '';\n var index = 0;\n var lastIndex = 0;\n\n for (index = match.index; index < str.length; index++) {\n switch (str.charCodeAt(index)) {\n case 34: // \"\n escape = '"';\n break;\n case 38: // &\n escape = '&';\n break;\n case 39: // '\n escape = ''';\n break;\n case 60: // <\n escape = '<';\n break;\n case 62: // >\n escape = '>';\n break;\n default:\n continue;\n }\n\n if (lastIndex !== index) {\n html += str.substring(lastIndex, index);\n }\n\n lastIndex = index + 1;\n html += escape;\n }\n\n return lastIndex !== index\n ? html + str.substring(lastIndex, index)\n : html;\n}\n", "/*!\n * clipboard.js v2.0.11\n * https://clipboardjs.com/\n *\n * Licensed MIT \u00A9 Zeno Rocha\n */\n(function webpackUniversalModuleDefinition(root, factory) {\n\tif(typeof exports === 'object' && typeof module === 'object')\n\t\tmodule.exports = factory();\n\telse if(typeof define === 'function' && define.amd)\n\t\tdefine([], factory);\n\telse if(typeof exports === 'object')\n\t\texports[\"ClipboardJS\"] = factory();\n\telse\n\t\troot[\"ClipboardJS\"] = factory();\n})(this, function() {\nreturn /******/ (function() { // webpackBootstrap\n/******/ \tvar __webpack_modules__ = ({\n\n/***/ 686:\n/***/ (function(__unused_webpack_module, __webpack_exports__, __webpack_require__) {\n\n\"use strict\";\n\n// EXPORTS\n__webpack_require__.d(__webpack_exports__, {\n \"default\": function() { return /* binding */ clipboard; }\n});\n\n// EXTERNAL MODULE: ./node_modules/tiny-emitter/index.js\nvar tiny_emitter = __webpack_require__(279);\nvar tiny_emitter_default = /*#__PURE__*/__webpack_require__.n(tiny_emitter);\n// EXTERNAL MODULE: ./node_modules/good-listener/src/listen.js\nvar listen = __webpack_require__(370);\nvar listen_default = /*#__PURE__*/__webpack_require__.n(listen);\n// EXTERNAL MODULE: ./node_modules/select/src/select.js\nvar src_select = __webpack_require__(817);\nvar select_default = /*#__PURE__*/__webpack_require__.n(src_select);\n;// CONCATENATED MODULE: ./src/common/command.js\n/**\n * Executes a given operation type.\n * @param {String} type\n * @return {Boolean}\n */\nfunction command(type) {\n try {\n return document.execCommand(type);\n } catch (err) {\n return false;\n }\n}\n;// CONCATENATED MODULE: ./src/actions/cut.js\n\n\n/**\n * Cut action wrapper.\n * @param {String|HTMLElement} target\n * @return {String}\n */\n\nvar ClipboardActionCut = function ClipboardActionCut(target) {\n var selectedText = select_default()(target);\n command('cut');\n return selectedText;\n};\n\n/* harmony default export */ var actions_cut = (ClipboardActionCut);\n;// CONCATENATED MODULE: ./src/common/create-fake-element.js\n/**\n * Creates a fake textarea element with a value.\n * @param {String} value\n * @return {HTMLElement}\n */\nfunction createFakeElement(value) {\n var isRTL = document.documentElement.getAttribute('dir') === 'rtl';\n var fakeElement = document.createElement('textarea'); // Prevent zooming on iOS\n\n fakeElement.style.fontSize = '12pt'; // Reset box model\n\n fakeElement.style.border = '0';\n fakeElement.style.padding = '0';\n fakeElement.style.margin = '0'; // Move element out of screen horizontally\n\n fakeElement.style.position = 'absolute';\n fakeElement.style[isRTL ? 'right' : 'left'] = '-9999px'; // Move element to the same position vertically\n\n var yPosition = window.pageYOffset || document.documentElement.scrollTop;\n fakeElement.style.top = \"\".concat(yPosition, \"px\");\n fakeElement.setAttribute('readonly', '');\n fakeElement.value = value;\n return fakeElement;\n}\n;// CONCATENATED MODULE: ./src/actions/copy.js\n\n\n\n/**\n * Create fake copy action wrapper using a fake element.\n * @param {String} target\n * @param {Object} options\n * @return {String}\n */\n\nvar fakeCopyAction = function fakeCopyAction(value, options) {\n var fakeElement = createFakeElement(value);\n options.container.appendChild(fakeElement);\n var selectedText = select_default()(fakeElement);\n command('copy');\n fakeElement.remove();\n return selectedText;\n};\n/**\n * Copy action wrapper.\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @return {String}\n */\n\n\nvar ClipboardActionCopy = function ClipboardActionCopy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n var selectedText = '';\n\n if (typeof target === 'string') {\n selectedText = fakeCopyAction(target, options);\n } else if (target instanceof HTMLInputElement && !['text', 'search', 'url', 'tel', 'password'].includes(target === null || target === void 0 ? void 0 : target.type)) {\n // If input type doesn't support `setSelectionRange`. Simulate it. https://developer.mozilla.org/en-US/docs/Web/API/HTMLInputElement/setSelectionRange\n selectedText = fakeCopyAction(target.value, options);\n } else {\n selectedText = select_default()(target);\n command('copy');\n }\n\n return selectedText;\n};\n\n/* harmony default export */ var actions_copy = (ClipboardActionCopy);\n;// CONCATENATED MODULE: ./src/actions/default.js\nfunction _typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { _typeof = function _typeof(obj) { return typeof obj; }; } else { _typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return _typeof(obj); }\n\n\n\n/**\n * Inner function which performs selection from either `text` or `target`\n * properties and then executes copy or cut operations.\n * @param {Object} options\n */\n\nvar ClipboardActionDefault = function ClipboardActionDefault() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n // Defines base properties passed from constructor.\n var _options$action = options.action,\n action = _options$action === void 0 ? 'copy' : _options$action,\n container = options.container,\n target = options.target,\n text = options.text; // Sets the `action` to be performed which can be either 'copy' or 'cut'.\n\n if (action !== 'copy' && action !== 'cut') {\n throw new Error('Invalid \"action\" value, use either \"copy\" or \"cut\"');\n } // Sets the `target` property using an element that will be have its content copied.\n\n\n if (target !== undefined) {\n if (target && _typeof(target) === 'object' && target.nodeType === 1) {\n if (action === 'copy' && target.hasAttribute('disabled')) {\n throw new Error('Invalid \"target\" attribute. Please use \"readonly\" instead of \"disabled\" attribute');\n }\n\n if (action === 'cut' && (target.hasAttribute('readonly') || target.hasAttribute('disabled'))) {\n throw new Error('Invalid \"target\" attribute. You can\\'t cut text from elements with \"readonly\" or \"disabled\" attributes');\n }\n } else {\n throw new Error('Invalid \"target\" value, use a valid Element');\n }\n } // Define selection strategy based on `text` property.\n\n\n if (text) {\n return actions_copy(text, {\n container: container\n });\n } // Defines which selection strategy based on `target` property.\n\n\n if (target) {\n return action === 'cut' ? actions_cut(target) : actions_copy(target, {\n container: container\n });\n }\n};\n\n/* harmony default export */ var actions_default = (ClipboardActionDefault);\n;// CONCATENATED MODULE: ./src/clipboard.js\nfunction clipboard_typeof(obj) { \"@babel/helpers - typeof\"; if (typeof Symbol === \"function\" && typeof Symbol.iterator === \"symbol\") { clipboard_typeof = function _typeof(obj) { return typeof obj; }; } else { clipboard_typeof = function _typeof(obj) { return obj && typeof Symbol === \"function\" && obj.constructor === Symbol && obj !== Symbol.prototype ? \"symbol\" : typeof obj; }; } return clipboard_typeof(obj); }\n\nfunction _classCallCheck(instance, Constructor) { if (!(instance instanceof Constructor)) { throw new TypeError(\"Cannot call a class as a function\"); } }\n\nfunction _defineProperties(target, props) { for (var i = 0; i < props.length; i++) { var descriptor = props[i]; descriptor.enumerable = descriptor.enumerable || false; descriptor.configurable = true; if (\"value\" in descriptor) descriptor.writable = true; Object.defineProperty(target, descriptor.key, descriptor); } }\n\nfunction _createClass(Constructor, protoProps, staticProps) { if (protoProps) _defineProperties(Constructor.prototype, protoProps); if (staticProps) _defineProperties(Constructor, staticProps); return Constructor; }\n\nfunction _inherits(subClass, superClass) { if (typeof superClass !== \"function\" && superClass !== null) { throw new TypeError(\"Super expression must either be null or a function\"); } subClass.prototype = Object.create(superClass && superClass.prototype, { constructor: { value: subClass, writable: true, configurable: true } }); if (superClass) _setPrototypeOf(subClass, superClass); }\n\nfunction _setPrototypeOf(o, p) { _setPrototypeOf = Object.setPrototypeOf || function _setPrototypeOf(o, p) { o.__proto__ = p; return o; }; return _setPrototypeOf(o, p); }\n\nfunction _createSuper(Derived) { var hasNativeReflectConstruct = _isNativeReflectConstruct(); return function _createSuperInternal() { var Super = _getPrototypeOf(Derived), result; if (hasNativeReflectConstruct) { var NewTarget = _getPrototypeOf(this).constructor; result = Reflect.construct(Super, arguments, NewTarget); } else { result = Super.apply(this, arguments); } return _possibleConstructorReturn(this, result); }; }\n\nfunction _possibleConstructorReturn(self, call) { if (call && (clipboard_typeof(call) === \"object\" || typeof call === \"function\")) { return call; } return _assertThisInitialized(self); }\n\nfunction _assertThisInitialized(self) { if (self === void 0) { throw new ReferenceError(\"this hasn't been initialised - super() hasn't been called\"); } return self; }\n\nfunction _isNativeReflectConstruct() { if (typeof Reflect === \"undefined\" || !Reflect.construct) return false; if (Reflect.construct.sham) return false; if (typeof Proxy === \"function\") return true; try { Date.prototype.toString.call(Reflect.construct(Date, [], function () {})); return true; } catch (e) { return false; } }\n\nfunction _getPrototypeOf(o) { _getPrototypeOf = Object.setPrototypeOf ? Object.getPrototypeOf : function _getPrototypeOf(o) { return o.__proto__ || Object.getPrototypeOf(o); }; return _getPrototypeOf(o); }\n\n\n\n\n\n\n/**\n * Helper function to retrieve attribute value.\n * @param {String} suffix\n * @param {Element} element\n */\n\nfunction getAttributeValue(suffix, element) {\n var attribute = \"data-clipboard-\".concat(suffix);\n\n if (!element.hasAttribute(attribute)) {\n return;\n }\n\n return element.getAttribute(attribute);\n}\n/**\n * Base class which takes one or more elements, adds event listeners to them,\n * and instantiates a new `ClipboardAction` on each click.\n */\n\n\nvar Clipboard = /*#__PURE__*/function (_Emitter) {\n _inherits(Clipboard, _Emitter);\n\n var _super = _createSuper(Clipboard);\n\n /**\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n * @param {Object} options\n */\n function Clipboard(trigger, options) {\n var _this;\n\n _classCallCheck(this, Clipboard);\n\n _this = _super.call(this);\n\n _this.resolveOptions(options);\n\n _this.listenClick(trigger);\n\n return _this;\n }\n /**\n * Defines if attributes would be resolved using internal setter functions\n * or custom functions that were passed in the constructor.\n * @param {Object} options\n */\n\n\n _createClass(Clipboard, [{\n key: \"resolveOptions\",\n value: function resolveOptions() {\n var options = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : {};\n this.action = typeof options.action === 'function' ? options.action : this.defaultAction;\n this.target = typeof options.target === 'function' ? options.target : this.defaultTarget;\n this.text = typeof options.text === 'function' ? options.text : this.defaultText;\n this.container = clipboard_typeof(options.container) === 'object' ? options.container : document.body;\n }\n /**\n * Adds a click event listener to the passed trigger.\n * @param {String|HTMLElement|HTMLCollection|NodeList} trigger\n */\n\n }, {\n key: \"listenClick\",\n value: function listenClick(trigger) {\n var _this2 = this;\n\n this.listener = listen_default()(trigger, 'click', function (e) {\n return _this2.onClick(e);\n });\n }\n /**\n * Defines a new `ClipboardAction` on each click event.\n * @param {Event} e\n */\n\n }, {\n key: \"onClick\",\n value: function onClick(e) {\n var trigger = e.delegateTarget || e.currentTarget;\n var action = this.action(trigger) || 'copy';\n var text = actions_default({\n action: action,\n container: this.container,\n target: this.target(trigger),\n text: this.text(trigger)\n }); // Fires an event based on the copy operation result.\n\n this.emit(text ? 'success' : 'error', {\n action: action,\n text: text,\n trigger: trigger,\n clearSelection: function clearSelection() {\n if (trigger) {\n trigger.focus();\n }\n\n window.getSelection().removeAllRanges();\n }\n });\n }\n /**\n * Default `action` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultAction\",\n value: function defaultAction(trigger) {\n return getAttributeValue('action', trigger);\n }\n /**\n * Default `target` lookup function.\n * @param {Element} trigger\n */\n\n }, {\n key: \"defaultTarget\",\n value: function defaultTarget(trigger) {\n var selector = getAttributeValue('target', trigger);\n\n if (selector) {\n return document.querySelector(selector);\n }\n }\n /**\n * Allow fire programmatically a copy action\n * @param {String|HTMLElement} target\n * @param {Object} options\n * @returns Text copied.\n */\n\n }, {\n key: \"defaultText\",\n\n /**\n * Default `text` lookup function.\n * @param {Element} trigger\n */\n value: function defaultText(trigger) {\n return getAttributeValue('text', trigger);\n }\n /**\n * Destroy lifecycle.\n */\n\n }, {\n key: \"destroy\",\n value: function destroy() {\n this.listener.destroy();\n }\n }], [{\n key: \"copy\",\n value: function copy(target) {\n var options = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : {\n container: document.body\n };\n return actions_copy(target, options);\n }\n /**\n * Allow fire programmatically a cut action\n * @param {String|HTMLElement} target\n * @returns Text cutted.\n */\n\n }, {\n key: \"cut\",\n value: function cut(target) {\n return actions_cut(target);\n }\n /**\n * Returns the support of the given action, or all actions if no action is\n * given.\n * @param {String} [action]\n */\n\n }, {\n key: \"isSupported\",\n value: function isSupported() {\n var action = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : ['copy', 'cut'];\n var actions = typeof action === 'string' ? [action] : action;\n var support = !!document.queryCommandSupported;\n actions.forEach(function (action) {\n support = support && !!document.queryCommandSupported(action);\n });\n return support;\n }\n }]);\n\n return Clipboard;\n}((tiny_emitter_default()));\n\n/* harmony default export */ var clipboard = (Clipboard);\n\n/***/ }),\n\n/***/ 828:\n/***/ (function(module) {\n\nvar DOCUMENT_NODE_TYPE = 9;\n\n/**\n * A polyfill for Element.matches()\n */\nif (typeof Element !== 'undefined' && !Element.prototype.matches) {\n var proto = Element.prototype;\n\n proto.matches = proto.matchesSelector ||\n proto.mozMatchesSelector ||\n proto.msMatchesSelector ||\n proto.oMatchesSelector ||\n proto.webkitMatchesSelector;\n}\n\n/**\n * Finds the closest parent that matches a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @return {Function}\n */\nfunction closest (element, selector) {\n while (element && element.nodeType !== DOCUMENT_NODE_TYPE) {\n if (typeof element.matches === 'function' &&\n element.matches(selector)) {\n return element;\n }\n element = element.parentNode;\n }\n}\n\nmodule.exports = closest;\n\n\n/***/ }),\n\n/***/ 438:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar closest = __webpack_require__(828);\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction _delegate(element, selector, type, callback, useCapture) {\n var listenerFn = listener.apply(this, arguments);\n\n element.addEventListener(type, listenerFn, useCapture);\n\n return {\n destroy: function() {\n element.removeEventListener(type, listenerFn, useCapture);\n }\n }\n}\n\n/**\n * Delegates event to a selector.\n *\n * @param {Element|String|Array} [elements]\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @param {Boolean} useCapture\n * @return {Object}\n */\nfunction delegate(elements, selector, type, callback, useCapture) {\n // Handle the regular Element usage\n if (typeof elements.addEventListener === 'function') {\n return _delegate.apply(null, arguments);\n }\n\n // Handle Element-less usage, it defaults to global delegation\n if (typeof type === 'function') {\n // Use `document` as the first parameter, then apply arguments\n // This is a short way to .unshift `arguments` without running into deoptimizations\n return _delegate.bind(null, document).apply(null, arguments);\n }\n\n // Handle Selector-based usage\n if (typeof elements === 'string') {\n elements = document.querySelectorAll(elements);\n }\n\n // Handle Array-like based usage\n return Array.prototype.map.call(elements, function (element) {\n return _delegate(element, selector, type, callback, useCapture);\n });\n}\n\n/**\n * Finds closest match and invokes callback.\n *\n * @param {Element} element\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Function}\n */\nfunction listener(element, selector, type, callback) {\n return function(e) {\n e.delegateTarget = closest(e.target, selector);\n\n if (e.delegateTarget) {\n callback.call(element, e);\n }\n }\n}\n\nmodule.exports = delegate;\n\n\n/***/ }),\n\n/***/ 879:\n/***/ (function(__unused_webpack_module, exports) {\n\n/**\n * Check if argument is a HTML element.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.node = function(value) {\n return value !== undefined\n && value instanceof HTMLElement\n && value.nodeType === 1;\n};\n\n/**\n * Check if argument is a list of HTML elements.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.nodeList = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return value !== undefined\n && (type === '[object NodeList]' || type === '[object HTMLCollection]')\n && ('length' in value)\n && (value.length === 0 || exports.node(value[0]));\n};\n\n/**\n * Check if argument is a string.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.string = function(value) {\n return typeof value === 'string'\n || value instanceof String;\n};\n\n/**\n * Check if argument is a function.\n *\n * @param {Object} value\n * @return {Boolean}\n */\nexports.fn = function(value) {\n var type = Object.prototype.toString.call(value);\n\n return type === '[object Function]';\n};\n\n\n/***/ }),\n\n/***/ 370:\n/***/ (function(module, __unused_webpack_exports, __webpack_require__) {\n\nvar is = __webpack_require__(879);\nvar delegate = __webpack_require__(438);\n\n/**\n * Validates all params and calls the right\n * listener function based on its target type.\n *\n * @param {String|HTMLElement|HTMLCollection|NodeList} target\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listen(target, type, callback) {\n if (!target && !type && !callback) {\n throw new Error('Missing required arguments');\n }\n\n if (!is.string(type)) {\n throw new TypeError('Second argument must be a String');\n }\n\n if (!is.fn(callback)) {\n throw new TypeError('Third argument must be a Function');\n }\n\n if (is.node(target)) {\n return listenNode(target, type, callback);\n }\n else if (is.nodeList(target)) {\n return listenNodeList(target, type, callback);\n }\n else if (is.string(target)) {\n return listenSelector(target, type, callback);\n }\n else {\n throw new TypeError('First argument must be a String, HTMLElement, HTMLCollection, or NodeList');\n }\n}\n\n/**\n * Adds an event listener to a HTML element\n * and returns a remove listener function.\n *\n * @param {HTMLElement} node\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNode(node, type, callback) {\n node.addEventListener(type, callback);\n\n return {\n destroy: function() {\n node.removeEventListener(type, callback);\n }\n }\n}\n\n/**\n * Add an event listener to a list of HTML elements\n * and returns a remove listener function.\n *\n * @param {NodeList|HTMLCollection} nodeList\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenNodeList(nodeList, type, callback) {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.addEventListener(type, callback);\n });\n\n return {\n destroy: function() {\n Array.prototype.forEach.call(nodeList, function(node) {\n node.removeEventListener(type, callback);\n });\n }\n }\n}\n\n/**\n * Add an event listener to a selector\n * and returns a remove listener function.\n *\n * @param {String} selector\n * @param {String} type\n * @param {Function} callback\n * @return {Object}\n */\nfunction listenSelector(selector, type, callback) {\n return delegate(document.body, selector, type, callback);\n}\n\nmodule.exports = listen;\n\n\n/***/ }),\n\n/***/ 817:\n/***/ (function(module) {\n\nfunction select(element) {\n var selectedText;\n\n if (element.nodeName === 'SELECT') {\n element.focus();\n\n selectedText = element.value;\n }\n else if (element.nodeName === 'INPUT' || element.nodeName === 'TEXTAREA') {\n var isReadOnly = element.hasAttribute('readonly');\n\n if (!isReadOnly) {\n element.setAttribute('readonly', '');\n }\n\n element.select();\n element.setSelectionRange(0, element.value.length);\n\n if (!isReadOnly) {\n element.removeAttribute('readonly');\n }\n\n selectedText = element.value;\n }\n else {\n if (element.hasAttribute('contenteditable')) {\n element.focus();\n }\n\n var selection = window.getSelection();\n var range = document.createRange();\n\n range.selectNodeContents(element);\n selection.removeAllRanges();\n selection.addRange(range);\n\n selectedText = selection.toString();\n }\n\n return selectedText;\n}\n\nmodule.exports = select;\n\n\n/***/ }),\n\n/***/ 279:\n/***/ (function(module) {\n\nfunction E () {\n // Keep this empty so it's easier to inherit from\n // (via https://github.com/lipsmack from https://github.com/scottcorgan/tiny-emitter/issues/3)\n}\n\nE.prototype = {\n on: function (name, callback, ctx) {\n var e = this.e || (this.e = {});\n\n (e[name] || (e[name] = [])).push({\n fn: callback,\n ctx: ctx\n });\n\n return this;\n },\n\n once: function (name, callback, ctx) {\n var self = this;\n function listener () {\n self.off(name, listener);\n callback.apply(ctx, arguments);\n };\n\n listener._ = callback\n return this.on(name, listener, ctx);\n },\n\n emit: function (name) {\n var data = [].slice.call(arguments, 1);\n var evtArr = ((this.e || (this.e = {}))[name] || []).slice();\n var i = 0;\n var len = evtArr.length;\n\n for (i; i < len; i++) {\n evtArr[i].fn.apply(evtArr[i].ctx, data);\n }\n\n return this;\n },\n\n off: function (name, callback) {\n var e = this.e || (this.e = {});\n var evts = e[name];\n var liveEvents = [];\n\n if (evts && callback) {\n for (var i = 0, len = evts.length; i < len; i++) {\n if (evts[i].fn !== callback && evts[i].fn._ !== callback)\n liveEvents.push(evts[i]);\n }\n }\n\n // Remove event from queue to prevent memory leak\n // Suggested by https://github.com/lazd\n // Ref: https://github.com/scottcorgan/tiny-emitter/commit/c6ebfaa9bc973b33d110a84a307742b7cf94c953#commitcomment-5024910\n\n (liveEvents.length)\n ? e[name] = liveEvents\n : delete e[name];\n\n return this;\n }\n};\n\nmodule.exports = E;\nmodule.exports.TinyEmitter = E;\n\n\n/***/ })\n\n/******/ \t});\n/************************************************************************/\n/******/ \t// The module cache\n/******/ \tvar __webpack_module_cache__ = {};\n/******/ \t\n/******/ \t// The require function\n/******/ \tfunction __webpack_require__(moduleId) {\n/******/ \t\t// Check if module is in cache\n/******/ \t\tif(__webpack_module_cache__[moduleId]) {\n/******/ \t\t\treturn __webpack_module_cache__[moduleId].exports;\n/******/ \t\t}\n/******/ \t\t// Create a new module (and put it into the cache)\n/******/ \t\tvar module = __webpack_module_cache__[moduleId] = {\n/******/ \t\t\t// no module.id needed\n/******/ \t\t\t// no module.loaded needed\n/******/ \t\t\texports: {}\n/******/ \t\t};\n/******/ \t\n/******/ \t\t// Execute the module function\n/******/ \t\t__webpack_modules__[moduleId](module, module.exports, __webpack_require__);\n/******/ \t\n/******/ \t\t// Return the exports of the module\n/******/ \t\treturn module.exports;\n/******/ \t}\n/******/ \t\n/************************************************************************/\n/******/ \t/* webpack/runtime/compat get default export */\n/******/ \t!function() {\n/******/ \t\t// getDefaultExport function for compatibility with non-harmony modules\n/******/ \t\t__webpack_require__.n = function(module) {\n/******/ \t\t\tvar getter = module && module.__esModule ?\n/******/ \t\t\t\tfunction() { return module['default']; } :\n/******/ \t\t\t\tfunction() { return module; };\n/******/ \t\t\t__webpack_require__.d(getter, { a: getter });\n/******/ \t\t\treturn getter;\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/define property getters */\n/******/ \t!function() {\n/******/ \t\t// define getter functions for harmony exports\n/******/ \t\t__webpack_require__.d = function(exports, definition) {\n/******/ \t\t\tfor(var key in definition) {\n/******/ \t\t\t\tif(__webpack_require__.o(definition, key) && !__webpack_require__.o(exports, key)) {\n/******/ \t\t\t\t\tObject.defineProperty(exports, key, { enumerable: true, get: definition[key] });\n/******/ \t\t\t\t}\n/******/ \t\t\t}\n/******/ \t\t};\n/******/ \t}();\n/******/ \t\n/******/ \t/* webpack/runtime/hasOwnProperty shorthand */\n/******/ \t!function() {\n/******/ \t\t__webpack_require__.o = function(obj, prop) { return Object.prototype.hasOwnProperty.call(obj, prop); }\n/******/ \t}();\n/******/ \t\n/************************************************************************/\n/******/ \t// module exports must be returned from runtime so entry inlining is disabled\n/******/ \t// startup\n/******/ \t// Load entry module and return exports\n/******/ \treturn __webpack_require__(686);\n/******/ })()\n.default;\n});", "/*\n * Copyright (c) 2016-2024 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport \"focus-visible\"\n\nimport {\n EMPTY,\n NEVER,\n Observable,\n Subject,\n defer,\n delay,\n filter,\n map,\n merge,\n mergeWith,\n shareReplay,\n switchMap\n} from \"rxjs\"\n\nimport { configuration, feature } from \"./_\"\nimport {\n at,\n getActiveElement,\n getOptionalElement,\n requestJSON,\n setLocation,\n setToggle,\n watchDocument,\n watchKeyboard,\n watchLocation,\n watchLocationTarget,\n watchMedia,\n watchPrint,\n watchScript,\n watchViewport\n} from \"./browser\"\nimport {\n getComponentElement,\n getComponentElements,\n mountAnnounce,\n mountBackToTop,\n mountConsent,\n mountContent,\n mountDialog,\n mountHeader,\n mountHeaderTitle,\n mountPalette,\n mountProgress,\n mountSearch,\n mountSearchHiglight,\n mountSidebar,\n mountSource,\n mountTableOfContents,\n mountTabs,\n watchHeader,\n watchMain\n} from \"./components\"\nimport {\n SearchIndex,\n setupClipboardJS,\n setupInstantNavigation,\n setupVersionSelector\n} from \"./integrations\"\nimport {\n patchEllipsis,\n patchIndeterminate,\n patchScrollfix,\n patchScrolllock\n} from \"./patches\"\nimport \"./polyfills\"\n\n/* ----------------------------------------------------------------------------\n * Functions - @todo refactor\n * ------------------------------------------------------------------------- */\n\n/**\n * Fetch search index\n *\n * @returns Search index observable\n */\nfunction fetchSearchIndex(): Observable {\n if (location.protocol === \"file:\") {\n return watchScript(\n `${new URL(\"search/search_index.js\", config.base)}`\n )\n .pipe(\n // @ts-ignore - @todo fix typings\n map(() => __index),\n shareReplay(1)\n )\n } else {\n return requestJSON(\n new URL(\"search/search_index.json\", config.base)\n )\n }\n}\n\n/* ----------------------------------------------------------------------------\n * Application\n * ------------------------------------------------------------------------- */\n\n/* Yay, JavaScript is available */\ndocument.documentElement.classList.remove(\"no-js\")\ndocument.documentElement.classList.add(\"js\")\n\n/* Set up navigation observables and subjects */\nconst document$ = watchDocument()\nconst location$ = watchLocation()\nconst target$ = watchLocationTarget(location$)\nconst keyboard$ = watchKeyboard()\n\n/* Set up media observables */\nconst viewport$ = watchViewport()\nconst tablet$ = watchMedia(\"(min-width: 960px)\")\nconst screen$ = watchMedia(\"(min-width: 1220px)\")\nconst print$ = watchPrint()\n\n/* Retrieve search index, if search is enabled */\nconst config = configuration()\nconst index$ = document.forms.namedItem(\"search\")\n ? fetchSearchIndex()\n : NEVER\n\n/* Set up Clipboard.js integration */\nconst alert$ = new Subject()\nsetupClipboardJS({ alert$ })\n\n/* Set up progress indicator */\nconst progress$ = new Subject()\n\n/* Set up instant navigation, if enabled */\nif (feature(\"navigation.instant\"))\n setupInstantNavigation({ location$, viewport$, progress$ })\n .subscribe(document$)\n\n/* Set up version selector */\nif (config.version?.provider === \"mike\")\n setupVersionSelector({ document$ })\n\n/* Always close drawer and search on navigation */\nmerge(location$, target$)\n .pipe(\n delay(125)\n )\n .subscribe(() => {\n setToggle(\"drawer\", false)\n setToggle(\"search\", false)\n })\n\n/* Set up global keyboard handlers */\nkeyboard$\n .pipe(\n filter(({ mode }) => mode === \"global\")\n )\n .subscribe(key => {\n switch (key.type) {\n\n /* Go to previous page */\n case \"p\":\n case \",\":\n const prev = getOptionalElement(\"link[rel=prev]\")\n if (typeof prev !== \"undefined\")\n setLocation(prev)\n break\n\n /* Go to next page */\n case \"n\":\n case \".\":\n const next = getOptionalElement(\"link[rel=next]\")\n if (typeof next !== \"undefined\")\n setLocation(next)\n break\n\n /* Expand navigation, see https://bit.ly/3ZjG5io */\n case \"Enter\":\n const active = getActiveElement()\n if (active instanceof HTMLLabelElement)\n active.click()\n }\n })\n\n/* Set up patches */\npatchEllipsis({ viewport$, document$ })\npatchIndeterminate({ document$, tablet$ })\npatchScrollfix({ document$ })\npatchScrolllock({ viewport$, tablet$ })\n\n/* Set up header and main area observable */\nconst header$ = watchHeader(getComponentElement(\"header\"), { viewport$ })\nconst main$ = document$\n .pipe(\n map(() => getComponentElement(\"main\")),\n switchMap(el => watchMain(el, { viewport$, header$ })),\n shareReplay(1)\n )\n\n/* Set up control component observables */\nconst control$ = merge(\n\n /* Consent */\n ...getComponentElements(\"consent\")\n .map(el => mountConsent(el, { target$ })),\n\n /* Dialog */\n ...getComponentElements(\"dialog\")\n .map(el => mountDialog(el, { alert$ })),\n\n /* Header */\n ...getComponentElements(\"header\")\n .map(el => mountHeader(el, { viewport$, header$, main$ })),\n\n /* Color palette */\n ...getComponentElements(\"palette\")\n .map(el => mountPalette(el)),\n\n /* Progress bar */\n ...getComponentElements(\"progress\")\n .map(el => mountProgress(el, { progress$ })),\n\n /* Search */\n ...getComponentElements(\"search\")\n .map(el => mountSearch(el, { index$, keyboard$ })),\n\n /* Repository information */\n ...getComponentElements(\"source\")\n .map(el => mountSource(el))\n)\n\n/* Set up content component observables */\nconst content$ = defer(() => merge(\n\n /* Announcement bar */\n ...getComponentElements(\"announce\")\n .map(el => mountAnnounce(el)),\n\n /* Content */\n ...getComponentElements(\"content\")\n .map(el => mountContent(el, { viewport$, target$, print$ })),\n\n /* Search highlighting */\n ...getComponentElements(\"content\")\n .map(el => feature(\"search.highlight\")\n ? mountSearchHiglight(el, { index$, location$ })\n : EMPTY\n ),\n\n /* Header title */\n ...getComponentElements(\"header-title\")\n .map(el => mountHeaderTitle(el, { viewport$, header$ })),\n\n /* Sidebar */\n ...getComponentElements(\"sidebar\")\n .map(el => el.getAttribute(\"data-md-type\") === \"navigation\"\n ? at(screen$, () => mountSidebar(el, { viewport$, header$, main$ }))\n : at(tablet$, () => mountSidebar(el, { viewport$, header$, main$ }))\n ),\n\n /* Navigation tabs */\n ...getComponentElements(\"tabs\")\n .map(el => mountTabs(el, { viewport$, header$ })),\n\n /* Table of contents */\n ...getComponentElements(\"toc\")\n .map(el => mountTableOfContents(el, {\n viewport$, header$, main$, target$\n })),\n\n /* Back-to-top button */\n ...getComponentElements(\"top\")\n .map(el => mountBackToTop(el, { viewport$, header$, main$, target$ }))\n))\n\n/* Set up component observables */\nconst component$ = document$\n .pipe(\n switchMap(() => content$),\n mergeWith(control$),\n shareReplay(1)\n )\n\n/* Subscribe to all components */\ncomponent$.subscribe()\n\n/* ----------------------------------------------------------------------------\n * Exports\n * ------------------------------------------------------------------------- */\n\nwindow.document$ = document$ /* Document observable */\nwindow.location$ = location$ /* Location subject */\nwindow.target$ = target$ /* Location target observable */\nwindow.keyboard$ = keyboard$ /* Keyboard observable */\nwindow.viewport$ = viewport$ /* Viewport observable */\nwindow.tablet$ = tablet$ /* Media tablet observable */\nwindow.screen$ = screen$ /* Media screen observable */\nwindow.print$ = print$ /* Media print observable */\nwindow.alert$ = alert$ /* Alert subject */\nwindow.progress$ = progress$ /* Progress indicator subject */\nwindow.component$ = component$ /* Component observable */\n", "/******************************************************************************\nCopyright (c) Microsoft Corporation.\n\nPermission to use, copy, modify, and/or distribute this software for any\npurpose with or without fee is hereby granted.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH\nREGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY\nAND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT,\nINDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM\nLOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR\nOTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR\nPERFORMANCE OF THIS SOFTWARE.\n***************************************************************************** */\n/* global Reflect, Promise, SuppressedError, Symbol, Iterator */\n\nvar extendStatics = function(d, b) {\n extendStatics = Object.setPrototypeOf ||\n ({ __proto__: [] } instanceof Array && function (d, b) { d.__proto__ = b; }) ||\n function (d, b) { for (var p in b) if (Object.prototype.hasOwnProperty.call(b, p)) d[p] = b[p]; };\n return extendStatics(d, b);\n};\n\nexport function __extends(d, b) {\n if (typeof b !== \"function\" && b !== null)\n throw new TypeError(\"Class extends value \" + String(b) + \" is not a constructor or null\");\n extendStatics(d, b);\n function __() { this.constructor = d; }\n d.prototype = b === null ? Object.create(b) : (__.prototype = b.prototype, new __());\n}\n\nexport var __assign = function() {\n __assign = Object.assign || function __assign(t) {\n for (var s, i = 1, n = arguments.length; i < n; i++) {\n s = arguments[i];\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p)) t[p] = s[p];\n }\n return t;\n }\n return __assign.apply(this, arguments);\n}\n\nexport function __rest(s, e) {\n var t = {};\n for (var p in s) if (Object.prototype.hasOwnProperty.call(s, p) && e.indexOf(p) < 0)\n t[p] = s[p];\n if (s != null && typeof Object.getOwnPropertySymbols === \"function\")\n for (var i = 0, p = Object.getOwnPropertySymbols(s); i < p.length; i++) {\n if (e.indexOf(p[i]) < 0 && Object.prototype.propertyIsEnumerable.call(s, p[i]))\n t[p[i]] = s[p[i]];\n }\n return t;\n}\n\nexport function __decorate(decorators, target, key, desc) {\n var c = arguments.length, r = c < 3 ? target : desc === null ? desc = Object.getOwnPropertyDescriptor(target, key) : desc, d;\n if (typeof Reflect === \"object\" && typeof Reflect.decorate === \"function\") r = Reflect.decorate(decorators, target, key, desc);\n else for (var i = decorators.length - 1; i >= 0; i--) if (d = decorators[i]) r = (c < 3 ? d(r) : c > 3 ? d(target, key, r) : d(target, key)) || r;\n return c > 3 && r && Object.defineProperty(target, key, r), r;\n}\n\nexport function __param(paramIndex, decorator) {\n return function (target, key) { decorator(target, key, paramIndex); }\n}\n\nexport function __esDecorate(ctor, descriptorIn, decorators, contextIn, initializers, extraInitializers) {\n function accept(f) { if (f !== void 0 && typeof f !== \"function\") throw new TypeError(\"Function expected\"); return f; }\n var kind = contextIn.kind, key = kind === \"getter\" ? \"get\" : kind === \"setter\" ? \"set\" : \"value\";\n var target = !descriptorIn && ctor ? contextIn[\"static\"] ? ctor : ctor.prototype : null;\n var descriptor = descriptorIn || (target ? Object.getOwnPropertyDescriptor(target, contextIn.name) : {});\n var _, done = false;\n for (var i = decorators.length - 1; i >= 0; i--) {\n var context = {};\n for (var p in contextIn) context[p] = p === \"access\" ? {} : contextIn[p];\n for (var p in contextIn.access) context.access[p] = contextIn.access[p];\n context.addInitializer = function (f) { if (done) throw new TypeError(\"Cannot add initializers after decoration has completed\"); extraInitializers.push(accept(f || null)); };\n var result = (0, decorators[i])(kind === \"accessor\" ? { get: descriptor.get, set: descriptor.set } : descriptor[key], context);\n if (kind === \"accessor\") {\n if (result === void 0) continue;\n if (result === null || typeof result !== \"object\") throw new TypeError(\"Object expected\");\n if (_ = accept(result.get)) descriptor.get = _;\n if (_ = accept(result.set)) descriptor.set = _;\n if (_ = accept(result.init)) initializers.unshift(_);\n }\n else if (_ = accept(result)) {\n if (kind === \"field\") initializers.unshift(_);\n else descriptor[key] = _;\n }\n }\n if (target) Object.defineProperty(target, contextIn.name, descriptor);\n done = true;\n};\n\nexport function __runInitializers(thisArg, initializers, value) {\n var useValue = arguments.length > 2;\n for (var i = 0; i < initializers.length; i++) {\n value = useValue ? initializers[i].call(thisArg, value) : initializers[i].call(thisArg);\n }\n return useValue ? value : void 0;\n};\n\nexport function __propKey(x) {\n return typeof x === \"symbol\" ? x : \"\".concat(x);\n};\n\nexport function __setFunctionName(f, name, prefix) {\n if (typeof name === \"symbol\") name = name.description ? \"[\".concat(name.description, \"]\") : \"\";\n return Object.defineProperty(f, \"name\", { configurable: true, value: prefix ? \"\".concat(prefix, \" \", name) : name });\n};\n\nexport function __metadata(metadataKey, metadataValue) {\n if (typeof Reflect === \"object\" && typeof Reflect.metadata === \"function\") return Reflect.metadata(metadataKey, metadataValue);\n}\n\nexport function __awaiter(thisArg, _arguments, P, generator) {\n function adopt(value) { return value instanceof P ? value : new P(function (resolve) { resolve(value); }); }\n return new (P || (P = Promise))(function (resolve, reject) {\n function fulfilled(value) { try { step(generator.next(value)); } catch (e) { reject(e); } }\n function rejected(value) { try { step(generator[\"throw\"](value)); } catch (e) { reject(e); } }\n function step(result) { result.done ? resolve(result.value) : adopt(result.value).then(fulfilled, rejected); }\n step((generator = generator.apply(thisArg, _arguments || [])).next());\n });\n}\n\nexport function __generator(thisArg, body) {\n var _ = { label: 0, sent: function() { if (t[0] & 1) throw t[1]; return t[1]; }, trys: [], ops: [] }, f, y, t, g = Object.create((typeof Iterator === \"function\" ? Iterator : Object).prototype);\n return g.next = verb(0), g[\"throw\"] = verb(1), g[\"return\"] = verb(2), typeof Symbol === \"function\" && (g[Symbol.iterator] = function() { return this; }), g;\n function verb(n) { return function (v) { return step([n, v]); }; }\n function step(op) {\n if (f) throw new TypeError(\"Generator is already executing.\");\n while (g && (g = 0, op[0] && (_ = 0)), _) try {\n if (f = 1, y && (t = op[0] & 2 ? y[\"return\"] : op[0] ? y[\"throw\"] || ((t = y[\"return\"]) && t.call(y), 0) : y.next) && !(t = t.call(y, op[1])).done) return t;\n if (y = 0, t) op = [op[0] & 2, t.value];\n switch (op[0]) {\n case 0: case 1: t = op; break;\n case 4: _.label++; return { value: op[1], done: false };\n case 5: _.label++; y = op[1]; op = [0]; continue;\n case 7: op = _.ops.pop(); _.trys.pop(); continue;\n default:\n if (!(t = _.trys, t = t.length > 0 && t[t.length - 1]) && (op[0] === 6 || op[0] === 2)) { _ = 0; continue; }\n if (op[0] === 3 && (!t || (op[1] > t[0] && op[1] < t[3]))) { _.label = op[1]; break; }\n if (op[0] === 6 && _.label < t[1]) { _.label = t[1]; t = op; break; }\n if (t && _.label < t[2]) { _.label = t[2]; _.ops.push(op); break; }\n if (t[2]) _.ops.pop();\n _.trys.pop(); continue;\n }\n op = body.call(thisArg, _);\n } catch (e) { op = [6, e]; y = 0; } finally { f = t = 0; }\n if (op[0] & 5) throw op[1]; return { value: op[0] ? op[1] : void 0, done: true };\n }\n}\n\nexport var __createBinding = Object.create ? (function(o, m, k, k2) {\n if (k2 === undefined) k2 = k;\n var desc = Object.getOwnPropertyDescriptor(m, k);\n if (!desc || (\"get\" in desc ? !m.__esModule : desc.writable || desc.configurable)) {\n desc = { enumerable: true, get: function() { return m[k]; } };\n }\n Object.defineProperty(o, k2, desc);\n}) : (function(o, m, k, k2) {\n if (k2 === undefined) k2 = k;\n o[k2] = m[k];\n});\n\nexport function __exportStar(m, o) {\n for (var p in m) if (p !== \"default\" && !Object.prototype.hasOwnProperty.call(o, p)) __createBinding(o, m, p);\n}\n\nexport function __values(o) {\n var s = typeof Symbol === \"function\" && Symbol.iterator, m = s && o[s], i = 0;\n if (m) return m.call(o);\n if (o && typeof o.length === \"number\") return {\n next: function () {\n if (o && i >= o.length) o = void 0;\n return { value: o && o[i++], done: !o };\n }\n };\n throw new TypeError(s ? \"Object is not iterable.\" : \"Symbol.iterator is not defined.\");\n}\n\nexport function __read(o, n) {\n var m = typeof Symbol === \"function\" && o[Symbol.iterator];\n if (!m) return o;\n var i = m.call(o), r, ar = [], e;\n try {\n while ((n === void 0 || n-- > 0) && !(r = i.next()).done) ar.push(r.value);\n }\n catch (error) { e = { error: error }; }\n finally {\n try {\n if (r && !r.done && (m = i[\"return\"])) m.call(i);\n }\n finally { if (e) throw e.error; }\n }\n return ar;\n}\n\n/** @deprecated */\nexport function __spread() {\n for (var ar = [], i = 0; i < arguments.length; i++)\n ar = ar.concat(__read(arguments[i]));\n return ar;\n}\n\n/** @deprecated */\nexport function __spreadArrays() {\n for (var s = 0, i = 0, il = arguments.length; i < il; i++) s += arguments[i].length;\n for (var r = Array(s), k = 0, i = 0; i < il; i++)\n for (var a = arguments[i], j = 0, jl = a.length; j < jl; j++, k++)\n r[k] = a[j];\n return r;\n}\n\nexport function __spreadArray(to, from, pack) {\n if (pack || arguments.length === 2) for (var i = 0, l = from.length, ar; i < l; i++) {\n if (ar || !(i in from)) {\n if (!ar) ar = Array.prototype.slice.call(from, 0, i);\n ar[i] = from[i];\n }\n }\n return to.concat(ar || Array.prototype.slice.call(from));\n}\n\nexport function __await(v) {\n return this instanceof __await ? (this.v = v, this) : new __await(v);\n}\n\nexport function __asyncGenerator(thisArg, _arguments, generator) {\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\n var g = generator.apply(thisArg, _arguments || []), i, q = [];\n return i = Object.create((typeof AsyncIterator === \"function\" ? AsyncIterator : Object).prototype), verb(\"next\"), verb(\"throw\"), verb(\"return\", awaitReturn), i[Symbol.asyncIterator] = function () { return this; }, i;\n function awaitReturn(f) { return function (v) { return Promise.resolve(v).then(f, reject); }; }\n function verb(n, f) { if (g[n]) { i[n] = function (v) { return new Promise(function (a, b) { q.push([n, v, a, b]) > 1 || resume(n, v); }); }; if (f) i[n] = f(i[n]); } }\n function resume(n, v) { try { step(g[n](v)); } catch (e) { settle(q[0][3], e); } }\n function step(r) { r.value instanceof __await ? Promise.resolve(r.value.v).then(fulfill, reject) : settle(q[0][2], r); }\n function fulfill(value) { resume(\"next\", value); }\n function reject(value) { resume(\"throw\", value); }\n function settle(f, v) { if (f(v), q.shift(), q.length) resume(q[0][0], q[0][1]); }\n}\n\nexport function __asyncDelegator(o) {\n var i, p;\n return i = {}, verb(\"next\"), verb(\"throw\", function (e) { throw e; }), verb(\"return\"), i[Symbol.iterator] = function () { return this; }, i;\n function verb(n, f) { i[n] = o[n] ? function (v) { return (p = !p) ? { value: __await(o[n](v)), done: false } : f ? f(v) : v; } : f; }\n}\n\nexport function __asyncValues(o) {\n if (!Symbol.asyncIterator) throw new TypeError(\"Symbol.asyncIterator is not defined.\");\n var m = o[Symbol.asyncIterator], i;\n return m ? m.call(o) : (o = typeof __values === \"function\" ? __values(o) : o[Symbol.iterator](), i = {}, verb(\"next\"), verb(\"throw\"), verb(\"return\"), i[Symbol.asyncIterator] = function () { return this; }, i);\n function verb(n) { i[n] = o[n] && function (v) { return new Promise(function (resolve, reject) { v = o[n](v), settle(resolve, reject, v.done, v.value); }); }; }\n function settle(resolve, reject, d, v) { Promise.resolve(v).then(function(v) { resolve({ value: v, done: d }); }, reject); }\n}\n\nexport function __makeTemplateObject(cooked, raw) {\n if (Object.defineProperty) { Object.defineProperty(cooked, \"raw\", { value: raw }); } else { cooked.raw = raw; }\n return cooked;\n};\n\nvar __setModuleDefault = Object.create ? (function(o, v) {\n Object.defineProperty(o, \"default\", { enumerable: true, value: v });\n}) : function(o, v) {\n o[\"default\"] = v;\n};\n\nexport function __importStar(mod) {\n if (mod && mod.__esModule) return mod;\n var result = {};\n if (mod != null) for (var k in mod) if (k !== \"default\" && Object.prototype.hasOwnProperty.call(mod, k)) __createBinding(result, mod, k);\n __setModuleDefault(result, mod);\n return result;\n}\n\nexport function __importDefault(mod) {\n return (mod && mod.__esModule) ? mod : { default: mod };\n}\n\nexport function __classPrivateFieldGet(receiver, state, kind, f) {\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a getter\");\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot read private member from an object whose class did not declare it\");\n return kind === \"m\" ? f : kind === \"a\" ? f.call(receiver) : f ? f.value : state.get(receiver);\n}\n\nexport function __classPrivateFieldSet(receiver, state, value, kind, f) {\n if (kind === \"m\") throw new TypeError(\"Private method is not writable\");\n if (kind === \"a\" && !f) throw new TypeError(\"Private accessor was defined without a setter\");\n if (typeof state === \"function\" ? receiver !== state || !f : !state.has(receiver)) throw new TypeError(\"Cannot write private member to an object whose class did not declare it\");\n return (kind === \"a\" ? f.call(receiver, value) : f ? f.value = value : state.set(receiver, value)), value;\n}\n\nexport function __classPrivateFieldIn(state, receiver) {\n if (receiver === null || (typeof receiver !== \"object\" && typeof receiver !== \"function\")) throw new TypeError(\"Cannot use 'in' operator on non-object\");\n return typeof state === \"function\" ? receiver === state : state.has(receiver);\n}\n\nexport function __addDisposableResource(env, value, async) {\n if (value !== null && value !== void 0) {\n if (typeof value !== \"object\" && typeof value !== \"function\") throw new TypeError(\"Object expected.\");\n var dispose, inner;\n if (async) {\n if (!Symbol.asyncDispose) throw new TypeError(\"Symbol.asyncDispose is not defined.\");\n dispose = value[Symbol.asyncDispose];\n }\n if (dispose === void 0) {\n if (!Symbol.dispose) throw new TypeError(\"Symbol.dispose is not defined.\");\n dispose = value[Symbol.dispose];\n if (async) inner = dispose;\n }\n if (typeof dispose !== \"function\") throw new TypeError(\"Object not disposable.\");\n if (inner) dispose = function() { try { inner.call(this); } catch (e) { return Promise.reject(e); } };\n env.stack.push({ value: value, dispose: dispose, async: async });\n }\n else if (async) {\n env.stack.push({ async: true });\n }\n return value;\n}\n\nvar _SuppressedError = typeof SuppressedError === \"function\" ? SuppressedError : function (error, suppressed, message) {\n var e = new Error(message);\n return e.name = \"SuppressedError\", e.error = error, e.suppressed = suppressed, e;\n};\n\nexport function __disposeResources(env) {\n function fail(e) {\n env.error = env.hasError ? new _SuppressedError(e, env.error, \"An error was suppressed during disposal.\") : e;\n env.hasError = true;\n }\n var r, s = 0;\n function next() {\n while (r = env.stack.pop()) {\n try {\n if (!r.async && s === 1) return s = 0, env.stack.push(r), Promise.resolve().then(next);\n if (r.dispose) {\n var result = r.dispose.call(r.value);\n if (r.async) return s |= 2, Promise.resolve(result).then(next, function(e) { fail(e); return next(); });\n }\n else s |= 1;\n }\n catch (e) {\n fail(e);\n }\n }\n if (s === 1) return env.hasError ? Promise.reject(env.error) : Promise.resolve();\n if (env.hasError) throw env.error;\n }\n return next();\n}\n\nexport default {\n __extends,\n __assign,\n __rest,\n __decorate,\n __param,\n __metadata,\n __awaiter,\n __generator,\n __createBinding,\n __exportStar,\n __values,\n __read,\n __spread,\n __spreadArrays,\n __spreadArray,\n __await,\n __asyncGenerator,\n __asyncDelegator,\n __asyncValues,\n __makeTemplateObject,\n __importStar,\n __importDefault,\n __classPrivateFieldGet,\n __classPrivateFieldSet,\n __classPrivateFieldIn,\n __addDisposableResource,\n __disposeResources,\n};\n", "/**\n * Returns true if the object is a function.\n * @param value The value to check\n */\nexport function isFunction(value: any): value is (...args: any[]) => any {\n return typeof value === 'function';\n}\n", "/**\n * Used to create Error subclasses until the community moves away from ES5.\n *\n * This is because compiling from TypeScript down to ES5 has issues with subclassing Errors\n * as well as other built-in types: https://github.com/Microsoft/TypeScript/issues/12123\n *\n * @param createImpl A factory function to create the actual constructor implementation. The returned\n * function should be a named function that calls `_super` internally.\n */\nexport function createErrorClass(createImpl: (_super: any) => any): T {\n const _super = (instance: any) => {\n Error.call(instance);\n instance.stack = new Error().stack;\n };\n\n const ctorFunc = createImpl(_super);\n ctorFunc.prototype = Object.create(Error.prototype);\n ctorFunc.prototype.constructor = ctorFunc;\n return ctorFunc;\n}\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface UnsubscriptionError extends Error {\n readonly errors: any[];\n}\n\nexport interface UnsubscriptionErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (errors: any[]): UnsubscriptionError;\n}\n\n/**\n * An error thrown when one or more errors have occurred during the\n * `unsubscribe` of a {@link Subscription}.\n */\nexport const UnsubscriptionError: UnsubscriptionErrorCtor = createErrorClass(\n (_super) =>\n function UnsubscriptionErrorImpl(this: any, errors: (Error | string)[]) {\n _super(this);\n this.message = errors\n ? `${errors.length} errors occurred during unsubscription:\n${errors.map((err, i) => `${i + 1}) ${err.toString()}`).join('\\n ')}`\n : '';\n this.name = 'UnsubscriptionError';\n this.errors = errors;\n }\n);\n", "/**\n * Removes an item from an array, mutating it.\n * @param arr The array to remove the item from\n * @param item The item to remove\n */\nexport function arrRemove(arr: T[] | undefined | null, item: T) {\n if (arr) {\n const index = arr.indexOf(item);\n 0 <= index && arr.splice(index, 1);\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { UnsubscriptionError } from './util/UnsubscriptionError';\nimport { SubscriptionLike, TeardownLogic, Unsubscribable } from './types';\nimport { arrRemove } from './util/arrRemove';\n\n/**\n * Represents a disposable resource, such as the execution of an Observable. A\n * Subscription has one important method, `unsubscribe`, that takes no argument\n * and just disposes the resource held by the subscription.\n *\n * Additionally, subscriptions may be grouped together through the `add()`\n * method, which will attach a child Subscription to the current Subscription.\n * When a Subscription is unsubscribed, all its children (and its grandchildren)\n * will be unsubscribed as well.\n *\n * @class Subscription\n */\nexport class Subscription implements SubscriptionLike {\n /** @nocollapse */\n public static EMPTY = (() => {\n const empty = new Subscription();\n empty.closed = true;\n return empty;\n })();\n\n /**\n * A flag to indicate whether this Subscription has already been unsubscribed.\n */\n public closed = false;\n\n private _parentage: Subscription[] | Subscription | null = null;\n\n /**\n * The list of registered finalizers to execute upon unsubscription. Adding and removing from this\n * list occurs in the {@link #add} and {@link #remove} methods.\n */\n private _finalizers: Exclude[] | null = null;\n\n /**\n * @param initialTeardown A function executed first as part of the finalization\n * process that is kicked off when {@link #unsubscribe} is called.\n */\n constructor(private initialTeardown?: () => void) {}\n\n /**\n * Disposes the resources held by the subscription. May, for instance, cancel\n * an ongoing Observable execution or cancel any other type of work that\n * started when the Subscription was created.\n * @return {void}\n */\n unsubscribe(): void {\n let errors: any[] | undefined;\n\n if (!this.closed) {\n this.closed = true;\n\n // Remove this from it's parents.\n const { _parentage } = this;\n if (_parentage) {\n this._parentage = null;\n if (Array.isArray(_parentage)) {\n for (const parent of _parentage) {\n parent.remove(this);\n }\n } else {\n _parentage.remove(this);\n }\n }\n\n const { initialTeardown: initialFinalizer } = this;\n if (isFunction(initialFinalizer)) {\n try {\n initialFinalizer();\n } catch (e) {\n errors = e instanceof UnsubscriptionError ? e.errors : [e];\n }\n }\n\n const { _finalizers } = this;\n if (_finalizers) {\n this._finalizers = null;\n for (const finalizer of _finalizers) {\n try {\n execFinalizer(finalizer);\n } catch (err) {\n errors = errors ?? [];\n if (err instanceof UnsubscriptionError) {\n errors = [...errors, ...err.errors];\n } else {\n errors.push(err);\n }\n }\n }\n }\n\n if (errors) {\n throw new UnsubscriptionError(errors);\n }\n }\n }\n\n /**\n * Adds a finalizer to this subscription, so that finalization will be unsubscribed/called\n * when this subscription is unsubscribed. If this subscription is already {@link #closed},\n * because it has already been unsubscribed, then whatever finalizer is passed to it\n * will automatically be executed (unless the finalizer itself is also a closed subscription).\n *\n * Closed Subscriptions cannot be added as finalizers to any subscription. Adding a closed\n * subscription to a any subscription will result in no operation. (A noop).\n *\n * Adding a subscription to itself, or adding `null` or `undefined` will not perform any\n * operation at all. (A noop).\n *\n * `Subscription` instances that are added to this instance will automatically remove themselves\n * if they are unsubscribed. Functions and {@link Unsubscribable} objects that you wish to remove\n * will need to be removed manually with {@link #remove}\n *\n * @param teardown The finalization logic to add to this subscription.\n */\n add(teardown: TeardownLogic): void {\n // Only add the finalizer if it's not undefined\n // and don't add a subscription to itself.\n if (teardown && teardown !== this) {\n if (this.closed) {\n // If this subscription is already closed,\n // execute whatever finalizer is handed to it automatically.\n execFinalizer(teardown);\n } else {\n if (teardown instanceof Subscription) {\n // We don't add closed subscriptions, and we don't add the same subscription\n // twice. Subscription unsubscribe is idempotent.\n if (teardown.closed || teardown._hasParent(this)) {\n return;\n }\n teardown._addParent(this);\n }\n (this._finalizers = this._finalizers ?? []).push(teardown);\n }\n }\n }\n\n /**\n * Checks to see if a this subscription already has a particular parent.\n * This will signal that this subscription has already been added to the parent in question.\n * @param parent the parent to check for\n */\n private _hasParent(parent: Subscription) {\n const { _parentage } = this;\n return _parentage === parent || (Array.isArray(_parentage) && _parentage.includes(parent));\n }\n\n /**\n * Adds a parent to this subscription so it can be removed from the parent if it\n * unsubscribes on it's own.\n *\n * NOTE: THIS ASSUMES THAT {@link _hasParent} HAS ALREADY BEEN CHECKED.\n * @param parent The parent subscription to add\n */\n private _addParent(parent: Subscription) {\n const { _parentage } = this;\n this._parentage = Array.isArray(_parentage) ? (_parentage.push(parent), _parentage) : _parentage ? [_parentage, parent] : parent;\n }\n\n /**\n * Called on a child when it is removed via {@link #remove}.\n * @param parent The parent to remove\n */\n private _removeParent(parent: Subscription) {\n const { _parentage } = this;\n if (_parentage === parent) {\n this._parentage = null;\n } else if (Array.isArray(_parentage)) {\n arrRemove(_parentage, parent);\n }\n }\n\n /**\n * Removes a finalizer from this subscription that was previously added with the {@link #add} method.\n *\n * Note that `Subscription` instances, when unsubscribed, will automatically remove themselves\n * from every other `Subscription` they have been added to. This means that using the `remove` method\n * is not a common thing and should be used thoughtfully.\n *\n * If you add the same finalizer instance of a function or an unsubscribable object to a `Subscription` instance\n * more than once, you will need to call `remove` the same number of times to remove all instances.\n *\n * All finalizer instances are removed to free up memory upon unsubscription.\n *\n * @param teardown The finalizer to remove from this subscription\n */\n remove(teardown: Exclude): void {\n const { _finalizers } = this;\n _finalizers && arrRemove(_finalizers, teardown);\n\n if (teardown instanceof Subscription) {\n teardown._removeParent(this);\n }\n }\n}\n\nexport const EMPTY_SUBSCRIPTION = Subscription.EMPTY;\n\nexport function isSubscription(value: any): value is Subscription {\n return (\n value instanceof Subscription ||\n (value && 'closed' in value && isFunction(value.remove) && isFunction(value.add) && isFunction(value.unsubscribe))\n );\n}\n\nfunction execFinalizer(finalizer: Unsubscribable | (() => void)) {\n if (isFunction(finalizer)) {\n finalizer();\n } else {\n finalizer.unsubscribe();\n }\n}\n", "import { Subscriber } from './Subscriber';\nimport { ObservableNotification } from './types';\n\n/**\n * The {@link GlobalConfig} object for RxJS. It is used to configure things\n * like how to react on unhandled errors.\n */\nexport const config: GlobalConfig = {\n onUnhandledError: null,\n onStoppedNotification: null,\n Promise: undefined,\n useDeprecatedSynchronousErrorHandling: false,\n useDeprecatedNextContext: false,\n};\n\n/**\n * The global configuration object for RxJS, used to configure things\n * like how to react on unhandled errors. Accessible via {@link config}\n * object.\n */\nexport interface GlobalConfig {\n /**\n * A registration point for unhandled errors from RxJS. These are errors that\n * cannot were not handled by consuming code in the usual subscription path. For\n * example, if you have this configured, and you subscribe to an observable without\n * providing an error handler, errors from that subscription will end up here. This\n * will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onUnhandledError: ((err: any) => void) | null;\n\n /**\n * A registration point for notifications that cannot be sent to subscribers because they\n * have completed, errored or have been explicitly unsubscribed. By default, next, complete\n * and error notifications sent to stopped subscribers are noops. However, sometimes callers\n * might want a different behavior. For example, with sources that attempt to report errors\n * to stopped subscribers, a caller can configure RxJS to throw an unhandled error instead.\n * This will _always_ be called asynchronously on another job in the runtime. This is because\n * we do not want errors thrown in this user-configured handler to interfere with the\n * behavior of the library.\n */\n onStoppedNotification: ((notification: ObservableNotification, subscriber: Subscriber) => void) | null;\n\n /**\n * The promise constructor used by default for {@link Observable#toPromise toPromise} and {@link Observable#forEach forEach}\n * methods.\n *\n * @deprecated As of version 8, RxJS will no longer support this sort of injection of a\n * Promise constructor. If you need a Promise implementation other than native promises,\n * please polyfill/patch Promise as you see appropriate. Will be removed in v8.\n */\n Promise?: PromiseConstructorLike;\n\n /**\n * If true, turns on synchronous error rethrowing, which is a deprecated behavior\n * in v6 and higher. This behavior enables bad patterns like wrapping a subscribe\n * call in a try/catch block. It also enables producer interference, a nasty bug\n * where a multicast can be broken for all observers by a downstream consumer with\n * an unhandled error. DO NOT USE THIS FLAG UNLESS IT'S NEEDED TO BUY TIME\n * FOR MIGRATION REASONS.\n *\n * @deprecated As of version 8, RxJS will no longer support synchronous throwing\n * of unhandled errors. All errors will be thrown on a separate call stack to prevent bad\n * behaviors described above. Will be removed in v8.\n */\n useDeprecatedSynchronousErrorHandling: boolean;\n\n /**\n * If true, enables an as-of-yet undocumented feature from v5: The ability to access\n * `unsubscribe()` via `this` context in `next` functions created in observers passed\n * to `subscribe`.\n *\n * This is being removed because the performance was severely problematic, and it could also cause\n * issues when types other than POJOs are passed to subscribe as subscribers, as they will likely have\n * their `this` context overwritten.\n *\n * @deprecated As of version 8, RxJS will no longer support altering the\n * context of next functions provided as part of an observer to Subscribe. Instead,\n * you will have access to a subscription or a signal or token that will allow you to do things like\n * unsubscribe and test closed status. Will be removed in v8.\n */\n useDeprecatedNextContext: boolean;\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetTimeoutFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearTimeoutFunction = (handle: TimerHandle) => void;\n\ninterface TimeoutProvider {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n delegate:\n | {\n setTimeout: SetTimeoutFunction;\n clearTimeout: ClearTimeoutFunction;\n }\n | undefined;\n}\n\nexport const timeoutProvider: TimeoutProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setTimeout(handler: () => void, timeout?: number, ...args) {\n const { delegate } = timeoutProvider;\n if (delegate?.setTimeout) {\n return delegate.setTimeout(handler, timeout, ...args);\n }\n return setTimeout(handler, timeout, ...args);\n },\n clearTimeout(handle) {\n const { delegate } = timeoutProvider;\n return (delegate?.clearTimeout || clearTimeout)(handle as any);\n },\n delegate: undefined,\n};\n", "import { config } from '../config';\nimport { timeoutProvider } from '../scheduler/timeoutProvider';\n\n/**\n * Handles an error on another job either with the user-configured {@link onUnhandledError},\n * or by throwing it on that new job so it can be picked up by `window.onerror`, `process.on('error')`, etc.\n *\n * This should be called whenever there is an error that is out-of-band with the subscription\n * or when an error hits a terminal boundary of the subscription and no error handler was provided.\n *\n * @param err the error to report\n */\nexport function reportUnhandledError(err: any) {\n timeoutProvider.setTimeout(() => {\n const { onUnhandledError } = config;\n if (onUnhandledError) {\n // Execute the user-configured error handler.\n onUnhandledError(err);\n } else {\n // Throw so it is picked up by the runtime's uncaught error mechanism.\n throw err;\n }\n });\n}\n", "/* tslint:disable:no-empty */\nexport function noop() { }\n", "import { CompleteNotification, NextNotification, ErrorNotification } from './types';\n\n/**\n * A completion object optimized for memory use and created to be the\n * same \"shape\" as other notifications in v8.\n * @internal\n */\nexport const COMPLETE_NOTIFICATION = (() => createNotification('C', undefined, undefined) as CompleteNotification)();\n\n/**\n * Internal use only. Creates an optimized error notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function errorNotification(error: any): ErrorNotification {\n return createNotification('E', undefined, error) as any;\n}\n\n/**\n * Internal use only. Creates an optimized next notification that is the same \"shape\"\n * as other notifications.\n * @internal\n */\nexport function nextNotification(value: T) {\n return createNotification('N', value, undefined) as NextNotification;\n}\n\n/**\n * Ensures that all notifications created internally have the same \"shape\" in v8.\n *\n * TODO: This is only exported to support a crazy legacy test in `groupBy`.\n * @internal\n */\nexport function createNotification(kind: 'N' | 'E' | 'C', value: any, error: any) {\n return {\n kind,\n value,\n error,\n };\n}\n", "import { config } from '../config';\n\nlet context: { errorThrown: boolean; error: any } | null = null;\n\n/**\n * Handles dealing with errors for super-gross mode. Creates a context, in which\n * any synchronously thrown errors will be passed to {@link captureError}. Which\n * will record the error such that it will be rethrown after the call back is complete.\n * TODO: Remove in v8\n * @param cb An immediately executed function.\n */\nexport function errorContext(cb: () => void) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n const isRoot = !context;\n if (isRoot) {\n context = { errorThrown: false, error: null };\n }\n cb();\n if (isRoot) {\n const { errorThrown, error } = context!;\n context = null;\n if (errorThrown) {\n throw error;\n }\n }\n } else {\n // This is the general non-deprecated path for everyone that\n // isn't crazy enough to use super-gross mode (useDeprecatedSynchronousErrorHandling)\n cb();\n }\n}\n\n/**\n * Captures errors only in super-gross mode.\n * @param err the error to capture\n */\nexport function captureError(err: any) {\n if (config.useDeprecatedSynchronousErrorHandling && context) {\n context.errorThrown = true;\n context.error = err;\n }\n}\n", "import { isFunction } from './util/isFunction';\nimport { Observer, ObservableNotification } from './types';\nimport { isSubscription, Subscription } from './Subscription';\nimport { config } from './config';\nimport { reportUnhandledError } from './util/reportUnhandledError';\nimport { noop } from './util/noop';\nimport { nextNotification, errorNotification, COMPLETE_NOTIFICATION } from './NotificationFactories';\nimport { timeoutProvider } from './scheduler/timeoutProvider';\nimport { captureError } from './util/errorContext';\n\n/**\n * Implements the {@link Observer} interface and extends the\n * {@link Subscription} class. While the {@link Observer} is the public API for\n * consuming the values of an {@link Observable}, all Observers get converted to\n * a Subscriber, in order to provide Subscription-like capabilities such as\n * `unsubscribe`. Subscriber is a common type in RxJS, and crucial for\n * implementing operators, but it is rarely used as a public API.\n *\n * @class Subscriber\n */\nexport class Subscriber extends Subscription implements Observer {\n /**\n * A static factory for a Subscriber, given a (potentially partial) definition\n * of an Observer.\n * @param next The `next` callback of an Observer.\n * @param error The `error` callback of an\n * Observer.\n * @param complete The `complete` callback of an\n * Observer.\n * @return A Subscriber wrapping the (partially defined)\n * Observer represented by the given arguments.\n * @nocollapse\n * @deprecated Do not use. Will be removed in v8. There is no replacement for this\n * method, and there is no reason to be creating instances of `Subscriber` directly.\n * If you have a specific use case, please file an issue.\n */\n static create(next?: (x?: T) => void, error?: (e?: any) => void, complete?: () => void): Subscriber {\n return new SafeSubscriber(next, error, complete);\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected isStopped: boolean = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n protected destination: Subscriber | Observer; // this `any` is the escape hatch to erase extra type param (e.g. R)\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * There is no reason to directly create an instance of Subscriber. This type is exported for typings reasons.\n */\n constructor(destination?: Subscriber | Observer) {\n super();\n if (destination) {\n this.destination = destination;\n // Automatically chain subscriptions together here.\n // if destination is a Subscription, then it is a Subscriber.\n if (isSubscription(destination)) {\n destination.add(this);\n }\n } else {\n this.destination = EMPTY_OBSERVER;\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `next` from\n * the Observable, with a value. The Observable may call this method 0 or more\n * times.\n * @param {T} [value] The `next` value.\n * @return {void}\n */\n next(value?: T): void {\n if (this.isStopped) {\n handleStoppedNotification(nextNotification(value), this);\n } else {\n this._next(value!);\n }\n }\n\n /**\n * The {@link Observer} callback to receive notifications of type `error` from\n * the Observable, with an attached `Error`. Notifies the Observer that\n * the Observable has experienced an error condition.\n * @param {any} [err] The `error` exception.\n * @return {void}\n */\n error(err?: any): void {\n if (this.isStopped) {\n handleStoppedNotification(errorNotification(err), this);\n } else {\n this.isStopped = true;\n this._error(err);\n }\n }\n\n /**\n * The {@link Observer} callback to receive a valueless notification of type\n * `complete` from the Observable. Notifies the Observer that the Observable\n * has finished sending push-based notifications.\n * @return {void}\n */\n complete(): void {\n if (this.isStopped) {\n handleStoppedNotification(COMPLETE_NOTIFICATION, this);\n } else {\n this.isStopped = true;\n this._complete();\n }\n }\n\n unsubscribe(): void {\n if (!this.closed) {\n this.isStopped = true;\n super.unsubscribe();\n this.destination = null!;\n }\n }\n\n protected _next(value: T): void {\n this.destination.next(value);\n }\n\n protected _error(err: any): void {\n try {\n this.destination.error(err);\n } finally {\n this.unsubscribe();\n }\n }\n\n protected _complete(): void {\n try {\n this.destination.complete();\n } finally {\n this.unsubscribe();\n }\n }\n}\n\n/**\n * This bind is captured here because we want to be able to have\n * compatibility with monoid libraries that tend to use a method named\n * `bind`. In particular, a library called Monio requires this.\n */\nconst _bind = Function.prototype.bind;\n\nfunction bind any>(fn: Fn, thisArg: any): Fn {\n return _bind.call(fn, thisArg);\n}\n\n/**\n * Internal optimization only, DO NOT EXPOSE.\n * @internal\n */\nclass ConsumerObserver implements Observer {\n constructor(private partialObserver: Partial>) {}\n\n next(value: T): void {\n const { partialObserver } = this;\n if (partialObserver.next) {\n try {\n partialObserver.next(value);\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n\n error(err: any): void {\n const { partialObserver } = this;\n if (partialObserver.error) {\n try {\n partialObserver.error(err);\n } catch (error) {\n handleUnhandledError(error);\n }\n } else {\n handleUnhandledError(err);\n }\n }\n\n complete(): void {\n const { partialObserver } = this;\n if (partialObserver.complete) {\n try {\n partialObserver.complete();\n } catch (error) {\n handleUnhandledError(error);\n }\n }\n }\n}\n\nexport class SafeSubscriber extends Subscriber {\n constructor(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((e?: any) => void) | null,\n complete?: (() => void) | null\n ) {\n super();\n\n let partialObserver: Partial>;\n if (isFunction(observerOrNext) || !observerOrNext) {\n // The first argument is a function, not an observer. The next\n // two arguments *could* be observers, or they could be empty.\n partialObserver = {\n next: (observerOrNext ?? undefined) as (((value: T) => void) | undefined),\n error: error ?? undefined,\n complete: complete ?? undefined,\n };\n } else {\n // The first argument is a partial observer.\n let context: any;\n if (this && config.useDeprecatedNextContext) {\n // This is a deprecated path that made `this.unsubscribe()` available in\n // next handler functions passed to subscribe. This only exists behind a flag\n // now, as it is *very* slow.\n context = Object.create(observerOrNext);\n context.unsubscribe = () => this.unsubscribe();\n partialObserver = {\n next: observerOrNext.next && bind(observerOrNext.next, context),\n error: observerOrNext.error && bind(observerOrNext.error, context),\n complete: observerOrNext.complete && bind(observerOrNext.complete, context),\n };\n } else {\n // The \"normal\" path. Just use the partial observer directly.\n partialObserver = observerOrNext;\n }\n }\n\n // Wrap the partial observer to ensure it's a full observer, and\n // make sure proper error handling is accounted for.\n this.destination = new ConsumerObserver(partialObserver);\n }\n}\n\nfunction handleUnhandledError(error: any) {\n if (config.useDeprecatedSynchronousErrorHandling) {\n captureError(error);\n } else {\n // Ideal path, we report this as an unhandled error,\n // which is thrown on a new call stack.\n reportUnhandledError(error);\n }\n}\n\n/**\n * An error handler used when no error handler was supplied\n * to the SafeSubscriber -- meaning no error handler was supplied\n * do the `subscribe` call on our observable.\n * @param err The error to handle\n */\nfunction defaultErrorHandler(err: any) {\n throw err;\n}\n\n/**\n * A handler for notifications that cannot be sent to a stopped subscriber.\n * @param notification The notification being sent\n * @param subscriber The stopped subscriber\n */\nfunction handleStoppedNotification(notification: ObservableNotification, subscriber: Subscriber) {\n const { onStoppedNotification } = config;\n onStoppedNotification && timeoutProvider.setTimeout(() => onStoppedNotification(notification, subscriber));\n}\n\n/**\n * The observer used as a stub for subscriptions where the user did not\n * pass any arguments to `subscribe`. Comes with the default error handling\n * behavior.\n */\nexport const EMPTY_OBSERVER: Readonly> & { closed: true } = {\n closed: true,\n next: noop,\n error: defaultErrorHandler,\n complete: noop,\n};\n", "/**\n * Symbol.observable or a string \"@@observable\". Used for interop\n *\n * @deprecated We will no longer be exporting this symbol in upcoming versions of RxJS.\n * Instead polyfill and use Symbol.observable directly *or* use https://www.npmjs.com/package/symbol-observable\n */\nexport const observable: string | symbol = (() => (typeof Symbol === 'function' && Symbol.observable) || '@@observable')();\n", "/**\n * This function takes one parameter and just returns it. Simply put,\n * this is like `(x: T): T => x`.\n *\n * ## Examples\n *\n * This is useful in some cases when using things like `mergeMap`\n *\n * ```ts\n * import { interval, take, map, range, mergeMap, identity } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(5));\n *\n * const result$ = source$.pipe(\n * map(i => range(i)),\n * mergeMap(identity) // same as mergeMap(x => x)\n * );\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * Or when you want to selectively apply an operator\n *\n * ```ts\n * import { interval, take, identity } from 'rxjs';\n *\n * const shouldLimit = () => Math.random() < 0.5;\n *\n * const source$ = interval(1000);\n *\n * const result$ = source$.pipe(shouldLimit() ? take(5) : identity);\n *\n * result$.subscribe({\n * next: console.log\n * });\n * ```\n *\n * @param x Any value that is returned by this function\n * @returns The value passed as the first parameter to this function\n */\nexport function identity(x: T): T {\n return x;\n}\n", "import { identity } from './identity';\nimport { UnaryFunction } from '../types';\n\nexport function pipe(): typeof identity;\nexport function pipe(fn1: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction): UnaryFunction;\nexport function pipe(fn1: UnaryFunction, fn2: UnaryFunction, fn3: UnaryFunction): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction\n): UnaryFunction;\nexport function pipe(\n fn1: UnaryFunction,\n fn2: UnaryFunction,\n fn3: UnaryFunction,\n fn4: UnaryFunction,\n fn5: UnaryFunction,\n fn6: UnaryFunction,\n fn7: UnaryFunction,\n fn8: UnaryFunction,\n fn9: UnaryFunction,\n ...fns: UnaryFunction[]\n): UnaryFunction;\n\n/**\n * pipe() can be called on one or more functions, each of which can take one argument (\"UnaryFunction\")\n * and uses it to return a value.\n * It returns a function that takes one argument, passes it to the first UnaryFunction, and then\n * passes the result to the next one, passes that result to the next one, and so on. \n */\nexport function pipe(...fns: Array>): UnaryFunction {\n return pipeFromArray(fns);\n}\n\n/** @internal */\nexport function pipeFromArray(fns: Array>): UnaryFunction {\n if (fns.length === 0) {\n return identity as UnaryFunction;\n }\n\n if (fns.length === 1) {\n return fns[0];\n }\n\n return function piped(input: T): R {\n return fns.reduce((prev: any, fn: UnaryFunction) => fn(prev), input as any);\n };\n}\n", "import { Operator } from './Operator';\nimport { SafeSubscriber, Subscriber } from './Subscriber';\nimport { isSubscription, Subscription } from './Subscription';\nimport { TeardownLogic, OperatorFunction, Subscribable, Observer } from './types';\nimport { observable as Symbol_observable } from './symbol/observable';\nimport { pipeFromArray } from './util/pipe';\nimport { config } from './config';\nimport { isFunction } from './util/isFunction';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A representation of any set of values over any amount of time. This is the most basic building block\n * of RxJS.\n *\n * @class Observable\n */\nexport class Observable implements Subscribable {\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n source: Observable | undefined;\n\n /**\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n */\n operator: Operator | undefined;\n\n /**\n * @constructor\n * @param {Function} subscribe the function that is called when the Observable is\n * initially subscribed to. This function is given a Subscriber, to which new values\n * can be `next`ed, or an `error` method can be called to raise an error, or\n * `complete` can be called to notify of a successful completion.\n */\n constructor(subscribe?: (this: Observable, subscriber: Subscriber) => TeardownLogic) {\n if (subscribe) {\n this._subscribe = subscribe;\n }\n }\n\n // HACK: Since TypeScript inherits static properties too, we have to\n // fight against TypeScript here so Subject can have a different static create signature\n /**\n * Creates a new Observable by calling the Observable constructor\n * @owner Observable\n * @method create\n * @param {Function} subscribe? the subscriber function to be passed to the Observable constructor\n * @return {Observable} a new observable\n * @nocollapse\n * @deprecated Use `new Observable()` instead. Will be removed in v8.\n */\n static create: (...args: any[]) => any = (subscribe?: (subscriber: Subscriber) => TeardownLogic) => {\n return new Observable(subscribe);\n };\n\n /**\n * Creates a new Observable, with this Observable instance as the source, and the passed\n * operator defined as the new observable's operator.\n * @method lift\n * @param operator the operator defining the operation to take on the observable\n * @return a new observable with the Operator applied\n * @deprecated Internal implementation detail, do not use directly. Will be made internal in v8.\n * If you have implemented an operator using `lift`, it is recommended that you create an\n * operator by simply returning `new Observable()` directly. See \"Creating new operators from\n * scratch\" section here: https://rxjs.dev/guide/operators\n */\n lift(operator?: Operator): Observable {\n const observable = new Observable();\n observable.source = this;\n observable.operator = operator;\n return observable;\n }\n\n subscribe(observerOrNext?: Partial> | ((value: T) => void)): Subscription;\n /** @deprecated Instead of passing separate callback arguments, use an observer argument. Signatures taking separate callback arguments will be removed in v8. Details: https://rxjs.dev/deprecations/subscribe-arguments */\n subscribe(next?: ((value: T) => void) | null, error?: ((error: any) => void) | null, complete?: (() => void) | null): Subscription;\n /**\n * Invokes an execution of an Observable and registers Observer handlers for notifications it will emit.\n *\n * Use it when you have all these Observables, but still nothing is happening.\n *\n * `subscribe` is not a regular operator, but a method that calls Observable's internal `subscribe` function. It\n * might be for example a function that you passed to Observable's constructor, but most of the time it is\n * a library implementation, which defines what will be emitted by an Observable, and when it be will emitted. This means\n * that calling `subscribe` is actually the moment when Observable starts its work, not when it is created, as it is often\n * the thought.\n *\n * Apart from starting the execution of an Observable, this method allows you to listen for values\n * that an Observable emits, as well as for when it completes or errors. You can achieve this in two\n * of the following ways.\n *\n * The first way is creating an object that implements {@link Observer} interface. It should have methods\n * defined by that interface, but note that it should be just a regular JavaScript object, which you can create\n * yourself in any way you want (ES6 class, classic function constructor, object literal etc.). In particular, do\n * not attempt to use any RxJS implementation details to create Observers - you don't need them. Remember also\n * that your object does not have to implement all methods. If you find yourself creating a method that doesn't\n * do anything, you can simply omit it. Note however, if the `error` method is not provided and an error happens,\n * it will be thrown asynchronously. Errors thrown asynchronously cannot be caught using `try`/`catch`. Instead,\n * use the {@link onUnhandledError} configuration option or use a runtime handler (like `window.onerror` or\n * `process.on('error)`) to be notified of unhandled errors. Because of this, it's recommended that you provide\n * an `error` method to avoid missing thrown errors.\n *\n * The second way is to give up on Observer object altogether and simply provide callback functions in place of its methods.\n * This means you can provide three functions as arguments to `subscribe`, where the first function is equivalent\n * of a `next` method, the second of an `error` method and the third of a `complete` method. Just as in case of an Observer,\n * if you do not need to listen for something, you can omit a function by passing `undefined` or `null`,\n * since `subscribe` recognizes these functions by where they were placed in function call. When it comes\n * to the `error` function, as with an Observer, if not provided, errors emitted by an Observable will be thrown asynchronously.\n *\n * You can, however, subscribe with no parameters at all. This may be the case where you're not interested in terminal events\n * and you also handled emissions internally by using operators (e.g. using `tap`).\n *\n * Whichever style of calling `subscribe` you use, in both cases it returns a Subscription object.\n * This object allows you to call `unsubscribe` on it, which in turn will stop the work that an Observable does and will clean\n * up all resources that an Observable used. Note that cancelling a subscription will not call `complete` callback\n * provided to `subscribe` function, which is reserved for a regular completion signal that comes from an Observable.\n *\n * Remember that callbacks provided to `subscribe` are not guaranteed to be called asynchronously.\n * It is an Observable itself that decides when these functions will be called. For example {@link of}\n * by default emits all its values synchronously. Always check documentation for how given Observable\n * will behave when subscribed and if its default behavior can be modified with a `scheduler`.\n *\n * #### Examples\n *\n * Subscribe with an {@link guide/observer Observer}\n *\n * ```ts\n * import { of } from 'rxjs';\n *\n * const sumObserver = {\n * sum: 0,\n * next(value) {\n * console.log('Adding: ' + value);\n * this.sum = this.sum + value;\n * },\n * error() {\n * // We actually could just remove this method,\n * // since we do not really care about errors right now.\n * },\n * complete() {\n * console.log('Sum equals: ' + this.sum);\n * }\n * };\n *\n * of(1, 2, 3) // Synchronously emits 1, 2, 3 and then completes.\n * .subscribe(sumObserver);\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Subscribe with functions ({@link deprecations/subscribe-arguments deprecated})\n *\n * ```ts\n * import { of } from 'rxjs'\n *\n * let sum = 0;\n *\n * of(1, 2, 3).subscribe(\n * value => {\n * console.log('Adding: ' + value);\n * sum = sum + value;\n * },\n * undefined,\n * () => console.log('Sum equals: ' + sum)\n * );\n *\n * // Logs:\n * // 'Adding: 1'\n * // 'Adding: 2'\n * // 'Adding: 3'\n * // 'Sum equals: 6'\n * ```\n *\n * Cancel a subscription\n *\n * ```ts\n * import { interval } from 'rxjs';\n *\n * const subscription = interval(1000).subscribe({\n * next(num) {\n * console.log(num)\n * },\n * complete() {\n * // Will not be called, even when cancelling subscription.\n * console.log('completed!');\n * }\n * });\n *\n * setTimeout(() => {\n * subscription.unsubscribe();\n * console.log('unsubscribed!');\n * }, 2500);\n *\n * // Logs:\n * // 0 after 1s\n * // 1 after 2s\n * // 'unsubscribed!' after 2.5s\n * ```\n *\n * @param {Observer|Function} observerOrNext (optional) Either an observer with methods to be called,\n * or the first of three possible handlers, which is the handler for each value emitted from the subscribed\n * Observable.\n * @param {Function} error (optional) A handler for a terminal event resulting from an error. If no error handler is provided,\n * the error will be thrown asynchronously as unhandled.\n * @param {Function} complete (optional) A handler for a terminal event resulting from successful completion.\n * @return {Subscription} a subscription reference to the registered handlers\n * @method subscribe\n */\n subscribe(\n observerOrNext?: Partial> | ((value: T) => void) | null,\n error?: ((error: any) => void) | null,\n complete?: (() => void) | null\n ): Subscription {\n const subscriber = isSubscriber(observerOrNext) ? observerOrNext : new SafeSubscriber(observerOrNext, error, complete);\n\n errorContext(() => {\n const { operator, source } = this;\n subscriber.add(\n operator\n ? // We're dealing with a subscription in the\n // operator chain to one of our lifted operators.\n operator.call(subscriber, source)\n : source\n ? // If `source` has a value, but `operator` does not, something that\n // had intimate knowledge of our API, like our `Subject`, must have\n // set it. We're going to just call `_subscribe` directly.\n this._subscribe(subscriber)\n : // In all other cases, we're likely wrapping a user-provided initializer\n // function, so we need to catch errors and handle them appropriately.\n this._trySubscribe(subscriber)\n );\n });\n\n return subscriber;\n }\n\n /** @internal */\n protected _trySubscribe(sink: Subscriber): TeardownLogic {\n try {\n return this._subscribe(sink);\n } catch (err) {\n // We don't need to return anything in this case,\n // because it's just going to try to `add()` to a subscription\n // above.\n sink.error(err);\n }\n }\n\n /**\n * Used as a NON-CANCELLABLE means of subscribing to an observable, for use with\n * APIs that expect promises, like `async/await`. You cannot unsubscribe from this.\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * #### Example\n *\n * ```ts\n * import { interval, take } from 'rxjs';\n *\n * const source$ = interval(1000).pipe(take(4));\n *\n * async function getTotal() {\n * let total = 0;\n *\n * await source$.forEach(value => {\n * total += value;\n * console.log('observable -> ' + value);\n * });\n *\n * return total;\n * }\n *\n * getTotal().then(\n * total => console.log('Total: ' + total)\n * );\n *\n * // Expected:\n * // 'observable -> 0'\n * // 'observable -> 1'\n * // 'observable -> 2'\n * // 'observable -> 3'\n * // 'Total: 6'\n * ```\n *\n * @param next a handler for each value emitted by the observable\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n */\n forEach(next: (value: T) => void): Promise;\n\n /**\n * @param next a handler for each value emitted by the observable\n * @param promiseCtor a constructor function used to instantiate the Promise\n * @return a promise that either resolves on observable completion or\n * rejects with the handled error\n * @deprecated Passing a Promise constructor will no longer be available\n * in upcoming versions of RxJS. This is because it adds weight to the library, for very\n * little benefit. If you need this functionality, it is recommended that you either\n * polyfill Promise, or you create an adapter to convert the returned native promise\n * to whatever promise implementation you wanted. Will be removed in v8.\n */\n forEach(next: (value: T) => void, promiseCtor: PromiseConstructorLike): Promise;\n\n forEach(next: (value: T) => void, promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n const subscriber = new SafeSubscriber({\n next: (value) => {\n try {\n next(value);\n } catch (err) {\n reject(err);\n subscriber.unsubscribe();\n }\n },\n error: reject,\n complete: resolve,\n });\n this.subscribe(subscriber);\n }) as Promise;\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): TeardownLogic {\n return this.source?.subscribe(subscriber);\n }\n\n /**\n * An interop point defined by the es7-observable spec https://github.com/zenparsing/es-observable\n * @method Symbol.observable\n * @return {Observable} this instance of the observable\n */\n [Symbol_observable]() {\n return this;\n }\n\n /* tslint:disable:max-line-length */\n pipe(): Observable;\n pipe(op1: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction): Observable;\n pipe(op1: OperatorFunction, op2: OperatorFunction, op3: OperatorFunction): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction\n ): Observable;\n pipe(\n op1: OperatorFunction,\n op2: OperatorFunction,\n op3: OperatorFunction,\n op4: OperatorFunction,\n op5: OperatorFunction,\n op6: OperatorFunction,\n op7: OperatorFunction,\n op8: OperatorFunction,\n op9: OperatorFunction,\n ...operations: OperatorFunction[]\n ): Observable;\n /* tslint:enable:max-line-length */\n\n /**\n * Used to stitch together functional operators into a chain.\n * @method pipe\n * @return {Observable} the Observable result of all of the operators having\n * been called in the order they were passed in.\n *\n * ## Example\n *\n * ```ts\n * import { interval, filter, map, scan } from 'rxjs';\n *\n * interval(1000)\n * .pipe(\n * filter(x => x % 2 === 0),\n * map(x => x + x),\n * scan((acc, x) => acc + x)\n * )\n * .subscribe(x => console.log(x));\n * ```\n */\n pipe(...operations: OperatorFunction[]): Observable {\n return pipeFromArray(operations)(this);\n }\n\n /* tslint:disable:max-line-length */\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: typeof Promise): Promise;\n /** @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise */\n toPromise(PromiseCtor: PromiseConstructorLike): Promise;\n /* tslint:enable:max-line-length */\n\n /**\n * Subscribe to this Observable and get a Promise resolving on\n * `complete` with the last emission (if any).\n *\n * **WARNING**: Only use this with observables you *know* will complete. If the source\n * observable does not complete, you will end up with a promise that is hung up, and\n * potentially all of the state of an async function hanging out in memory. To avoid\n * this situation, look into adding something like {@link timeout}, {@link take},\n * {@link takeWhile}, or {@link takeUntil} amongst others.\n *\n * @method toPromise\n * @param [promiseCtor] a constructor function used to instantiate\n * the Promise\n * @return A Promise that resolves with the last value emit, or\n * rejects on an error. If there were no emissions, Promise\n * resolves with undefined.\n * @deprecated Replaced with {@link firstValueFrom} and {@link lastValueFrom}. Will be removed in v8. Details: https://rxjs.dev/deprecations/to-promise\n */\n toPromise(promiseCtor?: PromiseConstructorLike): Promise {\n promiseCtor = getPromiseCtor(promiseCtor);\n\n return new promiseCtor((resolve, reject) => {\n let value: T | undefined;\n this.subscribe(\n (x: T) => (value = x),\n (err: any) => reject(err),\n () => resolve(value)\n );\n }) as Promise;\n }\n}\n\n/**\n * Decides between a passed promise constructor from consuming code,\n * A default configured promise constructor, and the native promise\n * constructor and returns it. If nothing can be found, it will throw\n * an error.\n * @param promiseCtor The optional promise constructor to passed by consuming code\n */\nfunction getPromiseCtor(promiseCtor: PromiseConstructorLike | undefined) {\n return promiseCtor ?? config.Promise ?? Promise;\n}\n\nfunction isObserver(value: any): value is Observer {\n return value && isFunction(value.next) && isFunction(value.error) && isFunction(value.complete);\n}\n\nfunction isSubscriber(value: any): value is Subscriber {\n return (value && value instanceof Subscriber) || (isObserver(value) && isSubscription(value));\n}\n", "import { Observable } from '../Observable';\nimport { Subscriber } from '../Subscriber';\nimport { OperatorFunction } from '../types';\nimport { isFunction } from './isFunction';\n\n/**\n * Used to determine if an object is an Observable with a lift function.\n */\nexport function hasLift(source: any): source is { lift: InstanceType['lift'] } {\n return isFunction(source?.lift);\n}\n\n/**\n * Creates an `OperatorFunction`. Used to define operators throughout the library in a concise way.\n * @param init The logic to connect the liftedSource to the subscriber at the moment of subscription.\n */\nexport function operate(\n init: (liftedSource: Observable, subscriber: Subscriber) => (() => void) | void\n): OperatorFunction {\n return (source: Observable) => {\n if (hasLift(source)) {\n return source.lift(function (this: Subscriber, liftedSource: Observable) {\n try {\n return init(liftedSource, this);\n } catch (err) {\n this.error(err);\n }\n });\n }\n throw new TypeError('Unable to lift unknown Observable type');\n };\n}\n", "import { Subscriber } from '../Subscriber';\n\n/**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional teardown logic here. This will only be called on teardown if the\n * subscriber itself is not already closed. This is called after all other teardown logic is executed.\n */\nexport function createOperatorSubscriber(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n onFinalize?: () => void\n): Subscriber {\n return new OperatorSubscriber(destination, onNext, onComplete, onError, onFinalize);\n}\n\n/**\n * A generic helper for allowing operators to be created with a Subscriber and\n * use closures to capture necessary state from the operator function itself.\n */\nexport class OperatorSubscriber extends Subscriber {\n /**\n * Creates an instance of an `OperatorSubscriber`.\n * @param destination The downstream subscriber.\n * @param onNext Handles next values, only called if this subscriber is not stopped or closed. Any\n * error that occurs in this function is caught and sent to the `error` method of this subscriber.\n * @param onError Handles errors from the subscription, any errors that occur in this handler are caught\n * and send to the `destination` error handler.\n * @param onComplete Handles completion notification from the subscription. Any errors that occur in\n * this handler are sent to the `destination` error handler.\n * @param onFinalize Additional finalization logic here. This will only be called on finalization if the\n * subscriber itself is not already closed. This is called after all other finalization logic is executed.\n * @param shouldUnsubscribe An optional check to see if an unsubscribe call should truly unsubscribe.\n * NOTE: This currently **ONLY** exists to support the strange behavior of {@link groupBy}, where unsubscription\n * to the resulting observable does not actually disconnect from the source if there are active subscriptions\n * to any grouped observable. (DO NOT EXPOSE OR USE EXTERNALLY!!!)\n */\n constructor(\n destination: Subscriber,\n onNext?: (value: T) => void,\n onComplete?: () => void,\n onError?: (err: any) => void,\n private onFinalize?: () => void,\n private shouldUnsubscribe?: () => boolean\n ) {\n // It's important - for performance reasons - that all of this class's\n // members are initialized and that they are always initialized in the same\n // order. This will ensure that all OperatorSubscriber instances have the\n // same hidden class in V8. This, in turn, will help keep the number of\n // hidden classes involved in property accesses within the base class as\n // low as possible. If the number of hidden classes involved exceeds four,\n // the property accesses will become megamorphic and performance penalties\n // will be incurred - i.e. inline caches won't be used.\n //\n // The reasons for ensuring all instances have the same hidden class are\n // further discussed in this blog post from Benedikt Meurer:\n // https://benediktmeurer.de/2018/03/23/impact-of-polymorphism-on-component-based-frameworks-like-react/\n super(destination);\n this._next = onNext\n ? function (this: OperatorSubscriber, value: T) {\n try {\n onNext(value);\n } catch (err) {\n destination.error(err);\n }\n }\n : super._next;\n this._error = onError\n ? function (this: OperatorSubscriber, err: any) {\n try {\n onError(err);\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._error;\n this._complete = onComplete\n ? function (this: OperatorSubscriber) {\n try {\n onComplete();\n } catch (err) {\n // Send any errors that occur down stream.\n destination.error(err);\n } finally {\n // Ensure finalization.\n this.unsubscribe();\n }\n }\n : super._complete;\n }\n\n unsubscribe() {\n if (!this.shouldUnsubscribe || this.shouldUnsubscribe()) {\n const { closed } = this;\n super.unsubscribe();\n // Execute additional teardown if we have any and we didn't already do so.\n !closed && this.onFinalize?.();\n }\n }\n}\n", "import { Subscription } from '../Subscription';\n\ninterface AnimationFrameProvider {\n schedule(callback: FrameRequestCallback): Subscription;\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n delegate:\n | {\n requestAnimationFrame: typeof requestAnimationFrame;\n cancelAnimationFrame: typeof cancelAnimationFrame;\n }\n | undefined;\n}\n\nexport const animationFrameProvider: AnimationFrameProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n schedule(callback) {\n let request = requestAnimationFrame;\n let cancel: typeof cancelAnimationFrame | undefined = cancelAnimationFrame;\n const { delegate } = animationFrameProvider;\n if (delegate) {\n request = delegate.requestAnimationFrame;\n cancel = delegate.cancelAnimationFrame;\n }\n const handle = request((timestamp) => {\n // Clear the cancel function. The request has been fulfilled, so\n // attempting to cancel the request upon unsubscription would be\n // pointless.\n cancel = undefined;\n callback(timestamp);\n });\n return new Subscription(() => cancel?.(handle));\n },\n requestAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.requestAnimationFrame || requestAnimationFrame)(...args);\n },\n cancelAnimationFrame(...args) {\n const { delegate } = animationFrameProvider;\n return (delegate?.cancelAnimationFrame || cancelAnimationFrame)(...args);\n },\n delegate: undefined,\n};\n", "import { createErrorClass } from './createErrorClass';\n\nexport interface ObjectUnsubscribedError extends Error {}\n\nexport interface ObjectUnsubscribedErrorCtor {\n /**\n * @deprecated Internal implementation detail. Do not construct error instances.\n * Cannot be tagged as internal: https://github.com/ReactiveX/rxjs/issues/6269\n */\n new (): ObjectUnsubscribedError;\n}\n\n/**\n * An error thrown when an action is invalid because the object has been\n * unsubscribed.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n *\n * @class ObjectUnsubscribedError\n */\nexport const ObjectUnsubscribedError: ObjectUnsubscribedErrorCtor = createErrorClass(\n (_super) =>\n function ObjectUnsubscribedErrorImpl(this: any) {\n _super(this);\n this.name = 'ObjectUnsubscribedError';\n this.message = 'object unsubscribed';\n }\n);\n", "import { Operator } from './Operator';\nimport { Observable } from './Observable';\nimport { Subscriber } from './Subscriber';\nimport { Subscription, EMPTY_SUBSCRIPTION } from './Subscription';\nimport { Observer, SubscriptionLike, TeardownLogic } from './types';\nimport { ObjectUnsubscribedError } from './util/ObjectUnsubscribedError';\nimport { arrRemove } from './util/arrRemove';\nimport { errorContext } from './util/errorContext';\n\n/**\n * A Subject is a special type of Observable that allows values to be\n * multicasted to many Observers. Subjects are like EventEmitters.\n *\n * Every Subject is an Observable and an Observer. You can subscribe to a\n * Subject, and you can call next to feed values as well as error and complete.\n */\nexport class Subject extends Observable implements SubscriptionLike {\n closed = false;\n\n private currentObservers: Observer[] | null = null;\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n observers: Observer[] = [];\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n isStopped = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n hasError = false;\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n thrownError: any = null;\n\n /**\n * Creates a \"subject\" by basically gluing an observer to an observable.\n *\n * @nocollapse\n * @deprecated Recommended you do not use. Will be removed at some point in the future. Plans for replacement still under discussion.\n */\n static create: (...args: any[]) => any = (destination: Observer, source: Observable): AnonymousSubject => {\n return new AnonymousSubject(destination, source);\n };\n\n constructor() {\n // NOTE: This must be here to obscure Observable's constructor.\n super();\n }\n\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n lift(operator: Operator): Observable {\n const subject = new AnonymousSubject(this, this);\n subject.operator = operator as any;\n return subject as any;\n }\n\n /** @internal */\n protected _throwIfClosed() {\n if (this.closed) {\n throw new ObjectUnsubscribedError();\n }\n }\n\n next(value: T) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n if (!this.currentObservers) {\n this.currentObservers = Array.from(this.observers);\n }\n for (const observer of this.currentObservers) {\n observer.next(value);\n }\n }\n });\n }\n\n error(err: any) {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.hasError = this.isStopped = true;\n this.thrownError = err;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.error(err);\n }\n }\n });\n }\n\n complete() {\n errorContext(() => {\n this._throwIfClosed();\n if (!this.isStopped) {\n this.isStopped = true;\n const { observers } = this;\n while (observers.length) {\n observers.shift()!.complete();\n }\n }\n });\n }\n\n unsubscribe() {\n this.isStopped = this.closed = true;\n this.observers = this.currentObservers = null!;\n }\n\n get observed() {\n return this.observers?.length > 0;\n }\n\n /** @internal */\n protected _trySubscribe(subscriber: Subscriber): TeardownLogic {\n this._throwIfClosed();\n return super._trySubscribe(subscriber);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._checkFinalizedStatuses(subscriber);\n return this._innerSubscribe(subscriber);\n }\n\n /** @internal */\n protected _innerSubscribe(subscriber: Subscriber) {\n const { hasError, isStopped, observers } = this;\n if (hasError || isStopped) {\n return EMPTY_SUBSCRIPTION;\n }\n this.currentObservers = null;\n observers.push(subscriber);\n return new Subscription(() => {\n this.currentObservers = null;\n arrRemove(observers, subscriber);\n });\n }\n\n /** @internal */\n protected _checkFinalizedStatuses(subscriber: Subscriber) {\n const { hasError, thrownError, isStopped } = this;\n if (hasError) {\n subscriber.error(thrownError);\n } else if (isStopped) {\n subscriber.complete();\n }\n }\n\n /**\n * Creates a new Observable with this Subject as the source. You can do this\n * to create custom Observer-side logic of the Subject and conceal it from\n * code that uses the Observable.\n * @return {Observable} Observable that the Subject casts to\n */\n asObservable(): Observable {\n const observable: any = new Observable();\n observable.source = this;\n return observable;\n }\n}\n\n/**\n * @class AnonymousSubject\n */\nexport class AnonymousSubject extends Subject {\n constructor(\n /** @deprecated Internal implementation detail, do not use directly. Will be made internal in v8. */\n public destination?: Observer,\n source?: Observable\n ) {\n super();\n this.source = source;\n }\n\n next(value: T) {\n this.destination?.next?.(value);\n }\n\n error(err: any) {\n this.destination?.error?.(err);\n }\n\n complete() {\n this.destination?.complete?.();\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n return this.source?.subscribe(subscriber) ?? EMPTY_SUBSCRIPTION;\n }\n}\n", "import { Subject } from './Subject';\nimport { Subscriber } from './Subscriber';\nimport { Subscription } from './Subscription';\n\n/**\n * A variant of Subject that requires an initial value and emits its current\n * value whenever it is subscribed to.\n *\n * @class BehaviorSubject\n */\nexport class BehaviorSubject extends Subject {\n constructor(private _value: T) {\n super();\n }\n\n get value(): T {\n return this.getValue();\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n const subscription = super._subscribe(subscriber);\n !subscription.closed && subscriber.next(this._value);\n return subscription;\n }\n\n getValue(): T {\n const { hasError, thrownError, _value } = this;\n if (hasError) {\n throw thrownError;\n }\n this._throwIfClosed();\n return _value;\n }\n\n next(value: T): void {\n super.next((this._value = value));\n }\n}\n", "import { TimestampProvider } from '../types';\n\ninterface DateTimestampProvider extends TimestampProvider {\n delegate: TimestampProvider | undefined;\n}\n\nexport const dateTimestampProvider: DateTimestampProvider = {\n now() {\n // Use the variable rather than `this` so that the function can be called\n // without being bound to the provider.\n return (dateTimestampProvider.delegate || Date).now();\n },\n delegate: undefined,\n};\n", "import { Subject } from './Subject';\nimport { TimestampProvider } from './types';\nimport { Subscriber } from './Subscriber';\nimport { Subscription } from './Subscription';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * A variant of {@link Subject} that \"replays\" old values to new subscribers by emitting them when they first subscribe.\n *\n * `ReplaySubject` has an internal buffer that will store a specified number of values that it has observed. Like `Subject`,\n * `ReplaySubject` \"observes\" values by having them passed to its `next` method. When it observes a value, it will store that\n * value for a time determined by the configuration of the `ReplaySubject`, as passed to its constructor.\n *\n * When a new subscriber subscribes to the `ReplaySubject` instance, it will synchronously emit all values in its buffer in\n * a First-In-First-Out (FIFO) manner. The `ReplaySubject` will also complete, if it has observed completion; and it will\n * error if it has observed an error.\n *\n * There are two main configuration items to be concerned with:\n *\n * 1. `bufferSize` - This will determine how many items are stored in the buffer, defaults to infinite.\n * 2. `windowTime` - The amount of time to hold a value in the buffer before removing it from the buffer.\n *\n * Both configurations may exist simultaneously. So if you would like to buffer a maximum of 3 values, as long as the values\n * are less than 2 seconds old, you could do so with a `new ReplaySubject(3, 2000)`.\n *\n * ### Differences with BehaviorSubject\n *\n * `BehaviorSubject` is similar to `new ReplaySubject(1)`, with a couple of exceptions:\n *\n * 1. `BehaviorSubject` comes \"primed\" with a single value upon construction.\n * 2. `ReplaySubject` will replay values, even after observing an error, where `BehaviorSubject` will not.\n *\n * @see {@link Subject}\n * @see {@link BehaviorSubject}\n * @see {@link shareReplay}\n */\nexport class ReplaySubject extends Subject {\n private _buffer: (T | number)[] = [];\n private _infiniteTimeWindow = true;\n\n /**\n * @param bufferSize The size of the buffer to replay on subscription\n * @param windowTime The amount of time the buffered items will stay buffered\n * @param timestampProvider An object with a `now()` method that provides the current timestamp. This is used to\n * calculate the amount of time something has been buffered.\n */\n constructor(\n private _bufferSize = Infinity,\n private _windowTime = Infinity,\n private _timestampProvider: TimestampProvider = dateTimestampProvider\n ) {\n super();\n this._infiniteTimeWindow = _windowTime === Infinity;\n this._bufferSize = Math.max(1, _bufferSize);\n this._windowTime = Math.max(1, _windowTime);\n }\n\n next(value: T): void {\n const { isStopped, _buffer, _infiniteTimeWindow, _timestampProvider, _windowTime } = this;\n if (!isStopped) {\n _buffer.push(value);\n !_infiniteTimeWindow && _buffer.push(_timestampProvider.now() + _windowTime);\n }\n this._trimBuffer();\n super.next(value);\n }\n\n /** @internal */\n protected _subscribe(subscriber: Subscriber): Subscription {\n this._throwIfClosed();\n this._trimBuffer();\n\n const subscription = this._innerSubscribe(subscriber);\n\n const { _infiniteTimeWindow, _buffer } = this;\n // We use a copy here, so reentrant code does not mutate our array while we're\n // emitting it to a new subscriber.\n const copy = _buffer.slice();\n for (let i = 0; i < copy.length && !subscriber.closed; i += _infiniteTimeWindow ? 1 : 2) {\n subscriber.next(copy[i] as T);\n }\n\n this._checkFinalizedStatuses(subscriber);\n\n return subscription;\n }\n\n private _trimBuffer() {\n const { _bufferSize, _timestampProvider, _buffer, _infiniteTimeWindow } = this;\n // If we don't have an infinite buffer size, and we're over the length,\n // use splice to truncate the old buffer values off. Note that we have to\n // double the size for instances where we're not using an infinite time window\n // because we're storing the values and the timestamps in the same array.\n const adjustedBufferSize = (_infiniteTimeWindow ? 1 : 2) * _bufferSize;\n _bufferSize < Infinity && adjustedBufferSize < _buffer.length && _buffer.splice(0, _buffer.length - adjustedBufferSize);\n\n // Now, if we're not in an infinite time window, remove all values where the time is\n // older than what is allowed.\n if (!_infiniteTimeWindow) {\n const now = _timestampProvider.now();\n let last = 0;\n // Search the array for the first timestamp that isn't expired and\n // truncate the buffer up to that point.\n for (let i = 1; i < _buffer.length && (_buffer[i] as number) <= now; i += 2) {\n last = i;\n }\n last && _buffer.splice(0, last + 1);\n }\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Subscription } from '../Subscription';\nimport { SchedulerAction } from '../types';\n\n/**\n * A unit of work to be executed in a `scheduler`. An action is typically\n * created from within a {@link SchedulerLike} and an RxJS user does not need to concern\n * themselves about creating and manipulating an Action.\n *\n * ```ts\n * class Action extends Subscription {\n * new (scheduler: Scheduler, work: (state?: T) => void);\n * schedule(state?: T, delay: number = 0): Subscription;\n * }\n * ```\n *\n * @class Action\n */\nexport class Action extends Subscription {\n constructor(scheduler: Scheduler, work: (this: SchedulerAction, state?: T) => void) {\n super();\n }\n /**\n * Schedules this action on its parent {@link SchedulerLike} for execution. May be passed\n * some context object, `state`. May happen at some point in the future,\n * according to the `delay` parameter, if specified.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler.\n * @return {void}\n */\n public schedule(state?: T, delay: number = 0): Subscription {\n return this;\n }\n}\n", "import type { TimerHandle } from './timerHandle';\ntype SetIntervalFunction = (handler: () => void, timeout?: number, ...args: any[]) => TimerHandle;\ntype ClearIntervalFunction = (handle: TimerHandle) => void;\n\ninterface IntervalProvider {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n delegate:\n | {\n setInterval: SetIntervalFunction;\n clearInterval: ClearIntervalFunction;\n }\n | undefined;\n}\n\nexport const intervalProvider: IntervalProvider = {\n // When accessing the delegate, use the variable rather than `this` so that\n // the functions can be called without being bound to the provider.\n setInterval(handler: () => void, timeout?: number, ...args) {\n const { delegate } = intervalProvider;\n if (delegate?.setInterval) {\n return delegate.setInterval(handler, timeout, ...args);\n }\n return setInterval(handler, timeout, ...args);\n },\n clearInterval(handle) {\n const { delegate } = intervalProvider;\n return (delegate?.clearInterval || clearInterval)(handle as any);\n },\n delegate: undefined,\n};\n", "import { Action } from './Action';\nimport { SchedulerAction } from '../types';\nimport { Subscription } from '../Subscription';\nimport { AsyncScheduler } from './AsyncScheduler';\nimport { intervalProvider } from './intervalProvider';\nimport { arrRemove } from '../util/arrRemove';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncAction extends Action {\n public id: TimerHandle | undefined;\n public state?: T;\n // @ts-ignore: Property has no initializer and is not definitely assigned\n public delay: number;\n protected pending: boolean = false;\n\n constructor(protected scheduler: AsyncScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n public schedule(state?: T, delay: number = 0): Subscription {\n if (this.closed) {\n return this;\n }\n\n // Always replace the current state with the new state.\n this.state = state;\n\n const id = this.id;\n const scheduler = this.scheduler;\n\n //\n // Important implementation note:\n //\n // Actions only execute once by default, unless rescheduled from within the\n // scheduled callback. This allows us to implement single and repeat\n // actions via the same code path, without adding API surface area, as well\n // as mimic traditional recursion but across asynchronous boundaries.\n //\n // However, JS runtimes and timers distinguish between intervals achieved by\n // serial `setTimeout` calls vs. a single `setInterval` call. An interval of\n // serial `setTimeout` calls can be individually delayed, which delays\n // scheduling the next `setTimeout`, and so on. `setInterval` attempts to\n // guarantee the interval callback will be invoked more precisely to the\n // interval period, regardless of load.\n //\n // Therefore, we use `setInterval` to schedule single and repeat actions.\n // If the action reschedules itself with the same delay, the interval is not\n // canceled. If the action doesn't reschedule, or reschedules with a\n // different delay, the interval will be canceled after scheduled callback\n // execution.\n //\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, delay);\n }\n\n // Set the pending flag indicating that this action has been scheduled, or\n // has recursively rescheduled itself.\n this.pending = true;\n\n this.delay = delay;\n // If this action has already an async Id, don't request a new one.\n this.id = this.id ?? this.requestAsyncId(scheduler, this.id, delay);\n\n return this;\n }\n\n protected requestAsyncId(scheduler: AsyncScheduler, _id?: TimerHandle, delay: number = 0): TimerHandle {\n return intervalProvider.setInterval(scheduler.flush.bind(scheduler, this), delay);\n }\n\n protected recycleAsyncId(_scheduler: AsyncScheduler, id?: TimerHandle, delay: number | null = 0): TimerHandle | undefined {\n // If this action is rescheduled with the same delay time, don't clear the interval id.\n if (delay != null && this.delay === delay && this.pending === false) {\n return id;\n }\n // Otherwise, if the action's delay time is different from the current delay,\n // or the action has been rescheduled before it's executed, clear the interval id\n if (id != null) {\n intervalProvider.clearInterval(id);\n }\n\n return undefined;\n }\n\n /**\n * Immediately executes this action and the `work` it contains.\n * @return {any}\n */\n public execute(state: T, delay: number): any {\n if (this.closed) {\n return new Error('executing a cancelled action');\n }\n\n this.pending = false;\n const error = this._execute(state, delay);\n if (error) {\n return error;\n } else if (this.pending === false && this.id != null) {\n // Dequeue if the action didn't reschedule itself. Don't call\n // unsubscribe(), because the action could reschedule later.\n // For example:\n // ```\n // scheduler.schedule(function doWork(counter) {\n // /* ... I'm a busy worker bee ... */\n // var originalAction = this;\n // /* wait 100ms before rescheduling the action */\n // setTimeout(function () {\n // originalAction.schedule(counter + 1);\n // }, 100);\n // }, 1000);\n // ```\n this.id = this.recycleAsyncId(this.scheduler, this.id, null);\n }\n }\n\n protected _execute(state: T, _delay: number): any {\n let errored: boolean = false;\n let errorValue: any;\n try {\n this.work(state);\n } catch (e) {\n errored = true;\n // HACK: Since code elsewhere is relying on the \"truthiness\" of the\n // return here, we can't have it return \"\" or 0 or false.\n // TODO: Clean this up when we refactor schedulers mid-version-8 or so.\n errorValue = e ? e : new Error('Scheduled action threw falsy error');\n }\n if (errored) {\n this.unsubscribe();\n return errorValue;\n }\n }\n\n unsubscribe() {\n if (!this.closed) {\n const { id, scheduler } = this;\n const { actions } = scheduler;\n\n this.work = this.state = this.scheduler = null!;\n this.pending = false;\n\n arrRemove(actions, this);\n if (id != null) {\n this.id = this.recycleAsyncId(scheduler, id, null);\n }\n\n this.delay = null!;\n super.unsubscribe();\n }\n }\n}\n", "import { Action } from './scheduler/Action';\nimport { Subscription } from './Subscription';\nimport { SchedulerLike, SchedulerAction } from './types';\nimport { dateTimestampProvider } from './scheduler/dateTimestampProvider';\n\n/**\n * An execution context and a data structure to order tasks and schedule their\n * execution. Provides a notion of (potentially virtual) time, through the\n * `now()` getter method.\n *\n * Each unit of work in a Scheduler is called an `Action`.\n *\n * ```ts\n * class Scheduler {\n * now(): number;\n * schedule(work, delay?, state?): Subscription;\n * }\n * ```\n *\n * @class Scheduler\n * @deprecated Scheduler is an internal implementation detail of RxJS, and\n * should not be used directly. Rather, create your own class and implement\n * {@link SchedulerLike}. Will be made internal in v8.\n */\nexport class Scheduler implements SchedulerLike {\n public static now: () => number = dateTimestampProvider.now;\n\n constructor(private schedulerActionCtor: typeof Action, now: () => number = Scheduler.now) {\n this.now = now;\n }\n\n /**\n * A getter method that returns a number representing the current time\n * (at the time this function was called) according to the scheduler's own\n * internal clock.\n * @return {number} A number that represents the current time. May or may not\n * have a relation to wall-clock time. May or may not refer to a time unit\n * (e.g. milliseconds).\n */\n public now: () => number;\n\n /**\n * Schedules a function, `work`, for execution. May happen at some point in\n * the future, according to the `delay` parameter, if specified. May be passed\n * some context object, `state`, which will be passed to the `work` function.\n *\n * The given arguments will be processed an stored as an Action object in a\n * queue of actions.\n *\n * @param {function(state: ?T): ?Subscription} work A function representing a\n * task, or some unit of work to be executed by the Scheduler.\n * @param {number} [delay] Time to wait before executing the work, where the\n * time unit is implicit and defined by the Scheduler itself.\n * @param {T} [state] Some contextual data that the `work` function uses when\n * called by the Scheduler.\n * @return {Subscription} A subscription in order to be able to unsubscribe\n * the scheduled work.\n */\n public schedule(work: (this: SchedulerAction, state?: T) => void, delay: number = 0, state?: T): Subscription {\n return new this.schedulerActionCtor(this, work).schedule(state, delay);\n }\n}\n", "import { Scheduler } from '../Scheduler';\nimport { Action } from './Action';\nimport { AsyncAction } from './AsyncAction';\nimport { TimerHandle } from './timerHandle';\n\nexport class AsyncScheduler extends Scheduler {\n public actions: Array> = [];\n /**\n * A flag to indicate whether the Scheduler is currently executing a batch of\n * queued actions.\n * @type {boolean}\n * @internal\n */\n public _active: boolean = false;\n /**\n * An internal ID used to track the latest asynchronous task such as those\n * coming from `setTimeout`, `setInterval`, `requestAnimationFrame`, and\n * others.\n * @type {any}\n * @internal\n */\n public _scheduled: TimerHandle | undefined;\n\n constructor(SchedulerAction: typeof Action, now: () => number = Scheduler.now) {\n super(SchedulerAction, now);\n }\n\n public flush(action: AsyncAction): void {\n const { actions } = this;\n\n if (this._active) {\n actions.push(action);\n return;\n }\n\n let error: any;\n this._active = true;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions.shift()!)); // exhaust the scheduler queue\n\n this._active = false;\n\n if (error) {\n while ((action = actions.shift()!)) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\n/**\n *\n * Async Scheduler\n *\n * Schedule task as if you used setTimeout(task, duration)\n *\n * `async` scheduler schedules tasks asynchronously, by putting them on the JavaScript\n * event loop queue. It is best used to delay tasks in time or to schedule tasks repeating\n * in intervals.\n *\n * If you just want to \"defer\" task, that is to perform it right after currently\n * executing synchronous code ends (commonly achieved by `setTimeout(deferredTask, 0)`),\n * better choice will be the {@link asapScheduler} scheduler.\n *\n * ## Examples\n * Use async scheduler to delay task\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * const task = () => console.log('it works!');\n *\n * asyncScheduler.schedule(task, 2000);\n *\n * // After 2 seconds logs:\n * // \"it works!\"\n * ```\n *\n * Use async scheduler to repeat task in intervals\n * ```ts\n * import { asyncScheduler } from 'rxjs';\n *\n * function task(state) {\n * console.log(state);\n * this.schedule(state + 1, 1000); // `this` references currently executing Action,\n * // which we reschedule with new state and delay\n * }\n *\n * asyncScheduler.schedule(task, 3000, 0);\n *\n * // Logs:\n * // 0 after 3s\n * // 1 after 4s\n * // 2 after 5s\n * // 3 after 6s\n * ```\n */\n\nexport const asyncScheduler = new AsyncScheduler(AsyncAction);\n\n/**\n * @deprecated Renamed to {@link asyncScheduler}. Will be removed in v8.\n */\nexport const async = asyncScheduler;\n", "import { AsyncAction } from './AsyncAction';\nimport { Subscription } from '../Subscription';\nimport { QueueScheduler } from './QueueScheduler';\nimport { SchedulerAction } from '../types';\nimport { TimerHandle } from './timerHandle';\n\nexport class QueueAction extends AsyncAction {\n constructor(protected scheduler: QueueScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n public schedule(state?: T, delay: number = 0): Subscription {\n if (delay > 0) {\n return super.schedule(state, delay);\n }\n this.delay = delay;\n this.state = state;\n this.scheduler.flush(this);\n return this;\n }\n\n public execute(state: T, delay: number): any {\n return delay > 0 || this.closed ? super.execute(state, delay) : this._execute(state, delay);\n }\n\n protected requestAsyncId(scheduler: QueueScheduler, id?: TimerHandle, delay: number = 0): TimerHandle {\n // If delay exists and is greater than 0, or if the delay is null (the\n // action wasn't rescheduled) but was originally scheduled as an async\n // action, then recycle as an async action.\n\n if ((delay != null && delay > 0) || (delay == null && this.delay > 0)) {\n return super.requestAsyncId(scheduler, id, delay);\n }\n\n // Otherwise flush the scheduler starting with this action.\n scheduler.flush(this);\n\n // HACK: In the past, this was returning `void`. However, `void` isn't a valid\n // `TimerHandle`, and generally the return value here isn't really used. So the\n // compromise is to return `0` which is both \"falsy\" and a valid `TimerHandle`,\n // as opposed to refactoring every other instanceo of `requestAsyncId`.\n return 0;\n }\n}\n", "import { AsyncScheduler } from './AsyncScheduler';\n\nexport class QueueScheduler extends AsyncScheduler {\n}\n", "import { QueueAction } from './QueueAction';\nimport { QueueScheduler } from './QueueScheduler';\n\n/**\n *\n * Queue Scheduler\n *\n * Put every next task on a queue, instead of executing it immediately\n *\n * `queue` scheduler, when used with delay, behaves the same as {@link asyncScheduler} scheduler.\n *\n * When used without delay, it schedules given task synchronously - executes it right when\n * it is scheduled. However when called recursively, that is when inside the scheduled task,\n * another task is scheduled with queue scheduler, instead of executing immediately as well,\n * that task will be put on a queue and wait for current one to finish.\n *\n * This means that when you execute task with `queue` scheduler, you are sure it will end\n * before any other task scheduled with that scheduler will start.\n *\n * ## Examples\n * Schedule recursively first, then do something\n * ```ts\n * import { queueScheduler } from 'rxjs';\n *\n * queueScheduler.schedule(() => {\n * queueScheduler.schedule(() => console.log('second')); // will not happen now, but will be put on a queue\n *\n * console.log('first');\n * });\n *\n * // Logs:\n * // \"first\"\n * // \"second\"\n * ```\n *\n * Reschedule itself recursively\n * ```ts\n * import { queueScheduler } from 'rxjs';\n *\n * queueScheduler.schedule(function(state) {\n * if (state !== 0) {\n * console.log('before', state);\n * this.schedule(state - 1); // `this` references currently executing Action,\n * // which we reschedule with new state\n * console.log('after', state);\n * }\n * }, 0, 3);\n *\n * // In scheduler that runs recursively, you would expect:\n * // \"before\", 3\n * // \"before\", 2\n * // \"before\", 1\n * // \"after\", 1\n * // \"after\", 2\n * // \"after\", 3\n *\n * // But with queue it logs:\n * // \"before\", 3\n * // \"after\", 3\n * // \"before\", 2\n * // \"after\", 2\n * // \"before\", 1\n * // \"after\", 1\n * ```\n */\n\nexport const queueScheduler = new QueueScheduler(QueueAction);\n\n/**\n * @deprecated Renamed to {@link queueScheduler}. Will be removed in v8.\n */\nexport const queue = queueScheduler;\n", "import { AsyncAction } from './AsyncAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\nimport { SchedulerAction } from '../types';\nimport { animationFrameProvider } from './animationFrameProvider';\nimport { TimerHandle } from './timerHandle';\n\nexport class AnimationFrameAction extends AsyncAction {\n constructor(protected scheduler: AnimationFrameScheduler, protected work: (this: SchedulerAction, state?: T) => void) {\n super(scheduler, work);\n }\n\n protected requestAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle {\n // If delay is greater than 0, request as an async action.\n if (delay !== null && delay > 0) {\n return super.requestAsyncId(scheduler, id, delay);\n }\n // Push the action to the end of the scheduler queue.\n scheduler.actions.push(this);\n // If an animation frame has already been requested, don't request another\n // one. If an animation frame hasn't been requested yet, request one. Return\n // the current animation frame request id.\n return scheduler._scheduled || (scheduler._scheduled = animationFrameProvider.requestAnimationFrame(() => scheduler.flush(undefined)));\n }\n\n protected recycleAsyncId(scheduler: AnimationFrameScheduler, id?: TimerHandle, delay: number = 0): TimerHandle | undefined {\n // If delay exists and is greater than 0, or if the delay is null (the\n // action wasn't rescheduled) but was originally scheduled as an async\n // action, then recycle as an async action.\n if (delay != null ? delay > 0 : this.delay > 0) {\n return super.recycleAsyncId(scheduler, id, delay);\n }\n // If the scheduler queue has no remaining actions with the same async id,\n // cancel the requested animation frame and set the scheduled flag to\n // undefined so the next AnimationFrameAction will request its own.\n const { actions } = scheduler;\n if (id != null && actions[actions.length - 1]?.id !== id) {\n animationFrameProvider.cancelAnimationFrame(id as number);\n scheduler._scheduled = undefined;\n }\n // Return undefined so the action knows to request a new async id if it's rescheduled.\n return undefined;\n }\n}\n", "import { AsyncAction } from './AsyncAction';\nimport { AsyncScheduler } from './AsyncScheduler';\n\nexport class AnimationFrameScheduler extends AsyncScheduler {\n public flush(action?: AsyncAction): void {\n this._active = true;\n // The async id that effects a call to flush is stored in _scheduled.\n // Before executing an action, it's necessary to check the action's async\n // id to determine whether it's supposed to be executed in the current\n // flush.\n // Previous implementations of this method used a count to determine this,\n // but that was unsound, as actions that are unsubscribed - i.e. cancelled -\n // are removed from the actions array and that can shift actions that are\n // scheduled to be executed in a subsequent flush into positions at which\n // they are executed within the current flush.\n const flushId = this._scheduled;\n this._scheduled = undefined;\n\n const { actions } = this;\n let error: any;\n action = action || actions.shift()!;\n\n do {\n if ((error = action.execute(action.state, action.delay))) {\n break;\n }\n } while ((action = actions[0]) && action.id === flushId && actions.shift());\n\n this._active = false;\n\n if (error) {\n while ((action = actions[0]) && action.id === flushId && actions.shift()) {\n action.unsubscribe();\n }\n throw error;\n }\n }\n}\n", "import { AnimationFrameAction } from './AnimationFrameAction';\nimport { AnimationFrameScheduler } from './AnimationFrameScheduler';\n\n/**\n *\n * Animation Frame Scheduler\n *\n * Perform task when `window.requestAnimationFrame` would fire\n *\n * When `animationFrame` scheduler is used with delay, it will fall back to {@link asyncScheduler} scheduler\n * behaviour.\n *\n * Without delay, `animationFrame` scheduler can be used to create smooth browser animations.\n * It makes sure scheduled task will happen just before next browser content repaint,\n * thus performing animations as efficiently as possible.\n *\n * ## Example\n * Schedule div height animation\n * ```ts\n * // html:
\n * import { animationFrameScheduler } from 'rxjs';\n *\n * const div = document.querySelector('div');\n *\n * animationFrameScheduler.schedule(function(height) {\n * div.style.height = height + \"px\";\n *\n * this.schedule(height + 1); // `this` references currently executing Action,\n * // which we reschedule with new state\n * }, 0, 0);\n *\n * // You will see a div element growing in height\n * ```\n */\n\nexport const animationFrameScheduler = new AnimationFrameScheduler(AnimationFrameAction);\n\n/**\n * @deprecated Renamed to {@link animationFrameScheduler}. Will be removed in v8.\n */\nexport const animationFrame = animationFrameScheduler;\n", "import { Observable } from '../Observable';\nimport { SchedulerLike } from '../types';\n\n/**\n * A simple Observable that emits no items to the Observer and immediately\n * emits a complete notification.\n *\n * Just emits 'complete', and nothing else.\n *\n * ![](empty.png)\n *\n * A simple Observable that only emits the complete notification. It can be used\n * for composing with other Observables, such as in a {@link mergeMap}.\n *\n * ## Examples\n *\n * Log complete notification\n *\n * ```ts\n * import { EMPTY } from 'rxjs';\n *\n * EMPTY.subscribe({\n * next: () => console.log('Next'),\n * complete: () => console.log('Complete!')\n * });\n *\n * // Outputs\n * // Complete!\n * ```\n *\n * Emit the number 7, then complete\n *\n * ```ts\n * import { EMPTY, startWith } from 'rxjs';\n *\n * const result = EMPTY.pipe(startWith(7));\n * result.subscribe(x => console.log(x));\n *\n * // Outputs\n * // 7\n * ```\n *\n * Map and flatten only odd numbers to the sequence `'a'`, `'b'`, `'c'`\n *\n * ```ts\n * import { interval, mergeMap, of, EMPTY } from 'rxjs';\n *\n * const interval$ = interval(1000);\n * const result = interval$.pipe(\n * mergeMap(x => x % 2 === 1 ? of('a', 'b', 'c') : EMPTY),\n * );\n * result.subscribe(x => console.log(x));\n *\n * // Results in the following to the console:\n * // x is equal to the count on the interval, e.g. (0, 1, 2, 3, ...)\n * // x will occur every 1000ms\n * // if x % 2 is equal to 1, print a, b, c (each on its own)\n * // if x % 2 is not equal to 1, nothing will be output\n * ```\n *\n * @see {@link Observable}\n * @see {@link NEVER}\n * @see {@link of}\n * @see {@link throwError}\n */\nexport const EMPTY = new Observable((subscriber) => subscriber.complete());\n\n/**\n * @param scheduler A {@link SchedulerLike} to use for scheduling\n * the emission of the complete notification.\n * @deprecated Replaced with the {@link EMPTY} constant or {@link scheduled} (e.g. `scheduled([], scheduler)`). Will be removed in v8.\n */\nexport function empty(scheduler?: SchedulerLike) {\n return scheduler ? emptyScheduled(scheduler) : EMPTY;\n}\n\nfunction emptyScheduled(scheduler: SchedulerLike) {\n return new Observable((subscriber) => scheduler.schedule(() => subscriber.complete()));\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport function isScheduler(value: any): value is SchedulerLike {\n return value && isFunction(value.schedule);\n}\n", "import { SchedulerLike } from '../types';\nimport { isFunction } from './isFunction';\nimport { isScheduler } from './isScheduler';\n\nfunction last(arr: T[]): T | undefined {\n return arr[arr.length - 1];\n}\n\nexport function popResultSelector(args: any[]): ((...args: unknown[]) => unknown) | undefined {\n return isFunction(last(args)) ? args.pop() : undefined;\n}\n\nexport function popScheduler(args: any[]): SchedulerLike | undefined {\n return isScheduler(last(args)) ? args.pop() : undefined;\n}\n\nexport function popNumber(args: any[], defaultValue: number): number {\n return typeof last(args) === 'number' ? args.pop()! : defaultValue;\n}\n", "export const isArrayLike = ((x: any): x is ArrayLike => x && typeof x.length === 'number' && typeof x !== 'function');", "import { isFunction } from \"./isFunction\";\n\n/**\n * Tests to see if the object is \"thennable\".\n * @param value the object to test\n */\nexport function isPromise(value: any): value is PromiseLike {\n return isFunction(value?.then);\n}\n", "import { InteropObservable } from '../types';\nimport { observable as Symbol_observable } from '../symbol/observable';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being Observable (but not necessary an Rx Observable) */\nexport function isInteropObservable(input: any): input is InteropObservable {\n return isFunction(input[Symbol_observable]);\n}\n", "import { isFunction } from './isFunction';\n\nexport function isAsyncIterable(obj: any): obj is AsyncIterable {\n return Symbol.asyncIterator && isFunction(obj?.[Symbol.asyncIterator]);\n}\n", "/**\n * Creates the TypeError to throw if an invalid object is passed to `from` or `scheduled`.\n * @param input The object that was passed.\n */\nexport function createInvalidObservableTypeError(input: any) {\n // TODO: We should create error codes that can be looked up, so this can be less verbose.\n return new TypeError(\n `You provided ${\n input !== null && typeof input === 'object' ? 'an invalid object' : `'${input}'`\n } where a stream was expected. You can provide an Observable, Promise, ReadableStream, Array, AsyncIterable, or Iterable.`\n );\n}\n", "export function getSymbolIterator(): symbol {\n if (typeof Symbol !== 'function' || !Symbol.iterator) {\n return '@@iterator' as any;\n }\n\n return Symbol.iterator;\n}\n\nexport const iterator = getSymbolIterator();\n", "import { iterator as Symbol_iterator } from '../symbol/iterator';\nimport { isFunction } from './isFunction';\n\n/** Identifies an input as being an Iterable */\nexport function isIterable(input: any): input is Iterable {\n return isFunction(input?.[Symbol_iterator]);\n}\n", "import { ReadableStreamLike } from '../types';\nimport { isFunction } from './isFunction';\n\nexport async function* readableStreamLikeToAsyncGenerator(readableStream: ReadableStreamLike): AsyncGenerator {\n const reader = readableStream.getReader();\n try {\n while (true) {\n const { value, done } = await reader.read();\n if (done) {\n return;\n }\n yield value!;\n }\n } finally {\n reader.releaseLock();\n }\n}\n\nexport function isReadableStreamLike(obj: any): obj is ReadableStreamLike {\n // We don't want to use instanceof checks because they would return\n // false for instances from another Realm, like an