We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
采用MinerU识别这篇论文的时候,解析报错https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Towards_Accurate_Image_Coding_Improved_Autoregressive_Image_Generation_With_Dynamic_CVPR_2023_paper.pdf 报错内容如下:
2024-10-23 03:32:49.576 | INFO | magic_pdf.model.pdf_extract_kit:__call__:200 - formula nums: 17, mfr time: 0.43 2024-10-23 03:32:49.582 | INFO | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins----------------- 2024-10-23 03:32:51.554 | INFO | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 1.9721670150756836s----- 2024-10-23 03:32:51.555 | INFO | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins----------------- 2024-10-23 03:32:53.566 | INFO | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 2.0111782550811768s----- 2024-10-23 03:32:53.567 | INFO | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins----------------- 2024-10-23 03:32:55.585 | ERROR | app:pdf_parse_main:133 - index 0 is out of bounds for axis 0 with size 0 Traceback (most recent call last): File "/opt/mineru_venv/bin/uvicorn", line 8, in <module> sys.exit(main()) │ │ └ <Command main> │ └ <built-in function exit> └ <module 'sys' (built-in)> File "/opt/mineru_venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) │ │ │ └ {} │ │ └ () │ └ <function BaseCommand.main at 0x7f20ea9e1b40> └ <Command main> File "/opt/mineru_venv/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) │ │ └ <click.core.Context object at 0x7f20eb503fa0> │ └ <function Command.invoke at 0x7f20ea9e25f0> └ <Command main> File "/opt/mineru_venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) │ │ │ │ │ └ {'host': '0.0.0.0', 'port': 8000, 'app': 'app:app', 'uds': None, 'fd': None, 'reload': False, 'reload_dirs': (), 'reload_incl... │ │ │ │ └ <click.core.Context object at 0x7f20eb503fa0> │ │ │ └ <function main at 0x7f20ea6b4c10> │ │ └ <Command main> │ └ <function Context.invoke at 0x7f20ea9e1360> └ <click.core.Context object at 0x7f20eb503fa0> File "/opt/mineru_venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) │ └ {'host': '0.0.0.0', 'port': 8000, 'app': 'app:app', 'uds': None, 'fd': None, 'reload': False, 'reload_dirs': (), 'reload_incl... └ () File "/opt/mineru_venv/lib/python3.10/site-packages/uvicorn/main.py", line 410, in main run( └ <function run at 0x7f20ea89edd0> File "/opt/mineru_venv/lib/python3.10/site-packages/uvicorn/main.py", line 577, in run server.run() │ └ <function Server.run at 0x7f20ea89e710> └ <uvicorn.server.Server object at 0x7f20ea69dfc0> File "/opt/mineru_venv/lib/python3.10/site-packages/uvicorn/server.py", line 65, in run return asyncio.run(self.serve(sockets=sockets)) │ │ │ │ └ None │ │ │ └ <function Server.serve at 0x7f20ea89e7a0> │ │ └ <uvicorn.server.Server object at 0x7f20ea69dfc0> │ └ <function run at 0x7f20eb3670a0> └ <module 'asyncio' from '/usr/lib/python3.10/asyncio/__init__.py'> File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run return loop.run_until_complete(main) │ │ └ <coroutine object Server.serve at 0x7f20ea671a10> │ └ <function BaseEventLoop.run_until_complete at 0x7f20eaa049d0> └ <_UnixSelectorEventLoop running=True closed=False debug=False> File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete self.run_forever() │ └ <function BaseEventLoop.run_forever at 0x7f20eaa04940> └ <_UnixSelectorEventLoop running=True closed=False debug=False> File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever self._run_once() │ └ <function BaseEventLoop._run_once at 0x7f20eaa06440> └ <_UnixSelectorEventLoop running=True closed=False debug=False> File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once handle._run() │ └ <function Handle._run at 0x7f20eab61e10> └ <Handle Task.task_wakeup(<Future finis...100\n%%EOF\n'>)> File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run self._context.run(self._callback, *self._args) │ │ │ │ │ └ <member '_args' of 'Handle' objects> │ │ │ │ └ <Handle Task.task_wakeup(<Future finis...100\n%%EOF\n'>)> │ │ │ └ <member '_callback' of 'Handle' objects> │ │ └ <Handle Task.task_wakeup(<Future finis...100\n%%EOF\n'>)> │ └ <member '_context' of 'Handle' objects> └ <Handle Task.task_wakeup(<Future finis...100\n%%EOF\n'>)> File "/opt/mineru_venv/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi result = await app( # type: ignore[func-returns-value] └ <uvicorn.middleware.proxy_headers.ProxyHeadersMiddleware object at 0x7f20ea69d0f0> File "/opt/mineru_venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__ return await self.app(scope, receive, send) │ │ │ │ └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <fastapi.applications.FastAPI object at 0x7f20ea711f90> └ <uvicorn.middleware.proxy_headers.ProxyHeadersMiddleware object at 0x7f20ea69d0f0> File "/opt/mineru_venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) │ │ └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/applications.py", line 113, in __call__ await self.middleware_stack(scope, receive, send) │ │ │ │ └ <bound method RequestResponseCycle.send of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <starlette.middleware.errors.ServerErrorMiddleware object at 0x7f1f9b99d720> └ <fastapi.applications.FastAPI object at 0x7f20ea711f90> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in __call__ await self.app(scope, receive, _send) │ │ │ │ └ <function ServerErrorMiddleware.__call__.<locals>._send at 0x7f20e7434e50> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x7f1f9b99d6f0> └ <starlette.middleware.errors.ServerErrorMiddleware object at 0x7f1f9b99d720> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) │ │ │ │ │ │ └ <function ServerErrorMiddleware.__call__.<locals>._send at 0x7f20e7434e50> │ │ │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ │ │ └ <starlette.requests.Request object at 0x7f1f9b99f5e0> │ │ └ <fastapi.routing.APIRouter object at 0x7f1f9b99d120> │ └ <starlette.middleware.exceptions.ExceptionMiddleware object at 0x7f1f9b99d6f0> └ <function wrap_app_handling_exceptions at 0x7f20e9887910> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app await app(scope, receive, sender) │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9c8790> │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... └ <fastapi.routing.APIRouter object at 0x7f1f9b99d120> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/routing.py", line 715, in __call__ await self.middleware_stack(scope, receive, send) │ │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9c8790> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <bound method Router.app of <fastapi.routing.APIRouter object at 0x7f1f9b99d120>> └ <fastapi.routing.APIRouter object at 0x7f1f9b99d120> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/routing.py", line 735, in app await route.handle(scope, receive, send) │ │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9c8790> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <function Route.handle at 0x7f20e98bcb80> └ APIRoute(path='/pdf_parse', name='pdf_parse_main', methods=['POST']) File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle await self.app(scope, receive, send) │ │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9c8790> │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ └ <function request_response.<locals>.app at 0x7f1f9b9c89d0> └ APIRoute(path='/pdf_parse', name='pdf_parse_main', methods=['POST']) File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/routing.py", line 76, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) │ │ │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9c8790> │ │ │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ │ │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... │ │ └ <starlette.requests.Request object at 0x7f1f9b99ef20> │ └ <function request_response.<locals>.app.<locals>.app at 0x7f1dbc343b50> └ <function wrap_app_handling_exceptions at 0x7f20e9887910> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app await app(scope, receive, sender) │ │ │ └ <function wrap_app_handling_exceptions.<locals>.wrapped_app.<locals>.sender at 0x7f1f9b9af910> │ │ └ <bound method RequestResponseCycle.receive of <uvicorn.protocols.http.h11_impl.RequestResponseCycle object at 0x7f1dbc2d5d20>> │ └ {'type': 'http', 'asgi': {'version': '3.0', 'spec_version': '2.4'}, 'http_version': '1.1', 'server': ('172.17.0.2', 8000), 'c... └ <function request_response.<locals>.app.<locals>.app at 0x7f1dbc343b50> File "/opt/mineru_venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app response = await f(request) │ └ <starlette.requests.Request object at 0x7f1f9b99ef20> └ <function get_request_handler.<locals>.app at 0x7f1f9b9c8940> File "/opt/mineru_venv/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app raw_response = await run_endpoint_function( └ <function run_endpoint_function at 0x7f20e98be680> File "/opt/mineru_venv/lib/python3.10/site-packages/fastapi/routing.py", line 212, in run_endpoint_function return await dependant.call(**values) │ │ └ {'parse_method': 'auto', 'model_json_path': None, 'is_json_md_dump': True, 'output_dir': 'output', 'pdf_file': UploadFile(fil... │ └ <function pdf_parse_main at 0x7f1f9b9af7f0> └ Dependant(path_params=[], query_params=[ModelField(field_info=Query(auto), name='parse_method', mode='validation'), ModelFiel... > File "/root/app.py", line 115, in pdf_parse_main pipe.pipe_analyze() # Parse │ └ <function UNIPipe.pipe_analyze at 0x7f1f9b9af130> └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f1dbc276e30> File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/pipe/UNIPipe.py", line 29, in pipe_analyze self.model_list = doc_analyze(self.pdf_bytes, ocr=False) │ │ │ │ └ b'%PDF-1.6\n%\xbf\xf7\xa2\xfe\n1 0 obj\n<< /CP2 3 0 R /FICL:Enfocus 4 0 R /Metadata 5 0 R /Names 41 0 R /OpenAction 124 0 R /... │ │ │ └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f1dbc276e30> │ │ └ <function doc_analyze at 0x7f2044cc3be0> │ └ [] └ <magic_pdf.pipe.UNIPipe.UNIPipe object at 0x7f1dbc276e30> File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/doc_analyze_by_custom_model.py", line 119, in doc_analyze result = custom_model(img) │ └ array([[[255, 255, 255], │ [255, 255, 255], │ [255, 255, 255], │ ..., │ [255, 255, 255], │ [255... └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f1f9ba48b80> File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/pdf_extract_kit.py", line 298, in __call__ html_code = self.table_model.img2html(new_image) │ │ │ └ <PIL.Image.Image image mode=RGB size=648x168 at 0x7F1F9ABDECB0> │ │ └ <function ppTableModel.img2html at 0x7f1dbca89cf0> │ └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f1dbc553ca0> └ <magic_pdf.model.pdf_extract_kit.CustomPEKModel object at 0x7f1f9ba48b80> File "/opt/mineru_venv/lib/python3.10/site-packages/magic_pdf/model/ppTableModel.py", line 40, in img2html pred_res, _ = self.table_sys(image) │ │ └ array([[[255, 255, 255], │ │ [255, 255, 255], │ │ [255, 255, 255], │ │ ..., │ │ [255, 255, 255], │ │ [255... │ └ <paddleocr.ppstructure.table.predict_table.TableSystem object at 0x7f1dbc2ad3f0> └ <magic_pdf.model.ppTableModel.ppTableModel object at 0x7f1dbc553ca0> File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/predict_table.py", line 100, in __call__ pred_html = self.match(structure_res, dt_boxes, rec_res) │ │ │ │ └ [('VQGAN f= 16', 0.8823233246803284), ('500', 0.9898629784584045), ('450', 0.9850022196769714), ('4117', 0.8392481207847595),... │ │ │ └ array([[ 28, 5, 95, 21], │ │ │ [ 377, 1, 394, 16], │ │ │ ... │ │ └ (['<html>', '<body>', '<table>', '<thead>', '<tr>', '<eb></eb>', '<eb></eb>', '<eb></eb>', '<eb></eb>', '<eb></eb>', '<eb></e... │ └ <ppstructure.table.table_master_match.TableMasterMatcher object at 0x7f1dbc26d840> └ <paddleocr.ppstructure.table.predict_table.TableSystem object at 0x7f1dbc2ad3f0> File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 949, in __call__ match_results = self.match() │ └ <function Matcher.match at 0x7f1dbca3f910> └ <ppstructure.table.table_master_match.TableMasterMatcher object at 0x7f1dbc26d840> File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 769, in match get_bboxes_list(end2end_result, structure_master_result) │ │ └ {'text': '<thead>,<tr>,<eb></eb>,<eb></eb>,<eb></eb>,<eb></eb>,<eb></eb>,<eb></eb>,<eb></eb>,<eb></eb>,</tr>,</thead>,<tbody>... │ └ [{'bbox': array([ 28, 5, 95, 21]), 'text': 'VQGAN f= 16'}, {'bbox': array([ 377, ... └ <function get_bboxes_list at 0x7f1dbca3ef80> File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 302, in get_bboxes_list xywh_bbox = xyxy2xywh(src_bboxes) │ └ array([], dtype=float64) └ <function xyxy2xywh at 0x7f1dbca3e950> File "/opt/mineru_venv/lib/python3.10/site-packages/paddleocr/ppstructure/table/table_master_match.py", line 71, in xyxy2xywh new_bboxes[0] = bboxes[0] + (bboxes[2] - bboxes[0]) / 2 │ │ │ └ array([], dtype=float64) │ │ └ array([], dtype=float64) │ └ array([], dtype=float64) └ array([], dtype=float64) IndexError: index 0 is out of bounds for axis 0 with size 0
curl -X 'POST' 'http://xx.xx.xx.xx:8888/pdf_parse?parse_method=auto&is_json_md_dump=true&output_dir=output' -H 'accept:application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=application/pdf' > x.json
Linux
3.10
0.8.x
cuda
The text was updated successfully, but these errors were encountered:
测出来是表格解析出的问题,目前表格解析功能是测试版本,暂无更多人力去修复,遇到表格解析出问题的情况建议关闭临时表格识别功能再使用。
Sorry, something went wrong.
好的,感谢您的回复~
to fix your trouble check this solution click maybe this will solve your problem.
No branches or pull requests
Description of the bug | 错误描述
采用MinerU识别这篇论文的时候,解析报错https://openaccess.thecvf.com/content/CVPR2023/papers/Huang_Towards_Accurate_Image_Coding_Improved_Autoregressive_Image_Generation_With_Dynamic_CVPR_2023_paper.pdf
报错内容如下:
How to reproduce the bug | 如何复现
上面这个docker镜像融合了MinerU和fastAPI,使得用户可以通过curl上传文档,解析完成后得到json内容
并且把论文命名为origin.pdf
curl -X 'POST' 'http://xx.xx.xx.xx:8888/pdf_parse?parse_method=auto&is_json_md_dump=true&output_dir=output' -H 'accept:application/json' -H 'Content-Type: multipart/form-data' -F '[email protected];type=application/pdf' > x.json
ERROR | app:pdf_parse_main:133 - index 0 is out of bounds for axis 0 with size 0
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
0.8.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: