Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

模型下载好了,config也配置好了,运行解析pdf,报错 #735

Open
fuxuelinwudi opened this issue Oct 14, 2024 · 14 comments
Open
Labels
bug Something isn't working

Comments

@fuxuelinwudi
Copy link

Description of the bug | 错误描述

[10/14 10:33:42 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models/Layout/model_final.pth ...
[10/14 10:33:42 fvcore.common.checkpoint]: [Checkpointer] Loading from /data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models/Layout/model_final.pth ...
2024-10-14 10:33:45.280 | INFO | magic_pdf.model.pdf_extract_kit:init:248 - DocAnalysis init done!
2024-10-14 10:33:45.280 | INFO | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:98 - model init cost: 25.69013738632202
2024-10-14 10:33:50.120 | INFO | magic_pdf.model.pdf_extract_kit:call:259 - layout detection cost: 3.53

0: 1888x1312 (no detections), 78.5ms
Speed: 16.2ms preprocess, 78.5ms inference, 0.6ms postprocess per image at shape (1, 3, 1888, 1312)
2024-10-14 10:33:50.704 | INFO | magic_pdf.model.pdf_extract_kit:call:289 - formula nums: 0, mfr time: 0.0
2024-10-14 10:33:50.905 | ERROR | magic_pdf.tools.cli:parse_doc:96 - (External) CUBLAS error(7).
[Hint: 'CUBLAS_STATUS_INVALID_VALUE'. An unsupported value or parameter was passed to the function (a negative vector size, for example). To correct: ensure that all the parameters being passed have valid values. ] (at ../paddle/phi/kernels/funcs/blas/blas_impl.cu.h:40)
[operator < fc > error]

How to reproduce the bug | 如何复现

我的config信息:

{
"bucket_info": {
"bucket-name-1": [
"ak",
"sk",
"endpoint"
],
"bucket-name-2": [
"ak",
"sk",
"endpoint"
]
},
"models-dir": "/data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/PDF-Extract-Kit/models",
"layoutreader-model-dir": "/data/deploy/pre_release/fxl/gomate/all_server/comp_server/mineru_root/root_models/layoutreader",
"device-mode": "cuda",
"table-config": {
"model": "TableMaster",
"is_table_recog_enable": false,
"max_time": 400
}
}

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cuda

@fuxuelinwudi fuxuelinwudi added the bug Something isn't working label Oct 14, 2024
@fuxuelinwudi
Copy link
Author

我的cuda version是121,运行了
python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
是不是这个问题?
但是我运行:
python -m pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu121/
paddle装不上

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

看报错是paddle和cuda不兼容,linux的paddle是自带cuda环境的,是需要装118版本避免和torch的12.1冲突,都不会用到你系统的cuda环境

@fuxuelinwudi
Copy link
Author

我该怎么做?

看报错是paddle和cuda不兼容,linux的paddle是自带cuda环境的,是需要装118版本避免和torch的12.1冲突,都不会用到你系统的cuda环境

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

可以卸了paddlepaddle-gpu和paddlepaddle,再重装paddlepaddle,使用cpu版本的paddle运行。

@fuxuelinwudi
Copy link
Author

可以卸了paddlepaddle-gpu和paddlepaddle,再重装paddlepaddle,使用cpu版本的paddle运行。

可以了,谢谢,如果我要用cuda版本的,是需要linux系统的cuda为11.8吗

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

不需要改linux内的cuda,linux的torch和paddle的cuda都是通过pip依赖的形式安装在conda的虚拟环境中的,linux只需要安装driver即可。如果按教程安装下来,结果不兼容,一般都比较难调,建议降级到cpu使用或者更换部署环境。

@fuxuelinwudi
Copy link
Author

不需要改linux内的cuda,linux的torch和paddle的cuda都是通过pip依赖的形式安装在conda的虚拟环境中的,linux只需要安装driver即可。如果按教程安装下来,结果不兼容,一般都比较难调,建议降级到cpu使用或者更换部署环境。

好的

@fuxuelinwudi
Copy link
Author

请问 这个shell:
magic-pdf -p small_ocr.pdf

执行的是哪个py文件?我想自己写一个py

@fuxuelinwudi
Copy link
Author

我安装了118cuda,然后安装了paddle-gpu,又报了个错:

2024-10-14 11:36:23.335 | ERROR | magic_pdf.tools.cli:parse_doc:96 - Unable to avoid copy while creating an array as requested.
If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x).
For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

请问 这个shell: magic-pdf -p small_ocr.pdf

执行的是哪个py文件?我想自己写一个py

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/tools/cli.py

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

我安装了118cuda,然后安装了paddle-gpu,又报了个错:

2024-10-14 11:36:23.335 | ERROR | magic_pdf.tools.cli:parse_doc:96 - Unable to avoid copy while creating an array as requested. If using np.array(obj, copy=False) replace it with np.asarray(obj) to allow a copy when needed (no behavior change in NumPy 1.x). For more details, see https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword

不兼容2.0以上的numpy,需要降级到1.x

@fuxuelinwudi
Copy link
Author

我看识别的md结果,好像做不到多级标题的识别?全部被识别为一级标题了

@myhloli
Copy link
Collaborator

myhloli commented Oct 15, 2024

我看识别的md结果,好像做不到多级标题的识别?全部被识别为一级标题了

目前没有多级标题识别能力

@v3nus-py
Copy link

to fix your trouble check this solution click
maybe this will solve your problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants