Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MFR model is supported by MPS with torch==2.6.0 #1647

Open
luizlf opened this issue Jan 31, 2025 · 2 comments
Open

MFR model is supported by MPS with torch==2.6.0 #1647

luizlf opened this issue Jan 31, 2025 · 2 comments

Comments

@luizlf
Copy link

luizlf commented Jan 31, 2025

I have changed line 130 in 'pdf_extract_kit.py'

device='cpu' if str(self.device).startswith("mps") else self.device,
to use 'self.device = mps' directly and it worked (Macbook Pro - M1 Pro, 16GB). PyTorch now offers support to the operations the MFR model needed. Results of:

infer_result = ds.apply(doc_analyze, 
                        ocr=False, 
                        start_page_id=1, 
                        end_page_id=3, 
                        show_log=True, 
                        lang="pt", 
                        table_enable=False, formula_enable=True)

show the debug logs below:

2025-01-31 09:36:04.830 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:77 - DocAnalysis init, this may take some times, layout_model: doclayout_yolo, apply_formula: True, apply_ocr: False, apply_table: False, table_model: rapid_table, lang: pt
2025-01-31 09:36:04.831 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:98 - using device: mps
2025-01-31 09:36:04.831 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:102 - using models_dir: [/Users/lsantos/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit-1.0/snapshots/60416a2cabad3f7b7284b43ce37a99864484fba2/models](https://file+.vscode-resource.vscode-cdn.net/Users/lsantos/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit-1.0/snapshots/60416a2cabad3f7b7284b43ce37a99864484fba2/models)
CustomVisionEncoderDecoderModel init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
CustomMBartForCausalLM init
CustomMBartDecoder init
[2025/01/31 09:36:16] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/Users/lsantos/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.3, det_db_unclip_ratio=1.8, max_batch_size=10, use_dilation=True, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/Users/lsantos/.paddleocr/whl/rec/latin/latin_PP-OCRv3_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/paddleocr/ppocr/utils/dict/latin_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=False, cls_model_dir='/Users/lsantos/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='pt', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
[2025/01/31 09:36:17] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_PP-OCRv4_det_infer.onnx', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.3, det_db_unclip_ratio=1.8, max_batch_size=10, use_dilation=True, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_PP-OCRv4_rec_infer.onnx', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/paddleocr/ppocr/utils/dict/latin_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=False, cls_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_ppocr_mobile_v2.0_cls_infer.onnx', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=True, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='pt', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
2025-01-31 09:36:17.779 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:179 - DocAnalysis init done!
2025-01-31 09:36:17.779 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:138 - model init cost: 12.960644721984863
2025-01-31 09:36:20.046 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:215 - layout detection time: 2.17
2025-01-31 09:36:22.437 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:221 - mfd time: 2.39
2025-01-31 09:36:31.469 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:228 - formula nums: 16, mfr time: 9.03
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 6, elapsed : 9.107589721679688e-05
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 6, elapsed : 0.0001590251922607422
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 7, elapsed : 9.584426879882812e-05
[2025/01/31 09:36:32] ppocr DEBUG: split text box by formula, new dt_boxes num : 2, elapsed : 0.00011420249938964844
2025-01-31 09:36:32.315 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:262 - det time: 0.84 
@myhloli
Copy link
Collaborator

myhloli commented Jan 31, 2025

Are you sure that using MPS for MFR is faster than using CPU? When I tested, MPS fell back to CPU execution during the MFR phase, which made the overall time longer than pure CPU computation. However, it might be because I only tested with PyTorch 2.4 and 2.5. In any case, I will upgrade PyTorch to version 2.6 and try again, hoping to see MPS work properly. Thank you for your feedback.

@myhloli
Copy link
Collaborator

myhloli commented Feb 7, 2025

I tested on an M4+16GB Mac mini, and found that MFR does not get a speedup in parsing with torch 2.6.0 + MPS. If you have more experience accelerating MFR with MPS, feedback is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants