MFR model is supported by MPS with torch==2.6.0 #1647

luizlf · 2025-01-31T16:48:01Z

I have changed line 130 in 'pdf_extract_kit.py'

MinerU/magic_pdf/model/pdf_extract_kit.py

Line 130 in 1e4d4b5

device='cpu' if str(self.device).startswith("mps") else self.device,

to use 'self.device = mps' directly and it worked (Macbook Pro - M1 Pro, 16GB). PyTorch now offers support to the operations the MFR model needed. Results of:

infer_result = ds.apply(doc_analyze, 
                        ocr=False, 
                        start_page_id=1, 
                        end_page_id=3, 
                        show_log=True, 
                        lang="pt", 
                        table_enable=False, formula_enable=True)

show the debug logs below:

2025-01-31 09:36:04.830 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:77 - DocAnalysis init, this may take some times, layout_model: doclayout_yolo, apply_formula: True, apply_ocr: False, apply_table: False, table_model: rapid_table, lang: pt
2025-01-31 09:36:04.831 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:98 - using device: mps
2025-01-31 09:36:04.831 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:102 - using models_dir: [/Users/lsantos/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit-1.0/snapshots/60416a2cabad3f7b7284b43ce37a99864484fba2/models](https://file+.vscode-resource.vscode-cdn.net/Users/lsantos/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit-1.0/snapshots/60416a2cabad3f7b7284b43ce37a99864484fba2/models)
CustomVisionEncoderDecoderModel init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
VariableUnimerNetModel init
VariableUnimerNetPatchEmbeddings init
CustomMBartForCausalLM init
CustomMBartDecoder init
[2025/01/31 09:36:16] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/Users/lsantos/.paddleocr/whl/det/en/en_PP-OCRv3_det_infer', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.3, det_db_unclip_ratio=1.8, max_batch_size=10, use_dilation=True, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/Users/lsantos/.paddleocr/whl/rec/latin/latin_PP-OCRv3_rec_infer', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/paddleocr/ppocr/utils/dict/latin_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=False, cls_model_dir='/Users/lsantos/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=False, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='pt', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
[2025/01/31 09:36:17] ppocr DEBUG: Namespace(help='==SUPPRESS==', use_gpu=False, use_xpu=False, use_npu=False, ir_optim=True, use_tensorrt=False, min_subgraph_size=15, precision='fp32', gpu_mem=500, gpu_id=0, image_dir=None, page_num=0, det_algorithm='DB', det_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_PP-OCRv4_det_infer.onnx', det_limit_side_len=960, det_limit_type='max', det_box_type='quad', det_db_thresh=0.3, det_db_box_thresh=0.3, det_db_unclip_ratio=1.8, max_batch_size=10, use_dilation=True, det_db_score_mode='fast', det_east_score_thresh=0.8, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_sast_score_thresh=0.5, det_sast_nms_thresh=0.2, det_pse_thresh=0, det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, scales=[8, 16, 32], alpha=1.0, beta=1.0, fourier_degree=5, rec_algorithm='SVTR_LCNet', rec_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_PP-OCRv4_rec_infer.onnx', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_batch_num=6, max_text_length=25, rec_char_dict_path='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/paddleocr/ppocr/utils/dict/latin_dict.txt', use_space_char=True, vis_font_path='./doc/fonts/simfang.ttf', drop_score=0.5, e2e_algorithm='PGNet', e2e_model_dir=None, e2e_limit_side_len=768, e2e_limit_type='max', e2e_pgnet_score_thresh=0.5, e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_pgnet_valid_set='totaltext', e2e_pgnet_mode='fast', use_angle_cls=False, cls_model_dir='/Users/lsantos/.local/share/mamba/envs/mm-exams/lib/python3.10/site-packages/rapidocr_onnxruntime/models/ch_ppocr_mobile_v2.0_cls_infer.onnx', cls_image_shape='3, 48, 192', label_list=['0', '180'], cls_batch_num=6, cls_thresh=0.9, enable_mkldnn=False, cpu_threads=10, use_pdserving=False, warmup=False, sr_model_dir=None, sr_image_shape='3, 32, 128', sr_batch_num=1, draw_img_save_dir='./inference_results', save_crop_res=False, crop_res_save_dir='./output', use_mp=False, total_process_num=1, process_id=0, benchmark=False, save_log_path='./log_output/', show_log=True, use_onnx=True, output='./output', table_max_len=488, table_algorithm='TableAttn', table_model_dir=None, merge_no_span_structure=True, table_char_dict_path=None, layout_model_dir=None, layout_dict_path=None, layout_score_threshold=0.5, layout_nms_threshold=0.5, kie_algorithm='LayoutXLM', ser_model_dir=None, re_model_dir=None, use_visual_backbone=True, ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ocr_order_method=None, mode='structure', image_orientation=False, layout=True, table=True, ocr=True, recovery=False, use_pdf2docx_api=False, invert=False, binarize=False, alphacolor=(255, 255, 255), lang='pt', det=True, rec=True, type='ocr', ocr_version='PP-OCRv4', structure_version='PP-StructureV2')
2025-01-31 09:36:17.779 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:179 - DocAnalysis init done!
2025-01-31 09:36:17.779 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:138 - model init cost: 12.960644721984863
2025-01-31 09:36:20.046 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:215 - layout detection time: 2.17
2025-01-31 09:36:22.437 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:221 - mfd time: 2.39
2025-01-31 09:36:31.469 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:228 - formula nums: 16, mfr time: 9.03
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 6, elapsed : 9.107589721679688e-05
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 6, elapsed : 0.0001590251922607422
[2025/01/31 09:36:31] ppocr DEBUG: split text box by formula, new dt_boxes num : 7, elapsed : 9.584426879882812e-05
[2025/01/31 09:36:32] ppocr DEBUG: split text box by formula, new dt_boxes num : 2, elapsed : 0.00011420249938964844
2025-01-31 09:36:32.315 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:262 - det time: 0.84

The text was updated successfully, but these errors were encountered:

myhloli · 2025-01-31T18:44:11Z

Are you sure that using MPS for MFR is faster than using CPU? When I tested, MPS fell back to CPU execution during the MFR phase, which made the overall time longer than pure CPU computation. However, it might be because I only tested with PyTorch 2.4 and 2.5. In any case, I will upgrade PyTorch to version 2.6 and try again, hoping to see MPS work properly. Thank you for your feedback.

myhloli · 2025-02-07T07:16:52Z

I tested on an M4+16GB Mac mini, and found that MFR does not get a speedup in parsing with torch 2.6.0 + MPS. If you have more experience accelerating MFR with MPS, feedback is welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MFR model is supported by MPS with torch==2.6.0 #1647

MFR model is supported by MPS with torch==2.6.0 #1647

luizlf commented Jan 31, 2025

myhloli commented Jan 31, 2025

myhloli commented Feb 7, 2025

MFR model is supported by MPS with torch==2.6.0 #1647

MFR model is supported by MPS with torch==2.6.0 #1647

Comments

luizlf commented Jan 31, 2025

myhloli commented Jan 31, 2025

myhloli commented Feb 7, 2025