Skip to content

Commit

Permalink
refactor(pdf): adjust span filling threshold in block construction
Browse files Browse the repository at this point in the history
Increased the threshold for filling spans in blocks from 0.3 to 0.5 to improve the accuracy of block formation. This change helps refine the grouping of spans into blocks, potentially enhancing the overall structure and readability of the PDF content.
  • Loading branch information
myhloli committed Oct 15, 2024
1 parent fdcb49d commit 7e301b8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion magic_pdf/pdf_parse_union_core_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -360,7 +360,7 @@ def parse_page_core(pdf_docs, magic_model, page_id, pdf_bytes_md5, imageWriter,
need_drop, drop_reason)

'''将span填入blocks中'''
block_with_spans, spans = fill_spans_in_blocks(all_bboxes, spans, 0.3)
block_with_spans, spans = fill_spans_in_blocks(all_bboxes, spans, 0.5)

'''对block进行fix操作'''
fix_blocks = fix_block_spans(block_with_spans, img_blocks, table_blocks)
Expand Down

0 comments on commit 7e301b8

Please sign in to comment.