Skip to content

Commit

Permalink
Bib detector fixes (#270)
Browse files Browse the repository at this point in the history
Fixes bibentry detector issue with orphaned boxes

RE: allenai/scholar#35308

    Sometimes the detector predicts boxes that don't intersect
    any tokens on the page. In these cases we should discard
    the box, as it doesn't map to text and causes a runtime error
    otherwise.
  • Loading branch information
cmwilhelm authored Jul 20, 2023
1 parent aaf121d commit cab36b6
Show file tree
Hide file tree
Showing 3 changed files with 15 additions and 13 deletions.
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = 'mmda'
version = '0.9.6'
version = '0.9.7'
description = 'MMDA - multimodal document analysis'
authors = [
{name = 'Allen Institute for Artificial Intelligence', email = '[email protected]'},
Expand Down
4 changes: 2 additions & 2 deletions src/ai2_internal/bibentry_detection_predictor/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,8 +116,8 @@ def predict_one(self, inst: Instance) -> Prediction:
for sg in doc.bib_entries
]
prediction = Prediction(
# filter out span-less SpanGroups which occasionally occur
bib_entries=[sg for sg in no_span_box_span_groups if len(sg.spans) != 0],
# filter out span-less and box-less SpanGroups which occasionally occur
bib_entries=[sg for sg in no_span_box_span_groups if (sg.spans and sg.box_group.boxes)],
# retain the original model output
raw_bib_entry_boxes=[api.SpanGroup(spans=[], box_group=api.BoxGroup.from_mmda(bg), id=bg.id) for bg in original_box_groups]
)
Expand Down
22 changes: 12 additions & 10 deletions src/mmda/predictors/d2_predictors/bibentry_detection_predictor.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,17 +43,19 @@ def tighten_boxes(bib_box_group, page_tokens, page_width, page_height):
abs_box.l + abs_box.w,
abs_box.t + abs_box.h
)
new_rect = union_blocks(page_tokens_as_layout.filter_by(rect, center=True))
new_boxes.append(
Box(l=new_rect.x_1,
t=new_rect.y_1,
w=new_rect.width,
h=new_rect.height,
page=box.page).get_relative(
page_width=page_width,
page_height=page_height,
intersecting_page_tokens = page_tokens_as_layout.filter_by(rect, center=True)
if intersecting_page_tokens:
new_rect = union_blocks(intersecting_page_tokens)
new_boxes.append(
Box(l=new_rect.x_1,
t=new_rect.y_1,
w=new_rect.width,
h=new_rect.height,
page=box.page).get_relative(
page_width=page_width,
page_height=page_height,
)
)
)
new_box_group = BoxGroup(
boxes=new_boxes,
id=bib_box_group.id
Expand Down

0 comments on commit cab36b6

Please sign in to comment.