Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

egork520 · 2023-03-17T17:22:21Z

See the slack thread for the discussion

Here is the link to the pdf which fails: [s3://ai2-s2-pdfs/e824/7449ba86efa714e39f8918b750654fc6284e.pdf to ./7449ba86efa714e39f8918b750654fc6284e.pdf](s3://ai2-s2-pdfs/e824/7449ba86efa714e39f8918b750654fc6284e.pdf to ./7449ba86efa714e39f8918b750654fc6284e.pdf)

Stack trace:

`
Input In [90], in generate_mmda_figure_table_pdf(sha, doc_dict, display_)
9 else:
10 recipe_doc = CoreRecipe()
---> 11 doc = recipe_doc.from_path(os.path.join(dir_name, name))
13 doc_dict[name] = doc
15 figure_table_pred = FigureTablePredictions(doc).predict()

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/recipes/core_recipe.py:54, in CoreRecipe.from_path(self, pdfpath)
52 blocks = self.effdet_publaynet_predictor.predict(document=doc)
53 equations = self.effdet_mfd_predictor.predict(document=doc)
---> 54 doc.annotate(blocks=blocks + equations)
56 logger.info("Predicting vila...")
57 vila_span_groups = self.vila_predictor.predict(document=doc)

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/types/document.py:96, in Document.annotate(self, is_overwrite, **kwargs)
91 span_groups = self._annotate_span_group(
92 span_groups=annotations, field_name=field_name
93 )
94 elif annotation_type == BoxGroup:
95 # TODO: not good. BoxGroups should be stored on their own, not auto-generating SpanGroups.
---> 96 span_groups = self._annotate_box_group(
97 box_groups=annotations, field_name=field_name
98 )
99 else:
100 raise NotImplementedError(
101 f"Unsupported annotation type {annotation_type} for {field_name}"
102 )

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/types/document.py:175, in Document._annotate_box_group(self, box_groups, field_name)
168 for box in box_group.boxes:
169
170 # Caching the page tokens to avoid duplicated search
171 if box.page not in all_page_tokens:
172 cur_page_tokens = all_page_tokens[box.page] = list(
173 itertools.chain.from_iterable(
174 span_group.spans
--> 175 for span_group in self.pages[box.page].tokens
176 )
177 )
178 else:
179 cur_page_tokens = all_page_tokens[box.page]

IndexError: list index out of range
`

geli-gel · 2023-03-17T17:52:09Z

Duplicate of #206 (comment)

geli-gel marked this as a duplicate of #206 Mar 17, 2023

geli-gel closed this as completed Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

egork520 commented Mar 17, 2023 •

edited

Loading

geli-gel commented Mar 17, 2023

Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

Comments

egork520 commented Mar 17, 2023 • edited Loading

geli-gel commented Mar 17, 2023

egork520 commented Mar 17, 2023 •

edited

Loading