Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document._annotate_box_group(self, box_groups, field_name) fails with IndexError: list index out of range #213

Closed
egork520 opened this issue Mar 17, 2023 · 1 comment

Comments

@egork520
Copy link
Contributor

egork520 commented Mar 17, 2023

See the slack thread for the discussion

Here is the link to the pdf which fails: [s3://ai2-s2-pdfs/e824/7449ba86efa714e39f8918b750654fc6284e.pdf to ./7449ba86efa714e39f8918b750654fc6284e.pdf](s3://ai2-s2-pdfs/e824/7449ba86efa714e39f8918b750654fc6284e.pdf to ./7449ba86efa714e39f8918b750654fc6284e.pdf)

Stack trace:

`
Input In [90], in generate_mmda_figure_table_pdf(sha, doc_dict, display_)
9 else:
10 recipe_doc = CoreRecipe()
---> 11 doc = recipe_doc.from_path(os.path.join(dir_name, name))
13 doc_dict[name] = doc
15 figure_table_pred = FigureTablePredictions(doc).predict()

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/recipes/core_recipe.py:54, in CoreRecipe.from_path(self, pdfpath)
52 blocks = self.effdet_publaynet_predictor.predict(document=doc)
53 equations = self.effdet_mfd_predictor.predict(document=doc)
---> 54 doc.annotate(blocks=blocks + equations)
56 logger.info("Predicting vila...")
57 vila_span_groups = self.vila_predictor.predict(document=doc)

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/types/document.py:96, in Document.annotate(self, is_overwrite, **kwargs)
91 span_groups = self._annotate_span_group(
92 span_groups=annotations, field_name=field_name
93 )
94 elif annotation_type == BoxGroup:
95 # TODO: not good. BoxGroups should be stored on their own, not auto-generating SpanGroups.
---> 96 span_groups = self._annotate_box_group(
97 box_groups=annotations, field_name=field_name
98 )
99 else:
100 raise NotImplementedError(
101 f"Unsupported annotation type {annotation_type} for {field_name}"
102 )

File ~/Documents/codes/git/ai2/s2/mmda/src/mmda/types/document.py:175, in Document._annotate_box_group(self, box_groups, field_name)
168 for box in box_group.boxes:
169
170 # Caching the page tokens to avoid duplicated search
171 if box.page not in all_page_tokens:
172 cur_page_tokens = all_page_tokens[box.page] = list(
173 itertools.chain.from_iterable(
174 span_group.spans
--> 175 for span_group in self.pages[box.page].tokens
176 )
177 )
178 else:
179 cur_page_tokens = all_page_tokens[box.page]

IndexError: list index out of range
`

@geli-gel
Copy link
Contributor

Duplicate of #206 (comment)

@geli-gel geli-gel marked this as a duplicate of #206 Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants