-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more robust table extraction #767
base: master
Are you sure you want to change the base?
Conversation
fix type check
a42df23
to
c510ac5
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #767 +/- ##
=======================================
Coverage 99.27% 99.27%
=======================================
Files 21 21
Lines 3576 3587 +11
=======================================
+ Hits 3550 3561 +11
Misses 26 26 ☔ View full report in Codecov by Sentry. |
@unsleepy22 Thanks for the PR, everything looks OK to me but the code could be improved using the suggestions above. |
Thanks for your comments, updated accordingly, would you take a look again? |
trafilatura/utils.py
Outdated
|
||
def is_in_table_cell(elem: _Element) -> bool: | ||
'''Check whether an element is in a table cell''' | ||
return elem.xpath('//ancestor::cell') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@unsleepy22 The .xpath
method returns a list of elements or []
, the type annotation is not correct and since we don't need the list it's useless to return it, I suggest using return bool(elem.xpath(...))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right, type check didn't find this one.
Refine table extraction and add more test cases.