You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When parsing certain HTML files, the parser is unable to add to the database all sections that have tables as parents.
I implemented a quick solution by adding two lines of code after line 583 in parser.py, and was able to read the documents. However, this may not be the most efficient solution (if any at all) for creating a corresponding tree structure of the section.
<table><tbody><tr><thclass="ssTableHeader" valign="top" rowspan="2" id="PIr03">State</th><thclass="ssTableHeader" valign="top" id="PIr13">State of Texas</th><tdrowspan="2"></td><tdrowspan="2" valign="top" headers="
PIr03 PIr13"></td><tdrowspan="2" valign="top" headers="
PIr03 PIr13 PIc5"><b>Tamara Y S Keener</b><br>830-997-9542(W)
<table><tbody><trheight="25"><td> </td></tr></tbody></table>
Jay Weinheimer<br>997-2149(W)
</td></tr></tbody></table>
The challenge here is that there is a nested table within a table cell.
lukehsiao
changed the title
Parser is not adding documents to database because table is parent of paragraph
Parser does not handle nested tables which also have content
Nov 18, 2018
When parsing certain HTML files, the parser is unable to add to the database all sections that have tables as parents.
I implemented a quick solution by adding two lines of code after line 583 in parser.py, and was able to read the documents. However, this may not be the most efficient solution (if any at all) for creating a corresponding tree structure of the section.
HTML file parsed:
20841.txt
Screenshot of error:
data:image/s3,"s3://crabby-images/956a9/956a970a36c2dd598cfa4978b0b1b2da446915c3" alt="screen shot 2018-11-17 at 9 45 43 pm"
Image of a quick fix after line 583:
data:image/s3,"s3://crabby-images/8c09e/8c09e6f9b65ae9e10cb784208cbdc7fd61e89287" alt="screen shot 2018-11-17 at 9 56 12 pm"
The text was updated successfully, but these errors were encountered: