-
Notifications
You must be signed in to change notification settings - Fork 254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Extracted text from table is reversed when text is styled with an underline #557
Comments
Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/ |
Hi @lavens , thank you for reporting this issue with the sample file and code. We were able to reproduce it and we have created a ticket to look into it. We will write an update as soon as we figure this out. |
Adding an update to this, we found that the problem is not exclusive to cells that have underlined text but also cells where a double newline exists between text. I've included an example pdf with two table rows. The first row contains a cell with a double newline, and the second row does not. The extracted text for the cell with a double newline in the first row is inverted similarly to the underline text issue mentioned above. Table with double newline cell.pdf
|
Hi @lavens , Thank you for the update and additional information. A fix to the previous case has been merged and also fixes this case too. We will write an update on this ticket as soon as it is released so that you can try it out. |
Description
When I extract text from a pdf that contains a table, where the table content is formatted with
underline
, each newline of text within a cell is reversed. Once theunderline
formatting is removed, the text is extracted in order as expected.Expected Behaviour
I am able to extract text from a table with the order of text preserved regardless of the formatting applied.
Actual Behaviour
Steps to reproduce the behaviour:
Attachments
Without Underline Formatting.pdf
With Underline Formatting.pdf
Output with underline formatting:
Output without underline formatting:
The text was updated successfully, but these errors were encountered: