You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the function Convert-PDFToText is working a bit incorrect - I have to test further, but for the moment (in my environment) it works like this:
Assuming that PDF has multiple pages with PageText1, PageText2,.. PageTextN, after running the function I get the result where text from every next page has all the text from previous pages, smthng like "PageText1PageText1PageText2PageText1PageText2PageText3" for pdf of 3 pages.
It seems that (in my environment) I could fix it by explicitly declaring new TextExtractionStrategy for every call of GetTextFromPage
so, line 1754
[iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, $iTextExtractionStrategy) converted to [iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, [iText.Kernel.Pdf.Canvas.Parser.Listener.LocationTextExtractionStrategy]::new())
after this fix extraction worked as expected.
The text was updated successfully, but these errors were encountered:
Reported on linkedin to be verified
It seems that the function Convert-PDFToText is working a bit incorrect - I have to test further, but for the moment (in my environment) it works like this:
Assuming that PDF has multiple pages with PageText1, PageText2,.. PageTextN, after running the function I get the result where text from every next page has all the text from previous pages, smthng like "PageText1PageText1PageText2PageText1PageText2PageText3" for pdf of 3 pages.
It seems that (in my environment) I could fix it by explicitly declaring new
TextExtractionStrategy
for every call of GetTextFromPageso, line 1754
[iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, $iTextExtractionStrategy)
converted to[iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor]::GetTextFromPage($ExtractedPage, [iText.Kernel.Pdf.Canvas.Parser.Listener.LocationTextExtractionStrategy]::new())
after this fix extraction worked as expected.
The text was updated successfully, but these errors were encountered: