You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I also removed the post-processing which was fixing ligatures and removing footers, because the plain text extraction should aim to extract all the text without any structure.
I was wondering if you could add pdfalto in the benchmark: https://github.com/kermitt2/pdfalto
The text was updated successfully, but these errors were encountered: