-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove page IDs when saving image to text or scanning to text using OCR #128
Comments
Hello @DraganRatkovich I may agree with removing the page numbering, but the page break char is semantically important, specially for OCR results. Anyhow, I'll make text exporting customizable. A dialog box will be shown when exporting to plane text or scanning to text file. Best |
@mush42 Yes, it would be nice if checkboxes appeared during the save process in order to remove or save page brake symbols, etc. |
Hello @mush42 |
@DraganRatkovich |
@mush42 Also, I didn't change the title, but please consider also adding options to select when saving any document in txt format, like from .pdf, docx, etc, not only when saving an image or scanning to text using OCR. |
Is your feature request related to a problem? Please describe.
When saving an image to a text file or selecting the Scan to Text File option and selecting a scanned book for text extraction using OCR, Bookworm adds Page 1 Page 2 identifiers to the text file, which is useless in this case, because it doesn't help in any way when pasting this text into a Word document to automatically arrange the pages like in the previous document, Word will very easily do the rest of the work for itself, plus the additional font, paragraph style, line spacing will be applied to the text if the user of this would require, so writing in a text file Page 1 , Page 2 and the extra page brake character is very useless, no text format exporters, at least the popular ones like MSWord, Adobe PDF, do this.
Describe the solution you'd like
Simply extract pure text from a PDF file or image without adding a Word "page" and numbers, and a page brake symbol.
@mush42 It will be very useful if fixed soon because saving as a text file of a pdf or word document will be increased many times and the text will be clean and smooth.
The text was updated successfully, but these errors were encountered: