Remove page IDs when saving image to text or scanning to text using OCR #128

DraganRatkovich · 2022-03-12T18:15:12Z

Is your feature request related to a problem? Please describe.

When saving an image to a text file or selecting the Scan to Text File option and selecting a scanned book for text extraction using OCR, Bookworm adds Page 1 Page 2 identifiers to the text file, which is useless in this case, because it doesn't help in any way when pasting this text into a Word document to automatically arrange the pages like in the previous document, Word will very easily do the rest of the work for itself, plus the additional font, paragraph style, line spacing will be applied to the text if the user of this would require, so writing in a text file Page 1 , Page 2 and the extra page brake character is very useless, no text format exporters, at least the popular ones like MSWord, Adobe PDF, do this.

Describe the solution you'd like

Simply extract pure text from a PDF file or image without adding a Word "page" and numbers, and a page brake symbol.
@mush42 It will be very useful if fixed soon because saving as a text file of a pdf or word document will be increased many times and the text will be clean and smooth.

mush42 · 2022-03-12T21:07:42Z

Hello @DraganRatkovich

I may agree with removing the page numbering, but the page break char is semantically important, specially for OCR results.

Anyhow, I'll make text exporting customizable. A dialog box will be shown when exporting to plane text or scanning to text file.

Best
Musharraf

DraganRatkovich · 2022-03-13T09:57:22Z

@mush42 Yes, it would be nice if checkboxes appeared during the save process in order to remove or save page brake symbols, etc.

DraganRatkovich · 2022-04-05T18:52:51Z

Hello @mush42
do you have any news on this issue?

mush42 · 2022-04-06T10:21:50Z

@DraganRatkovich
Yes. the fix is coming.

DraganRatkovich · 2022-04-06T12:54:20Z

@mush42 Also, I didn't change the title, but please consider also adding options to select when saving any document in txt format, like from .pdf, docx, etc, not only when saving an image or scanning to text using OCR.

DraganRatkovich changed the title ~~Remove page id words when saving any book to text file~~ Remove page IDs when saving image to text or scanning to text using OCR. Mar 12, 2022

DraganRatkovich changed the title ~~Remove page IDs when saving image to text or scanning to text using OCR.~~ Remove page IDs when saving image to text or scanning to text using OCR Mar 12, 2022

DraganRatkovich added Improvement Improving or fixing an existing feature enhancement New feature or request and removed Improvement Improving or fixing an existing feature labels Jun 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove page IDs when saving image to text or scanning to text using OCR #128

Remove page IDs when saving image to text or scanning to text using OCR #128

DraganRatkovich commented Mar 12, 2022 •

edited

Loading

mush42 commented Mar 12, 2022

DraganRatkovich commented Mar 13, 2022

DraganRatkovich commented Apr 5, 2022

mush42 commented Apr 6, 2022

DraganRatkovich commented Apr 6, 2022

Remove page IDs when saving image to text or scanning to text using OCR #128

Remove page IDs when saving image to text or scanning to text using OCR #128

Comments

DraganRatkovich commented Mar 12, 2022 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

mush42 commented Mar 12, 2022

DraganRatkovich commented Mar 13, 2022

DraganRatkovich commented Apr 5, 2022

mush42 commented Apr 6, 2022

DraganRatkovich commented Apr 6, 2022

DraganRatkovich commented Mar 12, 2022 •

edited

Loading