Skip to content

Can OCR TextPage results be written to a page? #1453

Discussion options

You must be logged in to vote

However, if you meant: "How can I create a PDF page with an OCRed textlayer":

Extract the text using page.get_text("dict", textpage=tpocr). Then walk through the text spans and select any with font GlyphlessFont.
With each of these spans do a page.insert_text()in the following way:

  • insertion point is span["origin"]
  • best choose "cour" (Courier) as the font, because GlyphlessFont also is (seems to be) monospaced
  • compute the fontsize such that the width of span["bbox"] comes out
  • when done, save the document to a new file and check that text inside images in fact is now selectable in your PDF viewer

Replies: 6 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@caerulescens
Comment options

@JorjMcKie
Comment options

@caerulescens
Comment options

Answer selected by caerulescens
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@caerulescens
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants