Skip to content

Missing text after redaction #867

Discussion options

You must be logged in to vote

Problem 1:

Hard to tell without looking at that file (probably confidential anyway). But there arethings like damaged PDFs ...
You could try cleaning the file / the page before processing to reveal / remove any errors.

  • cleaning the file e.g. by mutool clean -gggsc file.pdf
  • cleaning the page inside the script: page.clean_contents(sanitize=True)

Problem 2:

MuPDF normally uses the full font-defined line height when identifying the hits of search. If the PDF is made with smaller distances between lines, then adjacent lines may overlap somewhat. The redaction logic of MuPDF in turn removes every character overlaping the redaction rectangle - the result of this is what you saw.

  • set a global …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@mani2106
Comment options

@JorjMcKie
Comment options

Answer selected by NoraishaYusuf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants