addRedactAnnot - With text #748

deepanshug · 2020-12-06T14:19:44Z

deepanshug
Dec 6, 2020

Hi I am trying to use text instead of black rectangle to retain readability context. I have used as below:
"page.addRedactAnnot(area, text = 'sometext', fontname = "Courier", fontsize = 20, fill=(1, 1, 1), text_color=(0,0,0))"

I have a PDF which I am redacting and below is the list of fonts in each page (3 pages):
[(11, 'n/a', 'Type1', 'Courier', 'F1', '')]
[(18, 'n/a', 'Type1', 'Courier', 'F1', '')]
[(25, 'n/a', 'Type1', 'Courier', 'F1', '')]

I am least concerned about the font to be used but I am only intention is to use text instead of black box. E.g. if my document has email address mentioned anywhere, instead of redacting it with black box, I wish to have it replaced with textbox with text ".

appreciate your help in this

JorjMcKie · 2020-12-06T15:09:29Z

JorjMcKie
Dec 6, 2020
Maintainer

Try something like this. I am searching for some word ("pixmap" in this example) and replace all occurrences with the text "enigma".

import fitz

doc = fitz.open("file.pdf")
page = doc[0]
blue = (0, 0, 1)
rl = page.searchFor("pixmap")
for rect in rl:
    fontsize = rect.height / 1.3
    page.addRedactAnnot(
        rect,
        text="enigma",
        text_color=blue,
        fontsize=fontsize,
        align=fitz.TEXT_ALIGN_CENTER,
    )

page.apply_redactions()
doc.save("x.pdf")

Before:

After:

0 replies

JorjMcKie · 2020-12-06T15:17:40Z

JorjMcKie
Dec 6, 2020
Maintainer

As you can see, the replacement is not perfectly positioned. This goes back to the fact, that the search algorithm delivers the rectangles with a height equal to the line height. And did not bother to extract the exact insertion point of new text ...
With some effort, better results are achievable - but as a demo this might be sufficient.

0 replies

deepanshug · 2020-12-14T04:28:28Z

deepanshug
Dec 14, 2020
Author

Hi, First I would like to thank you a providing this library and support. I tried what you suggested and I was able to see it working. However you already mentioned about positioning so it did not fit good. I even tried by reducing area (as you already had suggested on how to reduce area) but it did not work. However, I am able to move ahead with assignment using this library. Thanks again for your continuous support. Thanks Deepanshu

…

________________________________ From: Jorj X. McKie <[email protected]> Sent: 06 December 2020 20:47 To: pymupdf/PyMuPDF <[email protected]> Cc: deepanshug <[email protected]>; Author <[email protected]> Subject: Re: [pymupdf/PyMuPDF] addRedactAnnot - With text (#748) As you can see, the replacement is not perfectly positioned. This goes back to the fact, that the search algorithm delivers the rectangles with a height equal to the line height. And did not bother to extract the exact insertion point of new text ... With some effort, better results are achievable - but as a demo this might be sufficient. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://github.com/pymupdf/PyMuPDF/issues/748#issuecomment-739517048>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJCFZFFO2HWJYNAWZKXWA6TSTOOCBANCNFSM4UPLEDSA>.

0 replies

JorjMcKie · 2020-12-14T07:13:57Z

JorjMcKie
Dec 14, 2020
Maintainer

I am working on an improved handling of this.
Under the hood, text insertion for redactions uses page.insertTextbox, and I am still making simplifying assumptions in that method as per the primary text insertion point and the line height:
In page.insertTextbox, line height is always set to fontsize * 1.2 and insertion point is always fontsize away from the relevant textbox border.
Actually, these values should be font-dependent:
Among a font's properties are "ascender" and "descender" (a negative value). With these values, the correct line height of a font can be computed as fontsize * (ascender - descender) and the insertion point distance as fontsize * ascender.

I will keep you posted about the progress.

0 replies

deepanshug · 2020-12-14T07:15:55Z

deepanshug
Dec 14, 2020
Author

Thanks Jorj. I will keep an eye on updates and release notes as well. Regards Deepanshu

…

________________________________ From: Jorj X. McKie <[email protected]> Sent: 14 December 2020 12:44 To: pymupdf/PyMuPDF <[email protected]> Cc: deepanshug <[email protected]>; Author <[email protected]> Subject: Re: [pymupdf/PyMuPDF] addRedactAnnot - With text (#748) I am working on an improved handling of this. Under the hood, text insertion for redactions uses page.insertTextbox, and I am still making simplifying assumptions in that method as per the primary text insertion point and the line height: In page.insertTextbox, line height is always set to fontsize * 1.2 and insertion point is always fontsize away from the relevant textbox border. Actually, these values should be font-dependent: Among a font's properties are "ascender" and "descender" (a negative value). With these values, the correct line height of a font can be computed as fontsize * (ascender - descender) and the insertion point distance as fontsize * ascender. I will keep you posted about the progress. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://github.com/pymupdf/PyMuPDF/issues/748#issuecomment-744226305>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJCFZFAW2F7NESPUV3HNVVDSUW3MLANCNFSM4UPLEDSA>.

0 replies

JorjMcKie · 2020-12-14T09:43:23Z

JorjMcKie
Dec 14, 2020
Maintainer

I should have thought of the following possibility earlier: the basic problem is determining the exact insertion point.
This however equals span["origin"], which we can easily determine.
The next thought is to create and immediatly apply each word occurrence. This empties the corresponding retangle. Then insert the replacement word using the correct insertion point - after adjusting the fontsize of the new text for exact fit:

rl = page.searchFor("pixmap")

for rect in rl:
    for b in page.getText("dict", clip=rect)["blocks"]:
        for l in b["lines"]:
            for span in l["spans"]:
                fsize = span["size"]
                origin = fitz.Point(span["origin"])  # the insertion point
                flags = span["flags"]
                if flags & 2 ** 3:  # is this font monospaced?
                    font = "cour"  # use Courier for new text
                else:
                    font = "helv"  # else stick with Helvetica
    page.addRedactAnnot(rect)  # redact the word
    page.apply_redactions()  # and imediately apply!
    # insert the new text separately - outside redaction

    # First determine length of new text to insert ... this only works
    # for fonts Times-Roman, Helvetica, Courier.
    # There are also ways for arbitrary fonts!
    tl = fitz.getTextlength("enigma", fontname=font, fontsize=fsize)
    # then adjust fontsize, so its fits exactly
    fsize = fsize * rect.width / tl
    page.insertText(origin, "enigma", fontname=font, fontsize=fsize, color=blue)
doc.save("x.pdf")

This results in the following:

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addRedactAnnot - With text #748

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

addRedactAnnot - With text #748

deepanshug Dec 6, 2020

Replies: 6 comments

JorjMcKie Dec 6, 2020 Maintainer

JorjMcKie Dec 6, 2020 Maintainer

deepanshug Dec 14, 2020 Author

JorjMcKie Dec 14, 2020 Maintainer

deepanshug Dec 14, 2020 Author

JorjMcKie Dec 14, 2020 Maintainer

deepanshug
Dec 6, 2020

JorjMcKie
Dec 6, 2020
Maintainer

JorjMcKie
Dec 6, 2020
Maintainer

deepanshug
Dec 14, 2020
Author

JorjMcKie
Dec 14, 2020
Maintainer

deepanshug
Dec 14, 2020
Author

JorjMcKie
Dec 14, 2020
Maintainer