addRedactAnnot - With text #748
Replies: 6 comments
-
Try something like this. I am searching for some word ("pixmap" in this example) and replace all occurrences with the text "enigma". import fitz
doc = fitz.open("file.pdf")
page = doc[0]
blue = (0, 0, 1)
rl = page.searchFor("pixmap")
for rect in rl:
fontsize = rect.height / 1.3
page.addRedactAnnot(
rect,
text="enigma",
text_color=blue,
fontsize=fontsize,
align=fitz.TEXT_ALIGN_CENTER,
)
page.apply_redactions()
doc.save("x.pdf") After: |
Beta Was this translation helpful? Give feedback.
-
As you can see, the replacement is not perfectly positioned. This goes back to the fact, that the search algorithm delivers the rectangles with a height equal to the line height. And did not bother to extract the exact insertion point of new text ... |
Beta Was this translation helpful? Give feedback.
-
Hi,
First I would like to thank you a providing this library and support.
I tried what you suggested and I was able to see it working.
However you already mentioned about positioning so it did not fit good.
I even tried by reducing area (as you already had suggested on how to reduce area) but it did not work.
However, I am able to move ahead with assignment using this library.
Thanks again for your continuous support.
Thanks
Deepanshu
…________________________________
From: Jorj X. McKie <[email protected]>
Sent: 06 December 2020 20:47
To: pymupdf/PyMuPDF <[email protected]>
Cc: deepanshug <[email protected]>; Author <[email protected]>
Subject: Re: [pymupdf/PyMuPDF] addRedactAnnot - With text (#748)
As you can see, the replacement is not perfectly positioned. This goes back to the fact, that the search algorithm delivers the rectangles with a height equal to the line height. And did not bother to extract the exact insertion point of new text ...
With some effort, better results are achievable - but as a demo this might be sufficient.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://github.com/pymupdf/PyMuPDF/issues/748#issuecomment-739517048>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJCFZFFO2HWJYNAWZKXWA6TSTOOCBANCNFSM4UPLEDSA>.
|
Beta Was this translation helpful? Give feedback.
-
I am working on an improved handling of this. I will keep you posted about the progress. |
Beta Was this translation helpful? Give feedback.
-
Thanks Jorj.
I will keep an eye on updates and release notes as well.
Regards
Deepanshu
…________________________________
From: Jorj X. McKie <[email protected]>
Sent: 14 December 2020 12:44
To: pymupdf/PyMuPDF <[email protected]>
Cc: deepanshug <[email protected]>; Author <[email protected]>
Subject: Re: [pymupdf/PyMuPDF] addRedactAnnot - With text (#748)
I am working on an improved handling of this.
Under the hood, text insertion for redactions uses page.insertTextbox, and I am still making simplifying assumptions in that method as per the primary text insertion point and the line height:
In page.insertTextbox, line height is always set to fontsize * 1.2 and insertion point is always fontsize away from the relevant textbox border.
Actually, these values should be font-dependent:
Among a font's properties are "ascender" and "descender" (a negative value). With these values, the correct line height of a font can be computed as fontsize * (ascender - descender) and the insertion point distance as fontsize * ascender.
I will keep you posted about the progress.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<https://github.com/pymupdf/PyMuPDF/issues/748#issuecomment-744226305>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJCFZFAW2F7NESPUV3HNVVDSUW3MLANCNFSM4UPLEDSA>.
|
Beta Was this translation helpful? Give feedback.
-
I should have thought of the following possibility earlier: the basic problem is determining the exact insertion point. rl = page.searchFor("pixmap")
for rect in rl:
for b in page.getText("dict", clip=rect)["blocks"]:
for l in b["lines"]:
for span in l["spans"]:
fsize = span["size"]
origin = fitz.Point(span["origin"]) # the insertion point
flags = span["flags"]
if flags & 2 ** 3: # is this font monospaced?
font = "cour" # use Courier for new text
else:
font = "helv" # else stick with Helvetica
page.addRedactAnnot(rect) # redact the word
page.apply_redactions() # and imediately apply!
# insert the new text separately - outside redaction
# First determine length of new text to insert ... this only works
# for fonts Times-Roman, Helvetica, Courier.
# There are also ways for arbitrary fonts!
tl = fitz.getTextlength("enigma", fontname=font, fontsize=fsize)
# then adjust fontsize, so its fits exactly
fsize = fsize * rect.width / tl
page.insertText(origin, "enigma", fontname=font, fontsize=fsize, color=blue)
doc.save("x.pdf") |
Beta Was this translation helpful? Give feedback.
-
Hi I am trying to use text instead of black rectangle to retain readability context. I have used as below:
"page.addRedactAnnot(area, text = 'sometext', fontname = "Courier", fontsize = 20, fill=(1, 1, 1), text_color=(0,0,0))"
I have a PDF which I am redacting and below is the list of fonts in each page (3 pages):
[(11, 'n/a', 'Type1', 'Courier', 'F1', '')]
[(18, 'n/a', 'Type1', 'Courier', 'F1', '')]
[(25, 'n/a', 'Type1', 'Courier', 'F1', '')]
I am least concerned about the font to be used but I am only intention is to use text instead of black box. E.g. if my document has email address mentioned anywhere, instead of redacting it with black box, I wish to have it replaced with textbox with text ".
appreciate your help in this
Beta Was this translation helpful? Give feedback.
All reactions