Invisible Text Removal #905
-
Hi, |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
This is a complex undertaking, unfortunately. Because when we know the text color, we do not know, where in the page definition source (i.e. the |
Beta Was this translation helpful? Give feedback.
-
hi is any way of removing some text that font size >50 @ @JorjMcKie |
Beta Was this translation helpful? Give feedback.
-
sometimes these text overlap with nomal text, |
Beta Was this translation helpful? Give feedback.
-
replace or removing text is really usefull |
Beta Was this translation helpful? Give feedback.
-
If you want, look at the source code behind |
Beta Was this translation helpful? Give feedback.
-
thanks a lot . really brilliant lib |
Beta Was this translation helpful? Give feedback.
-
Sorry, I didn't read your question closely enough: this is actually doable without using redactions, because fontsize is part of the text "object" inside the page definition. In principle - and after you did a
Each text object is wrapped in string pair b"BT" / b"ET". Start using some font is a line ending with string b"Tf". The number to the left (a float) is the font size (12, 50, 20 above). page.clean_contents()
xref = page.get_contents()[0] # xref of the /Contents
cont = bytearray(doc.xref_stream(xref)) # read the contents source modifyable
contlines = cont.splitlines()
# locate a line having `.endswith(b"Tf")`. Assume line number = i
# extract fontsize
# if fontsize < 50 continue searching for Tf
# else look for next line that either ends with b"Tf" or is equal to b"ET". Line number = j
del contlines[i, j] # remove thos lines
# when done with all removals:
doc.update_stream(xref, b"\n".contlines) # update / write back contents |
Beta Was this translation helpful? Give feedback.
-
This is a related conversation which can help @Raks-coder |
Beta Was this translation helpful? Give feedback.
This is a complex undertaking, unfortunately. Because when we know the text color, we do not know, where in the page definition source (i.e. the
/Contents
) this text is being written - which we must in order to delete it.Another issue (however solvable) is finding out whether the text background is white.
Of course we could use redaction annotations to remove text, but this incurs the risk of removing other stuff unintendedly.
I have also been asked to remove text, that is covered by other objects like images: same issue here - I would be forced not only to check what comes first: image or text, but also whether the image is transparent (in which case the text is not hidden), etc.