PDF page rotation angle gives 90 degrees for visually straight readable page #1069
-
Hi, Please find the attached pdf replicating the scenario. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
You are taking the wrong conclusion! Just because you can read the text like normal, does not mean that the page is unrotated! |
Beta Was this translation helpful? Give feedback.
-
Using PyMuPDF, you can easily create such an "anomaly" yourself! |
Beta Was this translation helpful? Give feedback.
-
Let me strees again the point: it is no "anomaly" - but a very common thing. >>> doc=fitz.open("rtext.pdf")
>>> page=doc[0]
>>> blocks = page.get_text("dict")["blocks"]
>>> for b in blocks:
for line in b["lines"]:
print(line["dir"])
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
(0.0, -1.0)
>>> The tuples (0, -1) mean that the angle has a cosine of zero and a sine value of -1. As expected: text goes from bottom to top. |
Beta Was this translation helpful? Give feedback.
Let me strees again the point: it is no "anomaly" - but a very common thing.
As you already indicated, you need to take more than one information into account: the page rotation is one thing, another is the text orientation.
For the second aspect, you need to use a text extraction variant, which returns this information, too:
page.get_text("dict")
.Its output is an object of nested dictionaries. Inside there you will find a dictionary for every text line. A line dictionary has the key "dir", which contains the requested information.
line["dir"]
is a 2-tuple of (cosine, sine) of the angle, which the line text has wi…