Difference in word coordinate information #552
Replies: 1 comment 1 reply
-
Hi @yavuzKomecoglu, and thanks for your interest in this library. Your examples and questions are very interesting ones. Unfortunately, I don't have a very conclusive answers. My best guess at what's happening is either:
My apologies that I can't be more helpful, but feel free to continue the discussion. Perhaps someone else here has deeper insights into what's happening. |
Beta Was this translation helpful? Give feedback.
-
Hello, in some Turkish newspaper PDFs, the word coordinates are below a certain pixel. For example, while the headlines in the news in 2_eylul_15_11_2018.pdf / page-3 and 2_eylul_30_11_2018.pdf / page-7 start from the bottom, the coordinates are obtained correctly in the news in gaziantep_dogus_10_08_2021.pdf / page-4.
What exactly is the difference between them? Why does this occur? What's the difference with these pdf's that such coordinate information returns differently?
Thanks.
First, the words are extracted.
Then the title region is determined.
Note: Where title_area is absolute position, relative position is used when drawing title area with opencv.
Test Newspapers
Beta Was this translation helpful? Give feedback.
All reactions