how to extract text from 2 column pdf file #890
bugm
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 3 replies
-
Hi @bugm, and thanks for your interest in this library. Does this guidance here help?: |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello guys, suppose I have pdf page looks like
I want to extract the text in the order like
4. Results and discussion
4.1. Experimental setup and simulation environment
To test the performance of the prototype, an experiment setup
for the biaxial-pendulum vibration energy harvester is established
and the schematic is shown in Fig. 6. The experimental setup
mainly includes a six-DOF platform, an inertial measurement unit
4.4. Loaded results from unidirectional excitation
The unidirectional excitation experiments under loaded condi
tions are carried out to examine the loaded performance of the
prototype. Fig. 9 shows the output voltage waveforms of the energy
harvester when the excitation frequency is set to 1 Hz, 1.5 Hz and
2 Hz and the excitation amplitude is set to 0.05 m and 0.07 m,
respectively. The peak values in Fig. 9 (a), (b), and (c) are 1.7 V,10.1 V
which means extract text for left column top to bottom and then for right column top to bottom. I have tried with page.extract_text() method and did not find the way to get my desired result. I think maybe I can parse it according to the middle x coordinate for page and the x coordinate for each char. Before this I want to know if there is any built-in function in pdfplumber or any other easier way to solve it?
Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions