-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very long titles when converting to markdown #158
Comments
The current logic does already detect when multiple line with equal header level font size follow each other. But it does not yet remove always all line breaks when joining the header text fragments. |
Thanks for reporting this. |
So it's already solved? Thanks for the answer and the speed |
version ==0.0.9 is not the latest version for pdf4llm? I thought it was because page pdf4llm said it was the latest. Sorry for the confusion |
Ah ok, I see. That other repo is just an alias of pymupdf4llm and therefore automatically is current. BTW "fix developed" means that I have a fix locally. It is not yet published on PyPI. |
That was a good point of yours though. I will make sure that the versions coincide in the future. |
okey thank you very much for the help and the explanation of 'fix developed'. I will wait for the correction |
thanks to you for keeping the package “alive”. |
I have pdfs with titles that occupy 2 or more lines and when the pdf is transformed to markdown, they are cut (because the pdf is cut).
I attach the original pdf and the generated markdown file:
prueba_indices_enormes.pdf
prueba_indices_enormes_new_markdown.md
The content of the pdf is invented, the important thing is the result it gives with the indexes.
You can see that, when the index is very large and the pdf itself divides it into several lines, a small space is given and the new line has no '#' to indicate that it is part of the section title.
Is it something normal? is it an error in the markdown transformation?
I'm using pdf4llm version==0.0.9
The text was updated successfully, but these errors were encountered: