You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tested and compared the DeepDoc module with [marker-pdf](https://github.com/VikParuchuri/marker), which has 20k stars and focuses solely on parsing PDFs. I think your OCR quality is better, especially for charts and tables. You could consider splitting the module out as a standalone program and offer chargeable online saas service for parsing PDFs into Markdown, similar to how [Jina Reader](https://jina.ai/reader/) converts HTML to Markdown.
A standalone service could also contribute to the RAGFlow agent modules, as users might sometimes need to transform PDFs into Markdown for further use with LLMs without necessarily storing them in the knowledge base.
The text was updated successfully, but these errors were encountered:
It is already somewhat splitted.
The only issue is, (western) European characters are not applied (Umlaute, etc.)
KevinHuSh
changed the title
[Suggestion] Split DeepDoc to an independent program
[Feature Request] Split DeepDoc to an independent program
Jan 26, 2025
KevinHuSh
changed the title
[Feature Request] Split DeepDoc to an independent program
[Feature Request]: Split DeepDoc to an independent program
Jan 26, 2025
I have tested and compared the DeepDoc module with [marker-pdf](https://github.com/VikParuchuri/marker), which has 20k stars and focuses solely on parsing PDFs. I think your OCR quality is better, especially for charts and tables. You could consider splitting the module out as a standalone program and offer chargeable online saas service for parsing PDFs into Markdown, similar to how [Jina Reader](https://jina.ai/reader/) converts HTML to Markdown.
A standalone service could also contribute to the RAGFlow agent modules, as users might sometimes need to transform PDFs into Markdown for further use with LLMs without necessarily storing them in the knowledge base.
The text was updated successfully, but these errors were encountered: