-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: UnstructuredFileConverter meta field #242
Feat: UnstructuredFileConverter meta field #242
Conversation
Not sure I understand why the test are failing on 3.9 |
And maybe I should add more test case because all test case only have one file with one metadata dict. Maybe a test case with two files and a EDIT: Done. |
@lambda-science tests are failing because of mypy. Something in the code should be fixed. Thanks for your PR. I will take a look later or tomorrow... |
No worries, I realized I could use my own fix before it is merged and released with a simple requirement.txt such as:
That's very cool ! So it's not urgent at all 🌞 |
I have the feeling that something is off with metadata, because the auto-generated page_number is always the last page with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lambda-science, thanks for the good work!
- I left a comment about metadata handling
- I would add a test where the path is a directory
...tions/unstructured/src/haystack_integrations/components/converters/unstructured/converter.py
Show resolved
Hide resolved
…to feat/unstructured_meta_field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey... Sorry for the long wait.
I found some small opportunities for improvement, then we can merge this good PR!
...tions/unstructured/src/haystack_integrations/components/converters/unstructured/converter.py
Outdated
Show resolved
Hide resolved
...tions/unstructured/src/haystack_integrations/components/converters/unstructured/converter.py
Outdated
Show resolved
Hide resolved
...tions/unstructured/src/haystack_integrations/components/converters/unstructured/converter.py
Outdated
Show resolved
Hide resolved
…/converters/unstructured/converter.py Co-authored-by: Stefano Fiorucci <[email protected]>
…/converters/unstructured/converter.py Co-authored-by: Stefano Fiorucci <[email protected]>
…/converters/unstructured/converter.py Co-authored-by: Stefano Fiorucci <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix: #241
This pull request add a new optional
meta
field to add custom metadata to Unstructured documents.From:
To:
The implementation is inspired from the handling of metadata in the
PyPDFToDocument
component from Haystack main repo.Ping @anakin87