-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
patch unstructured embeddings gen example #520
Conversation
Deploying datachain-documentation with Cloudflare Pages
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #520 +/- ##
=======================================
Coverage 87.43% 87.43%
=======================================
Files 97 97
Lines 10069 10069
Branches 1374 1374
=======================================
Hits 8804 8804
Misses 908 908
Partials 357 357
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -12,11 +12,11 @@ | |||
group_broken_paragraphs, | |||
replace_unicode_quotes, | |||
) | |||
from unstructured.embed.huggingface import ( | |||
from unstructured.partition.pdf import partition_pdf | |||
from unstructured_ingest.embed.huggingface import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[F] From what I can see this is the new package for doing things. There is a new API for ingesting data but I haven't been able to grok it completely.
What I can tell you is that you can no longer instantiate the old HuggingFaceEmbeddingEncoder
because it is missing abstract methods and the new embed_documents
expects list[dict]
instead of list[Element]
so they are incompatible without these small changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised Unstructured-IO/unstructured#3731 / Unstructured-IO/unstructured#3730 to fix the issue on their end properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems simpler to just set the upper limit of unstructured to the version before 0.16.0
ee6ac41
to
6a79d57
Compare
cc @tibor-mach |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for investigating this!
It is just a patch for the example/test. We should revert once Unstructured-IO/unstructured#3730 is merged and released