-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Support Cohere Embeddings for SemanticChunker and SDPMChunker #118 #130
base: development
Are you sure you want to change the base?
Conversation
Hey @Udayk02! Thanks for opening a PR 🚀 I am curious about point 4, you mentioned,
Could you describe the issue you faced with Thanks |
As I delved through |
src/chonkie/embeddings/cohere.py
Outdated
Args: | ||
model: name of the Cohere embedding model to use | ||
api_key: (optional) Cohere API key (if not provided, looks for COHERE_API_KEY environment variable) | ||
organization: (optional) client name for API requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Udayk02,
Could you check the docstring once? You've mentioned organization
here, but the signature says client_name
.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a mistake as I was directly referencing to your implementation for OpenAI Embeddings. Will fix and rebase the branch. Thanks!!
Hey @Udayk02! Brilliant work on this PR 🚀 Just a really minor point but is perfect otherwise~ Thanks! 😊 |
Hi @bhavnicksm , thank you! 😊 Please state whether I have to change the tokenizer from Thanks! |
- cohere embeddings support - added to the registry for autoembeddings - added tests
Hey @Udayk02!
Yes, it's preferable to keep autotiktokenizer here~ I'll have a look at the PR you've raised and merge that in soon! We can update this PR once that's been merged. Thanks! 😊 |
key things to notice:
tokenizers
for the complete tokenizing via Cohere official urls as I faced an issue while loading the same through hf viaautotiktoken
.client
is initialized withapi_key
or not.check_api_key
is a deprecated function. along with that, there isn't a retry functionality provided by Cohere. handled it in chonkie end.