Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add Voyage embeddings #408

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[WIP] feat: add Voyage embeddings #408

wants to merge 3 commits into from

Conversation

fzliu
Copy link

@fzliu fzliu commented Oct 17, 2024

Description

Adds embeddings from Voyage AI.

Type of change

  • New features (non-breaking change).
  • Bug fix (non-breaking change).
  • Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • I have performed a self-review of my code.
  • I have added thorough tests if it is a core feature.
  • There is a reference to the original bug report and related work.
  • I have commented on my code, particularly in hard-to-understand areas.
  • The feature is well documented.

@fzliu fzliu changed the title Add Voyage embeddings feat: add Voyage embeddings Oct 18, 2024
taprosoft
taprosoft previously approved these changes Oct 21, 2024
@taprosoft
Copy link
Collaborator

@fzliu thanks for your contribution. Please take a look at the CI reports and fix them. The rest looks okay to me.

@fzliu fzliu requested a review from taprosoft October 21, 2024 19:29
@fzliu fzliu changed the title feat: add Voyage embeddings [WIP] feat: add Voyage embeddings Oct 22, 2024
@ngduyanhece
Copy link

After analyzing the pull request, here are my findings:

Overall Feedback:

The addition of the Voyage AI embeddings is a significant enhancement to the project, providing new functionality for embedding models. The code is generally well-structured, but there are areas where improvements can be made, particularly regarding error handling and code clarity.

Score: 85/100

Labels: Enhancement, Tests

Code Suggestions:

  1. File: libs/kotaemon/kotaemon/embeddings/voyageai.py

    • Suggestion Content: Ensure that the api_key is validated before using it to create the client.
    • Relevant Line: + self._client = _import_voyageai().Client(api_key=self.api_key)
    • Existing Code:
      self._client = _import_voyageai().Client(api_key=self.api_key)
    • Improved Code:
      if not self.api_key:
          raise ValueError("API key must be provided for VoyageAIEmbeddings.")
      self._client = _import_voyageai().Client(api_key=self.api_key)
  2. File: libs/kotaemon/kotaemon/embeddings/voyageai.py

    • Suggestion Content: Handle potential exceptions when calling the embed method to prevent crashes due to API issues.
    • Relevant Line: + embeddings = self._client.embed(texts, model=self.model_name).embeddings
    • Existing Code:
      embeddings = self._client.embed(texts, model=self.model_name).embeddings
    • Improved Code:
      try:
          embeddings = self._client.embed(texts, model=self.model_name).embeddings
      except Exception as e:
          raise RuntimeError(f"Failed to retrieve embeddings: {e}")
  3. File: libs/kotaemon/tests/test_embedding_models.py

    • Suggestion Content: Ensure that the test for VoyageAIEmbeddings checks for the correct output structure.
    • Relevant Line: + assert_embedding_result(output)
    • Existing Code:
      assert_embedding_result(output)
    • Improved Code:
      assert isinstance(output, list) and all(isinstance(doc, DocumentWithEmbedding) for doc in output)

The addition of Voyage AI embeddings is a great enhancement! Here are some suggestions:

  1. Validate the api_key before using it to create the client in VoyageAIEmbeddings.
  2. Handle exceptions when calling the embed method to prevent crashes due to API issues.
  3. Ensure that the test for VoyageAIEmbeddings checks for the correct output structure.


def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._client = _import_voyageai().Client(api_key=self.api_key)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Validate the API key before using it in the client.

Existing Code:

self._client = _import_voyageai().Client(api_key=self.api_key)

Improved Code:

if not self.api_key:
    raise ValueError("API key must be provided for VoyageAIEmbeddings.")
self._client = _import_voyageai().Client(api_key=self.api_key)

Details:
Ensure that the api_key is validated before using it to create the client.

self, text: str | list[str] | Document | list[Document], *args, **kwargs
) -> list[DocumentWithEmbedding]:
texts = [t.content for t in self.prepare_input(text)]
embeddings = self._client.embed(texts, model=self.model_name).embeddings

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Handle exceptions when calling the embed method to prevent crashes.

Existing Code:

embeddings = self._client.embed(texts, model=self.model_name).embeddings

Improved Code:

try:
    embeddings = self._client.embed(texts, model=self.model_name).embeddings
except Exception as e:
    raise RuntimeError(f"Failed to retrieve embeddings: {e}")

Details:
Handle potential exceptions when calling the embed method to prevent crashes due to API issues.

def test_voyageai_embeddings():
model = VoyageAIEmbeddings()
output = model("Hello, world!")
assert_embedding_result(output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Ensure the test checks for the correct output structure.

Existing Code:

assert_embedding_result(output)

Improved Code:

assert isinstance(output, list) and all(isinstance(doc, DocumentWithEmbedding) for doc in output)

Details:
Ensure that the test for VoyageAIEmbeddings checks for the correct output structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants