Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: review integrations.astra #498

Merged
merged 5 commits into from
Feb 29, 2024
Merged

Conversation

wochinge
Copy link
Contributor

@wochinge wochinge requested a review from a team as a code owner February 28, 2024 16:57
@wochinge wochinge requested review from davidsbatista and removed request for a team February 28, 2024 16:57
@github-actions github-actions bot added integration:astra type:documentation Improvements or additions to documentation labels Feb 28, 2024
@wochinge
Copy link
Contributor Author


title: Astra
excerpt: Astra integration for Haystack
category: placeholder-integrations-api
slug: integrations-astra
parentDoc:
order: 30
hidden: false

Module haystack_integrations.components.retrievers.astra.retriever

AstraEmbeddingRetriever

@component
class AstraEmbeddingRetriever()

A component for retrieving documents from an AstraDocumentStore.

Usage example:

from haystack_integrations.document_stores.astra import AstraDocumentStore
from haystack_integrations.components.retrievers.astra import AstraEmbeddingRetriever

document_store = AstraDocumentStore(
    api_endpoint=api_endpoint,
    token=token,
    collection_name=collection_name,
    duplicates_policy=DuplicatePolicy.SKIP,
    embedding_dim=384,
)

retriever = AstraEmbeddingRetriever(document_store=document_store)

AstraEmbeddingRetriever.__init__

def __init__(document_store: AstraDocumentStore,
             filters: Optional[Dict[str, Any]] = None,
             top_k: int = 10)

Arguments:

  • filters: A dictionary with filters to narrow down the search space.
  • top_k: The maximum number of documents to retrieve.

AstraEmbeddingRetriever.run

@component.output_types(documents=List[Document])
def run(query_embedding: List[float],
        filters: Optional[Dict[str, Any]] = None,
        top_k: Optional[int] = None)

Retrieve documents from the AstraDocumentStore.

Arguments:

  • query_embedding: Floats representing the query embedding
  • filters: Filters to narrow down the search space.
  • top_k: the maximum number of documents to retrieve.

Returns:

A dictionary with the following keys:

  • documents: A list of documents retrieved from the AstraDocumentStore.

AstraEmbeddingRetriever.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AstraEmbeddingRetriever.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AstraEmbeddingRetriever"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

Module haystack_integrations.document_stores.astra.document_store

AstraDocumentStore

class AstraDocumentStore()

An AstraDocumentStore document store for Haystack.

Example Usage:

from haystack_integrations.document_stores.astra import AstraDocumentStore

document_store = AstraDocumentStore(
    api_endpoint=api_endpoint,
    token=token,
    collection_name=collection_name,
    duplicates_policy=DuplicatePolicy.SKIP,
    embedding_dim=384,
)

AstraDocumentStore.__init__

def __init__(
        api_endpoint: Secret = Secret.from_env_var("ASTRA_DB_API_ENDPOINT"),
        token: Secret = Secret.from_env_var("ASTRA_DB_APPLICATION_TOKEN"),
        collection_name: str = "documents",
        embedding_dimension: int = 768,
        duplicates_policy: DuplicatePolicy = DuplicatePolicy.NONE,
        similarity: str = "cosine")

The connection to Astra DB is established and managed through the JSON API.

The required credentials (api endpoint and application token) can be generated
through the UI by clicking and the connect tab, and then selecting JSON API and
Generate Configuration.

Arguments:

  • api_endpoint: The Astra DB API endpoint.
  • token: The Astra DB application token.
  • collection_name: The current collection in the keyspace in the current Astra DB.
  • embedding_dimension: Dimension of embedding vector.
  • duplicates_policy: Handle duplicate documents based on DuplicatePolicy parameter options.
    Parameter options : (SKIP, OVERWRITE, FAIL, NONE)
  • DuplicatePolicy.NONE: Default policy, If a Document with the same ID already exists,
    it is skipped and not written.
  • DuplicatePolicy.SKIP: If a Document with the same ID already exists, it is skipped and not written.
  • DuplicatePolicy.OVERWRITE: If a Document with the same ID already exists, it is overwritten.
  • DuplicatePolicy.FAIL: If a Document with the same ID already exists, an error is raised.
  • similarity: The similarity function used to compare document vectors.

Raises:

  • ValueError: If the API endpoint or token is not set.

AstraDocumentStore.from_dict

@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "AstraDocumentStore"

Deserializes the component from a dictionary.

Arguments:

  • data: Dictionary to deserialize from.

Returns:

Deserialized component.

AstraDocumentStore.to_dict

def to_dict() -> Dict[str, Any]

Serializes the component to a dictionary.

Returns:

Dictionary with serialized data.

AstraDocumentStore.write_documents

def write_documents(documents: List[Document],
                    policy: DuplicatePolicy = DuplicatePolicy.NONE)

Indexes documents for later queries.

Arguments:

  • documents: a list of Haystack Document objects.
  • policy: Handle duplicate documents based on DuplicatePolicy parameter options.
    Parameter options : (SKIP, OVERWRITE, FAIL, NONE)
  • DuplicatePolicy.NONE: Default policy, If a Document with the same ID already exists,
    it is skipped and not written.
  • DuplicatePolicy.SKIP: If a Document with the same ID already exists,
    it is skipped and not written.
  • DuplicatePolicy.OVERWRITE: If a Document with the same ID already exists, it is overwritten.
  • DuplicatePolicy.FAIL: If a Document with the same ID already exists, an error is raised.

Raises:

  • ValueError: If the documents are not of type Document or dict.
  • DuplicateDocumentError: If a document with the same ID already exists and policy is set to FAIL.
  • Exception: If the document ID is not a string or if id and _id are both present in the document.

Returns:

number of documents written.

AstraDocumentStore.count_documents

def count_documents() -> int

Counts the number of documents in the document store.

Returns:

the number of documents in the document store.

AstraDocumentStore.filter_documents

def filter_documents(
        filters: Optional[Dict[str, Any]] = None) -> List[Document]

Returns at most 1000 documents that match the filter

Arguments:

  • filters: ilters to apply. Defaults to None.

Raises:

  • AstraDocumentStoreFilterError: If the filter is invalid or not supported by this class.

Returns:

Matching documents.

AstraDocumentStore.get_documents_by_id

def get_documents_by_id(ids: List[str]) -> List[Document]

Gets documents by their ids.

Arguments:

  • ids: the IDs of the documents to retrieve.

Returns:

the matching documents.

AstraDocumentStore.get_document_by_id

def get_document_by_id(document_id: str) -> Document

Gets a document by its id.

Arguments:

  • document_id: the ID to filter by

Raises:

  • MissingDocumentError: if the document is not found

Returns:

the found document

AstraDocumentStore.search

def search(query_embedding: List[float],
           top_k: int,
           filters: Optional[Dict[str, Any]] = None) -> List[Document]

Perform a search for a list of queries.

Arguments:

  • query_embedding: A list of query embeddings.
  • top_k: The number of results to return.
  • filters: Filters to apply during search.

Returns:

Matching documents.

AstraDocumentStore.delete_documents

def delete_documents(document_ids: Optional[List[str]] = None,
                     delete_all: Optional[bool] = None) -> None

Deletes documents from the document store.

Arguments:

  • document_ids: IDs of the documents to delete.
  • delete_all: if True, delete all documents.

Raises:

  • MissingDocumentError: if no document was deleted but document IDs were provided.

Module haystack_integrations.document_stores.astra.errors

AstraDocumentStoreError

class AstraDocumentStoreError(DocumentStoreError)

Parent class for all AstraDocumentStore errors.

AstraDocumentStoreFilterError

class AstraDocumentStoreFilterError(FilterError)

Raised when an invalid filter is passed to AstraDocumentStore.

AstraDocumentStoreConfigError

class AstraDocumentStoreConfigError(AstraDocumentStoreError)

Raised when an invalid configuration is passed to AstraDocumentStore.

through the UI by clicking and the connect tab, and then selecting JSON API and
Generate Configuration.

:param api_endpoint: The Astra DB API endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency I would start with lower case what comes after a :param <variable>:

@@ -45,7 +58,7 @@ def __init__(
):
"""
The connection to Astra DB is established and managed through the JSON API.
The required credentials (api endpoint andapplication token) can be generated
The required credentials (api endpoint and application token) can be generated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

Copy link
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some small comments for consistency

@wochinge wochinge merged commit 65715d6 into main Feb 29, 2024
10 checks passed
@wochinge wochinge deleted the docs/review-integrations.astra branch February 29, 2024 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:astra type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Docs - integrations.astra Docstrings - integrations.astra
2 participants