[Question]: Connecting llamaindex to aoss opensearch on aws #14133

mike-wethos · 2024-06-13T20:57:50Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

Hello,

I am attempting to store documents in aws aoss opensearch and keep getting a 403 forbidden when creating the opensearchVectorClient despite passing correct credentials. Is this a supported method?

import boto3
from llama_index.core import Settings, StorageContext, VectorStoreIndex
from llama_index.readers.s3 import S3Reader
from llama_index.vector_stores.opensearch import OpensearchVectorStore, OpensearchVectorClient
from opensearchpy import AWSV4SignerAuth


        credentials = boto3.Session(
            aws_access_key_id=env.aws_access_key_id,
            aws_secret_access_key=env.aws_secret_access_key,
        ).get_credentials()
        region = "us-east-2"
        service = "aoss"
        auth = AWSV4SignerAuth(credentials, region, service)

        endpoint = env.vector_store_host
        idx = args.index_type
        text_field = "content"
        embedding_field = "embedding"
        client = OpensearchVectorClient(
            endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, http_auth=auth
        )

thank you for your time.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-06-13T21:05:05Z

Hey @mike-wethos! 👋 I'm here to help you out with your question. Let's tackle this together!

The OpensearchVectorClient class does support the method you are using to create an instance with AWS credentials. However, the 403 Forbidden error you are encountering could be due to several reasons, such as insufficient permissions for the provided AWS credentials or incorrect configuration of the AWSV4SignerAuth.

Here is a checklist to help you troubleshoot the issue:

Verify AWS Credentials: Ensure that the AWS credentials have the necessary permissions to access the OpenSearch service. The credentials should have policies that allow actions like es:ESHttpGet, es:ESHttpPut, es:ESHttpPost, etc.
Check Region and Service: Make sure that the region and service parameters are correctly set. For OpenSearch, the service should be es instead of aoss.
Endpoint URL: Ensure that the endpoint URL is correct and accessible.

Here is an updated version of your code snippet with the correct service name:

import boto3
from llama_index.core import Settings, StorageContext, VectorStoreIndex
from llama_index.readers.s3 import S3Reader
from llama_index.vector_stores.opensearch import OpensearchVectorStore, OpensearchVectorClient
from opensearchpy import AWSV4SignerAuth

credentials = boto3.Session(
    aws_access_key_id=env.aws_access_key_id,
    aws_secret_access_key=env.aws_secret_access_key,
).get_credentials()
region = "us-east-2"
service = "es"  # Correct service name for OpenSearch
auth = AWSV4SignerAuth(credentials, region, service)

endpoint = env.vector_store_host
idx = args.index_type
text_field = "content"
embedding_field = "embedding"
client = OpensearchVectorClient(
    endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, kwargs={"http_auth": auth}
)

If the issue persists after these checks, you may need to review the IAM policies attached to the AWS credentials and ensure they have the necessary permissions to interact with the OpenSearch service. Additionally, ensure that the OpensearchVectorClient is correctly initialized and that the OpenSearch client is properly configured to handle the AWS authentication ^[1]^[2].

To continue talking to Dosu, mention @dosu.

mike-wethos · 2024-06-14T00:13:59Z

@dosu
I verified that the aws credentials are correct however I did not add the :443 port to the endpoint. After adding it I now get a timeout exception ConnectionTimeout caused by - TimeoutError(). I have verified that I can connect and create an index using the opensearch-py library using the above auth and endpoint parameters

dosubot · 2024-06-14T00:20:27Z

The ConnectionTimeout error you're encountering can be due to several factors, including network or configuration issues with your OpenSearch server, the overhead of bulk ingestion operations, server performance and configuration, or the client's timeout settings being too low for the operation.

Here are some steps to address these issues:

Verify Server Endpoint and Accessibility: Ensure the OpenSearch server's endpoint is correct and that the server is accessible. Incorrect endpoint configuration or server unavailability can lead to timeout errors.
Check Network Connectivity: Ensure there's stable network connectivity between your client and the OpenSearch server.
Review Server Performance: The OpenSearch server's performance and configuration should be adequate to handle the workload. If the server is under heavy load, consider optimizing its resources.
Adjust Bulk Operation Settings: If you're ingesting large amounts of data, consider adjusting the max_chunk_bytes parameter to better suit your data size and reduce the load on the server. For example, you can increase the max_chunk_bytes value to something larger, such as 5 * 1024 * 1024 (5 MB) or more, depending on your data size and network conditions.

Modify Client Timeout Settings: Increase the timeout settings of the OpensearchVectorClient to accommodate operations that take longer to complete. You can set the timeout parameter when initializing the client:

from llama_index.vector_stores.opensearch import OpensearchVectorClient

client = OpensearchVectorClient(
    endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, timeout=60, kwargs={"http_auth": auth}
)

Configure AWS Service Client with Higher Timeout: Ensure that the AWS service client is configured with a higher timeout value to prevent premature timeouts:

import boto3
import botocore

def get_aws_service_client(
    service_name, region_name, aws_access_key_id, aws_secret_access_key, timeout=60.0
):
    config = botocore.config.Config(
        retries={"max_attempts": 3, "mode": "standard"},
        connect_timeout=timeout,
    )
    session = boto3.Session(
        aws_access_key_id=aws_access_key_id,
        aws_secret_access_key=aws_secret_access_key,
        region_name=region_name,
    )
    return session.client(service_name, config=config)

client = get_aws_service_client("es", "us-east-2", env.aws_access_key_id, env.aws_secret_access_key)

By following these steps, you should be able to mitigate the timeout exceptions when storing documents in AWS AOSS OpenSearch ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

mike-wethos · 2024-06-14T00:28:51Z

I am still getting the connection timeout

dosubot · 2024-06-14T00:34:35Z

To resolve the ConnectionTimeout error when creating the OpensearchVectorClient in AWS AOSS OpenSearch, ensure that the http_auth parameter is correctly set to indicate that the service is aoss. This is necessary for the _is_aoss_enabled method to recognize and handle the AOSS-specific configurations.

Here is an example of how to correctly set the http_auth parameter:

from opensearchpy import AWSV4SignerAuth
import boto3

# Create the AWS credentials
credentials = boto3.Session().get_credentials()
region = 'your-aws-region'

# Create the AWSV4SignerAuth object
http_auth = AWSV4SignerAuth(credentials, region)

# Initialize the OpensearchVectorClient with the correct http_auth
client = OpensearchVectorClient(
    endpoint="https://your-opensearch-endpoint",
    index="your-index-name",
    dim=1536,
    embedding_field="embedding",
    text_field="content",
    http_auth=http_auth
)

Ensure that you replace "your-aws-region", "https://your-opensearch-endpoint", and "your-index-name" with your actual AWS region, OpenSearch endpoint, and index name respectively. This should help in resolving the ConnectionTimeout error by correctly configuring the client for AWS AOSS ^[1].

To continue talking to Dosu, mention @dosu.

mike-wethos · 2024-06-14T00:56:51Z

With this setup

        # aws setup
        credentials = boto3.Session(
            aws_access_key_id=env.aws_access_key_id,
            aws_secret_access_key=env.aws_secret_access_key,
        ).get_credentials()

        region = "us-east-2"
        service = "aoss"
        auth = AWSV4SignerAuth(credentials, region, service)

        endpoint = env.vector_store_host
        idx = args.index_type
        text_field = "content"
        embedding_field = "embedding"
        client = OpensearchVectorClient(
             endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, http_auth=auth
        )

I am now getting a 'AWSV4SignerAuth' object has no attribute 'encode' error

mike-wethos · 2024-06-14T01:03:12Z

@dosu
I fixed that error by adding connection_class=RequestsHttpConnection to the call and that fixed the AWSV4signerAuth error, but now the the code just hangs with no error when attempting to create the client

        # aws setup
        credentials = boto3.Session(
            aws_access_key_id=env.aws_access_key_id,
            aws_secret_access_key=env.aws_secret_access_key,
        ).get_credentials()

        region = "us-east-2"
        service = "aoss"
        auth = AWSV4SignerAuth(credentials, region, service)

        endpoint = env.vector_store_host
        idx = args.index_type
        text_field = "content"
        embedding_field = "embedding"
        client = OpensearchVectorClient(
            endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, http_auth=auth,
            connection_class=RequestsHttpConnection, use_ssl=True, verify_certs=True
        )

It finally failed with

ConnectionError(HTTPSConnectionPool(host='hostHere', port=433): Max retries exceeded with url: /enterprise (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff22f577550>: Failed to establish a new connection: [Errno 101] Network is unreachable'))) caused by: ConnectionError(HTTPSConnectionPool(host='hostHere', port=433): Max retries exceeded with url: /enterprise (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff22f577550>: Failed to establish a new connection: [Errno 101] Network is unreachable')))

I have verified that I can successfully create an index using these credentials and host/port using opensearch-py. The issue seems to step from the fact that llamaindex only supports sending the host/port as a string rather than the standard dict that opensearch-py uses in its examples.

client = OpenSearch(
    hosts = [{'host': endpoint, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection
)

Georgehe4 · 2024-06-18T01:58:02Z

Are you encoding the port in your host when you use the vector client?
It looks like opensearch-py should be standardizing the input string as long as the port has the port is encoded in the string, i.e. "localhost:442"

dosubot · 2024-06-18T02:04:49Z

To ensure compatibility with llama_index when using OpensearchVectorClient, you should encode the port in the endpoint string as part of the URL. For example, if your OpenSearch instance is running on port 443, your endpoint string should look like this:

endpoint = "https://your-opensearch-endpoint:443"

Here is the updated code snippet with the port included in the endpoint string:

from opensearchpy import AWSV4SignerAuth, RequestsHttpConnection
import boto3

# AWS setup
credentials = boto3.Session(
    aws_access_key_id=env.aws_access_key_id,
    aws_secret_access_key=env.aws_secret_access_key,
).get_credentials()

region = "us-east-2"
service = "aoss"
auth = AWSV4SignerAuth(credentials, region, service)

endpoint = "https://your-opensearch-endpoint:443"
idx = args.index_type
text_field = "content"
embedding_field = "embedding"

client = OpensearchVectorClient(
    endpoint, idx, 768, embedding_field=embedding_field, text_field=text_field, http_auth=auth,
    connection_class=RequestsHttpConnection, use_ssl=True, verify_certs=True
)

This should help resolve the connection issues by ensuring the port is correctly included in the endpoint URL, making it compatible with the OpensearchVectorClient ^[1].

To continue talking to Dosu, mention @dosu.

mike-wethos · 2024-06-18T14:53:04Z

Yes i have tried every variation of the url with no luck. I think an ideal solution would to update OpensearchVectorClientclient to better match the standard of hosts = [{'host': endpoint, 'port': 443}] that is seen in the opensearch-py docs. The current solution does not seem to work at all with aoss aws opensearch servers.

GeorgeDittmar · 2024-08-15T22:06:07Z

Having similar issues. Any progress on solving this?

niravshah · 2024-08-16T20:54:19Z

The following worked for me - sharing incase its useful

import boto3
from opensearchpy import AWSV4SignerAuth, RequestsHttpConnection, AWSV4SignerAsyncAuth, AsyncHttpConnection
from llama_index.vector_stores.opensearch import  OpensearchVectorStore,   OpensearchVectorClient
from llama_index.core import VectorStoreIndex, StorageContext

region = <your-aws-region>
credentials = boto3.Session().get_credentials()
auth = AWSV4SignerAsyncAuth(credentials, region)

endpoint = "https://<your-domain-name>:443"
idx = <your-index>
text_field = "content"
embedding_field = "embedding"

client = OpensearchVectorClient(
    endpoint, idx, 1536, embedding_field=embedding_field, text_field=text_field,
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=AsyncHttpConnection
)

vector_store = OpensearchVectorStore(client)

GeorgeDittmar · 2024-08-16T22:31:22Z

hmm this works for standalone scripts but now running into event loop issues when trying to setup the client in fastapi service running in uvicorn. Seems it has trouble getting the loop in the llamaindex code so it errors out due to the loop already running. Also I cant just make one client at lifecycle startup so I need a way to create a client when a request comes in since it could hit multiple indexes based on the request params.

acastro2 · 2024-10-28T22:04:06Z

This worked for me:

from opensearchpy import AWSV4SignerAuth, RequestsHttpConnection

auth = AWSV4SignerAuth(credentials, "us-east-2", "es")

client = OpensearchVectorClient(
        endpoint, idx, 1024, embedding_field="embedding", text_field="content", http_auth=auth, use_ssl=True, verify_certs=True, connection_class=RequestsHttpConnection
    )

The RequestsHttpConnection was the key.

bjmvercelli · 2024-11-13T17:09:44Z

This worked for me:

from opensearchpy import AWSV4SignerAuth, RequestsHttpConnection

auth = AWSV4SignerAuth(credentials, "us-east-2", "es")

client = OpensearchVectorClient(
        endpoint, idx, 1024, embedding_field="embedding", text_field="content", http_auth=auth, use_ssl=True, verify_certs=True, connection_class=RequestsHttpConnection
    )

The RequestsHttpConnection was the key.

I've got the following error when i use RequestsHttpConnection:

File ".../venv/lib/python3.11/site-packages/opensearchpy/_async/transport.py", line 385, in perform_request
    status, headers_response, data = await connection.perform_request(
   
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object tuple can't be used in 'await' expression

Any ideas?

mike-wethos added the question Further information is requested label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: Connecting llamaindex to aoss opensearch on aws #14133

[Question]: Connecting llamaindex to aoss opensearch on aws #14133

mike-wethos commented Jun 13, 2024 •

edited

Loading

dosubot bot commented Jun 13, 2024

mike-wethos commented Jun 14, 2024 •

edited

Loading

dosubot bot commented Jun 14, 2024

mike-wethos commented Jun 14, 2024

dosubot bot commented Jun 14, 2024

mike-wethos commented Jun 14, 2024

mike-wethos commented Jun 14, 2024 •

edited

Loading

Georgehe4 commented Jun 18, 2024 •

edited

Loading

dosubot bot commented Jun 18, 2024

mike-wethos commented Jun 18, 2024

GeorgeDittmar commented Aug 15, 2024

niravshah commented Aug 16, 2024

GeorgeDittmar commented Aug 16, 2024 •

edited

Loading

acastro2 commented Oct 28, 2024

bjmvercelli commented Nov 13, 2024

[Question]: Connecting llamaindex to aoss opensearch on aws #14133

[Question]: Connecting llamaindex to aoss opensearch on aws #14133

Comments

mike-wethos commented Jun 13, 2024 • edited Loading

Question Validation

Question

dosubot bot commented Jun 13, 2024

mike-wethos commented Jun 14, 2024 • edited Loading

dosubot bot commented Jun 14, 2024

mike-wethos commented Jun 14, 2024

dosubot bot commented Jun 14, 2024

mike-wethos commented Jun 14, 2024

mike-wethos commented Jun 14, 2024 • edited Loading

Georgehe4 commented Jun 18, 2024 • edited Loading

dosubot bot commented Jun 18, 2024

mike-wethos commented Jun 18, 2024

GeorgeDittmar commented Aug 15, 2024

niravshah commented Aug 16, 2024

GeorgeDittmar commented Aug 16, 2024 • edited Loading

acastro2 commented Oct 28, 2024

bjmvercelli commented Nov 13, 2024

mike-wethos commented Jun 13, 2024 •

edited

Loading

mike-wethos commented Jun 14, 2024 •

edited

Loading

mike-wethos commented Jun 14, 2024 •

edited

Loading

Georgehe4 commented Jun 18, 2024 •

edited

Loading

GeorgeDittmar commented Aug 16, 2024 •

edited

Loading