Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix]/Use default values for SIMILARITY and NUM_LINES for the Azure CosmosDB datastore provider-405 #406

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion datastore/providers/azurecosmosdb_datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
AZCOSMOS_CONNSTR = os.environ.get("AZCOSMOS_CONNSTR")
AZCOSMOS_DATABASE_NAME = os.environ.get("AZCOSMOS_DATABASE_NAME")
AZCOSMOS_CONTAINER_NAME = os.environ.get("AZCOSMOS_CONTAINER_NAME")
AZCOSMOS_SIMILARITY = os.environ.get("AZCOSMOS_SIMILARITY", "COS")
adgrajales1 marked this conversation as resolved.
Show resolved Hide resolved
AZCOSMOS_NUM_LISTS = os.environ.get("AZCOSMOS_NUM_LISTS", 100)
assert AZCOSMOS_API is not None
assert AZCOSMOS_CONNSTR is not None
assert AZCOSMOS_DATABASE_NAME is not None
Expand Down Expand Up @@ -201,7 +203,7 @@ def __init__(self, cosmosStore: AzureCosmosDBStoreApi):

"""
@staticmethod
async def create(num_lists, similarity) -> DataStore:
async def create(num_lists: int=AZCOSMOS_NUM_LISTS, similarity: str=AZCOSMOS_SIMILARITY) -> DataStore:

# Create underlying data store based on the API definition.
# Right now this only supports Mongo, but set up to support more.
Expand All @@ -211,6 +213,11 @@ async def create(num_lists, similarity) -> DataStore:
apiStore = MongoStoreApi(mongoClient)
else:
raise NotImplementedError
if similarity not in ["COS", "L2", "IP"]:
raise ValueError(
f"Similarity {similarity} is not supported."
"Supported similarity metrics are COS, L2, and IP."
)

await apiStore.ensure(num_lists, similarity)
store = AzureCosmosDBDataStore(apiStore)
Expand Down
2 changes: 2 additions & 0 deletions docs/providers/azurecosmosdb/setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ Learn more about Azure Cosmos DB for MongoDB vCore [here](https://learn.microsof
| `AZCOSMOS_CONNSTR` | Yes | The connection string to your account. | |
| `AZCOSMOS_DATABASE_NAME` | Yes | The database where the data is stored/queried | |
| `AZCOSMOS_CONTAINER_NAME` | Yes | The container where the data is stored/queried | |
| `AZCOSMOS_SIMILARITY` | No | The similarity metric used by the vector database (allowed values are `COS`, `IP`, `L2`). Default value is `COS`.
| `AZCOSMOS_NUM_LISTS` | No | "This integer is the number of clusters that the inverted file (IVF) index uses to group the vector data.". Default value is `100`. See [vector-search](https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search) for more information.

## Indexing
On first insert, the datastore will create the collection and index if necessary on the field `embedding`. Currently hybrid search is not yet supported.
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,10 @@ def queries() -> List[QueryWithEmbedding]:
async def azurecosmosdb_datastore() -> DataStore:
return await AzureCosmosDBDataStore.create(num_lists=num_lists, similarity=similarity)

@pytest.mark.asyncio
async def test_invalid_similarity() -> None:
with pytest.raises(ValueError):
await AzureCosmosDBDataStore.create(num_lists=num_lists, similarity="INVALID")

@pytest.mark.asyncio
async def test_upsert(
Expand Down