-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Add PostgresVectorStore Memory connector. #9324
base: main
Are you sure you want to change the base?
.Net: Add PostgresVectorStore Memory connector. #9324
Conversation
Work in progress, some methods are not implemented yet.
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresConstants.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
dotnet/src/Connectors/Connectors.Memory.Postgres/PostgresVectorStoreRecordCollection.cs
Outdated
Show resolved
Hide resolved
…ctor-store-dotnet
new VectorStoreRecordDataProperty("code", typeof(int)), | ||
new VectorStoreRecordDataProperty("rating", typeof(float?)), | ||
new VectorStoreRecordDataProperty("description", typeof(string)), | ||
new VectorStoreRecordDataProperty("parking_is_included", typeof(bool)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth having a record definition where some of the properties have storage names that are different to the main property name, to make sure these are used correctly in the command. Same with the other tests here that have to build names into the command.
{ | ||
return new VectorStoreRecordDefinition | ||
{ | ||
Properties = new List<VectorStoreRecordProperty> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to add one or two enumerables to the property list as well to verify that they word as expected.
{ | ||
Properties = new List<VectorStoreRecordProperty> | ||
{ | ||
new VectorStoreRecordKeyProperty("HotelId", typeof(ulong)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this anywhere? From the constants list I don't believe we support ulong, so I would expect using this to fail.
[InlineData(typeof(long), 7L)] | ||
[InlineData(typeof(string), "key1")] | ||
[InlineData(typeof(Guid), null)] | ||
public async Task ItCanGetAndDeleteRecordAsync(Type idType, object? key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also possible to make test methods generic, e.g. public async Task ItCanGetAndDeleteRecordAsync<TKey>(Type idType, TKey? key)
, which means you can use TKey in the method and you don't have to use dynamic.
namespace SemanticKernel.IntegrationTests.Connectors.Memory.Postgres; | ||
|
||
[Collection("PostgresVectorStoreCollection")] | ||
public sealed class PostgresVectorStoreRecordCollectionTests(PostgresVectorStoreFixture fixture) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a test that reads some data that wasn't upserted by the Collection. E.g. if you create a table and upsert some records using SQL, and then read them using the Collection. The reason being that we want to ensure that the data in the DB is actually what we want it to be. E.g. if we do two complimentary things wrong, on upsert and read, so it looks fine when using the Collection, but the data isn't stored in the way we need it to in the DB, a test where we didn't write the data via the Collection should catch this.
…ctor-store-dotnet
…ctor-store-dotnet
@westey-m most recent comments addressed, thanks! |
…ctor-store-dotnet
Thanks for the work on this feature. We are currently using the legacy Memory Store but that was recently marked as legacy and the connector for Postgres is currently marked as in-development, but as i see here it is probably at-least ready for alpha testing. Is it available for testing already in any channel or is any ETA available for it? Again, sorry if that is not the right place to ask. Thanks! |
@Hanake0, no problem. It's been a bit busy lately with various deadlines, but I was meaning to get back to this again this week. With the start of the holiday period, giving an ETA will be difficult, but we'll certainly be progressing it as a priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of small comments, but in general looks good to me! Thanks for your contribution!
this._postgresContainerId = await VectorStoreInfra.SetupPostgresContainerAsync(this._dockerClient); | ||
|
||
// Delay until the Postgres server is ready. | ||
var connectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if it would be better to get this connection string from .NET user secrets to avoid a pattern of keeping connection strings directly in a code, even if it's a test one. I think it's okay to keep it in the code if connection string is just a URL, but in this case, it also contains a password.
/// <summary> | ||
/// The connection string to the Postgres database hosted in the docker container. | ||
/// </summary> | ||
private const string ConnectionString = "Host=localhost;Port=5432;Username=postgres;Password=example;Database=postgres;"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, we should not encourage users to keep the connection strings in a codebase, even if it's a test settings. We can do it in similar way how it's implemented in Azure AI Search, as an example:
semantic-kernel/dotnet/samples/Concepts/Memory/VectorStore_VectorSearch_MultiStore_AzureAISearch.cs
Lines 25 to 28 in 370c89a
/// To set your secrets use: | |
/// <para> dotnet user-secrets set "AzureAISearch:Endpoint" "https://... .search.windows.net"</para> | |
/// <para> dotnet user-secrets set "AzureAISearch:ApiKey" "{Key from your Search service resource}"</para> | |
/// </summary> |
semantic-kernel/dotnet/samples/Concepts/Memory/VectorStore_VectorSearch_MultiStore_AzureAISearch.cs
Lines 45 to 47 in 370c89a
kernelBuilder.AddAzureAISearchVectorStore( | |
new Uri(TestConfiguration.AzureAISearch.Endpoint), | |
new AzureKeyCredential(TestConfiguration.AzureAISearch.ApiKey)); |
/// <remarks> | ||
/// This interface is used with the PostgresMemoryStore, which is being deprecated. | ||
/// Use the <see cref="IPostgresVectorStoreDbClient"/> interface with the PostgresVectorStore | ||
/// and related classes instead. | ||
/// </remarks> | ||
public interface IPostgresDbClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPostgresVectorStoreDbClient
is internal, but this interface and documentation is public. Instead of this XML documentation we can mark this interface as Obsolete
and recommend using new PostgresVectorStore
.
@@ -15,6 +15,11 @@ namespace Microsoft.SemanticKernel.Connectors.Postgres; | |||
/// <summary> | |||
/// An implementation of a client for Postgres. This class is used to managing postgres database operations. | |||
/// </summary> | |||
/// <remarks> | |||
/// This class is used with the PostgresMemoryStore, which is being deprecated. | |||
/// Use the <see cref="PostgresVectorStoreDbClient"/> class with the PostgresVectorStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, PostgresVectorStoreDbClient
is internal. PostgresVectorStore
and/or PostgresVectorStoreRecordCollection
can be used as a new reference points.
this._propertyReader.VerifyDataProperties(PostgresConstants.SupportedDataTypes, PostgresConstants.SupportedEnumerableDataElementTypes); | ||
this._propertyReader.VerifyVectorProperties(PostgresConstants.SupportedVectorTypes); | ||
} | ||
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel) | |
public Dictionary<string, object?> MapFromDataToStorageModel(VectorStoreGenericDataModel<TKey> dataModel) |
/// function on each element of the original sequence. | ||
/// </returns> | ||
/// <exception cref="ArgumentNullException">Thrown when the source or selector is null.</exception> | ||
public static async IAsyncEnumerable<TResult> Select<TSource, TResult>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we can't name this method as SelectAsync
?
This PR adds a PostgresVectorStore and related classes to Microsoft.SemanticKernel.Connectors.Postgres.
Motivation and Context
As part of the move to having memory connectors implement the new Microsoft.Extensions.VectorData.IVectorStore architecture (see https://github.com/microsoft/semantic-kernel/blob/main/docs/decisions/0050-updated-vector-store-design.md), each memory connector needs to be updated with the new architecture. This PR tackles updating the existing Microsoft.SemanticKernel.Connectors.Postgres package to include this implementation. This will supercede the PostgresMemoryStore implementation.
Some high level comments about design:
Contribution Checklist