Skip to content

Commit

Permalink
Fixes #4242: The Pinecone APOC implementation is misleading
Browse files Browse the repository at this point in the history
  • Loading branch information
vga91 committed Nov 29, 2024
1 parent b7d8a60 commit 28b8830
Show file tree
Hide file tree
Showing 6 changed files with 88 additions and 82 deletions.
Original file line number Diff line number Diff line change
@@ -1,24 +1,32 @@

= Pinecone

[NOTE]
====
The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures,
since in Pinecone a collection is a static and non-queryable copy of an index.
Anyway, the create / delete index procedures are named `.createCollection` and `.deleteCollection` to be consistent with the other.
====

Here is a list of all available Pinecone procedures:

[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.pinecone.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a 404 error if it does not exist
| apoc.vectordb.pinecone.info(hostOrKey, index, $config) | Get information about the specified existing index or throws a 404 error if it does not exist
| apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) |
Creates an index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/indexes`.
| apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $config) |
Deletes an index with the name specified in the 2nd parameter.
The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
The default endpoint is `<hostOrKey param>/indexes/<index param>`.
| apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $config) |
Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}].
The default endpoint is `<hostOrKey param>/vectors/upsert`.
| apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $config) |
Delete the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/indexes/<collection param>`.
The default endpoint is `<hostOrKey param>/indexes/<index param>`.
| apoc.vectordb.pinecone.get(hostOrKey, index, ids, $config) |
Get the vectors with the specified `ids`.
The default endpoint is `<hostOrKey param>/vectors/fetch`.
Expand All @@ -35,15 +43,6 @@ Here is a list of all available Pinecone procedures:

where the 1st parameter can be a key defined by the apoc config `apoc.pinecone.<key>.host=myHost`.

[NOTE]
====
The procedures create/drop/handle an index, instead of a collection like the other vectordb procedures,
since in Pinecone a collection is a static and non-queryable copy of an index.
Anyway, the create / delete index procedures are named `.createCollection` and `.deleteCollection` to be consistent with the other.
====


The default `hostOrKey` is `"https://api.pinecone.io"`,
therefore in general can be null with the `createCollection` and `deleteCollection` procedures,
and equal to the host name, with the other ones, that is, the one indicated in the Pinecone dashboard:
Expand All @@ -55,10 +54,10 @@ image::pinecone-index.png[width=800]

The following example assume we want to create and manage an index called `test-index`.

.Get collection info (it leverages https://docs.pinecone.io/reference/api/control-plane/describe_collection[this API])
.Get index info (it leverages https://docs.pinecone.io/guides/indexes/view-index-information[this API])
[source,cypher]
----
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-collection', {<optional config>})
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-index', {<optional config>})
----

.Example results
Expand All @@ -67,7 +66,7 @@ CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-collection', {<optional config
| value
| { "dimension": 3,
"environment": "us-east1-gcp",
"name": "tiny-collection",
"name": "tiny-index",
"size": 3126700,
"status": "Ready",
"vector_count": 99
Expand Down Expand Up @@ -262,7 +261,7 @@ It is possible to execute vector db procedures together with the xref::ml/rag.ad

[source,cypher]
----
CALL apoc.vectordb.pinecone.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
CALL apoc.vectordb.pinecone.getAndUpdate($host, $index, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
Expand Down
74 changes: 37 additions & 37 deletions extended/src/main/java/apoc/vectordb/Pinecone.java
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ public class Pinecone {
public URLAccessChecker urlAccessChecker;

@Procedure("apoc.vectordb.pinecone.info")
@Description("apoc.vectordb.pinecone.info(hostOrKey, collection, $configuration) - Get information about the specified existing collection or throws an error if it does not exist")
@Description("apoc.vectordb.pinecone.info(hostOrKey, index, $configuration) - Get information about the specified existing index or throws an error if it does not exist")
public Stream<MapResult> getInfo(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
String url = "%s/collections/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
String url = "%s/indexes/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

methodAndPayloadNull(config);

Expand All @@ -59,18 +59,18 @@ public Stream<MapResult> getInfo(@Name("hostOrKey") String hostOrKey,
}

@Procedure("apoc.vectordb.pinecone.createCollection")
@Description("apoc.vectordb.pinecone.createCollection(hostOrKey, collection, similarity, size, $configuration) - Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`")
@Description("apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $configuration) - Creates a index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`")
public Stream<MapResult> createCollection(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("similarity") String similarity,
@Name("size") Long size,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
String url = "%s/indexes";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

Map<String, Object> additionalBodies = Map.of(
"name", collection,
"name", index,
"dimension", size,
"metric", similarity
);
Expand All @@ -81,14 +81,14 @@ public Stream<MapResult> createCollection(@Name("hostOrKey") String hostOrKey,
}

@Procedure("apoc.vectordb.pinecone.deleteCollection")
@Description("apoc.vectordb.pinecone.deleteCollection(hostOrKey, collection, $configuration) - Deletes a collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.deleteCollection(hostOrKey, index, $configuration) - Deletes a index with the name specified in the 2nd parameter")
public Stream<MapResult> deleteCollection(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/indexes/%s";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "DELETE");

RestAPIConfig restAPIConfig = new RestAPIConfig(config);
Expand All @@ -98,16 +98,16 @@ public Stream<MapResult> deleteCollection(
}

@Procedure("apoc.vectordb.pinecone.upsert")
@Description("apoc.vectordb.pinecone.upsert(hostOrKey, collection, vectors, $configuration) - Upserts, in the collection with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]")
@Description("apoc.vectordb.pinecone.upsert(hostOrKey, index, vectors, $configuration) - Upserts, in the index with the name specified in the 2nd parameter, the vectors [{id: 'id', vector: '<vectorDb>', medatada: '<metadata>'}]")
public Stream<MapResult> upsert(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("vectors") List<Map<String, Object>> vectors,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/vectors/upsert";

Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

vectors = vectors.stream()
Expand All @@ -126,15 +126,15 @@ public Stream<MapResult> upsert(
}

@Procedure("apoc.vectordb.pinecone.delete")
@Description("apoc.vectordb.pinecone.delete(hostOrKey, collection, ids, $configuration) - Delete the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.delete(hostOrKey, index, ids, $configuration) - Delete the vectors with the specified `ids`")
public Stream<MapResult> delete(
@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("vectors") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {

String url = "%s/vectors/delete";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);
config.putIfAbsent(METHOD_KEY, "POST");

Map<String, Object> additionalBodies = Map.of("ids", ids);
Expand All @@ -145,29 +145,29 @@ public Stream<MapResult> delete(
}

@Procedure(value = "apoc.vectordb.pinecone.get")
@Description("apoc.vectordb.pinecone.get(hostOrKey, collection, ids, $configuration) - Get the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.get(hostOrKey, index, ids, $configuration) - Get the vectors with the specified `ids`")
public Stream<VectorDbUtil.EmbeddingResult> get(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("ids") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
setReadOnlyMappingMode(configuration);
return getCommon(hostOrKey, collection, ids, configuration);
return getCommon(hostOrKey, index, ids, configuration);
}

@Procedure(value = "apoc.vectordb.pinecone.getAndUpdate", mode = Mode.WRITE)
@Description("apoc.vectordb.pinecone.getAndUpdate(hostOrKey, collection, ids, $configuration) - Get the vectors with the specified `ids`")
@Description("apoc.vectordb.pinecone.getAndUpdate(hostOrKey, index, ids, $configuration) - Get the vectors with the specified `ids`")
public Stream<VectorDbUtil.EmbeddingResult> getAndUpdate(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name("ids") List<Object> ids,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
return getCommon(hostOrKey, collection, ids, configuration);
return getCommon(hostOrKey, index, ids, configuration);
}

private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String collection, List<Object> ids, Map<String, Object> configuration) throws Exception {
private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String index, List<Object> ids, Map<String, Object> configuration) throws Exception {
String url = "%s/vectors/fetch";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromGet(config, procedureCallContext, ids, collection);
VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromGet(config, procedureCallContext, ids, index);

return getEmbeddingResultStream(conf, procedureCallContext, urlAccessChecker, tx,
v -> {
Expand All @@ -178,33 +178,33 @@ private Stream<VectorDbUtil.EmbeddingResult> getCommon(String hostOrKey, String
}

@Procedure(value = "apoc.vectordb.pinecone.query")
@Description("apoc.vectordb.pinecone.query(hostOrKey, collection, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.query(hostOrKey, index, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter")
public Stream<VectorDbUtil.EmbeddingResult> query(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "vector", defaultValue = "[]") List<Double> vector,
@Name(value = "filter", defaultValue = "{}") Map<String, Object> filter,
@Name(value = "limit", defaultValue = "10") long limit,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
setReadOnlyMappingMode(configuration);
return queryCommon(hostOrKey, collection, vector, filter, limit, configuration);
return queryCommon(hostOrKey, index, vector, filter, limit, configuration);
}

@Procedure(value = "apoc.vectordb.pinecone.queryAndUpdate", mode = Mode.WRITE)
@Description("apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, collection, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the collection with the name specified in the 2nd parameter")
@Description("apoc.vectordb.pinecone.queryAndUpdate(hostOrKey, index, vector, filter, limit, $configuration) - Retrieve closest vectors the the defined `vector`, `limit` of results, in the index with the name specified in the 2nd parameter")
public Stream<VectorDbUtil.EmbeddingResult> queryAndUpdate(@Name("hostOrKey") String hostOrKey,
@Name("collection") String collection,
@Name("index") String index,
@Name(value = "vector", defaultValue = "[]") List<Double> vector,
@Name(value = "filter", defaultValue = "{}") Map<String, Object> filter,
@Name(value = "limit", defaultValue = "10") long limit,
@Name(value = "configuration", defaultValue = "{}") Map<String, Object> configuration) throws Exception {
return queryCommon(hostOrKey, collection, vector, filter, limit, configuration);
return queryCommon(hostOrKey, index, vector, filter, limit, configuration);
}

private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, String collection, List<Double> vector, Map<String, Object> filter, long limit, Map<String, Object> configuration) throws Exception {
private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, String index, List<Double> vector, Map<String, Object> filter, long limit, Map<String, Object> configuration) throws Exception {
String url = "%s/query";
Map<String, Object> config = getVectorDbInfo(hostOrKey, collection, configuration, url);
Map<String, Object> config = getVectorDbInfo(hostOrKey, index, configuration, url);

VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromQuery(config, procedureCallContext, vector, filter, limit, collection);
VectorEmbeddingConfig conf = DB_HANDLER.getEmbedding().fromQuery(config, procedureCallContext, vector, filter, limit, index);

return getEmbeddingResultStream(conf, procedureCallContext, urlAccessChecker, tx,
v -> {
Expand All @@ -215,7 +215,7 @@ private Stream<VectorDbUtil.EmbeddingResult> queryCommon(String hostOrKey, Strin
}

private Map<String, Object> getVectorDbInfo(
String hostOrKey, String collection, Map<String, Object> configuration, String templateUrl) {
return getCommonVectorDbInfo(hostOrKey, collection, configuration, templateUrl, DB_HANDLER);
String hostOrKey, String index, Map<String, Object> configuration, String templateUrl) {
return getCommonVectorDbInfo(hostOrKey, index, configuration, templateUrl, DB_HANDLER);
}
}
4 changes: 2 additions & 2 deletions extended/src/main/java/apoc/vectordb/PineconeHandler.java
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ static class PineconeEmbeddingHandler implements VectorEmbeddingHandler {
* that makes the request to respond 200 OK, but returns an empty result
*/
@Override
public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<T> ids, String collection) {
public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<T> ids, String index) {
List<String> fields = procedureCallContext.outputFields().toList();

config.put(BODY_KEY, null);
Expand All @@ -74,7 +74,7 @@ public <T> VectorEmbeddingConfig fromGet(Map<String, Object> config, ProcedureCa
}

@Override
public VectorEmbeddingConfig fromQuery(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<Double> vector, Object filter, long limit, String collection) {
public VectorEmbeddingConfig fromQuery(Map<String, Object> config, ProcedureCallContext procedureCallContext, List<Double> vector, Object filter, long limit, String index) {
List<String> fields = procedureCallContext.outputFields().toList();

Map<String, Object> additionalBodies = map("vector", vector,
Expand Down
Loading

0 comments on commit 28b8830

Please sign in to comment.