From 82c0e4b267412c1737fcae7da2f92d710780ec1f Mon Sep 17 00:00:00 2001 From: Andreas Motl Date: Wed, 13 Mar 2024 10:39:28 +0100 Subject: [PATCH] Feature / Vector Store: Implement page --- docs/_include/links.md | 9 ++ docs/conf.py | 1 + docs/domain/ml/index.md | 10 +-- docs/feature/index.md | 1 + docs/feature/relational/index.md | 6 +- docs/feature/vector/index.md | 149 +++++++++++++++++++++++++++++-- 6 files changed, 160 insertions(+), 16 deletions(-) diff --git a/docs/_include/links.md b/docs/_include/links.md index ba00a89e..765f8bc9 100644 --- a/docs/_include/links.md +++ b/docs/_include/links.md @@ -5,8 +5,17 @@ [HoloViews]: https://www.holoviews.org/ [Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations [JSON Database]: https://cratedb.com/solutions/json-database +[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain +[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb +[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb +[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb +[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb +[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb +[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb [Multi-model Database]: https://cratedb.com/solutions/multi-model-database [Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure [Relational Database]: https://cratedb.com/solutions/relational-database [timeseries-queries-and-visualization-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb [timeseries-queries-and-visualization-github]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb +[Vector Database (Product)]: https://cratedb.com/solutions/vector-database +[Vector Database]: https://en.wikipedia.org/wiki/Vector_database diff --git a/docs/conf.py b/docs/conf.py index c6b1579c..560631d6 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -51,6 +51,7 @@ myst_substitutions.update({ "nb_colab": "[![Notebook on Colab](https://img.shields.io/badge/Open-Notebook%20on%20Colab-blue?logo=Google%20Colab)]", + "nb_binder": "[![Notebook on Binder](https://img.shields.io/badge/Open-Notebook%20on%20Binder-lightblue?logo=binder)]", "nb_github": "[![Notebook on GitHub](https://img.shields.io/badge/Open-Notebook%20on%20GitHub-darkgreen?logo=GitHub)]", "readme_github": "[![README](https://img.shields.io/badge/Open-README-darkblue?logo=GitHub)]", "blog": "[![Blog](https://img.shields.io/badge/Open-Blog-darkblue?logo=Markdown)]", diff --git a/docs/domain/ml/index.md b/docs/domain/ml/index.md index 76c406be..a4052c9f 100644 --- a/docs/domain/ml/index.md +++ b/docs/domain/ml/index.md @@ -3,6 +3,11 @@ # Machine Learning +:::{include} /_include/links.md +::: +:::{include} /_include/styles.html +::: + Integrate CrateDB with machine learning frameworks and tools, for [MLOps] and [Vector database] operations. @@ -321,16 +326,12 @@ tensorflow [LangChain: Analyzing structured data]: https://python.langchain.com/docs/use_cases/qa_structured/sql [LangChain: Chatbots]: https://python.langchain.com/docs/use_cases/chatbots [LangChain: Retrieval augmented generation]: https://python.langchain.com/docs/use_cases/question_answering/ -[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain [langchain-conversational-history-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fconversational_memory.ipynb [langchain-conversational-history-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb [langchain-conversational-history-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb [langchain-document-loader-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fdocument_loader.ipynb [langchain-document-loader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb [langchain-document-loader-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb -[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb -[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb -[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb [Machine Learning and CrateDB: An introduction]: https://cratedb.com/blog/machine-learning-and-cratedb-part-one [Machine Learning and CrateDB: Getting Started With Jupyter]: https://cratedb.com/blog/machine-learning-cratedb-jupyter [Machine Learning and CrateDB: Experiment Design & Linear Regression]: https://cratedb.com/blog/machine-learning-and-cratedb-part-three-experiment-design-and-linear-regression @@ -344,4 +345,3 @@ tensorflow [Time Series Modeling using Machine Learning]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data [tracking-merlion-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb [tracking-merlion-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb -[Vector database]: https://en.wikipedia.org/wiki/Vector_database diff --git a/docs/feature/index.md b/docs/feature/index.md index 6e5f1cc2..1bdcdac2 100644 --- a/docs/feature/index.md +++ b/docs/feature/index.md @@ -19,6 +19,7 @@ relational/index document/index search/index geospatial/index +vector/index ::: +++ CrateDB combines the power of Lucene with the advantages of diff --git a/docs/feature/relational/index.md b/docs/feature/relational/index.md index a8b5805f..69ec3b04 100644 --- a/docs/feature/relational/index.md +++ b/docs/feature/relational/index.md @@ -5,6 +5,8 @@ :::{include} /_include/links.md ::: +:::{include} /_include/styles.html +::: :::::{grid} :padding: 0 @@ -209,7 +211,3 @@ SET optimizer_reorder_nested_loop_join = false; [manual-join-concept]: inv:crate-reference#concept-joins [manual-join-types]: inv:crate-reference#sql_joins [manual-joined-relation]: inv:crate-reference#sql-select-joined-relation - - -```{include} /_include/styles.html -``` diff --git a/docs/feature/vector/index.md b/docs/feature/vector/index.md index fa2f7e4e..565d8aca 100644 --- a/docs/feature/vector/index.md +++ b/docs/feature/vector/index.md @@ -1,20 +1,155 @@ ---- -orphan: true ---- - (hnsw)= (vector)= # Vector Store -:::{todo} Implement. +:::{include} /_include/links.md +::: +:::{include} /_include/styles.html +::: + +:::::{grid} +:padding: 0 + +::::{grid-item} +:class: rubric-slim +:columns: 9 + + +:::{rubric} Overview +::: +CrateDB can be used as a [Vector Database] feature for storing and retrieving +vector embeddings. + +:::{rubric} About +::: +CrateDB's `FLOAT_VECTOR` type and its `KNN_MATCH` function can be used +for storing and retrieving embeddings, and for conducting HNSW similarity +searches. +:::: + + + +::::{grid-item} +:class: rubric-slim +:columns: 3 + +```{rubric} Reference Manual +``` +- [](inv:crate-reference#type-float_vector) +- [](inv:crate-reference#scalar_knn_match) + +```{rubric} Related +``` +- {ref}`sql` +- {ref}`machine-learning` +- {ref}`query` + +{tags-primary}`SQL` +{tags-primary}`Vector Store` +{tags-primary}`Machine Learning` +:::: + +::::: + + +:::{rubric} Tutorials +::: + +::::{info-card} +:::{grid-item} **Vector Support and KNN Search through SQL** +:columns: 9 + +The addition of vector support and KNN search makes CrateDB the optimal +multi-model database for all types of data. Whether it is structured, +semi-structured, or unstructured data, CrateDB stands as the all-in-one +solution, capable of handling diverse data types with ease. + +In this feature-focused blog post, we will introduce how CrateDB can be +used as a vector database and how the vector store is implemented. +We will also explore the possibilities of the K-Nearest Neighbors (KNN) +search, and demonstrate vector capabilities with easy-to-follow examples. + +{{ '{}[Vector support and KNN search in CrateDB]'.format(blog) }} ::: +:::{grid-item} +:columns: 3 +{tags-primary}`Introduction` \ +{tags-secondary}`Vector Store` \ +{tags-secondary}`SQL` +::: +:::: + + +::::{info-card} +:::{grid-item} **Retrieval Augmented Generation (RAG) with CrateDB and SQL** +:columns: 9 + +This notebook illustrates CrateDB's vector store using pure SQL on behalf +of an example exercising a RAG workflow. + +It uses the white-paper [Time series data in manufacturing] as input data, +generates embeddings using OpenAI's ChatGPT, stores them into a table +using `FLOAT_VECTOR(1536)`, and queries it using the `KNN_MATCH` function. + +{{ '{}[langchain-rag-sql-github]'.format(nb_github) }} {{ '{}[langchain-rag-sql-colab]'.format(nb_colab) }} {{ '{}[langchain-rag-sql-binder]'.format(nb_binder) }} +::: +:::{grid-item} +:columns: 3 +{tags-primary}`Fundamentals` \ +{tags-secondary}`Vector Store` \ +{tags-secondary}`LangChain` \ +{tags-secondary}`pandas` \ +{tags-secondary}`SQL` +::: +:::: + + +:::{rubric} Videos +::: + +::::{info-card} + +:::{grid-item} **How to Use Private Data in Generative AI?** +:columns: auto auto 8 8 + +In this video recorded at FOSDEM 2024, we explain how to leverage private data +in generative AI and which end to end solution is needed to leverage Retrieval +Augmented Generation (RAG). + +- [How to Use Private Data in Generative AI?] +::: + +:::{grid-item} +:columns: auto auto 4 4 + + +  + +{tags-primary}`Fundamentals` \ +{tags-secondary}`RAG` +::: + +:::: + :::{seealso} +**Features:** +[](#querying) • +[](#fulltext) + +**Domains:** +[](#industrial) • +[](#machine-learning) • +[](#timeseries) + **Product:** -[Relational Database] +[Relational Database] • +[Vector Database][Vector Database (Product)] ::: -[Relational Database]: https://cratedb.com/solutions/relational-database +[How to Use Private Data in Generative AI?]: https://youtu.be/icquKckM4o0?feature=shared +[Time series data in manufacturing]: https://github.com/crate/cratedb-datasets/raw/main/machine-learning/fulltext/White%20paper%20-%20Time-series%20data%20in%20manufacturing.pdf +[Vector support and KNN search in CrateDB]: https://cratedb.com/blog/unlocking-the-power-of-vector-support-and-knn-search-in-cratedb