Skip to content

Commit

Permalink
Feature / Vector Store: Implement page
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Mar 13, 2024
1 parent 22ea29a commit 82c0e4b
Show file tree
Hide file tree
Showing 6 changed files with 160 additions and 16 deletions.
9 changes: 9 additions & 0 deletions docs/_include/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,17 @@
[HoloViews]: https://www.holoviews.org/
[Indexing, Columnar Storage, and Aggregations]: https://cratedb.com/product/features/indexing-columnar-storage-aggregations
[JSON Database]: https://cratedb.com/solutions/json-database
[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain
[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb
[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-rag-sql-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fcratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[langchain-rag-sql-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/cratedb-vectorstore-rag-openai-sql.ipynb
[Multi-model Database]: https://cratedb.com/solutions/multi-model-database
[Nested Data Structure]: https://cratedb.com/product/features/nested-data-structure
[Relational Database]: https://cratedb.com/solutions/relational-database
[timeseries-queries-and-visualization-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[timeseries-queries-and-visualization-github]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/timeseries-queries-and-visualization.ipynb
[Vector Database (Product)]: https://cratedb.com/solutions/vector-database
[Vector Database]: https://en.wikipedia.org/wiki/Vector_database
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@

myst_substitutions.update({
"nb_colab": "[![Notebook on Colab](https://img.shields.io/badge/Open-Notebook%20on%20Colab-blue?logo=Google%20Colab)]",
"nb_binder": "[![Notebook on Binder](https://img.shields.io/badge/Open-Notebook%20on%20Binder-lightblue?logo=binder)]",
"nb_github": "[![Notebook on GitHub](https://img.shields.io/badge/Open-Notebook%20on%20GitHub-darkgreen?logo=GitHub)]",
"readme_github": "[![README](https://img.shields.io/badge/Open-README-darkblue?logo=GitHub)]",
"blog": "[![Blog](https://img.shields.io/badge/Open-Blog-darkblue?logo=Markdown)]",
Expand Down
10 changes: 5 additions & 5 deletions docs/domain/ml/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@

# Machine Learning

:::{include} /_include/links.md
:::
:::{include} /_include/styles.html
:::

Integrate CrateDB with machine learning frameworks and
tools, for [MLOps] and [Vector database] operations.

Expand Down Expand Up @@ -321,16 +326,12 @@ tensorflow
[LangChain: Analyzing structured data]: https://python.langchain.com/docs/use_cases/qa_structured/sql
[LangChain: Chatbots]: https://python.langchain.com/docs/use_cases/chatbots
[LangChain: Retrieval augmented generation]: https://python.langchain.com/docs/use_cases/question_answering/
[LangChain and CrateDB: Code Examples]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/llm-langchain
[langchain-conversational-history-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fconversational_memory.ipynb
[langchain-conversational-history-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb
[langchain-conversational-history-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb
[langchain-document-loader-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fdocument_loader.ipynb
[langchain-document-loader-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb
[langchain-document-loader-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb
[langchain-similarity-binder]: https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb
[langchain-similarity-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[langchain-similarity-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb
[Machine Learning and CrateDB: An introduction]: https://cratedb.com/blog/machine-learning-and-cratedb-part-one
[Machine Learning and CrateDB: Getting Started With Jupyter]: https://cratedb.com/blog/machine-learning-cratedb-jupyter
[Machine Learning and CrateDB: Experiment Design & Linear Regression]: https://cratedb.com/blog/machine-learning-and-cratedb-part-three-experiment-design-and-linear-regression
Expand All @@ -344,4 +345,3 @@ tensorflow
[Time Series Modeling using Machine Learning]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data
[tracking-merlion-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb
[tracking-merlion-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb
[Vector database]: https://en.wikipedia.org/wiki/Vector_database
1 change: 1 addition & 0 deletions docs/feature/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ relational/index
document/index
search/index
geospatial/index
vector/index
:::
+++
CrateDB combines the power of Lucene with the advantages of
Expand Down
6 changes: 2 additions & 4 deletions docs/feature/relational/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@

:::{include} /_include/links.md
:::
:::{include} /_include/styles.html
:::

:::::{grid}
:padding: 0
Expand Down Expand Up @@ -209,7 +211,3 @@ SET optimizer_reorder_nested_loop_join = false;
[manual-join-concept]: inv:crate-reference#concept-joins
[manual-join-types]: inv:crate-reference#sql_joins
[manual-joined-relation]: inv:crate-reference#sql-select-joined-relation


```{include} /_include/styles.html
```
149 changes: 142 additions & 7 deletions docs/feature/vector/index.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,155 @@
---
orphan: true
---

(hnsw)=
(vector)=

# Vector Store

:::{todo} Implement.
:::{include} /_include/links.md
:::
:::{include} /_include/styles.html
:::

:::::{grid}
:padding: 0

::::{grid-item}
:class: rubric-slim
:columns: 9


:::{rubric} Overview
:::
CrateDB can be used as a [Vector Database] feature for storing and retrieving
vector embeddings.

:::{rubric} About
:::
CrateDB's `FLOAT_VECTOR` type and its `KNN_MATCH` function can be used
for storing and retrieving embeddings, and for conducting HNSW similarity
searches.
::::



::::{grid-item}
:class: rubric-slim
:columns: 3

```{rubric} Reference Manual
```
- [](inv:crate-reference#type-float_vector)
- [](inv:crate-reference#scalar_knn_match)

```{rubric} Related
```
- {ref}`sql`
- {ref}`machine-learning`
- {ref}`query`

{tags-primary}`SQL`
{tags-primary}`Vector Store`
{tags-primary}`Machine Learning`
::::

:::::


:::{rubric} Tutorials
:::

::::{info-card}
:::{grid-item} **Vector Support and KNN Search through SQL**
:columns: 9

The addition of vector support and KNN search makes CrateDB the optimal
multi-model database for all types of data. Whether it is structured,
semi-structured, or unstructured data, CrateDB stands as the all-in-one
solution, capable of handling diverse data types with ease.

In this feature-focused blog post, we will introduce how CrateDB can be
used as a vector database and how the vector store is implemented.
We will also explore the possibilities of the K-Nearest Neighbors (KNN)
search, and demonstrate vector capabilities with easy-to-follow examples.

{{ '{}[Vector support and KNN search in CrateDB]'.format(blog) }}
:::
:::{grid-item}
:columns: 3
{tags-primary}`Introduction` \
{tags-secondary}`Vector Store` \
{tags-secondary}`SQL`
:::
::::


::::{info-card}
:::{grid-item} **Retrieval Augmented Generation (RAG) with CrateDB and SQL**
:columns: 9

This notebook illustrates CrateDB's vector store using pure SQL on behalf
of an example exercising a RAG workflow.

It uses the white-paper [Time series data in manufacturing] as input data,
generates embeddings using OpenAI's ChatGPT, stores them into a table
using `FLOAT_VECTOR(1536)`, and queries it using the `KNN_MATCH` function.

{{ '{}[langchain-rag-sql-github]'.format(nb_github) }} {{ '{}[langchain-rag-sql-colab]'.format(nb_colab) }} {{ '{}[langchain-rag-sql-binder]'.format(nb_binder) }}
:::
:::{grid-item}
:columns: 3
{tags-primary}`Fundamentals` \
{tags-secondary}`Vector Store` \
{tags-secondary}`LangChain` \
{tags-secondary}`pandas` \
{tags-secondary}`SQL`
:::
::::


:::{rubric} Videos
:::

::::{info-card}

:::{grid-item} **How to Use Private Data in Generative AI?**
:columns: auto auto 8 8

In this video recorded at FOSDEM 2024, we explain how to leverage private data
in generative AI and which end to end solution is needed to leverage Retrieval
Augmented Generation (RAG).

- [How to Use Private Data in Generative AI?]
:::

:::{grid-item}
:columns: auto auto 4 4

<iframe width="240" src="https://www.youtube-nocookie.com/embed/icquKckM4o0?si=J0w5yG56Ld4fIXfm" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>
&nbsp;

{tags-primary}`Fundamentals` \
{tags-secondary}`RAG`
:::

::::



:::{seealso}
**Features:**
[](#querying)
[](#fulltext)

**Domains:**
[](#industrial)
[](#machine-learning)
[](#timeseries)

**Product:**
[Relational Database]
[Relational Database]
[Vector Database][Vector Database (Product)]
:::


[Relational Database]: https://cratedb.com/solutions/relational-database
[How to Use Private Data in Generative AI?]: https://youtu.be/icquKckM4o0?feature=shared
[Time series data in manufacturing]: https://github.com/crate/cratedb-datasets/raw/main/machine-learning/fulltext/White%20paper%20-%20Time-series%20data%20in%20manufacturing.pdf
[Vector support and KNN search in CrateDB]: https://cratedb.com/blog/unlocking-the-power-of-vector-support-and-knn-search-in-cratedb

0 comments on commit 82c0e4b

Please sign in to comment.