Skip to content

Commit

Permalink
Machine Learning: Rework intro page
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Mar 8, 2024
1 parent 2ca9b64 commit b749193
Showing 1 changed file with 141 additions and 28 deletions.
169 changes: 141 additions & 28 deletions docs/domain/ml/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,52 +3,104 @@

# Machine Learning

Guidelines about integrating CrateDB with machine learning frameworks and tools.
Integrate CrateDB with machine learning frameworks and
tools, for [MLOps] and [Vector database] operations.

(langchain)=
## LangChain

Tutorials and Notebooks about using [LangChain] together with CrateDB.
:::::{grid}
:padding: 0

- [LangChain and CrateDB: Code Examples]
::::{grid-item}
:class: rubric-slimmer
:columns: 6

- CrateDB's `FLOAT_VECTOR` type and its `KNN_MATCH` function can be used for storing and
retrieving embeddings, and for conducting similarity searches.
:::{rubric} Machine Learning Operations
:::
Training a machine learning model, running it in production, and maintaining
it, requires a significant amount of data processing and bookkeeping
operations.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb)
CrateDB, as a universal SQL database, supports this process through
adapters to best-of-breed software components for MLOps procedures.

- Database tables in CrateDB can be used as a source provider for LangChain documents.
MLOps is a paradigm that aims to deploy and maintain machine learning models
in production reliably and efficiently, including experiment tracking, and in
the spirit of continuous development and DevOps.
::::

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fdocument_loader.ipynb)
::::{grid-item}
:class: rubric-slimmer
:columns: 6

- CrateDB supports managing LangChain's conversation history.
:::{rubric} Vector Store
:::
CrateDB's FLOAT_VECTOR data type implements a vector store and the k-nearest
neighbour (kNN) search algorithm to find vectors that are similar to a query
vector.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fconversational_memory.ipynb)
These feature vectors may be computed from raw data using machine learning
methods such as feature extraction algorithms, word embeddings, or deep
learning networks.

- What can you build with LangChain?
Vector databases can be used for similarity search, multi-modal search,
recommendation engines, large language models (LLMs), retrieval-augmented
generation (RAG), and other applications.
::::

:::::

- [LangChain: Retrieval augmented generation]
- [LangChain: Analyzing structured data]
- [LangChain: Chatbots]

## Anomaly Detection and Forecasting


(mlflow)=
## MLflow
### MLflow

Tutorials and Notebooks about using [MLflow] together with CrateDB.

- Blog series on "Running Time Series Models in Production using CrateDB"
- Part 1: [Introduction to Time Series Modeling using Machine Learning]
::::{info-card}
:::{grid-item} **Blog: Running Time Series Models in Production using CrateDB**
:columns: 9

Part 1: Introduction to [Time Series Modeling using Machine Learning]

The article will introduce you to the concept of time series modeling, and
discuss the main obstacles faced during its implementation in production.

It will introduce you to CrateDB, highlighting its key features and
benefits, why it stands out in managing time series data, and why it is
an especially good fit for supporting machine learning models in production.
:::
:::{grid-item}
:columns: 3
{tags-primary}`Fundamentals` \
{tags-secondary}`Time Series Modeling`
:::
::::

- [MLflow and CrateDB]: Guidelines and runnable code to get started with MLflow and
CrateDB, exercising time series anomaly detection and timeseries forecasting /
prediction using NumPy, Merlion, and Matplotlib.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb) [![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb)
::::{info-card}
:::{grid-item} **Notebook: Create a Time Series Anomaly Detection Model**
:columns: 9

Guidelines and runnable code to get started with MLflow and
CrateDB, exercising time series anomaly detection and timeseries forecasting /
prediction using NumPy, Salesforce Merlion, and Matplotlib.

[![README](https://img.shields.io/badge/Open-README-darkblue?logo=GitHub)][MLflow and CrateDB]
[![Notebook on GitHub](https://img.shields.io/badge/Open-Notebook%20on%20GitHub-darkgreen?logo=GitHub)][tracking-merlion-github]
[![Notebook on Colab](https://img.shields.io/badge/Open-Notebook%20on%20Colab-blue?logo=Google%20Colab)][tracking-merlion-colab]
:::
:::{grid-item}
:columns: 3
{tags-primary}`Fundamentals` \
{tags-secondary}`Anomaly Detection`
:::
::::


(pycaret)=
## PyCaret
### PyCaret

Tutorials and Notebooks about using [PyCaret] together with CrateDB.

Expand All @@ -65,21 +117,78 @@ Tutorials and Notebooks about using [PyCaret] together with CrateDB.


(scikit-learn)=
## scikit-learn
### scikit-learn

Using [pandas] and [scikit-learn] to run a regression analysis within a [Jupyter Notebook].
Using [pandas] and [scikit-learn] to run a regression analysis within a
[Jupyter Notebook].

- [Machine Learning and CrateDB: An introduction]
- [Machine Learning and CrateDB: Getting Started With Jupyter]
- [Machine Learning and CrateDB: Experiment Design & Linear Regression]


(tensorflow)=
## TensorFlow
### TensorFlow

- {doc}`./tensorflow`


## LLMs / RAG

One of the most powerful applications enabled by LLMs is sophisticated
question-answering (Q&A) chatbots.
These are applications that can answer questions about specific sources
of information, using a technique known as Retrieval Augmented Generation,
or RAG. RAG is a technique for augmenting LLM knowledge with additional data.


(langchain)=
### LangChain

Tutorials and Notebooks about using [LangChain] together with CrateDB.
LangChain has a number of components designed to help build Q&A applications,
and RAG applications more generally.
This feature uses CrateDB's [](#vector-store) implementation.

- [LangChain and CrateDB: Code Examples]

- CrateDB's `FLOAT_VECTOR` type and its `KNN_MATCH` function can be used for storing and
retrieving embeddings, and for conducting similarity searches.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/vector_search.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fvector_search.ipynb)

- Database tables in CrateDB can be used as a source provider for LangChain documents.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/document_loader.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fdocument_loader.ipynb)

- CrateDB supports managing LangChain's conversation history.

[![Open on GitHub](https://img.shields.io/badge/Open%20on-GitHub-lightgray?logo=GitHub)](https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/llm-langchain/conversational_memory.ipynb) [![Launch Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/crate/cratedb-examples/main?labpath=topic%2Fmachine-learning%2Fllm-langchain%2Fconversational_memory.ipynb)

- What can you build with LangChain?

- [LangChain: Retrieval augmented generation]
- [LangChain: Analyzing structured data]
- [LangChain: Chatbots]



(mlops)=
## ML Operations
Blubb.


(vector-store)=
## Vector Store
Blubb.


## More Information

CrateDB provides a rich data model including container-, geospatial-, and
vector-data types, and capabilities for full-text search.



```{toctree}
:hidden:
Expand All @@ -89,7 +198,6 @@ tensorflow


[AutoML with PyCaret and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/automl
[Introduction to Time Series Modeling using Machine Learning]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data
[Jupyter Notebook]: https://jupyter.org/
[LangChain]: https://python.langchain.com/
[LangChain: Analyzing structured data]: https://python.langchain.com/docs/use_cases/qa_structured/sql
Expand All @@ -101,6 +209,11 @@ tensorflow
[Machine Learning and CrateDB: Experiment Design & Linear Regression]: https://cratedb.com/blog/machine-learning-and-cratedb-part-three-experiment-design-and-linear-regression
[MLflow]: https://mlflow.org/
[MLflow and CrateDB]: https://github.com/crate/cratedb-examples/tree/main/topic/machine-learning/mlops-mlflow
[MLOps]: https://en.wikipedia.org/wiki/MLOps
[pandas]: https://pandas.pydata.org/
[PyCaret]: https://www.pycaret.org
[scikit-learn]: https://scikit-learn.org/
[Time Series Modeling using Machine Learning]: https://cratedb.com/blog/introduction-to-time-series-modeling-with-cratedb-machine-learning-time-series-data
[tracking-merlion-colab]: https://colab.research.google.com/github/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb
[tracking-merlion-github]: https://github.com/crate/cratedb-examples/blob/main/topic/machine-learning/mlops-mlflow/tracking_merlion.ipynb
[Vector database]: https://en.wikipedia.org/wiki/Vector_database

0 comments on commit b749193

Please sign in to comment.