Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opensearch datastax blog post, new member bits #2946

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions _community_members/zznate.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
name: Nate McCall
short_name: zznate
photo: '/assets/media/community/members/zznate.jpg'
title: 'OpenSearch Community Member: Nate McCall'
primary_title: Nate McCall
breadcrumbs:
icon: community
items:
- title: Community
url: /community/index.html
- title: Members
url: /community/members/index.html
- title: 'Nate McCall's Profile'
url: '/community/members/zznate.html'
twitter: 'zznate'
github: zznate
job_title_and_company: 'Product Research and Development at DataStax'
personas:
- author
permalink: '/community/members/zznate.html'
redirect_from: '/authors/zznate/'
---
Nate is currently in product research and development at DataStax. He is a Vice President emeritus at The Apache Software Foundation and is a committer and PMC member on Apache Cassandra. In the off hours he can be found building high-end custom roller skates for customers all over the world at his shop Seaside Skates in Paraparaumu, Aotearoa New Zealand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazon sponsors OpenSearch to ensure the continuing existence => Amazon sponsors the OpenSearch project

that users could use, modify, and extend however they wish => could use -> can use

The OpenSearch community is full of active contributors, maintainers, and partners => In addition to AWS, the OpenSearch community ...

OpenSearch supports hybrid vector/text search with score normalization, and sparse vector search => OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Suggested changes have been made and have merged latest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zznate!

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
---
layout: post
title: "Announcing an OpenSearch and DataStax generative AI partnership"
authors:
- zznate
date: 2024-06-13
categories:
- community
- partners
meta_keywords: Generative AI, retrieval augmented generation , DataStax HCDP, OpenSearch integrations
meta_description: Learn about the collaboration between open source startup DataStax and the OpenSearch Project on integration efforts to support Generative AI developers.
excerpt:
has_math: false
has_science_table: false
---

DataStax and the OpenSearch Project are announcing a series of integration efforts to support generative AI developers. Retrieval-augmented generation (RAG) is a key design pattern in generative AI. RAG applications work by assembling context from a variety of sources, which is then processed by a large language model (LLM) to provide an intelligent and relevant response. Serving these applications requires a mix of data retrieval and storage capabilities, and we, OpenSearch and DataStax, are committed to working together to serve the broad needs of generative AI developers.

To power the explosive growth within the generative AI space, we need to keep innovating on the tooling available to developers. These tools require access to a variety of enterprise data, and we want to be there to provide that access in whatever common format is required. Being able to retrieve data in the most flexible ways possible is a necessary catalyst for getting RAG and generative AI knowledge applications to production.

Amazon sponsors the OpenSearch Project to ensure the continuing existence of an open-source search engine that users can use, modify, and extend however they wish. In addition to AWS, the OpenSearch community is full of active contributors, maintainers, and partners. For generative AI specifically, OpenSearch offers the following benefits:

* **Ease of use**: OpenSearch provides easy-to-use indexing and search capabilities and has built-in features for text analysis, tokenization, and relevance scoring.
* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries
* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries.

* **Versatility**: OpenSearch can handle a wide variety of data types and formats
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Versatility**: OpenSearch can handle a wide variety of data types and formats
* **Versatility**: OpenSearch can handle a wide variety of data types and formats.

* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search
* **AI/machine learning (ML) integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search.


DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide:

Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Langflow. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Langflow. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 28, "column": 82}}}, "severity": "ERROR"}

Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 28, "column": 184}}}, "severity": "ERROR"}

Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: COLBert. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: COLBert. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 28, "column": 310}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide:
DataStax is a leading contributor to a range of open-source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced generative AI techniques like COLBert. Generative AI developers use this database and vector combination to provide the following functionality:


* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications

Check failure on line 30 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Langflow. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Langflow. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 30, "column": 25}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications
* **Context assembly**: Langflow provides a UI for discovering ecosystem components and composing the workflows that back generative AI applications.

* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance

Check failure on line 31 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 31, "column": 26}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance
* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries, which require low latency and high relevance.

* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data
* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data.


The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using:
The combination of these technologies enables semantic and keyword searches as well as hybrid query processing. Context is assembled using:

* Keyword queries which are directed to OpenSearch to retrieve relevant documents
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Keyword queries which are directed to OpenSearch to retrieve relevant documents
* Keyword queries, which are directed to OpenSearch to retrieve relevant documents.

* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity

Check failure on line 36 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 36, "column": 24}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity
* Semantic queries, which use JVector and Cassandra to find the most relevant data points based on vector similarity.

* Database queries which provide known personalization, profile, and transactional data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Database queries which provide known personalization, profile, and transactional data
* Database queries, which provide known personalization, profile, and transactional data.


### **Moving Forward**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### **Moving Forward**
### **Moving forward**

DataStax will maintain a JVector integration for OpenSearch and offer OpenSearch as part of its self-managed offering platform, HCDP (Hyper Converged Data Platform), and as an integration for its cloud service, Astra.

Check failure on line 40 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 40, "column": 26}}}, "severity": "ERROR"}

Check failure on line 40 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Astra. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Astra. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 40, "column": 212}}}, "severity": "ERROR"}

Enterprises have spent years investing in search infrastructure. With the inclusion of OpenSearch, DataStax can provide developers the most flexible information retrieval possible using applications already familiar to many enterprises. OpenSearch bridges the gap between single-document Q&A and open-domain Q&A, essentially providing the ability to reason across multiple diverse documents and texts by combining keyword search in OpenSearch with the dense vector search of JVector in Astra and HCDP.

Check failure on line 42 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: JVector. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md", "range": {"start": {"line": 42, "column": 476}}}, "severity": "ERROR"}

For generative AI, relevance is critical, and through this partnership we will ensure that your enterprise data estate can act as context for RAG and generative AI workflows to provide as much data to the context as possible. For more information, see the [HCDP announcement](https://www.datastax.com/fr/blog/introducing-vector-search-for-self-managed-modern-architecture).




Binary file added assets/media/community/members/zznate.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading