-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opensearch datastax blog post, new member bits #2946
Changes from all commits
1c4a00e
2c038b0
974205b
a4e3a71
11ce54b
ff54cd2
e05c4ee
8bb2427
67c8e04
991d217
9ca70a0
f7ef146
4ec33b5
6d7d876
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
--- | ||
name: Nate McCall | ||
short_name: zznate | ||
photo: '/assets/media/community/members/zznate.jpg' | ||
title: 'OpenSearch Community Member: Nate McCall' | ||
primary_title: Nate McCall | ||
breadcrumbs: | ||
icon: community | ||
items: | ||
- title: Community | ||
url: /community/index.html | ||
- title: Members | ||
url: /community/members/index.html | ||
- title: 'Nate McCall's Profile' | ||
url: '/community/members/zznate.html' | ||
twitter: 'zznate' | ||
github: zznate | ||
job_title_and_company: 'Product Research and Development at DataStax' | ||
personas: | ||
- author | ||
permalink: '/community/members/zznate.html' | ||
redirect_from: '/authors/zznate/' | ||
--- | ||
Nate is currently in product research and development at DataStax. He is a Vice President emeritus at The Apache Software Foundation and is a committer and PMC member on Apache Cassandra. In the off hours he can be found building high-end custom roller skates for customers all over the world at his shop Seaside Skates in Paraparaumu, Aotearoa New Zealand. |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,48 @@ | ||||||
--- | ||||||
layout: post | ||||||
title: "Announcing an OpenSearch and DataStax generative AI partnership" | ||||||
authors: | ||||||
- zznate | ||||||
date: 2024-06-13 | ||||||
categories: | ||||||
- community | ||||||
- partners | ||||||
meta_keywords: Generative AI, retrieval augmented generation , DataStax HCDP, OpenSearch integrations | ||||||
meta_description: Learn about the collaboration between open source startup DataStax and the OpenSearch Project on integration efforts to support Generative AI developers. | ||||||
excerpt: | ||||||
has_math: false | ||||||
has_science_table: false | ||||||
--- | ||||||
|
||||||
DataStax and the OpenSearch Project are announcing a series of integration efforts to support generative AI developers. Retrieval-augmented generation (RAG) is a key design pattern in generative AI. RAG applications work by assembling context from a variety of sources, which is then processed by a large language model (LLM) to provide an intelligent and relevant response. Serving these applications requires a mix of data retrieval and storage capabilities, and we, OpenSearch and DataStax, are committed to working together to serve the broad needs of generative AI developers. | ||||||
|
||||||
To power the explosive growth within the generative AI space, we need to keep innovating on the tooling available to developers. These tools require access to a variety of enterprise data, and we want to be there to provide that access in whatever common format is required. Being able to retrieve data in the most flexible ways possible is a necessary catalyst for getting RAG and generative AI knowledge applications to production. | ||||||
|
||||||
Amazon sponsors the OpenSearch Project to ensure the continuing existence of an open-source search engine that users can use, modify, and extend however they wish. In addition to AWS, the OpenSearch community is full of active contributors, maintainers, and partners. For generative AI specifically, OpenSearch offers the following benefits: | ||||||
|
||||||
* **Ease of use**: OpenSearch provides easy-to-use indexing and search capabilities and has built-in features for text analysis, tokenization, and relevance scoring. | ||||||
* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* **Versatility**: OpenSearch can handle a wide variety of data types and formats | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide: | ||||||
Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
Check failure on line 28 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications | ||||||
Check failure on line 30 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance | ||||||
Check failure on line 31 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* Keyword queries which are directed to OpenSearch to retrieve relevant documents | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity | ||||||
Check failure on line 36 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
* Database queries which provide known personalization, profile, and transactional data | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
### **Moving Forward** | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
DataStax will maintain a JVector integration for OpenSearch and offer OpenSearch as part of its self-managed offering platform, HCDP (Hyper Converged Data Platform), and as an integration for its cloud service, Astra. | ||||||
Check failure on line 40 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
Check failure on line 40 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
|
||||||
Enterprises have spent years investing in search infrastructure. With the inclusion of OpenSearch, DataStax can provide developers the most flexible information retrieval possible using applications already familiar to many enterprises. OpenSearch bridges the gap between single-document Q&A and open-domain Q&A, essentially providing the ability to reason across multiple diverse documents and texts by combining keyword search in OpenSearch with the dense vector search of JVector in Astra and HCDP. | ||||||
Check failure on line 42 in _posts/2024-06-06-opensearch-partnering-with-datastax-on-generative-ai.md GitHub Actions / style-job
|
||||||
|
||||||
For generative AI, relevance is critical, and through this partnership we will ensure that your enterprise data estate can act as context for RAG and generative AI workflows to provide as much data to the context as possible. For more information, see the [HCDP announcement](https://www.datastax.com/fr/blog/introducing-vector-search-for-self-managed-modern-architecture). | ||||||
|
||||||
|
||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazon sponsors OpenSearch to ensure the continuing existence => Amazon sponsors the OpenSearch project
that users could use, modify, and extend however they wish => could use -> can use
The OpenSearch community is full of active contributors, maintainers, and partners => In addition to AWS, the OpenSearch community ...
OpenSearch supports hybrid vector/text search with score normalization, and sparse vector search => OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! Suggested changes have been made and have merged latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @zznate!