opensearch-project · kolchfa-aws · Jun 13, 2024 · Jun 5, 2024 · Jun 5, 2024 · Jun 5, 2024
@@ -0,0 +1,24 @@
+---
+name: Nate McCall
+short_name: zznate
+photo: '/assets/media/community/members/zznate.jpg'
+title: 'OpenSearch Community Member: Nate McCall'
+primary_title: Nate McCall
+breadcrumbs:
+  icon: community
+  items:
+    - title: Community
+      url: /community/index.html
+    - title: Members
+      url: /community/members/index.html
+    - title: 'Nate McCall&apos;s Profile'
+      url: '/community/members/zznate.html'
+twitter: 'zznate'
+github: zznate
+job_title_and_company: 'Product Research and Development at DataStax'
+personas:
+  - author
+permalink: '/community/members/zznate.html'
+redirect_from: '/authors/zznate/'
+---
+Nate is currently in product research and development at DataStax. He is a Vice President emeritus at The Apache Software Foundation and is a committer and PMC member on Apache Cassandra. In the off hours he can be found building high-end custom roller skates for customers all over the world at his shop Seaside Skates in Paraparaumu, Aotearoa New Zealand.
@@ -0,0 +1,48 @@
+---
+layout: post
+title:  "Announcing an OpenSearch and DataStax generative AI partnership"
+authors:
+  - zznate 
+date: 2024-06-13
+categories:
+  - community
+  - partners
+meta_keywords: Generative AI, retrieval augmented generation , DataStax HCDP, OpenSearch integrations
+meta_description: Learn about the collaboration between open source startup DataStax and the OpenSearch Project on integration efforts to support Generative AI developers.
+excerpt: 
+has_math: false
+has_science_table: false
+---
+
+DataStax and the OpenSearch Project are announcing a series of integration efforts to support generative AI developers. Retrieval-augmented generation (RAG) is a key design pattern in generative AI. RAG applications work by assembling context from a variety of sources, which is then processed by a large language model (LLM) to provide an intelligent and relevant response. Serving these applications requires a mix of data retrieval and storage capabilities, and we, OpenSearch and DataStax, are committed to working together to serve the broad needs of generative AI developers.   
+
+To power the explosive growth within the generative AI space, we need to keep innovating on the tooling available to developers. These tools require access to a variety of enterprise data, and we want to be there to provide that access in whatever common format is required. Being able to retrieve data in the most flexible ways possible is a necessary catalyst for getting RAG and generative AI knowledge applications to production. 
+
+Amazon sponsors the OpenSearch Project to ensure the continuing existence of an open-source search engine that users can use, modify, and extend however they wish. In addition to AWS, the OpenSearch community is full of active contributors, maintainers, and partners. For generative AI specifically, OpenSearch offers the following benefits:
+
+* **Ease of use**: OpenSearch provides easy-to-use indexing and search capabilities and has built-in features for text analysis, tokenization, and relevance scoring.
+* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries
-* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries
+* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries.
-* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries
+* **Optimized for text retrieval**: OpenSearch makes it easy to find and rank documents based on keyword queries.
+* **Versatility**: OpenSearch can handle a wide variety of data types and formats
-* **Versatility**: OpenSearch can handle a wide variety of data types and formats
+* **Versatility**: OpenSearch can handle a wide variety of data types and formats.
-* **Versatility**: OpenSearch can handle a wide variety of data types and formats
+* **Versatility**: OpenSearch can handle a wide variety of data types and formats.
+* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search
-* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search
+* **AI/machine learning (ML) integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search.
-* **AI/ML integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search
+* **AI/machine learning (ML) integration**: OpenSearch supports semantic search with vector embeddings, multi-modal search, hybrid search with score normalization, and sparse vector search.
+
+DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide: 
-DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide: 
+DataStax is a leading contributor to a range of open-source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced generative AI techniques like COLBert. Generative AI developers use this database and vector combination to provide the following functionality: 
-DataStax is a leading contributor to a range of open source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced GenAI techniques like COLBert. Generative AI developers seek this database and vector combination to provide: 
+DataStax is a leading contributor to a range of open-source projects, including [Langflow](https://langflow.org/), [Apache Cassandra](https://cassandra.apache.org/_/index.html), and [JVector](https://github.com/jbellis/jvector), which provides vector search through DiskANN and advanced generative AI techniques like COLBert. Generative AI developers use this database and vector combination to provide the following functionality: 
+
+* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications
-* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications
+* **Context assembly**: Langflow provides a UI for discovering ecosystem components and composing the workflows that back generative AI applications.
-* **Context assembly**: Langflow delivers a UI to discover ecosystem components and compose the workflows that back Generative AI applications
+* **Context assembly**: Langflow provides a UI for discovering ecosystem components and composing the workflows that back generative AI applications.
+* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance
-* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance
+* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries, which require low latency and high relevance.
-* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries which require low latency and high relevance
+* **Similarity search**: JVector offers high-performance vector similarity search and can handle embedding-based queries, which require low latency and high relevance.
+* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data
-* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data
+* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data.
-* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data
+* **Scalability**: Cassandra offers scalable persistence for structured and semi-structured data.
+
+The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using: 
-The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using: 
+The combination of these technologies enables semantic and keyword searches as well as hybrid query processing. Context is assembled using: 
-The combination of these technologies enable semantic and keyword searches as well as hybrid query processing. Context is assembled using: 
+The combination of these technologies enables semantic and keyword searches as well as hybrid query processing. Context is assembled using: 
+* Keyword queries which are directed to OpenSearch to retrieve relevant documents
-* Keyword queries which are directed to OpenSearch to retrieve relevant documents
+* Keyword queries, which are directed to OpenSearch to retrieve relevant documents.
-* Keyword queries which are directed to OpenSearch to retrieve relevant documents
+* Keyword queries, which are directed to OpenSearch to retrieve relevant documents.
+* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity
-* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity
+* Semantic queries, which use JVector and Cassandra to find the most relevant data points based on vector similarity.
-* Semantic queries use JVector and Cassandra to find the most relevant data points based on vector similarity
+* Semantic queries, which use JVector and Cassandra to find the most relevant data points based on vector similarity.
+* Database queries which provide known personalization, profile, and transactional data
-* Database queries which provide known personalization, profile, and transactional data
+* Database queries, which provide known personalization, profile, and transactional data.
-* Database queries which provide known personalization, profile, and transactional data
+* Database queries, which provide known personalization, profile, and transactional data.
+
+### **Moving Forward**
-### **Moving Forward**
+### **Moving forward**
-### **Moving Forward**
+### **Moving forward**
+DataStax will maintain a JVector integration for OpenSearch and offer OpenSearch as part of its self-managed offering platform, HCDP (Hyper Converged Data Platform), and as an integration for its cloud service, Astra. 
+
+Enterprises have spent years investing in search infrastructure. With the inclusion of OpenSearch, DataStax can provide developers the most flexible information retrieval possible using applications already familiar to many enterprises. OpenSearch bridges the gap between single-document Q&A and open-domain Q&A, essentially providing the ability to reason across multiple diverse documents and texts by combining keyword search in OpenSearch with the dense vector search of JVector in Astra and HCDP. 
+
+For generative AI, relevance is critical, and through this partnership we will ensure that your enterprise data estate can act as context for RAG and generative AI workflows to provide as much data to the context as possible. For more information, see the [HCDP announcement](https://www.datastax.com/fr/blog/introducing-vector-search-for-self-managed-modern-architecture).
+
+
+
+