Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
images		images
memorydb-vss		memorydb-vss
.DS_Store		.DS_Store
PQA-semanticsearch.ipynb		PQA-semanticsearch.ipynb
PQA-semanticsearch.zip		PQA-semanticsearch.zip
README.md		README.md
requirements.txt		requirements.txt

README.md

Amazon MemoryDB

Similarity Search with MemoryDB as VectorStore

Semantic search is a technique that uses Machine Learning to understand the meaning of a search query to deliver more relevant search results. For example, when we search questions from Stack Overflow, we want to find questions that are semantically similar to our question, so we can view the most relevant answer. Semantic search is also used in e-commerce websites. For example, if we want to buy a headset from Amazon, we can get general information from the product title, description, and other features.

However, we may have a question about the headset, and we might want to search the questions asked by others to find the answer. When we type our question, we want to find semantically similar questions (which have the same meaning, while using different words). This is an example where semantic search will help us return more relevant results.

Prerequisites

Before you proceed, make sure you have the following prerequisites in place:

An AWS Cloud9 development environment set up.
We will be using sentence transformers from Huggingface.
Python and pip installed in your Cloud9 environment.
Internet connectivity to download packages.

Installation

Clone this repository to your Cloud9 environment:

git clone https://github.com/aws-samples/amazon-memorydb-for-redis-samples
cd tutorials/SimilaritySearch/

Install the required packages using pip:
```
pip3 install -r requirements.txt
```

Download the json file from the amazon pqa for Headset

aws s3 cp --no-sign-request s3://amazon-pqa/amazon_pqa_headsets.json ./amazon-pqa/amazon_pqa_headsets.json

Set Amazon MemoryDB cluster endpoint

export MEMORYDB_CLUSTER="Your cluster endpoint"

Running the application:

cd memorydb-vss
streamlit run 'mmapp.py' --server.port 8080

Create index and store embeddings in MemoryDB

At this step, the index is first created. We then load the PQA question, answer, and embeddings data as HASHES to MemoryDB.
Once the index is created, query the data store to find similar results.

Here's an example: suppose we want to ask if a particular headset is compatible with PC, and we ask the question "Does this work with my PC?". Suppose also that we have the following in our Q&A data set:
If we use text search, we would expect to return results for questions that match words such as "Windows", "10", and "Work", which will match some questions that aren't particularly relevant to our question. With semantic search, we would expect to find results that have a similar meaning, despite using different words. In this example, we will get more meaningful results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SimilaritySearch

SimilaritySearch

README.md

Amazon MemoryDB

Similarity Search with MemoryDB as VectorStore

Prerequisites

Installation

Files

SimilaritySearch

Directory actions

More options

Directory actions

More options

Latest commit

History

SimilaritySearch

Folders and files

parent directory

README.md

Amazon MemoryDB

Similarity Search with MemoryDB as VectorStore

Prerequisites

Installation