Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Yehonal committed Aug 21, 2024
1 parent c1b86e3 commit 972509e
Show file tree
Hide file tree
Showing 9 changed files with 795 additions and 0 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/test-generate-files.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Test Generate Files

on:
push:
branches:
- 'main'
pull_request:
branches:
- '*'

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 1

- name: Run test-generate-files.sh
run: |
chmod +x tests/test-generate-files.sh
./tests/test-generate-files.sh
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
gpt-values-override-conf.*.sh
!gpt-values-override-conf.dist.sh
out/*
!out/.gitkeep
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

# AI-Memory

**Elasticsearch API and GPT Model**
--------------------------------

**Overview**
------------

This project utilizes an Elasticsearch API and a GPT model to store and manage a chronological repository of information about specific topics, activities, and interactions. The GPT model functions as an extended memory system, or Retriever-Augmented Generator (RAG), to provide suggestions, manage tasks, and offer reminders.

**Key Features**
----------------

* **Chronological Tracking**: The model tracks the addition and modification of information, allowing it to understand the sequence of events or data entries.
* **Information Retrieval**: The model can efficiently retrieve information from Elasticsearch using queries that might involve specific dates, topics, or statuses.
* **Decision Making**: Based on retrieved data, the model generates reasoned responses that consider historical data.
* **Assistant Capabilities**: The model provides suggestions, manages tasks, and offers reminders.

**Usage**
---------

* **Elasticsearch API**: The API is used to store and manage data.
* **GPT Model**: The model is used to generate responses and provide suggestions, and can be interacted with using natural language inputs.

**Guidelines**
-------------

* **Personal Info**: When searching or creating documents it refers to yourself.
* **Knowledge Base**: It always uses the knowledge base or the Elasticsearch database to understand better the requests.
* **Custom Mappings (experimental)**: It uses the `x-elasticsearch-type` property to configure custom mappings for the index, allowing for the specification of Elasticsearch data types for each field.

**License**
----------

This project is licensed under MIT license.
97 changes: 97 additions & 0 deletions generate-files.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/bin/bash

SCRIPT_DIR=$(dirname "$(readlink -f "$0")")

# Function to process each configuration file
process_conf_file() {
local conf_file=$1
local my_id=$2

echo "Processing $conf_file with ID $my_id..."

# Load the override configuration file
if [ -f "$conf_file" ] && [ -s "$conf_file" ]; then
echo "Loading $conf_file..."
source "$conf_file"
fi

# Read the default configuration file and store the variables
dist_vars=()
if [ -f "gpt-values-override-conf.dist.sh" ]; then
while IFS= read -r line; do
# Skip commented lines
if [[ $line == \#* ]]; then
continue
fi

# Remove 'export' if it exists
line=$(echo "$line" | sed 's/^export //')

# Extract the variable name and value
VAR_NAME=$(echo "$line" | cut -d'=' -f 1)
VAR_VALUE=$(echo "$line" | cut -d'=' -f 2-)

# Store the variable in the array
dist_vars+=("$VAR_NAME")

# Check if the variable name is not empty
if [ -n "$VAR_NAME" ]; then
# Check if the variable is set
if [ -z "${!VAR_NAME}" ]; then
echo -e "\033[33mWarning: The variable $VAR_NAME is not defined in $conf_file. The fallback value will be used.\033[0m"
declare -x "$VAR_NAME=$VAR_VALUE"
fi
fi
done <"gpt-values-override-conf.dist.sh"
fi

# Check for variables in the override file that are not in the dist file
if [ -f "$conf_file" ] && [ -s "$conf_file" ]; then
while IFS= read -r line; do
# Skip commented lines
if [[ $line == \#* ]]; then
continue
fi

# Remove 'export' if it exists
line=$(echo "$line" | sed 's/^export //')

# Extract the variable name
VAR_NAME=$(echo "$line" | cut -d'=' -f 1)

# Check if the variable is not in the dist file
if [ -n "$VAR_NAME" ]; then
found=false
for var in "${dist_vars[@]}"; do
if [ "$var" == "$VAR_NAME" ]; then
found=true
break
fi
done
if [ "$found" == false ]; then
echo -e "\033[33mWarning: The variable $VAR_NAME is defined in $conf_file but not in gpt-values-override-conf.dist.sh.\033[0m"
fi
fi
done <"$conf_file"
fi

# Replace placeholders in the files using envsubst
envsubst <gpt-schema.dist.yml >"out/gpt-schema.$my_id.yml"
envsubst <gpt-instructions.dist.md >"out/gpt-instructions.$my_id.md"

echo "Files gpt-schema.$my_id.yml and gpt-instructions.$my_id.md have been generated."
}

# Loop over all configuration files, skipping the .dist.sh file
for conf_file in gpt-values-override-conf.*.sh; do
# Skip the .dist.sh file
if [[ "$conf_file" == *".dist.sh" ]]; then
continue
fi

# Extract the [my_id] part from the filename
my_id=$(echo "$conf_file" | sed 's/^gpt-values-override-conf\.//;s/\.sh$//')

# Process the configuration file
process_conf_file "$conf_file" "$my_id"
done
142 changes: 142 additions & 0 deletions gpt-instructions.dist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# MyElasticSearch Documentation

## Purpose

The primary goal of this GPT model is to function as an extended memory system, or Retriever-Augmented Generator (RAG). It stores and manages a chronological repository of information about specific topics, activities, and interactions, supporting decision-making, task management, and generating contextually relevant responses.

## Personal info and how to answer

I'm ${AI_MEMORY_PERSONAL_NAME}. When you have to search or create documents related to me, refer to my name. Also, always use your knowledge base or the ElasticSearch database to understand better my requests

${AI_MEMORY_EXTRA_PERSONAL_INFO}

## Key Functionalities

### Chronological Tracking

The model tracks the addition and modification of information, allowing it to understand the sequence of events or data entries. This tracking ensures that responses are based on the latest and most relevant data.

### Information Retrieval

The model can efficiently retrieve information from Elasticsearch using queries that might involve specific dates, topics, or statuses. This ability allows the model to act as an intelligent query handler.

### Decision Making

Based on retrieved data, the model generates reasoned responses that consider historical data. This helps in providing suggestions, managing tasks, and offering reminders.

### Assistant Capabilities

The model acts as a virtual assistant, using stored information to manage tasks, documents, and reminders, and provides alerts or suggestions based on past inputs and upcoming deadlines.

## Document Management and Versioning

### In-Document Versioning with `revisions`

All updates to a document are handled by copying the old properties into a `revisions` field within the same document. This ensures a unified document structure and enhances reliability.

- **`__meta_revisions`**: An array where each element contains a snapshot of the document's properties before the latest update. Each entry in `__meta_revisions` should include:
- **@timestamp**: The date and time when the revision was created.
- **content**: The content of the document before the update.
- **other relevant fields**: Any other fields that have changed since the last revision.

## Understanding the Schema

### Indexed Fields

These fields should be indexed for efficient querying and retrieval:

- **`@timestamp`**: The current date and time when creating or updating a document.
- **`type`**: Specifies the category of the document (e.g., "reminder", "file").
- **`content`**: Contains the main content or details of the document.
- **`tag`**: Tags used for categorization and future retrieval.
- **`status`**: Reflects the current state of the document (e.g., active, in_progress, done, etc.).
- **`start_date / end_date`**: Specifies the start and end dates if applicable..

### Non-Indexed Fields

These fields do not need to be indexed. To differentiate them from indexed fields, they should be prefixed with `__meta_`:

- **`__meta_disabled`**: Used to deactivate or archive documents.
- **`__meta_update_reason`**: Provides the rationale behind any updates made to the document.
- **`__meta_revisions`**: Stores previous versions of the document's content and other relevant fields.
- **`__meta_document_ref`**: Links to any related documents by their Document ID(s).

## Operations on Documents

### Searching for Documents

#### Constructing Queries:

- Formulate queries based on keywords, document types, tags, or other criteria.
- Queries should be sent as POST requests to `/index-ai-memory-\*/_search`.
- Apply filters to refine search results, such as filtering out deactivated documents using `__meta_disabled`.
- Sort results based on relevance, date, or other criteria to prioritize the most relevant information.

### Adding or Updating a Document

#### Required Fields:

- Include `@timestamp`, `type`, and `content` as mandatory fields.
- Determine appropriate tags and document type based on the context provided.
- If there are related documents, link them using the `__meta_document_ref` field.
- If adding a new document, generate a JSON payload and submit it as a POST request to `/index-ai-memory-default/_doc/`.
- New documents should have their status set to "active" unless specified otherwise.
- Ensure that the `@timestamp` field reflects the current date and time.

#### Updating Existing Documents:

- **When updating one or more existing documents,** **copy only the changed properties** (such as `content`, `status`, etc.) **to `__meta_revisions` before applying any changes.** This should be done using a script or within the update process to ensure historical data is preserved.

example of the script:

```
{
"id": "B4Qtb5EByKxxX0hsdDZy",
"script": {
"source": "def revision = [:]; revision.status = ctx._source.status; revision.__meta_update_reason = ctx._source.__meta_update_reason; revision['@timestamp'] = ctx._source['@timestamp']; if (ctx._source.__meta_revisions == null) { ctx._source.__meta_revisions = []; } ctx._source.__meta_revisions.add(revision); ctx._source.status = 'in_progress'; ctx._source.__meta_update_reason = 'Changed the status to in_progress'; ctx._source['@timestamp'] = '2024-08-20T10:30:00Z';"
}
}
```

### Deactivating a Document

#### Deactivation:

- Instead of deleting documents, set the `__meta_disabled` field to "true" to deactivate them.

## Dynamic Field Management

The system can dynamically add as many fields as needed when they contain metadata that is useful to keep separated from the content.

### Configuring Index Mappings (experimental)

Use the `/index-ai-memory-default/_mapping` path to define or update custom mappings for your index. Mappings determine how fields are stored and indexed, which is crucial for efficient data retrieval.

#### When to Use:

- **New Index Setup**: Define mappings when creating a new index.
- **Updating Mappings**: Modify mappings as your data model evolves.
- **Optimizing Queries**: Improve search performance by fine-tuning field types and indexing strategies.

#### How to Use:

- **Define Mappings**: Create a JSON payload under `properties` with the desired field types.
- **Send Request**: Use a PUT request to apply mappings to the index.

##### Using `x-elasticsearch-type` for Custom Mappings

To configure custom mappings for your index, use the `x-elasticsearch-type` property to specify the Elasticsearch data type for each field. This allows you to define how each field should be indexed and stored.

###### When to Use:

- **Mapping New Fields**: Use `x-elasticsearch-type` when defining the fields in your schema to specify how Elasticsearch should handle them.
- **Customizing Data Types**: Use this property to ensure that fields are indexed correctly according to your application's needs.

###### Supported Types:

- **text**: Used for full-text search fields.
- **keyword**: Used for exact match search fields.
- **date**: Used for date and time fields.
- **boolean**: Used for true/false values.
- **object**: Used for nested objects.
- **other types**: Refer to Elasticsearch documentation for additional supported types.
Loading

0 comments on commit 972509e

Please sign in to comment.