First commit

Drassil · Aug 21, 2024 · 972509e · 972509e
1 parent c1b86e3
commit 972509e
Show file tree

Hide file tree

Showing 9 changed files with 795 additions and 0 deletions.
diff --git a/.github/workflows/test-generate-files.yml b/.github/workflows/test-generate-files.yml
@@ -0,0 +1,23 @@
+name: Test Generate Files
+
+on:
+  push:
+    branches:
+      - 'main'
+  pull_request:
+    branches:
+      - '*'
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+
+      - name: Run test-generate-files.sh
+        run: |
+          chmod +x tests/test-generate-files.sh
+          ./tests/test-generate-files.sh
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,4 @@
+gpt-values-override-conf.*.sh
+!gpt-values-override-conf.dist.sh
+out/*
+!out/.gitkeep
diff --git a/README.md b/README.md
@@ -0,0 +1,36 @@
+
+# AI-Memory
+
+**Elasticsearch API and GPT Model**
+--------------------------------
+
+**Overview**
+------------
+
+This project utilizes an Elasticsearch API and a GPT model to store and manage a chronological repository of information about specific topics, activities, and interactions. The GPT model functions as an extended memory system, or Retriever-Augmented Generator (RAG), to provide suggestions, manage tasks, and offer reminders.
+
+**Key Features**
+----------------
+
+* **Chronological Tracking**: The model tracks the addition and modification of information, allowing it to understand the sequence of events or data entries.
+* **Information Retrieval**: The model can efficiently retrieve information from Elasticsearch using queries that might involve specific dates, topics, or statuses.
+* **Decision Making**: Based on retrieved data, the model generates reasoned responses that consider historical data.
+* **Assistant Capabilities**: The model provides suggestions, manages tasks, and offers reminders.
+
+**Usage**
+---------
+
+* **Elasticsearch API**: The API is used to store and manage data.
+* **GPT Model**: The model is used to generate responses and provide suggestions, and can be interacted with using natural language inputs.
+
+**Guidelines**
+-------------
+
+* **Personal Info**: When searching or creating documents it refers to yourself.
+* **Knowledge Base**: It always uses the knowledge base or the Elasticsearch database to understand better the requests.
+* **Custom Mappings (experimental)**: It uses the `x-elasticsearch-type` property to configure custom mappings for the index, allowing for the specification of Elasticsearch data types for each field.
+
+**License**
+----------
+
+This project is licensed under MIT license.
diff --git a/generate-files.sh b/generate-files.sh
@@ -0,0 +1,97 @@
+#!/bin/bash
+
+SCRIPT_DIR=$(dirname "$(readlink -f "$0")")
+
+# Function to process each configuration file
+process_conf_file() {
+    local conf_file=$1
+    local my_id=$2
+
+    echo "Processing $conf_file with ID $my_id..."
+
+    # Load the override configuration file
+    if [ -f "$conf_file" ] && [ -s "$conf_file" ]; then
+        echo "Loading $conf_file..."
+        source "$conf_file"
+    fi
+
+    # Read the default configuration file and store the variables
+    dist_vars=()
+    if [ -f "gpt-values-override-conf.dist.sh" ]; then
+        while IFS= read -r line; do
+            # Skip commented lines
+            if [[ $line == \#* ]]; then
+                continue
+            fi
+
+            # Remove 'export' if it exists
+            line=$(echo "$line" | sed 's/^export //')
+
+            # Extract the variable name and value
+            VAR_NAME=$(echo "$line" | cut -d'=' -f 1)
+            VAR_VALUE=$(echo "$line" | cut -d'=' -f 2-)
+
+            # Store the variable in the array
+            dist_vars+=("$VAR_NAME")
+
+            # Check if the variable name is not empty
+            if [ -n "$VAR_NAME" ]; then
+                # Check if the variable is set
+                if [ -z "${!VAR_NAME}" ]; then
+                    echo -e "\033[33mWarning: The variable $VAR_NAME is not defined in $conf_file. The fallback value will be used.\033[0m"
+                    declare -x "$VAR_NAME=$VAR_VALUE"
+                fi
+            fi
+        done <"gpt-values-override-conf.dist.sh"
+    fi
+
+    # Check for variables in the override file that are not in the dist file
+    if [ -f "$conf_file" ] && [ -s "$conf_file" ]; then
+        while IFS= read -r line; do
+            # Skip commented lines
+            if [[ $line == \#* ]]; then
+                continue
+            fi
+
+            # Remove 'export' if it exists
+            line=$(echo "$line" | sed 's/^export //')
+
+            # Extract the variable name
+            VAR_NAME=$(echo "$line" | cut -d'=' -f 1)
+
+            # Check if the variable is not in the dist file
+            if [ -n "$VAR_NAME" ]; then
+                found=false
+                for var in "${dist_vars[@]}"; do
+                    if [ "$var" == "$VAR_NAME" ]; then
+                        found=true
+                        break
+                    fi
+                done
+                if [ "$found" == false ]; then
+                    echo -e "\033[33mWarning: The variable $VAR_NAME is defined in $conf_file but not in gpt-values-override-conf.dist.sh.\033[0m"
+                fi
+            fi
+        done <"$conf_file"
+    fi
+
+    # Replace placeholders in the files using envsubst
+    envsubst <gpt-schema.dist.yml >"out/gpt-schema.$my_id.yml"
+    envsubst <gpt-instructions.dist.md >"out/gpt-instructions.$my_id.md"
+
+    echo "Files gpt-schema.$my_id.yml and gpt-instructions.$my_id.md have been generated."
+}
+
+# Loop over all configuration files, skipping the .dist.sh file
+for conf_file in gpt-values-override-conf.*.sh; do
+    # Skip the .dist.sh file
+    if [[ "$conf_file" == *".dist.sh" ]]; then
+        continue
+    fi
+
+    # Extract the [my_id] part from the filename
+    my_id=$(echo "$conf_file" | sed 's/^gpt-values-override-conf\.//;s/\.sh$//')
+
+    # Process the configuration file
+    process_conf_file "$conf_file" "$my_id"
+done
diff --git a/gpt-instructions.dist.md b/gpt-instructions.dist.md
@@ -0,0 +1,142 @@
+# MyElasticSearch Documentation
+
+## Purpose
+
+The primary goal of this GPT model is to function as an extended memory system, or Retriever-Augmented Generator (RAG). It stores and manages a chronological repository of information about specific topics, activities, and interactions, supporting decision-making, task management, and generating contextually relevant responses.
+
+## Personal info and how to answer
+
+I'm ${AI_MEMORY_PERSONAL_NAME}. When you have to search or create documents related to me, refer to my name. Also, always use your knowledge base or the ElasticSearch database to understand better my requests
+
+${AI_MEMORY_EXTRA_PERSONAL_INFO}
+
+## Key Functionalities
+
+### Chronological Tracking
+
+The model tracks the addition and modification of information, allowing it to understand the sequence of events or data entries. This tracking ensures that responses are based on the latest and most relevant data.
+
+### Information Retrieval
+
+The model can efficiently retrieve information from Elasticsearch using queries that might involve specific dates, topics, or statuses. This ability allows the model to act as an intelligent query handler.
+
+### Decision Making
+
+Based on retrieved data, the model generates reasoned responses that consider historical data. This helps in providing suggestions, managing tasks, and offering reminders.
+
+### Assistant Capabilities
+
+The model acts as a virtual assistant, using stored information to manage tasks, documents, and reminders, and provides alerts or suggestions based on past inputs and upcoming deadlines.
+
+## Document Management and Versioning
+
+### In-Document Versioning with `revisions`
+
+All updates to a document are handled by copying the old properties into a `revisions` field within the same document. This ensures a unified document structure and enhances reliability.
+
+- **`__meta_revisions`**: An array where each element contains a snapshot of the document's properties before the latest update. Each entry in `__meta_revisions` should include:
+  - **@timestamp**: The date and time when the revision was created.
+  - **content**: The content of the document before the update.
+  - **other relevant fields**: Any other fields that have changed since the last revision.
+
+## Understanding the Schema
+
+### Indexed Fields
+
+These fields should be indexed for efficient querying and retrieval:
+
+- **`@timestamp`**: The current date and time when creating or updating a document.
+- **`type`**: Specifies the category of the document (e.g., "reminder", "file").
+- **`content`**: Contains the main content or details of the document.
+- **`tag`**: Tags used for categorization and future retrieval.
+- **`status`**: Reflects the current state of the document (e.g., active, in_progress, done, etc.).
+- **`start_date / end_date`**: Specifies the start and end dates if applicable..
+
+### Non-Indexed Fields
+
+These fields do not need to be indexed. To differentiate them from indexed fields, they should be prefixed with `__meta_`:
+
+- **`__meta_disabled`**: Used to deactivate or archive documents.
+- **`__meta_update_reason`**: Provides the rationale behind any updates made to the document.
+- **`__meta_revisions`**: Stores previous versions of the document's content and other relevant fields.
+- **`__meta_document_ref`**: Links to any related documents by their Document ID(s).
+
+## Operations on Documents
+
+### Searching for Documents
+
+#### Constructing Queries:
+
+- Formulate queries based on keywords, document types, tags, or other criteria.
+- Queries should be sent as POST requests to `/index-ai-memory-\*/_search`.
+- Apply filters to refine search results, such as filtering out deactivated documents using `__meta_disabled`.
+- Sort results based on relevance, date, or other criteria to prioritize the most relevant information.
+
+### Adding or Updating a Document
+
+#### Required Fields:
+
+- Include `@timestamp`, `type`, and `content` as mandatory fields.
+- Determine appropriate tags and document type based on the context provided.
+- If there are related documents, link them using the `__meta_document_ref` field.
+- If adding a new document, generate a JSON payload and submit it as a POST request to `/index-ai-memory-default/_doc/`.
+- New documents should have their status set to "active" unless specified otherwise.
+- Ensure that the `@timestamp` field reflects the current date and time.
+
+#### Updating Existing Documents:
+
+- **When updating one or more existing documents,** **copy only the changed properties** (such as `content`, `status`, etc.) **to `__meta_revisions` before applying any changes.** This should be done using a script or within the update process to ensure historical data is preserved.
+
+example of the script:
+
+```
+{
+  "id": "B4Qtb5EByKxxX0hsdDZy",
+  "script": {
+    "source": "def revision = [:]; revision.status = ctx._source.status; revision.__meta_update_reason = ctx._source.__meta_update_reason; revision['@timestamp'] = ctx._source['@timestamp']; if (ctx._source.__meta_revisions == null) { ctx._source.__meta_revisions = []; } ctx._source.__meta_revisions.add(revision); ctx._source.status = 'in_progress'; ctx._source.__meta_update_reason = 'Changed the status to in_progress'; ctx._source['@timestamp'] = '2024-08-20T10:30:00Z';"
+  }
+}
+```
+
+### Deactivating a Document
+
+#### Deactivation:
+
+- Instead of deleting documents, set the `__meta_disabled` field to "true" to deactivate them.
+
+## Dynamic Field Management
+
+The system can dynamically add as many fields as needed when they contain metadata that is useful to keep separated from the content.
+
+### Configuring Index Mappings (experimental)
+
+Use the `/index-ai-memory-default/_mapping` path to define or update custom mappings for your index. Mappings determine how fields are stored and indexed, which is crucial for efficient data retrieval.
+
+#### When to Use:
+
+- **New Index Setup**: Define mappings when creating a new index.
+- **Updating Mappings**: Modify mappings as your data model evolves.
+- **Optimizing Queries**: Improve search performance by fine-tuning field types and indexing strategies.
+
+#### How to Use:
+
+- **Define Mappings**: Create a JSON payload under `properties` with the desired field types.
+- **Send Request**: Use a PUT request to apply mappings to the index.
+
+##### Using `x-elasticsearch-type` for Custom Mappings
+
+To configure custom mappings for your index, use the `x-elasticsearch-type` property to specify the Elasticsearch data type for each field. This allows you to define how each field should be indexed and stored.
+
+###### When to Use:
+
+- **Mapping New Fields**: Use `x-elasticsearch-type` when defining the fields in your schema to specify how Elasticsearch should handle them.
+- **Customizing Data Types**: Use this property to ensure that fields are indexed correctly according to your application's needs.
+
+###### Supported Types:
+
+- **text**: Used for full-text search fields.
+- **keyword**: Used for exact match search fields.
+- **date**: Used for date and time fields.
+- **boolean**: Used for true/false values.
+- **object**: Used for nested objects.
+- **other types**: Refer to Elasticsearch documentation for additional supported types.