Skip to content

Intel® AI for Enterprise RAG converts enterprise data into actionable insights with excellent TCO. Utilizing Intel Gaudi AI accelerators and Intel Xeon processors ensuring streamlined deployment.

License

Notifications You must be signed in to change notification settings

opea-project/Enterprise-RAG

Intel® AI for Enterprise RAG

 

Intel® AI for Enterprise RAG simplifies transforming your enterprise data into actionable insights. Powered by Intel® Gaudi® AI accelerators and Intel® Xeon® processors, it integrates components from industry partners to offer a streamlined approach to deploying enterprise solutions. It scales seamlessly with proven orchestration frameworks, providing the flexibility and choice your enterprise needs.

Building on the strong foundation of OPEA, Intel® AI for Enterprise RAG extends this base with key features that enhance scalability, security, and user experience. These include integrated Service Mesh capabilities for seamless integration with modern service-based architectures, production-ready validation for pipeline reliability, and a feature-rich UI for RAG as a service to manage and monitor workflows easily. Additionally, Intel® and partner support provide access to a broad ecosystem of solutions, combined with integrated IAM with UI and application, ensuring secure and compliant operations. Programmable guardrails enable fine-grained control over pipeline behavior, allowing for customized security and compliance settings.

Note

The video provided above showcases the beta release of our project. As we transition to the 1.0 version, users can anticipate an improved UI design along with other minor enhancements. While many features remain consistent with the beta, the 1.0 release will offer a more refined user experience.

ChatQnA

The ChatQnA solution uses retrieval augmented generation (RAG) architecture, quickly becoming the industry standard for chatbot development. It combines the benefits of a knowledge base (via a vector store) and generative models to reduce hallucinations, maintain up-to-date information, and leverage domain-specific knowledge.

arch

For the complete microservices architecture, refer here

Table of Contents

Documentation

  • Deployment Guide explains how to install and configure Intel® AI for Enterprise RAG for your needs.

System Requirements

Category Details
Operating System Ubuntu 22.04/24.04
Hardware Platforms 4th Gen Intel® Xeon® Scalable processors
5th Gen Intel® Xeon® Scalable processors
6th Gen Intel® Xeon® Scalable processors
3rd Gen Intel® Xeon® Scalable processors and Intel® Gaudi® 2 AI Accelerator
4th Gen Intel® Xeon® Scalable processors and Intel® Gaudi® 2 AI Accelerator
6th Gen Intel® Xeon® Scalable processors and Intel® Gaudi® 3 AI Accelerator
Kubernetes Version 1.29.5
1.29.12
1.30.8
1.31.4
Gaudi Firmware Version 1.21.0
Python 3.10

Hardware Prerequisites for Deployment using Gaudi® AI Accelerator

To deploy the solution on a platform with Gaudi® AI Accelerator we need to have access to instance with minimal requirements:

  • logical cores: A minimum of 56 logical cores
  • RAM memory: A minimum of 250GB of RAM though this is highly dependent on database size
  • Disk Space: 500GB of disk space is generally recommended, though this is highly dependent on the model size and database size
  • Gaudi cards: 8
  • Latest Gaudi driver: To check your Gaudi version, run hl-smi. If Gaudi version doesn't match the required version, upgrade it by following this tutorial.

If you don't have a Gaudi® AI Accelerator, you can request these instances in Intel® Tiber™ AI Cloud to run Intel® AI for Enterprise RAG.

  • Visit ai.cloud.intel.com and register.
  • on Intel® Tiber™ AI Cloud the left pane select Preview > Preview catalog.
  • Select Gaudi® 2 Deep Learning Server, we recommend choosing a Bare Metal machine with 8 Gaudi devices to be able to meet hardware requirements.
  • request instance
  • As there is a limited number of machines with Gaudi accelerators, you need to be whitelisted to be able to access them. Therefore:
    • Send your email, Account ID, and information that you have requested an instance to [email protected] so we could add you to the whitelist.
  • on Intel® Tiber™ AI Cloud the left pane select Catalog > Hardware. Once you are added to the whitelist you should see Gaudi instances available.
  • Select the Machine image - for example: ubuntu-2204-gaudi2-1.20.1-* with Architecture: X86_64 (Baremetal only). Please note that minor version tags may change over time.
  • Upload your public key and launch the instance
  • Navigate to the Instances page and verify that the machine has reached its ready state, then click on "How to Connect via SSH" to configure your machine correctly for further installation.

Hardware Prerequisites for Deployment using Xeon only

To deploy the solution on a platform using 4th or 5th generation Intel® Xeon® processors, you will need:

  • access to any platform with Intel® Xeon® Scalable processors that meet bellow requirements:
  • logical cores: A minimum of 88 logical cores
  • RAM memory: A minimum of 250GB of RAM
  • Disk Space: 200GB of disk space is generally recommended, though this is highly dependent on the model size

Software Prerequisites

  • Operating System: Ubuntu 22.04/24.04
  • Hugging Face Model Access: Ensure you have the necessary access to download and use the chosen Hugging Face model. This default model used is Mixtral-8x7B for which access needs to be requested. Visit Mixtral-8x7B to apply for access.
  • **For multi-node clusters CSI driver with StorageClass supporting accessMode ReadWriteMany (RWX). NFS server with CSI driver that supports RWX can be installed in simplified-kubernetes-cluster-deployment section.

Additional Software Prerequisites when using Gaudi® AI Accelerator

  • Gaudi Software Stack: Verify that your setup uses a valid software stack for Gaudi accelerators, see Gaudi support matrix. Note that running LLM on a CPU is possible but will significantly reduce performance.

Software Images

Software images will be downloaded automatically during deployment from Docker Hub. If you prefer to use an internal registry, all required images are available under the following link: ERAG Images.

To configure the use of an internal registry, please specify it in the config.yaml file as described in the Application Deployment on a Custom Cluster section of the repository.

Pre-Installation

It is recommended to use python3-venv to manage python packages.

cd deployment
sudo apt-get install python3-venv
python3 -m venv erag-venv
source erag-venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yaml --upgrade

Note: To verify if venv is properly used, check with ansible --version for Python version. If not using the venv Python, you can add ansible_python_interpreter=erag-venv/bin/python3 to the inventory under [local] section, or for application playbook add -e ansible_python_interpreter=erag-venv/bin/python3.

Configuration File

To prepare the configuration file, create a copy of the sample:

cd deployment
cp -r inventory/sample inventory/test-cluster

Simplified Kubernetes Cluster Deployment

If you need to deploy a new Kubernetes cluster, follow the instructions below. If you already have a cluster prepared, you can skip to the Preparing a Custom Cluster and Installing Infrastructure Components section.

Show instructions for installing a new Kubernetes cluster
  1. Edit the inventory file:
    • Open inventory/test-cluster/inventory.ini.
    • Replace LOCAL_USER, REMOTE_USER, MACHINE_HOSTNAME, and MACHINE_IP with your actual values.

Example inventory.ini for a single-node cluster:

# Kubernetes Cluster Inventory
[local]
localhost ansible_connection=local ansible_user=LOCAL_USER

[all]
# Control plane nodes
MACHINE_HOSTNAME ansible_host=MACHINE_IP

# Define node groups
[kube_control_plane]
MACHINE_HOSTNAME

[kube_node]
MACHINE_HOSTNAME

[etcd:children]
kube_control_plane

[k8s_cluster:children]
kube_control_plane
kube_node

# Vars
[k8s_cluster:vars]
ansible_become=true
ansible_user=REMOTE_USER
ansible_connection=ssh

For more information on preparing an Ansible inventory, see the Ansible Inventory Documentation.

  1. Edit the configuration file:
    • Open inventory/test-cluster/config.yaml.
    • Fill in the required values for your environment.

Example config.yaml for deploying a new Kubernetes cluster:

# Uses Kubespray to deploy Kubernetes cluster and install required components
deploy_k8s: false

# Available options:
# - "local-path-provisioner": Use for single-node deployment. Local path provisioner works only with Kubespray deployment, so deploy_k8s needs to be true to install it
# - "nfs": Use for multi-node deployment or single node with velero backup functionality; can be installed on existing K8s cluster using infrastructure playbook
install_csi: "local-path-provisioner"

# please use variables local_registry and insecure_registry together
# set to true to install registry on K8s , set insecure registry to be able to push and pull images from registry on defined node and defined port.
local_registry: true
insecure_registry: "master-1:32000"

# Setup when install_csi is "nfs"
# Setup NFS server when working on a multi-node cluster which does not have a StorageClass with RWX capability.
# Setting nfs_server_enabled to true will install NFS server together with CSI driver
# and set nfs_csi_storage_class as the default one.
nfs_node_name: "master-1"             # K8s node name on which to set up NFS server
nfs_host_path: "/opt/nfs-data"       # Host path on the K8s node for NFS server
nfs_csi_driver_version: "4.11.0"
nfs_csi_storage_class: "nfs-csi"

huggingToken: FILL_HERE # Provide your Hugging Face token here

# Provide absolute path to kubeconfig (e.g. /home/ubuntu/.kube/config).
# If you are installing K8s cluster in simplified deployment, your kubeconfig will be created in path "<repository path>/deployment/inventory/test-cluster/artifacts/admin.conf"
kubeconfig: FILL_HERE

# Proxy settings are optional
httpProxy:
httpsProxy:
# If HTTP/HTTPS proxy is set, update the noProxy field as needed:
noProxy: #"localhost,.svc,.monitoring,.monitoring-traces"
# ...
pipelines:
  - namespace: chatqa
    samplePath: chatqa/reference-cpu.yaml # For HPU deployment, use chatqa/reference-hpu.yaml
    resourcesPath: chatqa/resources-reference-cpu.yaml # For HPU deployment, use chatqa/resources-reference-hpu.yaml
    type: chatqa

Note: The inventory provides the ability to install additional components that might be needed when preparing a Kubernetes (K8s) cluster.
Set gaudi_operator: true if you are working with Gaudi nodes and want to install gaudi software stack via operator.
Set install_csi: nfs if you are setting up a multi-node cluster and want to deploy an NFS server with a CSI plugin that creates a StorageClass with RWX (ReadWriteMany) capabilities.
Velero requires NFS to be included.

  1. (Optional) Validate hardware resources and config.yaml:

    ansible-playbook playbooks/validate.yaml --tags hardware,config -i inventory/test-cluster/inventory.ini -e @inventory/test-cluster/config.yaml

    Note: If this is gaudi_deployment add additional flag -e is_gaudi_platform=true

  2. Deploy the cluster:

    ansible-playbook -K playbooks/infrastructure.yaml --tags configure,install -i inventory/test-cluster/inventory.ini -e @inventory/test-cluster/config.yaml

Cluster Deletion

To remove the cluster, run:

ansible-playbook -K playbooks/infrastructure.yaml --tags delete -i inventory/test-cluster/inventory.ini -e @inventory/test-cluster/config.yaml
 

Preparing a Custom Cluster and Installing Infrastructure Components

If you are using your own custom Kubernetes cluster (not provisioned by the provided infrastructure playbooks), you may need to install additional infrastructure components before deploying the application. These include the NFS server for shared storage, the Gaudi operator (for Habana Gaudi AI accelerator support), Velero, or other supported services.

Show instructions for Preparing a Custom Cluster

To prepare your cluster:

  1. Edit the configuration file:

    • Open inventory/test-cluster/config.yaml.
    • Set deploy_k8s: false and update the other fields as needed for your environment (see the Configuration File section for details).
    • If you need NFS, set install_csi: nfs and configure the NFS-related variables (backing up with Velero requires NFS to be included). If you need Gaudi support, set gaudi_operator: true and specify the desired habana_driver_version.

    Example config.yaml for deploying on an existing cluster:

    # Uses Kubespray to deploy Kubernetes cluster and install required components
    deploy_k8s: false
    
    # Available options:
    # - "local-path-provisioner": Use for single-node deployment. Local path provisioner works only with Kubespray deployment, so deploy_k8s needs to be true to install it
    # - "nfs": Use for multi-node deployment; can be installed on existing K8s cluster using infrastructure playbook
    install_csi: "local-path-provisioner"
    
    # please use variables local_registry and insecure_registry together
    # set to true to install registry on K8s , set insecure registry to be able to push and pull images from registry on defined node and defined port.
    local_registry: true
    insecure_registry: "master-1:32000"
    
    # Setup when install_csi is "nfs"
    # Setup NFS server when working on a multi-node cluster which does not have a StorageClass with RWX capability.
    # Setting nfs_server_enabled to true will install NFS server together with CSI driver
    # and set nfs_csi_storage_class as the default one.
    nfs_node_name: "master-1"             # K8s node name on which to set up NFS server
    nfs_host_path: "/opt/nfs-data"       # Host path on the K8s node for NFS server
    nfs_csi_driver_version: "4.11.0"
    nfs_csi_storage_class: "nfs-csi"
    
    huggingToken: FILL_HERE # Provide your Hugging Face token here
    
    # Provide absolute path to kubeconfig (e.g. /home/ubuntu/.kube/config).
    # If you are installing K8s cluster in simplified deployment, your kubeconfig will be created in path "<repository path>/deployment/inventory/test-cluster/artifacts/admin.conf"
    kubeconfig: FILL_HERE
    
    httpProxy:
    httpsProxy:
    # If HTTP/HTTPS proxy is set, update the noProxy field with the following:
    noProxy: #"localhost,.svc,.monitoring,.monitoring-traces"
    
    ...
    
    gaudi_operator: false # set to true when Gaudi operator is to be installed
    habana_driver_version: "1.21.3-57" # habana operator from https://vault.habana.ai/ui/native/habana-ai-operator/driver/
    
    ...
    pipelines:
      - namespace: chatqa
        samplePath: chatqa/reference-cpu.yaml
        resourcesPath: chatqa/resources-reference-cpu.yaml
        modelConfigPath: chatqa/resources-model-cpu.yaml
        type: chatqa
    
    ...
  2. Validate hardware resources and config.yaml:

    ansible-playbook playbooks/validate.yaml --tags hardware,config -i inventory/test-cluster/inventory.ini -e @inventory/test-cluster/config.yaml

    Note: If this is a Gaudi deployment, add the flag -e is_gaudi_platform=true.

  3. Install infrastructure components (NFS, Gaudi operator, or others):

     ansible-playbook -K playbooks/infrastructure.yaml --tags post-install -i inventory/test-cluster/inventory.ini -e @inventory/test-cluster/config.yaml

    This will install and configure the NFS server, Gaudi operator, or velero as specified in your configuration.

    Note: You can enable several components in the same run if both are needed. Additional components may be supported via post-install in the future.

Once your cluster is prepared and the required infrastructure is installed, proceed with the application installation as described below.

 

Installation

ansible-playbook -u $USER -K playbooks/application.yaml --tags configure,install -e @inventory/test-cluster/config.yaml

Refer Deployment if you prefer to install with multiple options.

Updating the Application

After the application is installed, you can update its components (for example, change the LLM or embedding model) by editing your configuration file and running the install tag again. The deployment scripts will detect changes and update only the involved components, minimizing downtime and unnecessary redeployments.

To update the application:

  1. Edit inventory/test-cluster/config.yaml and adjust the relevant parameters (e.g., llm_model, embedding_model_name, or other settings).
  2. Run:
ansible-playbook -u $USER -K playbooks/application.yaml --tags install -e @inventory/test-cluster/config.yaml

This will apply the changes and update only the affected services.

Remove installation

cd deployment
ansible-playbook playbooks/application.yaml --tags uninstall -e @inventory/test-cluster/config.yaml

User Data Backup

Application supports taking backup and restoring user data, including ingested vector data, ingested documents, user accounts and credentials and chat history.

  • With backup enabled and configured in cluster, the backup can be taken with the following command:

    ansible-playbook -u $USER -K playbooks/backup.yaml --tags backup,monitor_backup -e @inventory/test-cluster/config.yaml

    See documentation on Full Backup of User Data to find more details on backup.

Note: Backup requires configuring cluster and application to support this feature, in particular the VMWare Velero tool is required.
Refer to Deployment guide's chapter Velero Prerequisites for details.

User Data Restore

  • With backup configured correctly the data restore process can be started with the following command:

    ansible-playbook -u $USER -K playbooks/backup.yaml --tags restore,monitor_restore -e @inventory/test-cluster/config.yaml

    See documentation on Full Restore of User Data to find more details on restore.

Support

Submit questions, feature requests, and bug reports on the GitHub Issues page.

License

Intel® AI for Enterprise RAG is licensed under the Apache License Version 2.0. Refer to the "LICENSE" file for the full license text and copyright notice.

This distribution includes third-party software governed by separate license terms. This third-party software, even if included with the distribution of the Intel software, may be governed by separate license terms, including without limitation, third-party license terms, other Intel software license terms, and open-source software license terms. These separate license terms govern your use of the third-party programs as set forth in the "THIRD-PARTY-PROGRAMS" file.

Security

The Security Policy outlines our guidelines and procedures for ensuring the highest level of security and trust for our users who consume Intel® AI for Enterprise RAG.

Intel’s Human Rights Principles

Intel is committed to respecting human rights and avoiding causing or contributing to adverse impacts on human rights. See Intel’s Global Human Rights Principles. Intel’s products and software are intended only to be used in applications that do not cause or contribute to adverse impacts on human rights.

Model Card Guidance

You, not Intel, are responsible for determining model suitability for your use case. For information regarding model limitations, safety considerations, biases, or other information consult the model cards (if any) for models you use, typically found in the repository where the model is available for download. Contact the model provider with questions. Intel does not provide model cards for third party models.

Contributing

If you want to contribute to the project, please refer to the guide in CONTRIBUTING.md file.

Trademark Information

Intel, the Intel logo, OpenVINO, the OpenVINO logo, Pentium, Xeon, and Gaudi are trademarks of Intel Corporation or its subsidiaries.

  • Other names and brands may be claimed as the property of others.

© Intel Corporation

About

Intel® AI for Enterprise RAG converts enterprise data into actionable insights with excellent TCO. Utilizing Intel Gaudi AI accelerators and Intel Xeon processors ensuring streamlined deployment.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 35