Merge branch 'main' of https://github.com/katanemo/arch into cotran/h…

…allu-fix
katanemo · Oct 15, 2024 · b8c6bd7 · b8c6bd7
2 parents 5049082 + 35c5e30
commit b8c6bd7
Show file tree

Hide file tree

Showing 43 changed files with 865 additions and 644 deletions.
diff --git a/.github/workflows/checks.yml b/.github/workflows/checks.yml
@@ -1,6 +1,9 @@
 name: Checks
 
-on: pull_request
+on:
+  pull_request:
+  push:
+    branches: [main]
 
 jobs:
   test:

diff --git a/.gitignore b/.gitignore
@@ -30,3 +30,4 @@ model_server/venv_model_server
 model_server/build
 model_server/dist
 arch_logs/
+dist/
diff --git a/README.md b/README.md
@@ -2,16 +2,18 @@
   <img src="docs/source/_static/img/arch-logo.png" alt="Arch Gateway Logo" title="Arch Gateway Logo">
 </p>
 
-## Build fast, robust, and personalized GenAI apps (agents, assistants, etc.)
+## Build fast, robust, and personalized AI agents.
 
-Arch is an intelligent [Layer 7](https://www.cloudflare.com/learning/ddos/what-is-layer-7/) gateway designed for generative AI apps, AI agents, and co-pilots that work with prompts. Engineered with purpose-built LLMs, Arch handles the critical but undifferentiated tasks related to the handling and processing of prompts, including detecting and rejecting [jailbreak](https://github.com/verazuo/jailbreak_llms) attempts, intelligently calling "backend" APIs to fulfill the user's request represented in a prompt, routing to and offering disaster recovery between upstream LLMs, and managing the observability of prompts and LLM interactions in a centralized way.
+Arch is an intelligent [Layer 7](https://www.cloudflare.com/learning/ddos/what-is-layer-7/) gateway designed to protect, observe, and personalize LLM applications (agents, assistants, co-pilots) with your APIs.
+
+Engineered with purpose-built LLMs, Arch handles the critical but undifferentiated tasks related to the handling and processing of prompts, including detecting and rejecting [jailbreak](https://github.com/verazuo/jailbreak_llms) attempts, intelligently calling "backend" APIs to fulfill the user's request represented in a prompt, routing to and offering disaster recovery between upstream LLMs, and managing the observability of prompts and LLM interactions in a centralized way.
 
  Arch is built on (and by the core contributors of) [Envoy Proxy](https://www.envoyproxy.io/) with the belief that:
 
 >Prompts are nuanced and opaque user requests, which require the same capabilities as traditional HTTP requests including secure handling, intelligent routing, robust observability, and integration with backend (API) systems for personalization – all outside business logic.*
 
 **Core Features**:
-  - Built on [Envoy](https://envoyproxy.io): Arch runs alongside application servers, and builds on top of Envoy's proven HTTP management and scalability features to handle ingress and egress traffic related to prompts and LLMs
+  - Built on [Envoy](https://envoyproxy.io): Arch runs alongside application servers, and builds on top of Envoy's proven HTTP management and scalability features to handle ingress and egress traffic related to prompts and LLMs.
   - Function Calling for fast Agentic and RAG apps. Engineered with purpose-built [LLMs](https://huggingface.co/collections/katanemo/arch-function-66f209a693ea8df14317ad68) to handle fast, cost-effective, and accurate prompt-based tasks like function/API calling, and parameter extraction from prompts.
   - Prompt [Guard](https://huggingface.co/collections/katanemo/arch-guard-6702bdc08b889e4bce8f446d): Arch centralizes prompt guardrails to prevent jailbreak attempts and ensure safe user interactions without writing a single line of code.
   - Traffic Management: Arch manages LLM calls, offering smart retries, automatic cutover, and resilient upstream connections for continuous availability.
@@ -20,7 +22,7 @@ Arch is an intelligent [Layer 7](https://www.cloudflare.com/learning/ddos/what-i
 **Jump to our [docs](https://docs.archgw.com)** to learn how you can use Arch to improve the speed, security and personalization of your GenAI apps.
 
 ## Contact
-To get in touch with us, please join our [discord server](https://discord.gg/rbjqVbpa). We will be monitoring that actively and offering support there.
+To get in touch with us, please join our [discord server](https://discord.gg/rSRQ9fv7). We will be monitoring that actively and offering support there.
 
 ## Demos
 * [Function Calling](demos/function_calling/README.md) - Walk through of critical function calling capabilities
@@ -35,7 +37,7 @@ Follow this guide to learn how to quickly set up Arch and integrate it into your
 
 Before you begin, ensure you have the following:
 
-- `Docker` & `Python` verion 3.10 installed on your system
+- `Docker` & `Python` installed on your system
 - `API Keys` for LLM providers (if using external LLMs)
 
 ### Step 1: Install Arch
@@ -109,15 +111,12 @@ Make outbound calls via Arch
 import openai
 
 # Set the OpenAI API base URL to the Arch gateway endpoint
-openai.api_base = "http://127.0.0.1:12000/"
+openai.api_base = "http://127.0.0.1:51001/v1"
 
 # No need to set openai.api_key since it's configured in Arch's gateway
 
 # Use the OpenAI client as usual
-# we set api_key to '--' becasue openai client would fail to initiate request without it. Just pass any
-# dummy value here since arch gateway will properly pass access key before making outbound call.
 response = openai.Completion.create(
-   api_key="--",
    model="text-davinci-003",
    prompt="What is the capital of France?"
 )

diff --git a/arch/docker-compose.yaml b/arch/docker-compose.yaml
@@ -12,3 +12,5 @@ services:
       - ~/archgw_logs:/var/log/
     env_file:
       - stage.env
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
diff --git a/arch/tools/README.md b/arch/tools/README.md
@@ -56,9 +56,18 @@ sh build_cli.sh
 archgw build
 ```
 
-## Step 5: start model server in the background
+### Step 5: download models
+This will help download models so model_server can load faster. This should be done once.
+
+```bash
+archgw download-models
 ```
-archgw up --services model_server
+
+### Logs
+`archgw` command can also view logs from gateway and model_server. Use following command to view logs,
+
+```bash
+archgw logs --follow
 ```
 
 ## Uninstall Instructions: archgw CLI

diff --git a/arch/tools/cli/consts.py b/arch/tools/cli/consts.py
@@ -0,0 +1 @@
+KATANEMO_DOCKERHUB_REPO = "katanemo/archgw"
diff --git a/arch/tools/cli/core.py b/arch/tools/cli/core.py
@@ -4,6 +4,37 @@
 import pkg_resources
 import select
 from cli.utils import run_docker_compose_ps, print_service_status, check_services_state
+from cli.utils import getLogger
+import sys
+
+log = getLogger(__name__)
+
+
+def stream_gateway_logs(follow):
+    """
+    Stream logs from the arch gateway service.
+    """
+    compose_file = pkg_resources.resource_filename(
+        __name__, "../config/docker-compose.yaml"
+    )
+
+    log.info("Logs from arch gateway service.")
+
+    options = ["docker", "compose", "-p", "arch", "logs"]
+    if follow:
+        options.append("-f")
+    try:
+        # Run `docker-compose logs` to stream logs from the gateway service
+        subprocess.run(
+            options,
+            cwd=os.path.dirname(compose_file),
+            check=True,
+            stdout=sys.stdout,
+            stderr=sys.stderr,
+        )
+
+    except subprocess.CalledProcessError as e:
+        log.info(f"Failed to stream logs: {str(e)}")
 
 
 def start_arch(arch_config_file, env, log_timeout=120):
@@ -14,7 +45,7 @@ def start_arch(arch_config_file, env, log_timeout=120):
         path (str): The path where the prompt_confi.yml file is located.
         log_timeout (int): Time in seconds to show logs before checking for healthy state.
     """
-
+    log.info("Starting arch gateway")
     compose_file = pkg_resources.resource_filename(
         __name__, "../config/docker-compose.yaml"
     )
@@ -35,9 +66,10 @@ def start_arch(arch_config_file, env, log_timeout=120):
             ),  # Ensure the Docker command runs in the correct path
             env=env,  # Pass the modified environment
             check=True,  # Raise an exception if the command fails
+            stderr=subprocess.PIPE,
+            stdout=subprocess.PIPE,
         )
-        print(f"Arch docker-compose started in detached.")
-        print("Monitoring `docker-compose ps` logs...")
+        log.info(f"Arch docker-compose started in detached.")
 
         start_time = time.time()
         services_status = {}
@@ -51,14 +83,14 @@ def start_arch(arch_config_file, env, log_timeout=120):
 
             # Check if timeout is reached
             if elapsed_time > log_timeout:
-                print(f"Stopping log monitoring after {log_timeout} seconds.")
+                log.info(f"Stopping log monitoring after {log_timeout} seconds.")
                 break
 
             current_services_status = run_docker_compose_ps(
                 compose_file=compose_file, env=env
             )
             if not current_services_status:
-                print(
+                log.info(
                     "Status for the services could not be detected. Something went wrong. Please run docker logs"
                 )
                 break
@@ -74,11 +106,11 @@ def start_arch(arch_config_file, env, log_timeout=120):
             running_states = ["running", "up"]
 
             if check_services_state(current_services_status, running_states):
-                print("Arch is up and running!")
+                log.info("Arch gateway is up and running!")
                 break
 
             if check_services_state(current_services_status, unhealthy_states):
-                print(
+                log.info(
                     "One or more Arch services are unhealthy. Please run `docker logs` for more information"
                 )
                 print_service_status(
@@ -92,7 +124,7 @@ def start_arch(arch_config_file, env, log_timeout=120):
                     services_status[service_name]["State"]
                     != current_services_status[service_name]["State"]
                 ):
-                    print(
+                    log.info(
                         "One or more Arch services have changed state. Printing current state"
                     )
                     print_service_status(current_services_status)
@@ -101,7 +133,7 @@ def start_arch(arch_config_file, env, log_timeout=120):
             services_status = current_services_status
 
     except subprocess.CalledProcessError as e:
-        print(f"Failed to start Arch: {str(e)}")
+        log.info(f"Failed to start Arch: {str(e)}")
 
 
 def stop_arch():
@@ -115,17 +147,21 @@ def stop_arch():
         __name__, "../config/docker-compose.yaml"
     )
 
+    log.info("Shutting down arch gateway service.")
+
     try:
         # Run `docker-compose down` to shut down all services
         subprocess.run(
             ["docker", "compose", "-p", "arch", "down"],
             cwd=os.path.dirname(compose_file),
             check=True,
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
         )
-        print("Successfully shut down all services.")
+        log.info("Successfully shut down arch gateway service.")
 
     except subprocess.CalledProcessError as e:
-        print(f"Failed to shut down services: {str(e)}")
+        log.info(f"Failed to shut down services: {str(e)}")
 
 
 def start_arch_modelserver():
@@ -134,12 +170,13 @@ def start_arch_modelserver():
 
     """
     try:
+        log.info("archgw_modelserver restart")
         subprocess.run(
             ["archgw_modelserver", "restart"], check=True, start_new_session=True
         )
-        print("Successfull run the archgw model_server")
+        log.info("Successfull ran model_server")
     except subprocess.CalledProcessError as e:
-        print(f"Failed to start model_server. Please check archgw_modelserver logs")
+        log.info(f"Failed to start model_server. Please check archgw_modelserver logs")
         sys.exit(1)
 
 
@@ -153,7 +190,7 @@ def stop_arch_modelserver():
             ["archgw_modelserver", "stop"],
             check=True,
         )
-        print("Successfull stopped the archgw model_server")
+        log.info("Successfull stopped the archgw model_server")
     except subprocess.CalledProcessError as e:
-        print(f"Failed to start model_server. Please check archgw_modelserver logs")
+        log.info(f"Failed to start model_server. Please check archgw_modelserver logs")
         sys.exit(1)