Skip to content

rh-ai-quickstart/F5-API-Security

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Securing Model Inference with F5 Distributed Cloud API Security

Table of contents

Detailed description

This QuickStart shows how to protect AI inference endpoints on Red Hat OpenShift AI using F5 Distributed Cloud (XC) Web App & API Protection (WAAP) + API Security. You’ll deploy a KServe/vLLM model service in OpenShift AI, front it with an F5 XC HTTP Load Balancer, and enforce API discovery, OpenAPI schema validation, rate limiting, bot defense, and sensitive-data controls—without changing your ML workflow. OpenShift AI’s single-model serving is KServe-based (recommended for LLMs), and KServe’s HuggingFace/vLLM runtime exposes OpenAI-compatible endpoints, which we’ll secure via F5 XC

Key Components

  • Red Hat OpenShift AI – Unified MLOps platform for developing and inference models at scale.
  • F5 Distributed Cloud API Security – Provides LLM-aware threat detection, schema validation, and sensitive data redaction.
  • Integration Blueprint – Demonstrates secure model inference across hybrid environments

Architecture diagrams

RAG System Architecture

Layer/Component Technology Purpose/Description
Orchestration OpenShift AI Container orchestration and GPU acceleration
Framework LLaMA Stack Standardizes core building blocks and simplifies AI application development
UI Layer Streamlit User-friendly chatbot interface for chat-based interaction
LLM Llama-3.2-3B-Instruct Generates contextual responses based on retrieved documents
Embedding all-MiniLM-L6-v2 Converts text to vector embeddings
Vector DB PostgreSQL + PGVector Stores embeddings and enables semantic search
Retrieval Vector Search Retrieves relevant documents based on query similarity
Storage S3 Bucket Document source for enterprise content

Requirements

Minimum hardware requirements

Minimum software requirements

  • OpenShift Client CLI - oc
  • OpenShift Cluster 4.18+
  • OpenShift AI
  • Helm CLI - helm

Required user permissions

  • Regular user permission for default deployment
  • Cluster admin required for advanced configurations

Deploy

The instructions below will deploy this quickstart to your OpenShift environment.

Please see the local deployments section for additional deployment options.

Prerequisites

Supported Models

Function Model Name Hardware AWS
Embedding all-MiniLM-L6-v2 CPU/GPU/HPU
Generation meta-llama/Llama-3.2-3B-Instruct L4/HPU g6.2xlarge
Generation meta-llama/Llama-3.1-8B-Instruct L4/HPU g6.2xlarge
Generation meta-llama/Meta-Llama-3-70B-Instruct A100 x2/HPU p4d.24xlarge
Safety meta-llama/Llama-Guard-3-8B L4/HPU g6.2xlarge

Note: the 70B model is NOT required for initial testing of this example. The safety/shield model Llama-Guard-3-8B is also optional.

Deploying the Quickstart Baseline (Step 1)

The instructions below will deploy the core AI stack (pgvector, llm-service, llama-stack) to your OpenShift environment.

Installation Steps

1. Login to OpenShift

Log in to your OpenShift cluster using your token and API endpoint:

oc login --token=<your_sha256_token> --server=<cluster-api-endpoint>

Example: The observed deployment logged into https://api.gpu-ai.bd.f5.com:6443 using a specific token and used project z-ji initially.


2. Clone the Repository

Clone the F5-API-Security repository:

git clone https://github.com/rh-ai-quickstart/F5-API-Security

The repository was cloned into the local directory.


3. Navigate to Deployment Directory

Change into the cloned repository and then into the deploy folder:

cd F5-API-Security
cd deploy

The deployment process navigated to ~/F5-API-Security/deploy.


4. Configure and Deploy

Execute the deployment script:

./deploy.sh

If the configuration file is missing, the script creates one and prompts you to edit it:

Values file not found. Copying from example...
Created /Users/<user>/F5-API-Security/deploy/f5-ai-security-values.yaml
Please edit this file to configure your deployment (API keys, model selection, etc.)

After editing f5-ai-security-values.yaml, re-run the script:

./deploy.sh

During installation, the script:

  • Updates Helm dependencies.
  • Downloads required charts (pgvector, llm-service, llama-stack).
  • Creates the OpenShift project f5-ai-security.
  • Installs the Helm chart with custom values.

A successful deployment will show:

NAME: f5-ai-security
LAST DEPLOYED: Thu Nov 6 12:27:49 2025
NAMESPACE: f5-ai-security
STATUS: deployed
REVISION: 1
TEST SUITE: None
Deployment complete!

Post-Deployment Verification (Optional)

Once deployed, you can verify that the model endpoints are running correctly using curl.

Check Deployed Models (LlamaStack Endpoint)

curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/models

Expected output: Two models available — a large language model and an embedding model.


Test Chat Completion (LlamaStack Endpoint)

curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/openai/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "remote-llm/RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 64,
    "temperature": 0
  }' | jq

Example output:
"Hello, how can I assist you today?"


Test Chat Completion (Secured vLLM Endpoint)

curl -sS http://vllm-quantized.volt.thebizdevops.net/v1/openai/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 64,
    "temperature": 0
  }' | jq

This test against the dedicated vLLM endpoint also returned a successful response.


Summary

The deployment successfully sets up the F5-API-Security QuickStart environment on OpenShift, installs the Helm chart, and exposes model endpoints that can be verified using standard API calls.


Next Steps: Deploying and Securing (Steps 2 & 3)

With the core AI baseline deployed, proceed to the detailed guides for configuring the F5 Distributed Cloud components and running security use cases:

Step 2: Deploy F5 Distributed Cloud

Configure the F5 Distributed Cloud components and integrate the LLM endpoint.
➡️ Deployment and Configuration of F5 Distributed Cloud

Step 3: Configure and Run Use Cases for F5 Distributed Cloud

Run security testing to demonstrate how F5 API Security protects the deployed model inference services.
➡️ Security Use Cases and Testing


Delete

References

Technical details

Tags

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •