- Detailed description
- Requirements
- Deploy
- Deploying the Quickstart Baseline (Step 1)
- Next Steps: Deploying and Securing (Steps 2 & 3)
- Delete
- References
- Technical details
- Tags
This QuickStart shows how to protect AI inference endpoints on Red Hat OpenShift AI using F5 Distributed Cloud (XC) Web App & API Protection (WAAP) + API Security. You’ll deploy a KServe/vLLM model service in OpenShift AI, front it with an F5 XC HTTP Load Balancer, and enforce API discovery, OpenAPI schema validation, rate limiting, bot defense, and sensitive-data controls—without changing your ML workflow. OpenShift AI’s single-model serving is KServe-based (recommended for LLMs), and KServe’s HuggingFace/vLLM runtime exposes OpenAI-compatible endpoints, which we’ll secure via F5 XC
Key Components
- Red Hat OpenShift AI – Unified MLOps platform for developing and inference models at scale.
- F5 Distributed Cloud API Security – Provides LLM-aware threat detection, schema validation, and sensitive data redaction.
- Integration Blueprint – Demonstrates secure model inference across hybrid environments
| Layer/Component | Technology | Purpose/Description |
|---|---|---|
| Orchestration | OpenShift AI | Container orchestration and GPU acceleration |
| Framework | LLaMA Stack | Standardizes core building blocks and simplifies AI application development |
| UI Layer | Streamlit | User-friendly chatbot interface for chat-based interaction |
| LLM | Llama-3.2-3B-Instruct | Generates contextual responses based on retrieved documents |
| Embedding | all-MiniLM-L6-v2 | Converts text to vector embeddings |
| Vector DB | PostgreSQL + PGVector | Stores embeddings and enables semantic search |
| Retrieval | Vector Search | Retrieves relevant documents based on query similarity |
| Storage | S3 Bucket | Document source for enterprise content |
- OpenShift Client CLI - oc
- OpenShift Cluster 4.18+
- OpenShift AI
- Helm CLI - helm
- Regular user permission for default deployment
- Cluster admin required for advanced configurations
The instructions below will deploy this quickstart to your OpenShift environment.
Please see the local deployments section for additional deployment options.
- huggingface-cli (optional)
- Hugging Face Token
- Access to Meta Llama model
- Access to Meta Llama Guard model
- Some of the example scripts use
jqa JSON parsing utility which you can acquire viabrew install jq
| Function | Model Name | Hardware | AWS |
|---|---|---|---|
| Embedding | all-MiniLM-L6-v2 |
CPU/GPU/HPU | |
| Generation | meta-llama/Llama-3.2-3B-Instruct |
L4/HPU | g6.2xlarge |
| Generation | meta-llama/Llama-3.1-8B-Instruct |
L4/HPU | g6.2xlarge |
| Generation | meta-llama/Meta-Llama-3-70B-Instruct |
A100 x2/HPU | p4d.24xlarge |
| Safety | meta-llama/Llama-Guard-3-8B |
L4/HPU | g6.2xlarge |
Note: the 70B model is NOT required for initial testing of this example. The safety/shield model Llama-Guard-3-8B is also optional.
The instructions below will deploy the core AI stack (pgvector, llm-service, llama-stack) to your OpenShift environment.
Log in to your OpenShift cluster using your token and API endpoint:
oc login --token=<your_sha256_token> --server=<cluster-api-endpoint>Example: The observed deployment logged into
https://api.gpu-ai.bd.f5.com:6443using a specific token and used projectz-jiinitially.
Clone the F5-API-Security repository:
git clone https://github.com/rh-ai-quickstart/F5-API-SecurityThe repository was cloned into the local directory.
Change into the cloned repository and then into the deploy folder:
cd F5-API-Security
cd deployThe deployment process navigated to
~/F5-API-Security/deploy.
Execute the deployment script:
./deploy.shIf the configuration file is missing, the script creates one and prompts you to edit it:
Values file not found. Copying from example...
Created /Users/<user>/F5-API-Security/deploy/f5-ai-security-values.yaml
Please edit this file to configure your deployment (API keys, model selection, etc.)
After editing f5-ai-security-values.yaml, re-run the script:
./deploy.shDuring installation, the script:
- Updates Helm dependencies.
- Downloads required charts (
pgvector,llm-service,llama-stack). - Creates the OpenShift project
f5-ai-security. - Installs the Helm chart with custom values.
A successful deployment will show:
NAME: f5-ai-security
LAST DEPLOYED: Thu Nov 6 12:27:49 2025
NAMESPACE: f5-ai-security
STATUS: deployed
REVISION: 1
TEST SUITE: None
Deployment complete!
Once deployed, you can verify that the model endpoints are running correctly using curl.
curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/modelsExpected output: Two models available — a large language model and an embedding model.
curl -sS http://llamastack-f5-ai-security.apps.gpu-ai.bd.f5.com/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "remote-llm/RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8",
"messages": [{"role": "user", "content": "Say hello in one sentence."}],
"max_tokens": 64,
"temperature": 0
}' | jqExample output:
"Hello, how can I assist you today?"
curl -sS http://vllm-quantized.volt.thebizdevops.net/v1/openai/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "RedHatAI/Llama-3.2-1B-Instruct-quantized.w8a8",
"messages": [{"role": "user", "content": "Say hello in one sentence."}],
"max_tokens": 64,
"temperature": 0
}' | jqThis test against the dedicated vLLM endpoint also returned a successful response.
The deployment successfully sets up the F5-API-Security QuickStart environment on OpenShift, installs the Helm chart, and exposes model endpoints that can be verified using standard API calls.
With the core AI baseline deployed, proceed to the detailed guides for configuring the F5 Distributed Cloud components and running security use cases:
Configure the F5 Distributed Cloud components and integrate the LLM endpoint.
➡️ Deployment and Configuration of F5 Distributed Cloud
Run security testing to demonstrate how F5 API Security protects the deployed model inference services.
➡️ Security Use Cases and Testing
