Skip to content

Commit

Permalink
Add helm chart, add CI for lint/relase helm chart, update README for …
Browse files Browse the repository at this point in the history
…helm installation (#3)

Signed-off-by: Hung-Han (Henry) Chen <[email protected]>
  • Loading branch information
chenhunghan authored Aug 10, 2023
1 parent 4c42f9a commit 0330d65
Show file tree
Hide file tree
Showing 8 changed files with 228 additions and 1 deletion.
45 changes: 45 additions & 0 deletions .github/workflows/helm-chart-lint-test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Lint and Test Charts

on: pull_request

jobs:
lint-test:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Set up Helm
uses: azure/setup-helm@v3
with:
version: v3.12.1

- uses: actions/setup-python@v4
with:
python-version: '3.9'
check-latest: true

- name: Set up chart-testing
uses: helm/[email protected]

- name: Run chart-testing (list-changed)
id: list-changed
run: |
changed=$(ct list-changed --target-branch ${{ github.event.repository.default_branch }})
if [[ -n "$changed" ]]; then
echo "changed=true" >> "$GITHUB_OUTPUT"
fi
- name: Run chart-testing (lint)
if: steps.list-changed.outputs.changed == 'true'
run: ct lint --target-branch ${{ github.event.repository.default_branch }} --validate-maintainers=false

- name: Create kind cluster
if: steps.list-changed.outputs.changed == 'true'
uses: helm/[email protected]

- name: Run chart-testing (install)
if: steps.list-changed.outputs.changed == 'true'
run: ct install --target-branch ${{ github.event.repository.default_branch }}
32 changes: 32 additions & 0 deletions .github/workflows/helm-chart-release.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Release Charts

on:
push:
branches:
- main
paths:
- 'charts/text-inference-batcher-nodejs/**'
- '.github/workflows/helm-chart-release.yaml'

jobs:
release:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Configure Git
run: |
git config user.name "$GITHUB_ACTOR"
git config user.email "[email protected]"
- name: Run chart-releaser
uses: helm/[email protected]
with:
charts_dir: charts
env:
CR_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
48 changes: 48 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,54 @@

`text-inference-batcher` is a high-performance router optimized for maximum throughput in text inference workload.

## Quick Start

Quickly deploy two inference backend using [ialacol](https://github.com/chenhunghan/ialacol) in namespace `llm`.

```sh
helm repo add ialacol https://chenhunghan.github.io/ialacol
helm repo update
# the classic llama-2 13B
helm install llama-2 ialacol/ialacol \
--set deployment.env.DEFAULT_MODEL_HG_REPO_ID="" \
--set deployment.env.DEFAULT_MODEL_FILE="llama-2-13b-chat.ggmlv3.q4_0.bin" \
-n llm
# orca mini fine-tuned llama-2 https://huggingface.co/psmathur/orca_mini_v3_13b
helm install orca-mini ialacol/ialacol \
--set deployment.env.DEFAULT_MODEL_HG_REPO_ID="TheBloke/orca_mini_v3_13B-GGML" \
--set deployment.env.DEFAULT_MODEL_HG_REPO_ID="orca_mini_v3_13b.ggmlv3.q4_0.bin" \
-n llm
# just another fine-tuned variant
helm install stable-platypus2 ialacol/ialacol \
--set deployment.env.DEFAULT_MODEL_HG_REPO_ID="TheBloke/Stable-Platypus2-13B-GGML" \
--set deployment.env.DEFAULT_MODEL_HG_REPO_ID="stable-platypus2-13b.ggmlv3.q4_0.bin" \
-n llm
```

Add `text-inference-batcher` pointing to upstreams.

```sh
helm repo add text-inference-batcher <https://chenhunghan.github.io/text-inference-batcher>
helm repo update
helm install tib text-inference-batcher/text-inference-batcher-nodejs \
--set deployment.env.UPSTREAMS="http://llama-2:8000,http://orca-mini:8000,http://stable-platypus2:8000"
-n llm
```

Port forward `text-inference-batcher` for testing.

```sh
kubectl port-forward svc/tib 8000:8000 -n llm
```

Single gateway for all your inference backends

```sh
openai -k "sk-" -b http://localhost:8000/v1 -vv api chat_completions.create -m llama-2-13b-chat.ggmlv3.q4_0.bin -g user "Hello world!"
openai -k "sk-" -b http://localhost:8000/v1 -vv api chat_completions.create -m orca_mini_v3_13b.ggmlv3.q4_0.bin -g user "Hello world!"
openai -k "sk-" -b http://localhost:8000/v1 -vv api chat_completions.create -m stable-platypus2-13b.ggmlv3.q4_0.bin -g user "Hello world!"
```

## Features

- Max throughput by queuing, and continuous batching of incoming requests.
Expand Down
2 changes: 1 addition & 1 deletion apps/text-inference-batcher-nodejs/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -43,4 +43,4 @@ RUN --mount=type=cache,target=/tmp/.npm \
--cache /tmp/.npm
ENV NODE_ENV production
EXPOSE 8000
CMD ["node", "dist/index.js"]
CMD ["node", "apps/text-inference-batcher-nodejs/dist/index.js"]
6 changes: 6 additions & 0 deletions charts/text-inference-batcher-nodejs/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v2
appVersion: 0.0.1
description: A Helm chart for text-inference-batcher with node.js runtime
name: text-inference-batcher-nodejs
type: application
version: 0.0.1
43 changes: 43 additions & 0 deletions charts/text-inference-batcher-nodejs/templates/deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}
namespace: {{ .Release.Namespace | quote }}
labels:
app.kubernetes.io/instance: {{ .Chart.Name }}
app.kubernetes.io/name: {{ .Release.Name }}
spec:
selector:
matchLabels:
app.kubernetes.io/instance: {{ .Chart.Name }}
app.kubernetes.io/name: {{ .Release.Name }}
replicas: {{ .Values.replicas }}
template:
metadata:
name: {{ .Release.Name }}
labels:
app.kubernetes.io/instance: {{ .Chart.Name }}
app.kubernetes.io/name: {{ .Release.Name }}
spec:
containers:
- name: {{ .Release.Name }}
image: {{ .Values.deployment.image }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
env:
- name: UPSTREAMS
value: {{ (.Values.deployment.env).UPSTREAMS | quote }}
- name: MAX_CONNECT_PER_UPSTREAM
value: {{ (.Values.deployment.env).MAX_CONNECT_PER_UPSTREAM | quote }}
tolerations:
{{- if .Values.tolerations }}
{{ toYaml .Values.tolerations | indent 8 }}
{{- end }}
nodeSelector:
{{- if .Values.nodeSelector }}
{{ toYaml .Values.nodeSelector | indent 8 }}
{{- end }}
affinity:
{{- if .Values.affinity }}
{{ toYaml .Values.affinity | indent 8 }}
{{- end }}
15 changes: 15 additions & 0 deletions charts/text-inference-batcher-nodejs/templates/service.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: v1
kind: Service
metadata:
name: {{ .Release.Name }}
namespace: {{ .Release.Namespace | quote }}
spec:
selector:
app.kubernetes.io/instance: {{ .Chart.Name }}
app.kubernetes.io/name: {{ .Release.Name }}
type: "{{ .Values.service.type }}"
ports:
- protocol: TCP
port: {{ .Values.service.port }}
targetPort: 8000
name: http
38 changes: 38 additions & 0 deletions charts/text-inference-batcher-nodejs/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
replicas: 1

deployment:
image: ghcr.io/chenhunghan/text-inference-batcher-nodejs:latest
env:
# upstream url separated by comman. e.g. "http://llama-2-7b-0:8000,http://llama-2-7b-1:8000,http://llama-2-13b-0:8000"
UPSTREAMS: ""
MAX_CONNECT_PER_UPSTREAM: 1
resources:
{}
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
service:
type: ClusterIP
port: 8000
annotations: {}
# If using an AWS load balancer, you'll need to override the default 60s load balancer idle timeout
# service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "1200"
nodeSelector: {}
tolerations: []
# e.g.
# - key: "computing"
# operator: "Exists"
# effect: "NoSchedule"
affinity: {}
# e.g.
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: computing-lb
# operator: In
# values:
# - "true"

0 comments on commit 0330d65

Please sign in to comment.