Skip to content

Commit

Permalink
Add Groq/Llama-3-Groq-8B-Tool-Use (akash-network#609)
Browse files Browse the repository at this point in the history
  • Loading branch information
alexx855 authored Dec 3, 2024
1 parent 0a91fff commit bd2f96a
Show file tree
Hide file tree
Showing 3 changed files with 75 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Llama-3-Groq-8B-Tool-Use/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Groq/Llama-3-Groq-8B-Tool-Use

Meta developed and publicly released the Llama 3 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 8 billion to 70 billion parameters. Llama 3 is an auto-regressive language model that uses an optimized transformer architecture.

In this deployment, the [Groq/Llama-3-Groq-8B-Tool-Use](https://huggingface.co/Groq/Llama-3-Groq-8B-Tool-Use) pretrained model is used, which generates a continuation of the incoming text. But to access this model you must have access granted by the Meta that you can request from https://huggingface.co/Groq/Llama-3-Groq-8B-Tool-Use.

## Deploying

Use [this SDL](deploy.yaml) to deploy the application on Akash. You will need to enter your Huggingface Access Key in "HF_TOKEN=" ENV variable and you can adjust the parameters passed into the "vllm serve" argument according to your hardware cluster configuration (refer to vLLM documentation for the various parameters). Lastly you can add additional debug flags through the ENV variables (consult the vLLM and Pytorch documentation for this as well)
65 changes: 65 additions & 0 deletions Llama-3-Groq-8B-Tool-Use/deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
version: "2.0"
services:
vllm:
image: vllm/vllm-openai:v0.6.2@sha256:55a88146a4da0b6e193431b5b1d3492dfd7bebdc16919df4d031273e85a6157c
expose:
- port: 8000
as: 8000
to:
- global: true
command:
- bash
- "-c"
args: # see https://docs.vllm.ai/en/latest/models/engine_args.html for all avaibale arguments
- >-
vllm serve Groq/Llama-3-Groq-8B-Tool-Use --gpu-memory-utilization 0.99
env:
- "HF_TOKEN=" # Hugging Face API token required to download restricted or private models
#- NCCL_DEBUG=INFO # Uncomment to enable NCCL debugging
params:
storage:
data:
mount: /root/.cache # Mount the data storage to the cache directory for persistent storage of model files
readOnly: false
shm:
mount: /dev/shm
profiles:
compute:
vllm:
resources:
cpu:
units: 6
memory:
size: 16Gi
storage:
- size: 10Gi
- name: data
size: 50Gi
attributes:
persistent: true
class: beta3
- name: shm
size: 10Gi
attributes:
class: ram
persistent: false
gpu:
units: 1
attributes:
vendor:
nvidia:
# - model: a100
# - model: h100
# - model: rtx4090
placement:
dcloud:
pricing:
vllm:
denom: uakt
amount: 10000
deployment:
vllm:
dcloud:
profile: vllm
count: 1
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,7 @@ Also, follow [@akashnet\_](https://twitter.com/akashnet_) to stay in the loop wi
- [InvokeAI](invoke-ai-gpu)
- [Llama-2-70B](Llama-2-70B)
- [Llama-3-8B](Llama-3-8B)
- [Llama-3-Groq-8B-Tool-Use](Llama-3-Groq-8B-Tool-Use)
- [Llama-3-70B](Llama-3-70B)
- [Llama-3.1-8B](Llama-3.1-8B)
- [Llama-3.1-405B-AWQ-INT4](Llama-3.1-405B-AWQ-INT4)
Expand Down

0 comments on commit bd2f96a

Please sign in to comment.