-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Baseline for SGLang Benchmark Test (#602)
# Description The SGLang Benchmark Test has been running for awhile, but only benchmarks the shortfin server itself. In order to get a baseline metric and enable long-term convergence in-terms of performance, we need to be able to track metrics of the SGLang server using the same benchmark method. This adds an `sglang_benchmark_test` to complement the `shortfin_benchmark_test`. Also restructures `app_tests/benchmark_tests/llm` -> `app_tests/benchmark_tests/llm/sglang_benchmarks`. This keeps the benchmark tests organized and allows for the folder to be extended with other types of benchmarks in the future. # Why are we using docker to start the SGLang server? Currently, the pyprompt.toml file inside of SGLang requires `vllm==0.6.3.dev13` to run on ROCm. I looked into potentially building vLLM from source for this test, but couldn't find a branch, tag, or release that matched that signature. From their own comments inside of `pyproject.toml`, it appears to only be available inside of a `ROCm` base image: ```toml # HIP (Heterogeneous-computing Interface for Portability) for AMD # => base docker rocm/vllm-dev:20241022, not from public vllm whl srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.3.dev13"] ``` Their [instructions](https://sgl-project.github.io/start/install.html#method-3-using-docker) on installing SGLang and running for ROCm also appear to suggest the docker method: ## Instructions from their docs for running with ROCm ``` docker build --build-arg SGL_BRANCH=v0.3.5.post2 -t v0.3.5.post2-rocm620 -f Dockerfile.rocm . alias drun='docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --ipc=host \ --shm-size 16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined \ -v $HOME/dockerx:/dockerx -v /data:/data' drun -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ v0.3.5.post2-rocm620 \ python3 -m sglang.launch_server --model-path meta-llama/Llama-3.1-8B-Instruct --host 0.0.0.0 --port 30000 ``` The workflow file handles starting the container and cleaning up once the workflow is done. I set the timeout for waiting for the server to start to `10 minutes` to give the SGLang server enough time to load necessary model weights and startup.
- Loading branch information
1 parent
da4644e
commit 539be41
Showing
6 changed files
with
301 additions
and
75 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.