-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous cosmetic changes #166
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
/ready |
a77839b
to
c41825a
Compare
setup_cython.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just keep it until proven worthless, it does no harm on its own
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation purposes, I observe the following error when building with python3 setup_cython.py build_ext --inplace
and running benchmark_latency.py
. Needs further investigation.
[rank0]: File "vllm/entrypoints/llm.py", line 697, in _run_engine
[rank0]: step_outputs = self.llm_engine.step()
[rank0]: File "vllm/engine/llm_engine.py", line 1585, in vllm.engine.llm_engine.LLMEngine.step
[rank0]: self._advance_to_next_step(
[rank0]: TypeError: Argument 'output' has incorrect type (expected list, got SamplerOutput)
Dockerfile.rocm
Outdated
@@ -1,17 +1,18 @@ | |||
# default base image | |||
ARG BASE_IMAGE="rocm/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging" | |||
ARG BASE_IMAGE="rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_staging" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use 22.04
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we should use 22.04? Besides it being newer which does not bring any performance improvements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can...?
I am doing all my testing on 22 lately and don't see any issues. But mainly yeah, more newer more better, 20 will be eventually deprecated as was 18 (next April to be precise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Because we can" is not an affirmative reason. "More newer more better" is dubious at best.
Deprecation is something we can consider but not for ROCm 6.2 unless in the unlikely event there's a patch for it released after next April which we use.
I'll draw your attention to PRs which may have missed your attention like vllm-project#6517 for one reason we should consider upgrades carefully. We have seen other limitations in Ubuntu 22.04 versions in the past, such as lack of prebuilt ASan libraries obtainable via apt
on ROCm 6.1. Absent a affirmative reason, disturbing the status quo is fixing what isn't broke.
Dockerfile.rocm
Outdated
# ----------------------- | ||
# RCCL build stages | ||
FROM base AS build_rccl | ||
ARG RCCL_BRANCH="73221b4" | ||
ARG RCCL_BRANCH="833435b" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it's past the point they introduced a bug with queues.
Let's make this an official release tag, probably dependent on rocm version
Spiritual continuation of #163 with some changes left out by 05e67ab and further Dockerfile improvements.This upgrades the ROCm vLLM container to ROCm 6.2 and updates dependencies including CK-based flash-attention, hipBLASLt, RCCL, and Triton. The first 3 are not installed by default to cut down on container build time. However, it should be installed by specifying--build-arg "BUILD_FA=1"
if users are looking for sliding window attention support.Changes above deferred to a later PR. Only contains small cosmetic improvements.