-
Notifications
You must be signed in to change notification settings - Fork 308
Add daily lib integration test #2601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2601
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled JobAs of commit a8161ba with merge base 376d6d2 ( CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
pip install . | ||
# to not interfere with pytorch version | ||
git clone https://github.com/vllm-project/vllm.git | ||
cd vllm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this installing vllm into the torchao directory? if yes, maybe we can change it to install it side by side instead?
root
/torchao
/vllm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh OK, makes sense
include: | ||
- name: SM-89 | ||
runs-on: linux.g6.4xlarge.experimental.nvidia.gpu | ||
torch-lib-spec: '--pre fbgemm-gpu-genai --index-url https://download.pytorch.org/whl/nightly/cu126' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like this is testing nightly pytorch + latest torchao + latest vllm?
how about switching it to stable pytorch + latest torchao + stable vllm, so the only thing moving often is torchao? Otherwise I feel it will be noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me switch a bit later or add the stable version test later, since we are actively adding new things now, we need fixes from fbgemm-gpu-genai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opened #2607 to track
Summary: * We are separating integration tests since it tends to be more noisy than the other tests * run it daily instead of on every PR to reduce the cost of running the tests Test Plan: CI Reviewers: Subscribers: Tasks: Tags:
609b808
to
a8161ba
Compare
git submodule update --init --recursive | ||
python use_existing_torch.py | ||
pip install -r requirements/build.txt | ||
pip install --no-build-isolation -e . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my comment in #2463 (comment), please don't land this PR with pip install --no-build-isolation -e .
vLLM from source as it kills our H100 cluster. I have created an issue on our end to have better isolation for this case https://github.com/pytorch-labs/pytorch-gha-infra/issues/766.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we could consider using vLLM Docker image to avoid building it altogether #2610
Summary:
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags: