Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add HPU support to vLLM v1 #487

Draft
wants to merge 36 commits into
base: habana_main
Choose a base branch
from

Conversation

kzawora-intel
Copy link

@kzawora-intel kzawora-intel commented Nov 12, 2024

Early prototype. Lots of basic stuff is completely broken.

  • Implemented v1 HPU attn backend, worker, model_runner and executor
  • VLLM_USE_V1=1 properly selects V1 HPU components
  • V1 HPU executor loads model properly
  • V1 HPU executor allocates KV cache properly
  • V1 HPU model runner is constructed properly and initializes bucketing
  • V1 HPU attention backend gets selected automatically
  • profile_run works on dummy data
  • V1 HPU model_runner prepares input tensors based on SchedulerOutputs (rather than SequenceGroupMetadata)
  • V1 HPU model_runner differentiates prefill and decode sequences
  • V1 HPU model_runner execute_model runs for prefill
  • V1 HPU model_runner execute_model runs for decode
  • V1 HPU model_runner handles mixed-batch scenarios
  • V1 HPU model_runner prefill returns correct results
  • V1 HPU model_runner decode returns correct results (w/ flat PA)
  • V1 HPU model_runner decode returns correct results (w/ contiguous PA)
  • V1 HPU model_runner prefill runs at BS>1
  • V1 standard greedy and random sampling work on HPU
  • Capturing and replaying HPU Graphs work
  • Llama3.1-8B runs on GSM-8k with SOTA accuracy
  • V1 HPU model_runner warmup works properly
  • V1 HPU automatic prefix caching works properly
  • Tensor parallelism works
  • torch.compile works

@kzawora-intel kzawora-intel marked this pull request as draft November 12, 2024 16:08
def test_stateless_process_group(worker):
port1 = get_open_port()
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))

Check warning

Code scanning / CodeQL

Binding a socket to all network interfaces Medium test

'' binds a socket to all interfaces.

Copilot Autofix AI about 2 months ago

To fix the problem, we need to bind the socket to a specific interface instead of all interfaces. In this case, we can bind the socket to the loopback interface (127.0.0.1), which is commonly used for local testing and does not expose the service to external networks.

  • Change the binding address from '' to '127.0.0.1' on line 127.
  • This change ensures that the socket only accepts connections from the local machine, mitigating the security risk.
Suggested changeset 1
tests/distributed/test_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/distributed/test_utils.py b/tests/distributed/test_utils.py
--- a/tests/distributed/test_utils.py
+++ b/tests/distributed/test_utils.py
@@ -126,3 +126,3 @@
     with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
-        s.bind(("", port1))
+        s.bind(("127.0.0.1", port1))
         port2 = get_open_port()
EOF
@@ -126,3 +126,3 @@
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.bind(("", port1))
s.bind(("127.0.0.1", port1))
port2 = get_open_port()
Copilot is powered by AI and may make mistakes. Always verify output.
Positive Feedback
Negative Feedback

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

Please select one or more of the options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant