Crash Loop using Llama 3.1 on VLLM 0.6.4.post1 #10393

aghbd · 2024-11-16T15:56:28Z

aghbd
Nov 16, 2024

Hello, I am currently hosting a Docker image with VLLM 0.6.4.post1 tied to a k8 pod. When attempting to make a request for completions, I receive the following trace

Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: INFO 11-16 15:18:47 logger.py:37] Received request cmpl-ad9b68a083ad4bb09522daf6d65744c0-0: prompt: 'Hello, this is a test.', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=20, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None), prompt_token_ids: [9906, 11, 420, 374, 264, 1296, 13], lora_request: None, prompt_adapter_request: None. Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: INFO 11-16 15:18:47 engine.py:267] Added request cmpl-ad9b68a083ad4bb09522daf6d65744c0-0. Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: INFO 11-16 15:18:47 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241116-151847.pkl... Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: INFO 11-16 15:18:47 model_runner_base.py:149] Completed writing input of failed execution to /tmp/err_execute_model_input_20241116-151847.pkl. Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: CRITICAL 11-16 15:18:47 launcher.py:99] MQLLMEngine is already dead, terminating server process Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: INFO: 127.0.0.1:59556 - "POST /v1/completions HTTP/1.1" 500 Internal Server Error Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] TypeError("CompilationError.__init__() missing 1 required positional argument: 'node'") Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Traceback (most recent call last): Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/language/core.py", line 35, in wrapper Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return fn(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/language/core.py", line 1597, in load Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return semantic.load(pointer, mask, other, boundary_check, padding_option, cache_modifier, eviction_policy, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/language/semantic.py", line 1037, in load Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return _load_legacy(ptr, mask, other, boundary_check, padding, cache, eviction, is_volatile, builder) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/language/semantic.py", line 1005, in _load_legacy Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] other = cast(other, elt_ty, builder) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/language/semantic.py", line 759, in cast Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] assert builder.options.allow_fp8e4nv, "fp8e4nv data type is not supported on CUDA arch < 89" Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] AssertionError: fp8e4nv data type is not supported on CUDA arch < 89 Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] The above exception was the direct cause of the following exception: Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Traceback (most recent call last): Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return func(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1654, in execute_model Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] hidden_or_intermediate_states = model_executable( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self._call_impl(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return forward_call(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 553, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] model_output = self.model(input_ids, positions, kv_caches, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 143, in __call__ Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self.forward(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 340, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] hidden_states, residual = layer(positions, hidden_states, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self._call_impl(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return forward_call(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 259, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] hidden_states = self.self_attn(positions=positions, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self._call_impl(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return forward_call(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 189, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] attn_output = self.attn(q, k, v, kv_cache, attn_metadata) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self._call_impl(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return forward_call(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/attention/layer.py", line 99, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self.impl.forward(query, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/attention/backends/xformers.py", line 566, in forward Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] out = PagedAttention.forward_prefix( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/attention/ops/paged_attn.py", line 211, in forward_prefix Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] context_attention_fwd( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return func(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/attention/ops/prefix_prefill.py", line 811, in context_attention_fwd Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] _fwd_kernel[grid]( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in <lambda> Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/runtime/jit.py", line 662, in run Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] kernel = self.compile( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/compiler/compiler.py", line 276, in compile Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] module = src.make_ir(options, codegen_fns, context) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/triton/compiler/compiler.py", line 113, in make_ir Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] triton.compiler.errors.CompilationError: at 110:17: Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] cur_kv_head * stride_k_cache_h + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] (offs_d[:, None] // x) * stride_k_cache_d + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] ((start_n + offs_n[None, :]) % block_size) * Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] stride_k_cache_bl + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] (offs_d[:, None] % x) * stride_k_cache_x) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] # [N,D] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] off_v = ( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] bn[:, None] * stride_v_cache_bs + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] cur_kv_head * stride_v_cache_h + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] offs_d[None, :] * stride_v_cache_d + Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] (start_n + offs_n[:, None]) % block_size * stride_v_cache_bl) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] k_load = tl.load(K_cache + off_k, Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] ^ Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] During handling of the above exception, another exception occurred: Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] Traceback (most recent call last): Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 133, in start Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] self.run_engine_loop() Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 196, in run_engine_loop Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] request_outputs = self.engine_step() Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 214, in engine_step Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] raise e Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 205, in engine_step Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return self.engine.step() Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1454, in step Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] outputs = self.model_executor.execute_model( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 125, in execute_model Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] output = self.driver_worker.execute_model(execute_model_req) INFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. INFO: Finished server process [1329] Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 343, in execute_model Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] output = self.model_runner.execute_model( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] return func(*args, **kwargs) Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] File "/miniforge3/lib/python3.10/site-packages/vllm/worker/model_runner_base.py", line 152, in _wrapper Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] raise type(err)( Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 :: ERROR 11-16 15:18:47 engine.py:135] TypeError: CompilationError.__init__() missing 1 required positional argument: 'node' Download model after launching VLM server. Default Model Args are: --max-model-len 128000 --quantization marlin --gpu-memory-utilization 0.97 --trust-remote-code --enforce-eager --kv-cache-dtype fp8 Number of GPU's consumed: 1 NVIDIA visible device string: 1 CUDA visible device string: Starting vllm api server... Model: hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 LLM_LOG_LEVEL is set to Running checks to wait for Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 server to start... Could not resolve Meta-Llama-3.1-70B-Instruct-GPTQ-INT4 server. Pausing 10s.

Curious as this does not happen on current production version 0.6.0.

Here is the following code invoking it

`import requests
import json
import sys


def test_vllm_pod(host="localhost", port=1453):
    """Basic test for VLLM pod connectivity"""
    url = f"http://{host}:{port}/v1/completions"

    # Minimal payload
    payload = {
        "prompt": "Say hello:",
        "max_tokens": 10,
        "temperature": 0,
        "model": "hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4"
    }

    print(f"Testing VLLM pod at {url}")
    print(f"Request payload:\n{json.dumps(payload, indent=2)}")

    try:
        response = requests.post(
            url,
            json=payload,
            headers={"Content-Type": "application/json"},
            timeout=30  # Increased timeout
        )

        print(f"\nResponse Status: {response.status_code}")
        print(f"Response Headers:")
        for k, v in response.headers.items():
            print(f"{k}: {v}")

        if response.status_code == 200:
            try:
                data = response.json()
                print(f"\nResponse Data:\n{json.dumps(data, indent=2)}")
            except json.JSONDecodeError:
                print(f"\nRaw Response Text:\n{response.text}")
        else:
            print(f"\nError Response:\n{response.text}")

    except requests.exceptions.ConnectionError:
        print(f"Connection failed - verify port-forward to pod is active")
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument("--host", default="0.0.0.0")
    parser.add_argument("--port", type=int, default=1453)
    args = parser.parse_args()

    test_vllm_pod(args.host, args.port)`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash Loop using Llama 3.1 on VLLM 0.6.4.post1 #10393

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Crash Loop using Llama 3.1 on VLLM 0.6.4.post1 #10393

aghbd Nov 16, 2024

Replies: 0 comments

aghbd
Nov 16, 2024