Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Bugfix] Refine ci tests and revert many-to-many migration commit to avoid ci tests failure #74

Merged
merged 41 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8faeefc
[Bugfix] Address Request Status Changes During Migration Asynchronous…
KuilongCui Nov 18, 2024
2f9c264
update error log, kill bench process
KuilongCui Nov 21, 2024
9b37af8
remove hacker
KuilongCui Nov 21, 2024
9b2e765
fix
KuilongCui Nov 21, 2024
43df148
[CI] Fix and refine ci tests to avoid unexpected failure (#79)
s5u13b Nov 27, 2024
d9f0eb3
Fix lint
s5u13b Nov 27, 2024
a7a9f1a
Fix bakeup error_log
s5u13b Nov 27, 2024
1ec320d
Refine cleanup_ray_env
s5u13b Nov 27, 2024
9a101f1
More detailed pytest output
s5u13b Nov 27, 2024
12cc9d6
Temp try to fix e2e test
s5u13b Nov 27, 2024
fdceebc
Fix lint & Broader except
s5u13b Nov 27, 2024
f235202
Change subprocess.run check to False
s5u13b Nov 27, 2024
fdc898f
Minor
s5u13b Nov 27, 2024
5e80827
Change bash to sh in run test script
s5u13b Nov 27, 2024
fa6174a
Disable many-to-many migration temporarily
s5u13b Nov 27, 2024
d160224
Add pytest_runtest_makereport
s5u13b Nov 27, 2024
9db0381
Fix
s5u13b Nov 28, 2024
0d10de0
Refine subprocess.run and fixture
s5u13b Nov 28, 2024
d28afc0
Change subprocess.run check
s5u13b Nov 28, 2024
0e56a16
Fix
s5u13b Nov 28, 2024
9c5dc28
Refine -x
s5u13b Nov 28, 2024
37e976e
More strict migration test
s5u13b Nov 28, 2024
97535fa
Fix
s5u13b Nov 28, 2024
f0ba0e9
Minor
s5u13b Nov 29, 2024
967881f
Rename queue & Refine log
s5u13b Dec 5, 2024
1965489
Fix lint
s5u13b Dec 5, 2024
5c4af53
Fix
s5u13b Dec 5, 2024
0d91fd5
Change logger
s5u13b Dec 5, 2024
d59fae7
Add log for debugging ci
s5u13b Dec 5, 2024
b739dfb
Change request num from 1000 to 500 for ci debugging
s5u13b Dec 5, 2024
62266c8
Change from 500 to 300
s5u13b Dec 5, 2024
d615cf7
debugging ci
s5u13b Dec 5, 2024
491eb59
Fix
s5u13b Dec 5, 2024
c48af6c
Fix max-num-batched-tokens
s5u13b Dec 9, 2024
17e4c23
Fix exception handling of migration
s5u13b Dec 9, 2024
c17cec4
Revert "[Core] Support one-to-many and many-to-one migration (#63)"
s5u13b Dec 9, 2024
98517a7
Fix unit_test
s5u13b Dec 9, 2024
c81ea97
Fix e2e test
s5u13b Dec 9, 2024
109a7ef
Refine log
s5u13b Dec 9, 2024
77996ca
Fix lint
s5u13b Dec 9, 2024
3d388f9
Fix comments
s5u13b Dec 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/bench_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,15 +20,15 @@ jobs:
bench_tests:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 60
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Kill Running Containers
run: |
[[ -n $(docker ps -q) ]] && docker kill $(docker ps -q) || echo "No running containers to kill."
- name: Build And Test
run: ./tools/bench_test.sh
run: ./tools/run_test.sh bench_test
- name: Create comment from file
if: ${{ github.event_name != 'push' }}
uses: actions/github-script@v7
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/e2e_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,12 +20,12 @@ jobs:
e2e_tests:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 60
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Kill Running Containers
run: |
[[ -n $(docker ps -q) ]] && docker kill $(docker ps -q) || echo "No running containers to kill."
- name: Build And Test
run: ./tools/e2e_test.sh
run: ./tools/run_test.sh e2e_test
6 changes: 3 additions & 3 deletions .github/workflows/migration_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,15 +20,15 @@ jobs:
migration_tests:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 60
timeout-minutes: 90
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Kill Running Containers
run: |
[[ -n $(docker ps -q) ]] && docker kill $(docker ps -q) || echo "No running containers to kill."
- name: Build And Test
run: ./tools/migration_test.sh
run: ./tools/run_test.sh migration_test
- name: Create comment from file
if: ${{ github.event_name != 'push' }}
uses: actions/github-script@v7
Expand Down
11 changes: 3 additions & 8 deletions .github/workflows/offline_inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,13 +20,8 @@ jobs:
offline_inference:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 10
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
- name: Run offline inference example
run: |
nvidia-docker run --rm -t --net host --ipc host \
s5u13b marked this conversation as resolved.
Show resolved Hide resolved
-v ${PWD}:/workspace \
-w /workspace \
registry.cn-beijing.aliyuncs.com/llumnix/llumnix-dev:20240909_action_678a439 \
bash -c "pip install -e . > /dev/null && make offline_test"
run: ./tools/run_test.sh offline_test
4 changes: 2 additions & 2 deletions .github/workflows/pylint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,7 +20,7 @@ jobs:
pylint_test:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 10
timeout-minutes: 5
steps:
- uses: actions/checkout@v4
- name: Analysing the code with pylint
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/unit_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
cancel_previous_workflows:
runs-on: ubuntu-latest
timeout-minutes: 3
timeout-minutes: 1
steps:
- uses: styfle/[email protected]
with:
Expand All @@ -20,12 +20,12 @@ jobs:
unit_tests:
needs: cancel_previous_workflows
runs-on: [self-hosted]
timeout-minutes: 60
timeout-minutes: 30
s5u13b marked this conversation as resolved.
Show resolved Hide resolved
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Kill Running Containers
run: |
[[ -n $(docker ps -q) ]] && docker kill $(docker ps -q) || echo "No running containers to kill."
- name: Build And Test
run: ./tools/unit_test.sh
run: ./tools/run_test.sh unit_test
2 changes: 1 addition & 1 deletion .github/workflows/whl_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
jobs:
whl_build:
runs-on: ubuntu-latest
timeout-minutes: 10
timeout-minutes: 1

steps:
- name: Checkout
Expand Down
12 changes: 6 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ lint: check_pylint_installed check_pytest_installed
test: check_pytest_installed
@pytest -v --ignore=third_party/ --ignore=tests/e2e_test --disable-warnings
@python examlpes/offline_inference.py
@pytest -v ./tests/e2e_test/test_e2e.py
@pytest -v ./tests/e2e_test/test_bench.py
@pytest -v ./tests/e2e_test/test_migration.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_e2e.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_bench.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_migration.py

.PHONY: unit_test
unit_test: check_pytest_installed
Expand All @@ -45,15 +45,15 @@ offline_test:

.PHONY: e2e_test
e2e_test:
@pytest -v ./tests/e2e_test/test_e2e.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_e2e.py

.PHONY: bench_test
bench_test:
@pytest -v ./tests/e2e_test/test_bench.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_bench.py

.PHONY: migration_test
migration_test:
@pytest -v ./tests/e2e_test/test_migration.py
@pytest -v -x -s --tb=long ./tests/e2e_test/test_migration.py

#################### pygloo install for gloo migration backend begin ####################

Expand Down
3 changes: 1 addition & 2 deletions configs/base.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
SERVER:
HOST: '127.0.0.1'
PORT: 1234
QUEUE_TYPE: "rayqueue"
REQUEST_OUTPUT_QUEUE_TYPE: "rayqueue"
RAY_CLUSTER_PORT: 6379
LAUNCH_RAY_CLUSTER: True

Expand All @@ -20,6 +20,5 @@ MANAGER:

MIGRATION_BACKEND: 'gloo'
MIGRATION_BUFFER_BLOCKS: 512
MIGRATION_INTERNAL_BUFFER_NUM: 2

ENABLE_SCALING: False
9 changes: 2 additions & 7 deletions docs/Arguments.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,14 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
[--profiling-result-file-path PROFILING_RESULT_FILE_PATH]
[--gpu-type GPU_TYPE]
[--polling-interval POLLING_INTERVAL]
[--migration-backend {gloo,rpc}]
[--migration-backend {gloo,nccl,rpc}]
[--migration-buffer-blocks MIGRATION_BUFFER_BLOCKS]
[--migration-backend-init-timeout MIGRATION_BACKEND_INIT_TIMEOUT]
[--migration-num-layers MIGRATION_NUM_LAYERS]
[--last-stage-max-blocks LAST_STAGE_MAX_BLOCKS]
[--max-stages MAX_STAGES]
[--enable-pd-disagg]
[--num-dispatch-instances NUM_DISPATCH_INSTANCES]
[--migration-internal-buffer-num MIGRATION_INTERNAL_BUFFER_NUM]
[--log-request-timestamps]

```
Expand Down Expand Up @@ -149,7 +148,7 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
- Default: "rpc"

`--migration-buffer-blocks`
- Number of cache blocks in each migration buffer.
- Number of cache blocks in migration.
- Default: 512

`--migration-backend-init-timeout`
Expand All @@ -168,10 +167,6 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
- Drop migration if the number of stages > max_stages.
- Default: 3

`--migration-internal-buffer-num`
- Number of the buffer in migration backend for sending and receiving
- Default: 2

`--log-request-timestamps`
- Enable logging request timestamps.

Expand Down
18 changes: 5 additions & 13 deletions llumnix/arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ def add_argument(self, *args, **kwargs):
class LlumnixEntrypointsArgs:
launch_ray_cluster: bool = None
ray_cluster_port: int = None
queue_type: str = None
request_output_queue_type: str = None
request_output_queue_port: int = None
disable_log_requests_server: bool = None
log_request_timestamps: bool = None
Expand Down Expand Up @@ -82,10 +82,10 @@ def add_cli_args(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
parser.add_argument("--ray-cluster-port",
type=int,
help='ray cluster port')
parser.add_argument("--queue-type",
parser.add_argument("--request-output-queue-type",
type=str,
choices=['rayqueue', 'zmq'],
help='queue type for request output queue')
help='request output queue type for request output queue')
parser.add_argument("--request-output-queue-port",
type=int,
help='port for zmq')
Expand Down Expand Up @@ -138,7 +138,6 @@ class EngineManagerArgs:
migration_num_layers: int = None
last_stage_max_blocks: int = None
max_stages: int = None
migration_internal_buffer_num: int = None

enable_pd_disagg: bool = None

Expand Down Expand Up @@ -177,8 +176,7 @@ def create_migration_config(self) -> MigrationConfig:
self.migration_num_layers,
self.last_stage_max_blocks,
self.max_stages,
self.migration_backend_init_timeout,
self.migration_internal_buffer_num)
self.migration_backend_init_timeout)
return migration_config

@classmethod
Expand All @@ -197,9 +195,6 @@ def check_args(cls, args: 'EngineManagerArgs', parser: argparse.ArgumentParser):
if hasattr(action, 'choices') and action.choices is not None and hasattr(args, action.dest):
assert getattr(args, action.dest) in action.choices, f"{action.dest} should be one of {action.choices}."

assert args.migration_backend != 'nccl', 'NCCL has been temporarily deprecated due to its incompatibility with \
concurrent migrations in Llumnix.'

assert args.migration_backend != 'gloo' or (args.migration_backend == 'gloo' \
and not args.disable_init_instance_by_manager and not args.disable_fixed_node_init_instance), \
("When using gloo as migration backend, "
Expand Down Expand Up @@ -316,16 +311,13 @@ def add_cli_args(
help='timeout(s) for initializing migration backend')
parser.add_argument('--migration-buffer-blocks',
type=int,
help='number of cache blocks in each migration buffer')
help='number of cache blocks in migration')
parser.add_argument('--migration-num-layers',
type=int,
help='number of kv-cache layers to transfer in each round during migration')
parser.add_argument('--last-stage-max-blocks',
type=int,
help='if the number pf remain blocks < last_stage_max_blocks, do last stage migration')
parser.add_argument('--migration-internal-buffer-num',
type=int,
help='number of the buffer in migration backend for sending and receiving')
parser.add_argument('--max-stages',
type=int,
help='drop migration if the number of stages > max_stages')
Expand Down
23 changes: 0 additions & 23 deletions llumnix/backends/migration_backend_interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,7 @@

from abc import ABC, abstractmethod
from typing import List
import queue

import torch

class MigrationBackendBase(ABC):
@abstractmethod
Expand All @@ -41,24 +39,3 @@ def do_send(self, dst_handle, blocks: List[int]):
@abstractmethod
def do_recv(self, src_handle, blocks: List[int]):
raise NotImplementedError
s5u13b marked this conversation as resolved.
Show resolved Hide resolved

class BufferMigrationBackend(MigrationBackendBase):
def __init__(self, num_buffer, buffer_shape, buffer_dtype, buffer_device, pin_memory, *args, **kwargs):
super().__init__(*args, **kwargs)

self.num_buffer = num_buffer

self.dummy_buffer = [
torch.empty(size=buffer_shape, dtype=buffer_dtype, device=buffer_device, pin_memory=pin_memory)
for _ in range(self.num_buffer)
]

self.avaiable_buffer_queue = queue.Queue()
for i in range(self.num_buffer):
self.avaiable_buffer_queue.put_nowait(i)

def get_available_cache(self):
return self.avaiable_buffer_queue.get()

def put_back_cache(self, buffer_id):
self.avaiable_buffer_queue.put_nowait(buffer_id)
6 changes: 3 additions & 3 deletions llumnix/backends/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,16 @@
from llumnix.backends.backend_interface import BackendInterface, BackendType
from llumnix.queue.queue_type import QueueType

def init_backend_engine(instance_id: str, output_queue_type: QueueType,
def init_backend_engine(instance_id: str, request_output_queue_type: QueueType,
backend_type: BackendType, *args, **kwargs) -> BackendInterface:
if backend_type == BackendType.VLLM:
# pylint: disable=import-outside-toplevel
from llumnix.backends.vllm.llm_engine import BackendVLLM
backend_engine = BackendVLLM(instance_id, output_queue_type, *args, **kwargs)
backend_engine = BackendVLLM(instance_id, request_output_queue_type, *args, **kwargs)
elif backend_type == BackendType.SIM_VLLM:
# pylint: disable=import-outside-toplevel
from llumnix.backends.vllm.simulator import BackendSimVLLM
backend_engine = BackendSimVLLM(instance_id, output_queue_type, *args, **kwargs)
backend_engine = BackendSimVLLM(instance_id, request_output_queue_type, *args, **kwargs)
else:
raise ValueError(f'Unsupported backend: {backend_type}')
return backend_engine
Expand Down
Loading
Loading