Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bladellm] Support dispatch feature for BladeLLM #86

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

KuilongCui
Copy link
Contributor

No description provided.

@KuilongCui KuilongCui changed the title [WIP] Dispatch only for bladellm [Bladellm] support dispatch feature for Bladellm Dec 12, 2024
@s5u13b s5u13b changed the title [Bladellm] support dispatch feature for Bladellm [Bladellm] Support dispatch feature for Bladellm Dec 13, 2024
configs/blade.yml Outdated Show resolved Hide resolved
@@ -32,8 +32,11 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
[--profiling-result-file-path PROFILING_RESULT_FILE_PATH]
[--gpu-type GPU_TYPE]
[--polling-interval POLLING_INTERVAL]
[--migration-backend {gloo,nccl,rpc}]
[--migration-backend {gloo,nccl,rpc,grpc,kvtransfer}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add explanations in helps and arguments.py for the args of blade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, it seems confusing to have both "rpc" and "grpc". Maybe change "rpc" to "rayrpc"?

llumnix/__init__.py Show resolved Hide resolved

assert args.migration_backend != 'gloo' or (args.migration_backend == 'gloo' \
and not args.disable_init_instance_by_manager and not args.disable_fixed_node_init_instance), \
("When using gloo as migration backend, "
"do not set --disable-init-instance-by-manager and --disable-fixed-node-init-instance.")

assert args.migration_backend not in ['kvtransfer'] or (args.migration_backend == 'kvtransfer' \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add assert for the args only using by vllm/blade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO, clarify features of vllm/blade in parser, arg_utils, etc...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get

llumnix/backends/bladellm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/llm_engine_manager.py Show resolved Hide resolved
llumnix/llumlet/llumlet.py Outdated Show resolved Hide resolved
llumnix/llumlet/llumlet.py Outdated Show resolved Hide resolved
llumnix/queue/utils.py Outdated Show resolved Hide resolved

logger = init_logger(__name__)

class AsyncPutQueueActor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why put this actor here?

Copy link
Contributor Author

@KuilongCui KuilongCui Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put it in backends/utils.py now

@s5u13b s5u13b changed the title [Bladellm] Support dispatch feature for Bladellm [Bladellm] Support dispatch feature for BladeLLM Dec 13, 2024
refine

fix loguru error

fix kwarg error

fix ray autoscale error

fix

fix
@AlibabaPAI AlibabaPAI deleted a comment from github-actions bot Dec 16, 2024
@AlibabaPAI AlibabaPAI deleted a comment from github-actions bot Dec 16, 2024
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 15613.53 81900.35 138709.87 172273.69 173170.95 78941.22
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.23 55.58 64.63 100.66 128.00 63.03

Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 264.00 MB 272.00 MB 296.00 MB 336.00 MB
rpc_speed(GB/s) 1.04 1.57 1.78 1.95 2.06 2.14 2.17 2.23 2.30 2.29 2.32 2.42 2.41 2.48 2.44 2.50 2.44 2.67 2.53 2.54 2.51 2.60 2.57 2.60 2.65 2.94 2.69 2.64 2.65 2.72 2.79 3.15
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 240.00 MB 400.00 MB
gloo_speed(GB/s) 1.01 1.68 2.02 2.31 2.52 2.62 2.84 2.83 3.00 2.89 3.18 2.79 3.16 3.15 3.26 2.76 2.42 3.03 2.10 2.90 1.64 1.85 1.54 2.75 2.96 2.96 2.75 2.71 3.34 0.83

@@ -32,8 +32,11 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
[--profiling-result-file-path PROFILING_RESULT_FILE_PATH]
[--gpu-type GPU_TYPE]
[--polling-interval POLLING_INTERVAL]
[--migration-backend {gloo,nccl,rpc}]
[--migration-backend {gloo,nccl,rpc,grpc,kvtransfer}]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, it seems confusing to have both "rpc" and "grpc". Maybe change "rpc" to "rayrpc"?

llumnix/backends/bladellm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/backends/bladellm/llm_engine.py Outdated Show resolved Hide resolved
llumnix/metrics/base_metrics.py Show resolved Hide resolved
llumnix/backends/bladellm/llm_engine.py Show resolved Hide resolved
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 16681.14 71840.31 160162.11 190754.36 198458.71 84051.57
decode p25 p50 p75 p95 p99 mean
latency(ms) 49.17 53.46 59.73 97.64 134.30 58.82

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 264.00 MB 272.00 MB 280.00 MB 368.00 MB 448.00 MB 472.00 MB
rayrpc_speed(GB/s) 3.88 1.04 1.53 1.80 1.98 2.05 2.14 2.15 2.19 2.27 2.35 2.34 2.39 2.38 2.42 2.42 2.55 2.50 2.55 2.59 2.51 2.68 2.54 2.49 2.49 2.67 2.57 2.58 2.81 2.77 2.71 2.80 2.77 3.26 3.13 3.24
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 248.00 MB 312.00 MB
gloo_speed(GB/s) 1.01 1.69 2.12 2.31 2.56 2.90 2.94 3.05 3.03 3.26 3.01 3.28 2.76 3.13 2.99 2.67 3.27 2.63 2.49 2.78 3.13 2.58 1.59 2.91 2.86 3.00 3.06 3.65 3.26 0.68

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants