[Bladellm] Support dispatch feature for BladeLLM #86

KuilongCui · 2024-12-11T12:10:10Z

No description provided.

configs/blade.yml

s5u13b · 2024-12-13T05:29:22Z

docs/Arguments.md

@@ -32,8 +32,11 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
            [--profiling-result-file-path PROFILING_RESULT_FILE_PATH]
            [--gpu-type GPU_TYPE]
            [--polling-interval POLLING_INTERVAL]
-            [--migration-backend {gloo,nccl,rpc}]
+            [--migration-backend {gloo,nccl,rpc,grpc,kvtransfer}]


We need to add explanations in helps and arguments.py for the args of blade.

And, it seems confusing to have both "rpc" and "grpc". Maybe change "rpc" to "rayrpc"?

llumnix/__init__.py

s5u13b · 2024-12-13T05:31:28Z

llumnix/arg_utils.py


        assert args.migration_backend != 'gloo' or (args.migration_backend == 'gloo' \
            and not args.disable_init_instance_by_manager and not args.disable_fixed_node_init_instance), \
            ("When using gloo as migration backend, "
             "do not set --disable-init-instance-by-manager and --disable-fixed-node-init-instance.")

+        assert args.migration_backend not in ['kvtransfer'] or (args.migration_backend == 'kvtransfer' \


We should add assert for the args only using by vllm/blade.

TODO, clarify features of vllm/blade in parser, arg_utils, etc...

llumnix/backends/bladellm/llm_engine.py

llumnix/llm_engine_manager.py

llumnix/llumlet/llumlet.py

llumnix/queue/utils.py

s5u13b · 2024-12-13T07:47:15Z

llumnix/queue/utils.py

+
+logger = init_logger(__name__)
+
+class AsyncPutQueueActor:


why put this actor here?

put it in backends/utils.py now

llumnix/backends/backend_interface.py

llumnix/entrypoints/utils.py

refine fix loguru error fix kwarg error fix ray autoscale error fix fix

github-actions · 2024-12-16T08:58:11Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	15613.53	81900.35	138709.87	172273.69	173170.95	78941.22

decode	p25	p50	p75	p95	p99	mean
latency(ms)	51.23	55.58	64.63	100.66	128.00	63.03

github-actions · 2024-12-16T09:02:47Z

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	264.00 MB	272.00 MB	296.00 MB	336.00 MB
rpc_speed(GB/s)	1.04	1.57	1.78	1.95	2.06	2.14	2.17	2.23	2.30	2.29	2.32	2.42	2.41	2.48	2.44	2.50	2.44	2.67	2.53	2.54	2.51	2.60	2.57	2.60	2.65	2.94	2.69	2.64	2.65	2.72	2.79	3.15

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	240.00 MB	400.00 MB
gloo_speed(GB/s)	1.01	1.68	2.02	2.31	2.52	2.62	2.84	2.83	3.00	2.89	3.18	2.79	3.16	3.15	3.26	2.76	2.42	3.03	2.10	2.90	1.64	1.85	1.54	2.75	2.96	2.96	2.75	2.71	3.34	0.83

zhypku · 2024-12-17T06:06:13Z

docs/Arguments.md

@@ -32,8 +32,11 @@ usage: -m llumnix.entrypoints.vllm.api_server [-h]
            [--profiling-result-file-path PROFILING_RESULT_FILE_PATH]
            [--gpu-type GPU_TYPE]
            [--polling-interval POLLING_INTERVAL]
-            [--migration-backend {gloo,nccl,rpc}]
+            [--migration-backend {gloo,nccl,rpc,grpc,kvtransfer}]


And, it seems confusing to have both "rpc" and "grpc". Maybe change "rpc" to "rayrpc"?

llumnix/backends/bladellm/llm_engine.py

llumnix/metrics/base_metrics.py

llumnix/backends/bladellm/llm_engine.py

github-actions · 2024-12-17T11:48:50Z

prefill	p25	p50	p75	p95	p99	mean
latency(ms)	16681.14	71840.31	160162.11	190754.36	198458.71	84051.57

decode	p25	p50	p75	p95	p99	mean
latency(ms)	49.17	53.46	59.73	97.64	134.30	58.82

github-actions · 2024-12-17T12:16:48Z

migration_size	1.59 GB	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	224.00 MB	232.00 MB	264.00 MB	272.00 MB	280.00 MB	368.00 MB	448.00 MB	472.00 MB
rayrpc_speed(GB/s)	3.88	1.04	1.53	1.80	1.98	2.05	2.14	2.15	2.19	2.27	2.35	2.34	2.39	2.38	2.42	2.42	2.55	2.50	2.55	2.59	2.51	2.68	2.54	2.49	2.49	2.67	2.57	2.58	2.81	2.77	2.71	2.80	2.77	3.26	3.13	3.24

migration_size	8.00 MB	16.00 MB	24.00 MB	32.00 MB	40.00 MB	48.00 MB	56.00 MB	64.00 MB	72.00 MB	80.00 MB	88.00 MB	96.00 MB	104.00 MB	112.00 MB	120.00 MB	128.00 MB	136.00 MB	144.00 MB	152.00 MB	160.00 MB	168.00 MB	176.00 MB	184.00 MB	192.00 MB	200.00 MB	208.00 MB	216.00 MB	232.00 MB	248.00 MB	312.00 MB
gloo_speed(GB/s)	1.01	1.69	2.12	2.31	2.56	2.90	2.94	3.05	3.03	3.26	3.01	3.28	2.76	3.13	2.99	2.67	3.27	2.63	2.49	2.78	3.13	2.58	1.59	2.91	2.86	3.00	3.06	3.65	3.26	0.68

KuilongCui changed the title ~~[WIP] Dispatch only for bladellm~~ [Bladellm] support dispatch feature for Bladellm Dec 12, 2024

s5u13b changed the title ~~[Bladellm] support dispatch feature for Bladellm~~ [Bladellm] Support dispatch feature for Bladellm Dec 13, 2024

s5u13b reviewed Dec 13, 2024

View reviewed changes

s5u13b changed the title ~~[Bladellm] Support dispatch feature for Bladellm~~ [Bladellm] Support dispatch feature for BladeLLM Dec 13, 2024

s5u13b reviewed Dec 13, 2024

View reviewed changes

llumnix/backends/backend_interface.py Show resolved Hide resolved

s5u13b reviewed Dec 13, 2024

View reviewed changes

llumnix/entrypoints/utils.py Outdated Show resolved Hide resolved

KuilongCui force-pushed the dispatch_bladellm branch from a3ff024 to cdac2e3 Compare December 16, 2024 07:08

[WIP] dispatch only for bladellm

716cb02

refine fix loguru error fix kwarg error fix ray autoscale error fix fix

KuilongCui force-pushed the dispatch_bladellm branch from 347378f to 716cb02 Compare December 16, 2024 08:04

fix

d21d8e8

AlibabaPAI deleted a comment from github-actions bot Dec 16, 2024

zhypku reviewed Dec 17, 2024

View reviewed changes

fix comment

febf19c

zhypku approved these changes Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bladellm] Support dispatch feature for BladeLLM #86

[Bladellm] Support dispatch feature for BladeLLM #86

KuilongCui commented Dec 11, 2024

s5u13b Dec 13, 2024

KuilongCui Dec 13, 2024

zhypku Dec 17, 2024

s5u13b Dec 13, 2024

s5u13b Dec 13, 2024

KuilongCui Dec 16, 2024

s5u13b Dec 13, 2024

KuilongCui Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

zhypku Dec 17, 2024

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024


		logger = init_logger(__name__)

		class AsyncPutQueueActor:

[Bladellm] Support dispatch feature for BladeLLM #86

Are you sure you want to change the base?

[Bladellm] Support dispatch feature for BladeLLM #86

Conversation

KuilongCui commented Dec 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KuilongCui Dec 16, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Dec 16, 2024

github-actions bot commented Dec 16, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 17, 2024

github-actions bot commented Dec 17, 2024

KuilongCui Dec 16, 2024 •

edited

Loading