Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Support one-to-many and many-to-one migration #63

Merged
merged 5 commits into from
Nov 11, 2024

Conversation

KuilongCui
Copy link
Contributor

@KuilongCui KuilongCui commented Nov 4, 2024

Note that this PR has nothing to do with the action of issuing one-to-many and many-to-one migration during a migration decision. Instead, it's only one-to-one at once, but the latter can choose the same src_actor or dst_actor as before even though the last migration hasn't finished.

Detail:

  1. Support one-to-many and many-to-one migration
  2. Fix num_available_dispatch_instance bug in remove_instance
  3. Fix test_migration to reduce CI test time

Copy link

github-actions bot commented Nov 4, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 240.00 MB 280.00 MB 312.00 MB 336.00 MB 480.00 MB 536.00 MB 552.00 MB 560.00 MB 568.00 MB 584.00 MB 688.00 MB
rpc_speed(GB/s) 1.08 1.61 1.90 2.08 2.09 2.20 2.28 2.28 2.39 2.40 2.48 2.46 2.54 2.48 2.60 2.54 2.44 2.59 2.63 2.61 2.62 2.69 2.77 2.88 2.70 2.71 2.63 2.79 2.50 2.94 2.88 3.24 3.37 3.32 3.37 3.22 3.17 3.42
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 296.00 MB 312.00 MB 320.00 MB 328.00 MB 336.00 MB 472.00 MB 552.00 MB 560.00 MB 696.00 MB
gloo_speed(GB/s) 1.05 1.67 2.09 2.44 2.45 2.81 2.89 3.05 3.17 3.00 3.07 3.45 2.97 3.10 3.15 2.82 2.83 2.88 3.14 2.83 2.89 1.56 1.65 3.04 2.75 3.04 3.04 3.63 3.51 3.61 3.42 3.39 2.85 3.40
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 208.00 MB 312.00 MB 320.00 MB 480.00 MB 536.00 MB 552.00 MB 560.00 MB 568.00 MB
nccl_speed(GB/s) 0.18 0.44 0.55 0.85 0.94 1.11 1.40 1.38 1.60 1.65 2.05 1.99 2.30 2.23 2.57 2.32 2.61 2.72 2.89 2.98 2.77 2.81 3.19 3.51 2.93 4.69 5.40 4.14 4.05 3.87 4.31

Copy link

github-actions bot commented Nov 4, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 46016.28 107041.01 184378.06 242771.37 246294.59 115489.07
decode p25 p50 p75 p95 p99 mean
latency(ms) 54.11 57.82 69.24 121.25 289.39 70.23

Copy link

github-actions bot commented Nov 5, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 34316.91 81721.04 196142.99 263357.31 271299.89 113519.08
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.95 58.24 69.90 110.24 237.88 68.31

.gitignore Outdated Show resolved Hide resolved
llumnix/arg_utils.py Outdated Show resolved Hide resolved
llumnix/backends/migration_backend_interface.py Outdated Show resolved Hide resolved
llumnix/backends/vllm/migration_backend.py Show resolved Hide resolved
llumnix/backends/migration_backend_interface.py Outdated Show resolved Hide resolved
llumnix/global_scheduler/migration_scheduler.py Outdated Show resolved Hide resolved
llumnix/llm_engine_manager.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 5, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 25222.98 88295.68 240812.33 259406.41 277690.37 123898.04
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.27 58.51 68.31 115.90 193.21 66.26

Copy link

github-actions bot commented Nov 5, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 240.00 MB 280.00 MB 312.00 MB 320.00 MB 328.00 MB 336.00 MB 344.00 MB 408.00 MB 472.00 MB 488.00 MB 536.00 MB 544.00 MB 560.00 MB 696.00 MB 960.00 MB
rpc_speed(GB/s) 1.09 1.59 1.90 2.03 2.17 2.26 2.34 2.32 2.41 2.44 2.53 2.54 2.49 2.56 2.65 2.60 2.58 2.64 2.60 2.65 2.63 2.69 2.76 2.73 2.77 2.69 2.76 2.63 2.93 2.83 2.77 3.00 2.96 2.24 2.89 3.12 3.06 3.26 3.20 3.18 3.29 3.43 3.54 3.69
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 232.00 MB 280.00 MB 320.00 MB 328.00 MB 336.00 MB 344.00 MB 480.00 MB 536.00 MB 544.00 MB 552.00 MB 560.00 MB 568.00 MB 728.00 MB
gloo_speed(GB/s) 1.03 1.61 2.13 2.41 2.36 2.85 2.77 3.08 2.97 3.05 3.02 2.96 3.46 3.16 2.95 2.96 2.94 3.02 2.77 2.93 3.01 3.20 3.18 3.78 3.31 3.12 2.87 2.24 3.67 3.04 3.86 2.94 2.79 3.66 3.10 3.25 3.21 3.32

configs/base.yml Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 6, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 42395.84 102773.00 212875.15 242211.52 245569.34 116967.88
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.60 58.23 69.33 114.25 319.54 68.75

Copy link

github-actions bot commented Nov 6, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 22253.82 91185.24 233017.97 302336.70 311978.43 121415.15
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.98 58.65 72.64 118.45 193.92 67.86

Copy link

github-actions bot commented Nov 6, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 41498.15 115823.08 178960.86 232691.30 256674.47 114531.05
decode p25 p50 p75 p95 p99 mean
latency(ms) 54.33 59.40 72.23 120.82 291.08 70.91

Copy link

github-actions bot commented Nov 7, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 44211.42 97544.90 177324.61 281217.73 286675.34 111007.09
decode p25 p50 p75 p95 p99 mean
latency(ms) 54.15 58.39 68.62 112.95 255.47 69.66

llumnix/arg_utils.py Outdated Show resolved Hide resolved
llumnix/arg_utils.py Outdated Show resolved Hide resolved
llumnix/llumlet/llumlet.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Nov 7, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 216.00 MB 312.00 MB 336.00 MB 480.00 MB 544.00 MB 552.00 MB 560.00 MB 568.00 MB 736.00 MB
rpc_speed(GB/s) 1.09 1.62 1.90 2.09 2.14 2.24 2.37 2.35 2.45 2.45 2.50 2.52 2.59 2.56 2.59 2.59 2.68 2.73 2.54 2.63 2.64 2.64 2.64 2.80 2.03 2.68 3.12 2.95 3.34 3.38 3.13 3.33 3.42 3.50
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 280.00 MB 312.00 MB 320.00 MB 336.00 MB 480.00 MB 536.00 MB 544.00 MB 656.00 MB 960.00 MB
gloo_speed(GB/s) 1.04 1.68 2.09 2.39 2.55 2.53 2.92 2.96 2.86 3.28 2.94 3.17 3.16 3.64 3.28 3.00 2.76 2.69 2.70 2.10 2.74 3.52 3.38 3.15 3.09 2.42 2.79 3.62 3.42 2.48 4.06 2.93 2.34 2.67 2.75 2.84 3.22

Copy link

github-actions bot commented Nov 7, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 31304.32 121523.63 182468.53 246412.56 289963.72 118921.87
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.42 58.30 70.08 135.93 360.49 73.28

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 312.00 MB 320.00 MB 328.00 MB 472.00 MB 480.00 MB 544.00 MB 560.00 MB 688.00 MB 728.00 MB
rpc_speed(GB/s) 1.09 1.60 1.88 2.00 2.19 2.18 2.30 2.35 2.39 2.43 2.45 2.51 2.53 2.58 2.63 2.54 2.55 2.72 2.68 2.63 2.57 2.61 2.63 2.88 2.49 2.99 2.86 2.84 3.11 3.25 3.30 3.36 3.51 3.35
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 208.00 MB 216.00 MB 312.00 MB 320.00 MB 344.00 MB 472.00 MB 480.00 MB 488.00 MB
gloo_speed(GB/s) 1.01 1.62 2.05 2.45 2.56 2.78 3.02 3.07 3.04 3.10 3.18 2.97 3.08 2.85 3.58 2.92 2.84 2.50 2.92 2.95 2.53 2.83 2.73 3.26 2.88 3.78 2.81 2.76 2.70 2.79 2.98

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 33478.28 116307.25 202726.36 259863.19 278905.74 115095.14
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.10 58.49 70.18 116.37 202.28 75.22

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 35445.10 109219.68 196536.79 260082.01 267458.89 113915.83
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.83 57.81 67.02 114.39 349.11 69.71

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 280.00 MB 288.00 MB 296.00 MB 312.00 MB 320.00 MB 328.00 MB 344.00 MB 480.00 MB 536.00 MB 544.00 MB 552.00 MB 560.00 MB 568.00 MB
rpc_speed(GB/s) 1.05 1.58 1.88 2.02 2.14 2.22 2.25 2.34 2.41 2.42 2.36 2.52 2.54 2.54 2.54 2.59 2.54 2.58 2.66 2.56 2.59 2.69 2.55 2.48 2.53 2.86 2.75 2.97 2.69 2.70 3.09 2.86 3.31 3.30 3.17 3.33 3.27 3.02
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 224.00 MB 280.00 MB 312.00 MB 320.00 MB 480.00 MB 544.00 MB 560.00 MB
gloo_speed(GB/s) 1.02 1.73 1.92 2.37 2.43 2.74 2.87 3.00 2.95 3.15 3.32 3.40 3.25 3.05 3.55 2.83 3.15 2.87 2.33 2.95 3.09 2.97 3.00 2.74 3.11 1.31 1.47 2.61 2.87 2.95 3.08 3.06

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 280.00 MB 312.00 MB 320.00 MB 472.00 MB 480.00 MB 488.00 MB 536.00 MB 560.00 MB 568.00 MB
gloo_speed(GB/s) 1.00 1.64 2.09 2.35 2.67 2.73 3.00 3.08 3.03 2.56 3.51 3.27 3.08 2.79 2.88 2.77 2.02 2.95 2.90 2.79 2.89 2.67 3.08 2.48 2.76 2.22 1.64 2.86 2.68 2.92 2.98 2.82 3.40 2.87 3.10

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 9740.88 59549.11 106681.71 162774.29 172850.70 64477.53
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.52 57.85 69.68 143.69 376.11 70.99

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 224.00 MB 312.00 MB 320.00 MB 328.00 MB 480.00 MB 536.00 MB 560.00 MB
gloo_speed(GB/s) 1.03 1.64 2.15 2.51 2.63 2.88 3.11 3.06 2.99 3.13 3.23 3.46 3.33 3.05 3.42 2.73 2.66 2.82 2.99 2.93 2.83 2.85 2.90 2.98 2.86 2.69 3.05 2.70 3.41 3.32 1.70 3.55 3.67

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 15811.31 58760.88 115898.81 137141.22 149100.16 65261.04
decode p25 p50 p75 p95 p99 mean
latency(ms) 54.02 59.68 68.91 102.61 224.52 66.77

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 288.00 MB 296.00 MB 312.00 MB 344.00 MB 472.00 MB 480.00 MB 552.00 MB 568.00 MB 608.00 MB
rpc_speed(GB/s) 1.12 1.61 1.91 2.06 2.22 2.26 2.33 2.33 2.49 2.52 2.45 2.57 2.61 2.67 2.70 2.68 2.65 2.74 2.73 2.81 2.58 2.75 2.77 2.68 2.86 2.66 3.00 2.86 2.85 3.07 3.07 3.28 3.41 3.20 3.24 3.65
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 224.00 MB 312.00 MB 328.00 MB 336.00 MB 480.00 MB 576.00 MB 696.00 MB 976.00 MB
gloo_speed(GB/s) 1.01 1.66 2.06 2.34 2.48 2.58 2.76 2.93 3.07 3.43 2.84 3.11 3.12 3.79 3.21 3.06 2.62 2.69 2.62 2.83 2.84 2.96 3.11 3.12 2.92 2.75 3.54 3.11 2.73 3.27 3.58

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 13138.04 59959.30 118347.69 146519.69 148090.58 64537.16
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.61 56.48 64.49 97.38 199.44 62.75

Copy link

github-actions bot commented Nov 8, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 304.00 MB 312.00 MB 320.00 MB 328.00 MB 336.00 MB 352.00 MB 472.00 MB 536.00 MB 560.00 MB 568.00 MB 656.00 MB
rpc_speed(GB/s) 1.07 1.64 1.87 2.03 2.14 2.19 2.30 2.29 2.45 2.40 2.47 2.47 2.59 2.50 2.61 2.61 2.62 2.65 2.63 2.79 2.62 2.65 2.70 2.74 2.68 2.76 2.83 2.79 2.91 2.74 2.66 3.04 2.88 3.40 3.31 3.20 3.24 3.25
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 208.00 MB 216.00 MB 232.00 MB 280.00 MB 312.00 MB 320.00 MB 336.00 MB 344.00 MB 480.00 MB 536.00 MB 560.00 MB 568.00 MB 696.00 MB
gloo_speed(GB/s) 1.05 1.65 2.03 2.31 2.59 2.91 2.83 2.80 3.05 2.99 3.15 3.17 3.65 3.53 3.22 2.64 2.78 2.95 2.64 3.12 3.19 3.03 1.88 3.16 2.80 2.82 2.87 2.49 3.58 2.96 3.65 2.12 1.29 3.09 3.16 3.03

Copy link

github-actions bot commented Nov 8, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 18726.92 56607.12 114069.39 153658.14 162207.22 66084.93
decode p25 p50 p75 p95 p99 mean
latency(ms) 53.17 56.84 66.49 104.51 167.48 63.24

@KuilongCui KuilongCui merged commit 844c836 into main Nov 11, 2024
14 checks passed
@KuilongCui KuilongCui deleted the multi_migration branch November 11, 2024 02:16
s5u13b added a commit that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants