Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][Bugfix] Refine ci tests and revert many-to-many migration commit to avoid ci tests failure #74

Merged
merged 41 commits into from
Dec 10, 2024

Conversation

KuilongCui
Copy link
Contributor

@KuilongCui KuilongCui commented Nov 18, 2024

  1. Fix and refine ci tests
  • Handle exception of all ray operations.
  • Replace bash with sh to get correct return code when running tests.
  • Support backup error log if any test function is failed.
  • Support cleanup_ray_env, shutdown_llumnix_service fixture and wait_for_llumnix_service_ready.
  • Change the methods of testing cache blocks leaking in migration tests.
  • Change timeout value of test workflow.
  • Add -s and --tb=long option in pytest to get detailed output.
  • Merge serveral test scripts into one.
  • Simplify redundant codes.
  • Add wait_for_all_instances_finished
  • Revert one-to-many many-to-one migration commit, because it will cause dead lock during migration.
  • Pass max-num-batched-tokens and Change max-model-len from 2048 to 4096.
  1. Disable many-to-many migrations temporarily.
  2. Check request status changes after migration async operations.
  3. Change logger timestamp from s-level to ms-level.
  4. Add more logs to improve overhead observibility.
  5. Refine exception handling of migration.

@KuilongCui KuilongCui changed the title [Bugfix] Address Request Status Changes During Migration Asynchronous… [Bugfix] check request status changes after migration async operations Nov 18, 2024
@AlibabaPAI AlibabaPAI deleted a comment from github-actions bot Nov 18, 2024
.github/workflows/offline_inference.yml Show resolved Hide resolved
llumnix/llumlet/request.py Show resolved Hide resolved
tests/e2e_test/test_bench.py Show resolved Hide resolved
tests/e2e_test/test_bench.py Show resolved Hide resolved
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 17986.75 66328.00 97943.30 144650.53 151141.77 63998.00
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.19 56.97 67.60 112.07 170.54 63.80

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 10776.73 57304.32 118055.34 151870.96 155075.01 64456.89
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.65 58.35 72.05 127.61 241.66 69.04

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 11977.11 38606.56 127338.45 174297.08 178332.09 66220.30
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.30 55.37 63.81 112.23 231.88 63.57

@KuilongCui KuilongCui force-pushed the migration_bug branch 2 times, most recently from 3515d56 to 62cb1b1 Compare November 18, 2024 13:55
Copy link

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 232.00 MB 272.00 MB 280.00 MB 352.00 MB 400.00 MB
rpc_speed(GB/s) 1.04 1.56 1.86 1.95 2.11 2.18 2.26 2.27 2.31 2.41 2.40 2.39 2.49 2.49 2.49 2.60 2.59 2.42 2.59 2.47 2.73 2.51 2.78 2.76 2.67 3.23 3.12
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 232.00 MB 264.00 MB 312.00 MB 480.00 MB
gloo_speed(GB/s) 1.03 1.70 2.19 2.37 2.64 2.96 2.86 3.17 3.04 3.14 3.19 2.86 3.75 3.32 3.11 2.62 2.59 1.93 2.84 2.92 2.89 3.11 2.88 3.16 3.23 2.75 2.23 1.66

Copy link

migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 13.12 MB 14.00 MB 14.88 MB 15.75 MB 16.62 MB 17.50 MB 896.00 KB
rpc_speed(GB/s) 0.25 0.31 0.41 0.46 0.52 0.50 0.57 0.62 0.65 0.74 0.76 0.78 0.69 0.80 0.70 0.73 0.71 0.67 0.61 0.14
migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 14.00 MB 14.88 MB 15.75 MB 16.62 MB 17.50 MB 18.38 MB 896.00 KB
gloo_speed(GB/s) 0.24 0.38 0.45 0.55 0.69 0.72 0.75 0.92 0.81 0.72 0.87 0.94 0.85 1.15 0.52 0.41 0.13 0.30 0.36 0.12

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 82.47 119.31 240.52 774.63 1916.74 229.88
decode p25 p50 p75 p95 p99 mean
latency(ms) 85.47 98.24 112.33 129.41 153.29 95.89

Copy link

migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 14.00 MB 15.75 MB 16.62 MB 896.00 KB
rpc_speed(GB/s) 0.23 0.34 0.39 0.45 0.48 0.50 0.54 0.61 0.63 0.69 0.66 0.74 0.75 0.76 0.63 0.72 0.14
migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 13.12 MB 14.00 MB 15.75 MB 16.62 MB 17.50 MB 18.38 MB 896.00 KB
gloo_speed(GB/s) 0.26 0.35 0.41 0.57 0.69 0.61 0.75 0.80 0.85 0.65 0.69 0.73 1.08 0.81 0.84 0.51 0.07 0.29 0.34 0.11

Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 83.22 127.24 260.54 677.34 1250.72 218.20
decode p25 p50 p75 p95 p99 mean
latency(ms) 84.49 98.85 109.93 124.38 155.81 95.31

Copy link

migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 13.12 MB 14.00 MB 14.88 MB 15.75 MB 16.62 MB 17.50 MB 896.00 KB
rpc_speed(GB/s) 0.25 0.33 0.33 0.40 0.52 0.53 0.59 0.59 0.66 0.68 0.74 0.75 0.74 0.71 0.78 0.62 0.71 0.60 0.56 0.13
migration_size 1.75 MB 2.62 MB 3.50 MB 4.38 MB 5.25 MB 6.12 MB 7.00 MB 7.88 MB 8.75 MB 9.62 MB 10.50 MB 11.38 MB 12.25 MB 14.00 MB 14.88 MB 15.75 MB 16.62 MB 17.50 MB 18.38 MB 896.00 KB
gloo_speed(GB/s) 0.24 0.36 0.45 0.55 0.62 0.65 0.73 0.83 0.67 0.62 0.82 0.92 0.82 0.86 1.20 0.94 0.27 0.31 0.35 0.12

@s5u13b s5u13b changed the title [Bugfix] check request status changes after migration async operations [Bugfix] Check request status changes after migration async operations Nov 19, 2024
llumnix/llumlet/llumlet.py Show resolved Hide resolved
@s5u13b s5u13b changed the title [Bugfix] Check request status changes after migration async operations [Bugfix][CI] Check request status changes after migration async operations & Fix and refine ci tests to avoid unexpected failure Nov 27, 2024
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 7170.30 63068.43 116835.34 166615.96 169138.46 70253.52
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.06 55.96 65.16 115.82 266.55 66.26

Copy link

github-actions bot commented Dec 5, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 14589.95 69909.55 123467.19 155757.39 160390.33 72970.49
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.55 56.02 64.72 96.26 128.20 62.83

Copy link

github-actions bot commented Dec 5, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 2115.18 69342.23 122288.36 165846.73 166841.20 71345.67
decode p25 p50 p75 p95 p99 mean
latency(ms) 52.55 57.98 70.03 101.57 447.01 70.79

Copy link

github-actions bot commented Dec 9, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 16755.68 69485.77 128053.58 170789.35 174394.57 75440.88
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.77 56.29 66.55 111.95 390.98 70.20

@s5u13b s5u13b changed the title [CI][Bugfix] Refine ci tests and fix some migration bugs to avoid unexpected ci tests failure [CI][Bugfix] Refine ci tests and fix migration bugs to avoid unexpected ci tests failure Dec 9, 2024
Copy link

github-actions bot commented Dec 9, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 248.00 MB 280.00 MB 408.00 MB 416.00 MB 440.00 MB
rpc_speed(GB/s) 1.00 1.54 1.78 1.97 2.08 2.17 2.17 2.25 2.36 2.36 2.42 2.49 2.51 2.52 2.58 2.56 2.56 2.58 2.64 2.72 2.71 2.57 2.72 2.86 2.85 2.82 2.79 2.88 2.73 3.01 3.16 2.84 2.94 3.30
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 352.00 MB 472.00 MB
rpc_speed(GB/s) 1.05 1.55 1.81 1.96 2.08 2.15 2.11 2.20 2.23 2.23 2.30 2.39 2.40 2.42 2.48 2.54 2.43 2.44 2.39 2.60 2.65 2.52 2.56 2.40 2.69 2.82 2.75 2.60 3.09 3.28
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 216.00 MB 232.00 MB 264.00 MB 328.00 MB
gloo_speed(GB/s) 1.02 1.65 2.10 2.31 2.54 2.74 2.88 2.99 3.15 2.95 3.25 3.03 3.09 2.57 3.12 2.64 2.36 2.89 2.82 2.82 1.96 1.67 2.83 2.91 1.15 1.96 3.53 3.04

@s5u13b s5u13b changed the title [CI][Bugfix] Refine ci tests and fix migration bugs to avoid unexpected ci tests failure [CI][Bugfix] Refine ci tests and revert many-to-many migration commit to avoid ci tests failure Dec 9, 2024
Copy link

github-actions bot commented Dec 9, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 9331.08 68269.71 129344.97 174440.33 189574.85 72170.82
decode p25 p50 p75 p95 p99 mean
latency(ms) 51.28 56.58 69.75 112.76 184.33 66.62

Copy link

github-actions bot commented Dec 9, 2024

prefill p25 p50 p75 p95 p99 mean
latency(ms) 14131.08 64613.06 119284.27 181053.12 205580.70 75029.03
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.24 54.97 65.10 105.51 298.90 70.08

Copy link

github-actions bot commented Dec 9, 2024

migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 248.00 MB 280.00 MB 408.00 MB 416.00 MB 440.00 MB
rpc_speed(GB/s) 1.00 1.54 1.78 1.97 2.08 2.17 2.17 2.25 2.36 2.36 2.42 2.49 2.51 2.52 2.58 2.56 2.56 2.58 2.64 2.72 2.71 2.57 2.72 2.86 2.85 2.82 2.79 2.88 2.73 3.01 3.16 2.84 2.94 3.30
migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 240.00 MB 264.00 MB 296.00 MB 456.00 MB
rpc_speed(GB/s) 3.65 1.03 1.54 1.71 1.93 1.98 2.11 2.07 2.20 2.20 2.22 2.30 2.34 2.40 2.39 2.33 2.39 2.39 2.42 2.41 2.44 2.40 2.53 2.51 2.40 2.67 2.47 2.38 2.57 2.19 2.61 2.66 3.22
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 232.00 MB 280.00 MB 312.00 MB 424.00 MB
gloo_speed(GB/s) 1.04 1.67 2.07 2.37 2.60 2.61 2.96 2.77 3.19 3.07 3.05 2.98 3.37 3.05 3.31 2.95 3.09 2.65 2.28 2.41 2.01 2.92 0.69 3.67 2.93 1.99 2.88 1.79 1.83 0.98

performance.txt Outdated Show resolved Hide resolved
docs/Arguments.md Outdated Show resolved Hide resolved
llumnix/backends/vllm/migration_backend.py Show resolved Hide resolved
Copy link

prefill p25 p50 p75 p95 p99 mean
latency(ms) 19239.94 83361.77 123964.11 173168.39 199915.46 77281.02
decode p25 p50 p75 p95 p99 mean
latency(ms) 50.21 54.24 61.44 104.25 269.87 64.94

Copy link

migration_size 1.59 GB 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 224.00 MB 264.00 MB 272.00 MB 280.00 MB 304.00 MB 320.00 MB 360.00 MB 384.00 MB 424.00 MB 560.00 MB
rpc_speed(GB/s) 3.77 1.04 1.57 1.81 1.98 2.05 2.15 2.12 2.21 2.29 2.23 2.31 2.33 2.39 2.48 2.43 2.50 2.53 2.59 2.47 2.50 2.47 2.51 2.63 2.38 2.59 2.66 2.62 2.88 2.82 3.09 2.81 3.10 2.99 3.14 3.44
migration_size 8.00 MB 16.00 MB 24.00 MB 32.00 MB 40.00 MB 48.00 MB 56.00 MB 64.00 MB 72.00 MB 80.00 MB 88.00 MB 96.00 MB 104.00 MB 112.00 MB 120.00 MB 128.00 MB 136.00 MB 144.00 MB 152.00 MB 160.00 MB 168.00 MB 176.00 MB 184.00 MB 192.00 MB 200.00 MB 208.00 MB 216.00 MB 224.00 MB 232.00 MB 272.00 MB 296.00 MB 312.00 MB 560.00 MB
gloo_speed(GB/s) 1.00 1.63 1.98 2.29 2.50 2.77 2.95 2.72 3.05 2.98 3.29 3.25 3.28 2.96 3.28 3.07 2.88 2.66 2.91 2.65 2.01 3.32 3.31 1.17 3.45 1.03 3.39 3.03 2.74 1.34 2.68 3.15 1.65

@s5u13b s5u13b merged commit 83fe07b into main Dec 10, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants