Skip to content

Activity

update

leiwen83pushed 1 commit to dev_2407_1_lora_3 • ec40f29…ebd80b2 • 
on Jul 14, 2024

update controller

leiwen83created dev_2407_1_lora_3 • ec40f29 • 
on Jul 14, 2024

[ Misc ] Remove separate bias add (vllm-project#6353)

leiwen83pushed 108 commits to main • af9ad46…6047187 • 
on Jul 12, 2024

[ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify …

leiwen83pushed 156 commits to main • e2b85cf…af9ad46 • 
on Jul 1, 2024

Fix w8a8 benchmark and add Llama-3-8B (vllm-project#5562)

leiwen83pushed 95 commits to main • 5d7e3d0…e2b85cf • 
on Jun 17, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created prefix_caching_v2_fix • bcfcd08 • 
on Jun 9, 2024

[mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-p…

leiwen83pushed 18 commits to main • ccdc490…5d7e3d0 • 
on Jun 9, 2024

update

leiwen83pushed 1 commit to spec_infer_prometheus_metric • 5306db8…396e29a • 
on Jun 7, 2024

add spec infer related into prometheus metrics.

Force push
leiwen83force pushed to spec_infer_prometheus_metric • a6bf575…5306db8 • 
on Jun 7, 2024

[Core] Change LoRA embedding sharding to support loading methods (vll…

leiwen83pushed 97 commits to main • f17a1a8…ccdc490 • 
on Jun 7, 2024

[Misc] Make Serving Benchmark More User-friendly (vllm-project#5044)

leiwen83pushed 60 commits to main • 973617a…f17a1a8 • 
on May 26, 2024

add comment

leiwen83pushed 1 commit to v2_prefix_fix_2 • 2965400…2b86523 • 
on May 24, 2024

reuse ServerRunner

leiwen83pushed 1 commit to controller • 7495147…9b5a853 • 
on May 16, 2024

fix ruff

leiwen83pushed 1 commit to controller • e3a3da8…7495147 • 
on May 16, 2024

fix ruff

leiwen83pushed 1 commit to controller • 93258e1…e3a3da8 • 
on May 16, 2024

Add control panel allow manage multi vllm instances

leiwen83created controller • 93258e1 • 
on May 16, 2024

[Speculative decoding][Re-take] Enable TP>1 speculative decoding (vll…

leiwen83pushed 28 commits to main • 4e12131…973617a • 
on May 16, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix_2 • 2965400 • 
on May 11, 2024

assert ref count inc after promotion

leiwen83pushed 1 commit to v2_prefix_fix_1 • 98951fd…07ec35f • 
on May 11, 2024

fix ruff

leiwen83pushed 1 commit to v2_prefix_fix_1 • 0800a17…98951fd • 
on May 11, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix_1 • 0800a17 • 
on May 11, 2024

Deleted branch

leiwen83deleted v2_prefix_fix • 
on May 11, 2024

fix spelling

leiwen83pushed 1 commit to v2_prefix_fix • 580f38f…8f173fc • 
on May 11, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix • 580f38f • 
on May 11, 2024

[Core][Test] fix function name typo in custom allreduce (vllm-project…

leiwen83pushed 49 commits to main • 36fb68f…4e12131 • 
on May 11, 2024

Updated branch

Force pushMissing commit
leiwen83force pushed to spec_infer_prometheus_metric • bf96a47…a6bf575 • 
on May 4, 2024

[Doc] Chunked Prefill Documentation (vllm-project#4580)

leiwen83pushed 10 commits to main • 808632d…36fb68f • 
on May 4, 2024

Updated branch

Missing commit
leiwen83pushed 0 commits to spec_infer_prometheus_metric • 620577f…bf96a47 • 
on May 4, 2024

logging proposer type

cadedanielpushed 1 commit to ngram_fix • dba3b0d…dc2c645 • 
on May 3, 2024

fix

cadedanielpushed 1 commit to ngram_fix • 4a6dbb5…dba3b0d • 
on May 3, 2024