Activity

Showing most recent first

update

leiwen83pushed 1 commit to dev_2407_1_lora_3 • ec40f29…ebd80b2 •

on Jul 14, 2024

update controller

leiwen83created dev_2407_1_lora_3 • ec40f29 •

on Jul 14, 2024

[ Misc ] Remove separate bias add (vllm-project#6353)

leiwen83pushed 108 commits to main • af9ad46…6047187 •

on Jul 12, 2024

[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify …

leiwen83pushed 156 commits to main • e2b85cf…af9ad46 •

on Jul 1, 2024

Fix w8a8 benchmark and add Llama-3-8B (vllm-project#5562)

leiwen83pushed 95 commits to main • 5d7e3d0…e2b85cf •

on Jun 17, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created prefix_caching_v2_fix • bcfcd08 •

on Jun 9, 2024

[mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-p…

leiwen83pushed 18 commits to main • ccdc490…5d7e3d0 •

on Jun 9, 2024

update

leiwen83pushed 1 commit to spec_infer_prometheus_metric • 5306db8…396e29a •

on Jun 7, 2024

add spec infer related into prometheus metrics.

Force push

leiwen83force pushed to spec_infer_prometheus_metric • a6bf575…5306db8 •

on Jun 7, 2024

[Core] Change LoRA embedding sharding to support loading methods (vll…

leiwen83pushed 97 commits to main • f17a1a8…ccdc490 •

on Jun 7, 2024

[Misc] Make Serving Benchmark More User-friendly (vllm-project#5044)

leiwen83pushed 60 commits to main • 973617a…f17a1a8 •

on May 26, 2024

add comment

leiwen83pushed 1 commit to v2_prefix_fix_2 • 2965400…2b86523 •

on May 24, 2024

reuse ServerRunner

leiwen83pushed 1 commit to controller • 7495147…9b5a853 •

on May 16, 2024

fix ruff

leiwen83pushed 1 commit to controller • e3a3da8…7495147 •

on May 16, 2024

fix ruff

leiwen83pushed 1 commit to controller • 93258e1…e3a3da8 •

on May 16, 2024

Add control panel allow manage multi vllm instances

leiwen83created controller • 93258e1 •

on May 16, 2024

[Speculative decoding][Re-take] Enable TP>1 speculative decoding (vll…

leiwen83pushed 28 commits to main • 4e12131…973617a •

on May 16, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix_2 • 2965400 •

on May 11, 2024

assert ref count inc after promotion

leiwen83pushed 1 commit to v2_prefix_fix_1 • 98951fd…07ec35f •

on May 11, 2024

fix ruff

leiwen83pushed 1 commit to v2_prefix_fix_1 • 0800a17…98951fd •

on May 11, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix_1 • 0800a17 •

on May 11, 2024

Deleted branch

leiwen83deleted v2_prefix_fix •

on May 11, 2024

fix spelling

leiwen83pushed 1 commit to v2_prefix_fix • 580f38f…8f173fc •

on May 11, 2024

[Core][Bugfix]: fix prefix caching for blockv2

leiwen83created v2_prefix_fix • 580f38f •

on May 11, 2024

[Core][Test] fix function name typo in custom allreduce (vllm-project…

leiwen83pushed 49 commits to main • 36fb68f…4e12131 •

on May 11, 2024

Updated branch

Force pushMissing commit

leiwen83force pushed to spec_infer_prometheus_metric • bf96a47…a6bf575 •

on May 4, 2024

[Doc] Chunked Prefill Documentation (vllm-project#4580)

leiwen83pushed 10 commits to main • 808632d…36fb68f •

on May 4, 2024

Updated branch

Missing commit

leiwen83pushed 0 commits to spec_infer_prometheus_metric • 620577f…bf96a47 •

on May 4, 2024

logging proposer type

cadedanielpushed 1 commit to ngram_fix • dba3b0d…dc2c645 •

on May 3, 2024

fix

cadedanielpushed 1 commit to ngram_fix • 4a6dbb5…dba3b0d •

on May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update

update controller

[ Misc ] Remove separate bias add (vllm-project#6353)

[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify …

Fix w8a8 benchmark and add Llama-3-8B (vllm-project#5562)

[Core][Bugfix]: fix prefix caching for blockv2

[mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-p…

update

add spec infer related into prometheus metrics.

[Core] Change LoRA embedding sharding to support loading methods (vll…

[Misc] Make Serving Benchmark More User-friendly (vllm-project#5044)

add comment

reuse ServerRunner

fix ruff

fix ruff

Add control panel allow manage multi vllm instances

[Speculative decoding][Re-take] Enable TP>1 speculative decoding (vll…

[Core][Bugfix]: fix prefix caching for blockv2

assert ref count inc after promotion

fix ruff

[Core][Bugfix]: fix prefix caching for blockv2

Deleted branch

fix spelling

[Core][Bugfix]: fix prefix caching for blockv2

[Core][Test] fix function name typo in custom allreduce (vllm-project…

Updated branch

[Doc] Chunked Prefill Documentation (vllm-project#4580)

Updated branch

logging proposer type

fix

update

update controller

[ Misc ] Remove separate bias add (vllm-project#6353)

[ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify …

Fix w8a8 benchmark and add Llama-3-8B (vllm-project#5562)

[Core][Bugfix]: fix prefix caching for blockv2

[mis][ci/test] fix flaky test in test_sharded_state_loader.py (vllm-p…

update

add spec infer related into prometheus metrics.

[Core] Change LoRA embedding sharding to support loading methods (vll…

[Misc] Make Serving Benchmark More User-friendly (vllm-project#5044)

add comment

reuse ServerRunner

fix ruff

fix ruff

Add control panel allow manage multi vllm instances

[Speculative decoding][Re-take] Enable TP>1 speculative decoding (vll…

[Core][Bugfix]: fix prefix caching for blockv2

assert ref count inc after promotion

fix ruff

[Core][Bugfix]: fix prefix caching for blockv2

Deleted branch

fix spelling

[Core][Bugfix]: fix prefix caching for blockv2

[Core][Test] fix function name typo in custom allreduce (vllm-project…

Updated branch

[Doc] Chunked Prefill Documentation (vllm-project#4580)

Updated branch

logging proposer type

fix

[ Misc ] Refactor w8a8 to use `process_weights_after_load` (Simplify …