HabanaAI / vllm-fork Public

forked from vllm-project/vllm

Notifications You must be signed in to change notification settings
Fork 77
Star 57

Code
Issues 10
Pull requests 40
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: HabanaAI/vllm-fork

Labels 16 Milestones 0

New pull request New

40 Open 773 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Cherry-pick of: "Delayed sampling tp fix #834"

#885 opened Mar 4, 2025 by kamil-kaczor

Loading…

Added the logic to fix the warmup phase for spec decoding when enforce_eager is not used

#880 opened Feb 28, 2025 by pallavijaini0525

Loading…

Add possibility to execute with LoRA adapter for lm_eval

#879 opened Feb 28, 2025 by mkrze • Draft

Delayed cherrypick

#877 opened Feb 28, 2025 by kamil-kaczor • Draft

Allow to use flex_attention instead of FSDPA in HPUAttentionImpl

#876 opened Feb 27, 2025 by m-a-nowak • Draft

[Gaudi][Model] Qwen2.5-vl New Model

Issue o PR to enable a new model

#870 opened Feb 26, 2025 by malkomes

Loading…

[CI] Add APC tests

#866 opened Feb 25, 2025 by kzawora-intel

Loading…

Update Dockerfile.hpu

#864 opened Feb 25, 2025 by michalkuligowski • Draft

Update requirements-hpu.txt for open telemetry tracing support

#857 opened Feb 21, 2025 by louie-tsai

Loading…

enable multi-modal embedding for TIGER-Lab/VLM2Vec-Full T+I on HPU

#854 opened Feb 20, 2025 by libinta

Loading…

Port delayed sampling to habana_main

#849 opened Feb 20, 2025 by madamczykhabana • Draft

Draft: Another attempt at v1 HPU integration

#831 opened Feb 14, 2025 by kzawora-intel • Draft

19 of 23 tasks

Extend accuracy tests for models that we support

#824 opened Feb 13, 2025 by AnetaKaczynska

Loading…

Resolve Speculative Decode RTE

#823 opened Feb 13, 2025 by tannervoas742

Loading…

enable LoRA for embedding models

#821 opened Feb 12, 2025 by skaulintel

Loading…

Update documentation to reflect current bucket defaults

#817 opened Feb 12, 2025 by nngokhale

Loading…

support inc dynamic quant deepseek

#814 opened Feb 11, 2025 by changwangss • Draft

support inc dynamic quantization

#803 opened Feb 8, 2025 by changwangss • Draft

Support qwenvl model for HPU New Model

Issue o PR to enable a new model

#793 opened Feb 7, 2025 by yingjie-han

Loading…

[DEEPSEEK_V3/R1] includes features of fp8 dequant, MLA, Expert parallelism

#792 opened Feb 6, 2025 by xuechendi

Loading…

Enable roberta embedding

#786 opened Feb 5, 2025 by yeonsily

Loading…

[DO NOT MERGE][PoC] Mark dynamic shapes in torch.compile mode

#755 opened Jan 29, 2025 by kzawora-intel • Draft

Pipeline Parallelism implementation.

#731 opened Jan 23, 2025 by jmaksymczuk • Draft

Delayed sampling

#720 opened Jan 22, 2025 by mfylcek • Draft

make benchmark_throughput static support single image input

#718 opened Jan 22, 2025 by yma11

Loading…

Previous 1 2 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly