-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: triton-inference-server/server
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Milestones
Assignee
Sort
Issues list
Streaming support on Infer endpoint when DECOUPLED mode is true
#8021
opened Feb 19, 2025 by
adityarap
Inconsistent HF token requirements for cached gated models: Triton vs vLLM deployments
#8020
opened Feb 19, 2025 by
haka-qylis
"output tensor shape does not match size of output" when using python backend and providing a custom environment
#8019
opened Feb 19, 2025 by
Isuxiz
Performance Discrepancy Between NVIDIA Triton and Direct Faster-Whisper Inference
#8016
opened Feb 18, 2025 by
YuBeomGon
Python Backend support implicit state management for Sequence Inference
#8006
opened Feb 12, 2025 by
zhuichao001
k8s-onprem
Chart doesn't work with OpenShift's default security posture
#8004
opened Feb 11, 2025 by
jharmison-redhat
The system looks the same, but errors occur on some machines, but the reason is unknown
#7996
opened Feb 8, 2025 by
coder-2014
[BUG] [GenAI-Perf] openai-fronted server with --endpoint-type completions
openai
OpenAI related
#7995
opened Feb 7, 2025 by
jihyeonRyu
Batching
module: backends
Issues related to the backends
python
Python related, whether backend, in-process API, client, etc
question
Further information is requested
#7994
opened Feb 7, 2025 by
riyajatar37003
build.py
setting docker build args for secrets even when build-secret flag is not present
build
#7992
opened Feb 6, 2025 by
BenjaminBraunDev
libtriton_fil.so
missing on Arm64 containers 24.12 and 25.01
module: backends
Performance issue - High queue times in perf_analyzer
performance
A possible performance tune-up
question
Further information is requested
#7986
opened Feb 4, 2025 by
asaff1
Something like "model instance index" inside python backend
enhancement
New feature or request
module: backends
Issues related to the backends
python
Python related, whether backend, in-process API, client, etc
#7984
opened Feb 3, 2025 by
vadimkantorov
Expected model dimensions when expected shape is not suitable to batch
#7981
opened Jan 31, 2025 by
codeofdutyAI
Pytorch backend: Model is run in no_grad mode even with INFERENCE_MODE=false
#7974
opened Jan 28, 2025 by
hakanardo
vLLM backend Hugging Face feature branch model loading
enhancement
New feature or request
#7963
opened Jan 23, 2025 by
knitzschke
Previous Next
ProTip!
Adding no:label will show everything without a label.