triton-inference-server / server Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 8.8k

Code
Issues 640
Pull requests 65
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: triton-inference-server/server

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

640 Open 3,233 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How can I construct a pb_utils.Tensor without using numpy?

#8022 opened Feb 19, 2025 by fighterhit

Streaming support on Infer endpoint when DECOUPLED mode is true

#8021 opened Feb 19, 2025 by adityarap

Inconsistent HF token requirements for cached gated models: Triton vs vLLM deployments

#8020 opened Feb 19, 2025 by haka-qylis

"output tensor shape does not match size of output" when using python backend and providing a custom environment

#8019 opened Feb 19, 2025 by Isuxiz

why triton server used so many thread in same triton proc?

#8017 opened Feb 18, 2025 by soulseen

Performance Discrepancy Between NVIDIA Triton and Direct Faster-Whisper Inference

#8016 opened Feb 18, 2025 by YuBeomGon

Python Backend on Windows

#8012 opened Feb 13, 2025 by mhbassel

Unable to load model from S3 bucket

#8008 opened Feb 12, 2025 by jmlaubach

Got run time error 0 active drivers ([]). There should only be one. when using PipelineModule through ray and deepspeed

#8007 opened Feb 12, 2025 by consciousgaze

Python Backend support implicit state management for Sequence Inference

#8006 opened Feb 12, 2025 by zhuichao001

k8s-onprem Chart doesn't work with OpenShift's default security posture

#8004 opened Feb 11, 2025 by jharmison-redhat

ONNX Model IR Version 10 Support

#8001 opened Feb 11, 2025 by RohanAdwankar

Can't build r25.01 (r24.12 builds okay) on ubuntu-22.04 (unclear build errors). Also can't build r24.12 on ubuntu-24.04 (C++ errors)

#7997 opened Feb 8, 2025 by vadimkantorov

The system looks the same, but errors occur on some machines, but the reason is unknown

#7996 opened Feb 8, 2025 by coder-2014

[BUG] [GenAI-Perf] openai-fronted server with --endpoint-type completions openai

OpenAI related

#7995 opened Feb 7, 2025 by jihyeonRyu

Batching module: backends

Issues related to the backends

python

Python related, whether backend, in-process API, client, etc

question

Further information is requested

#7994 opened Feb 7, 2025 by riyajatar37003

build.py setting docker build args for secrets even when build-secret flag is not present build

Issues pertaining to builds

#7992 opened Feb 6, 2025 by BenjaminBraunDev

libtriton_fil.so missing on Arm64 containers 24.12 and 25.01 module: backends

Issues related to the backends

module: platforms

Issues related to platforms, hardware, and support matrix

#7991 opened Feb 5, 2025 by dagardner-nv 25.02

Performance issue - High queue times in perf_analyzer performance

A possible performance tune-up

question

Further information is requested

#7986 opened Feb 4, 2025 by asaff1

Something like "model instance index" inside python backend enhancement

New feature or request

module: backends

Issues related to the backends

python

Python related, whether backend, in-process API, client, etc

#7984 opened Feb 3, 2025 by vadimkantorov

Expected model dimensions when expected shape is not suitable to batch

#7981 opened Jan 31, 2025 by codeofdutyAI

[Question] triton-client numpy 2 support

#7979 opened Jan 30, 2025 by john-pixforce

Pytorch backend: Model is run in no_grad mode even with INFERENCE_MODE=false

#7974 opened Jan 28, 2025 by hakanardo

Method 'forward' is not defined error !

#7968 opened Jan 26, 2025 by MHmi1

vLLM backend Hugging Face feature branch model loading enhancement

New feature or request

#7963 opened Jan 23, 2025 by knitzschke

Previous 1 2 3 4 5 … 25 26 Next

Previous Next

ProTip! Adding no:label will show everything without a label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly