Skip to content
View junliume's full-sized avatar

Block or report junliume

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,122 5,856 Updated Feb 24, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 974 60 Updated Feb 15, 2025

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 75 8 Updated Jan 2, 2024

LLM inference in C/C++

C++ 75,147 10,859 Updated Feb 24, 2025

Official inference framework for 1-bit LLMs

C++ 12,755 894 Updated Feb 18, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,152 223 Updated Feb 24, 2025

The C++ Standard Library for your entire system.

C++ 15 3 Updated Jan 30, 2025

HIP version of dietGPU for the ROCm platform, featuring a GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compress…

C++ 7 2 Updated Dec 22, 2024

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,076 75 Updated Feb 21, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,508 1,112 Updated Feb 21, 2025

Dockerfiles for the various software layers defined in the ROCm software platform

Shell 451 70 Updated Feb 19, 2025

Microsoft Quantum Development Kit Samples

Jupyter Notebook 3,915 925 Updated Jan 12, 2024

Development repository for the Triton language and compiler

MLIR 14,579 1,813 Updated Feb 24, 2025

Tuna centric MIOpen client

C++ 4 4 Updated Feb 21, 2025

A thin wrapper around miOpen and cuDNN

C++ 40 15 Updated Aug 9, 2023

Clang build analysis tool using -ftime-trace

C++ 1,044 68 Updated Jan 5, 2025

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,599 376 Updated Dec 4, 2024

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 350 151 Updated Feb 24, 2025

AMD's Machine Intelligence Library

Assembly 4 Updated Sep 27, 2024
Python 7 Updated Feb 12, 2025

Helps with dual booting. Ubuntu tray application to reboot into different OSes or UEFI/BIOS

Python 53 5 Updated Feb 16, 2025
C++ 4 Updated Jun 17, 2022

benchmarking miopen

C++ 17 15 Updated Jan 14, 2019

AMD's Machine Intelligence Library

Assembly 1,118 242 Updated Feb 24, 2025
VBA 3 Updated Jan 24, 2023
MLIR 137 39 Updated Feb 24, 2025

Legacy ROCm Software Platform Documentation

113 93 Updated Jun 5, 2023
Next
Showing results