junliume

Jun Liu junliume

14 followers · 6 following

Sunnyvale, CA
06:38 - 8h behind
https://www.linkedin.com/in/junliume/
@junliume

Achievements

x3 x3

Achievements

x3 x3

Stars

deepseek-ai / FlashMLA

C++ 7,060 331 Updated Feb 24, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,122 5,856 Updated Feb 24, 2025

deepseek-ai / DeepSeek-V3

Python 88,162 14,231 Updated Feb 24, 2025

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 974 60 Updated Feb 15, 2025

ROCm / amd_matrix_instruction_calculator

A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators

Python 75 8 Updated Jan 2, 2024

ggml-org / llama.cpp

LLM inference in C/C++

C++ 75,147 10,859 Updated Feb 24, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

C++ 12,755 894 Updated Feb 18, 2025

ImagineAILab / ai-by-hand-excel

3,301 447 Updated Jan 28, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,152 223 Updated Feb 24, 2025

ROCm / libhipcxx

The C++ Standard Library for your entire system.

C++ 15 3 Updated Jan 30, 2025

PAA-NCIC / hipANS

HIP version of dietGPU for the ROCm platform, featuring a GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compress…

C++ 7 2 Updated Dec 22, 2024

zml / zml

Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild

Zig 2,076 75 Updated Feb 21, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,508 1,112 Updated Feb 21, 2025

ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform

Shell 451 70 Updated Feb 19, 2025

microsoft / Quantum

Microsoft Quantum Development Kit Samples

Jupyter Notebook 3,915 925 Updated Jan 12, 2024

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 14,579 1,813 Updated Feb 24, 2025

ROCm / MIFin

Tuna centric MIOpen client

C++ 4 4 Updated Feb 21, 2025

ROCm / hipDNN

A thin wrapper around miOpen and cuDNN

C++ 40 15 Updated Aug 9, 2023

aras-p / ClangBuildAnalyzer

Clang build analysis tool using -ftime-trace

C++ 1,044 68 Updated Jan 5, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,599 376 Updated Dec 4, 2024

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 350 151 Updated Feb 24, 2025

atamazov / MIOpen

Forked from ROCm/MIOpen

AMD's Machine Intelligence Library

Assembly 4 Updated Sep 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jun Liu junliume

Achievements

Achievements

Block or report junliume

Stars

deepseek-ai / FlashMLA

vllm-project / vllm

deepseek-ai / DeepSeek-V3

thu-ml / SageAttention

ROCm / amd_matrix_instruction_calculator

ggml-org / llama.cpp

microsoft / BitNet

ImagineAILab / ai-by-hand-excel

flashinfer-ai / flashinfer

ROCm / libhipcxx

PAA-NCIC / hipANS

zml / zml

NVIDIA / TensorRT-LLM

ROCm / ROCm-docker

microsoft / Quantum

triton-lang / triton

ROCm / MIFin

ROCm / hipDNN

aras-p / ClangBuildAnalyzer

facebookincubator / AITemplate

ROCm / composable_kernel

atamazov / MIOpen

ROCm / MITuna

mendhak / grub-reboot-picker

asroy / composable_kernel

patflick / miopen-benchmark

ROCm / MIOpen

aserio / switchboard

ROCm / rocMLIR

RadeonOpenCompute / ROCm_Documentation