Ageliss

Follow

GUO-QING JIANG Ageliss

Follow

Rigorous analytical thinking, ingenuity, hands-on problem solving, and big new ideas.

13 followers · 16 following

Worked at Kuaishou, Baidu, Meituan
Beijing
https://ageliss.github.io/gqjiang/

Achievements

Achievements

Starred repositories

HArmonizedSS / HASS

Forked from SafeAILab/EAGLE

Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)

Python 23 1 Updated Mar 14, 2025

mannaandpoem / OpenManus

No fortress, purely open ground. OpenManus is Coming.

Python 41,854 7,110 Updated Apr 1, 2025

LMCache / LMCache

Redis for LLMs

Python 703 81 Updated Apr 4, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,223 133 Updated Apr 4, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,142 541 Updated Apr 3, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,403 817 Updated Mar 1, 2025

ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 374 169 Updated Apr 4, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,286 556 Updated Apr 4, 2025

neuralmagic / AutoFP8

Python 185 24 Updated Oct 1, 2024

NVIDIA / TensorRT-Model-Optimizer

nvidia-modelopt is a unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for do…

Python 840 63 Updated Apr 3, 2025

ModelTC / llmc

[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".

Python 445 51 Updated Mar 28, 2025

microsoft / LLMLingua

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,996 285 Updated Mar 11, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,450 93 Updated Apr 4, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,976 193 Updated Apr 3, 2025

wdndev / llm_interview_note

主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题

HTML 6,638 751 Updated Oct 22, 2024

km1994 / LLMs_interview_notes

该仓库主要记录大模型（LLMs）算法工程师相关的面试题

1,894 133 Updated Dec 26, 2024

richards199999 / Thinking-Claude

Let your Claude able to think

TypeScript 14,886 1,732 Updated Mar 10, 2025

pcg-mlp / KsanaLLM

C++ 324 30 Updated Jan 20, 2025

bytedance / ShadowKV

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 155 9 Updated Oct 30, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,166 107 Updated Apr 3, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 788 32 Updated Sep 21, 2024

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 974 146 Updated Apr 3, 2025

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 221 21 Updated Sep 30, 2024

louaaron / Score-Entropy-Discrete-Diffusion

[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)

Python 540 61 Updated Feb 29, 2024

QwenLM / Qwen2.5-VL

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 9,493 657 Updated Mar 27, 2025

facebookresearch / SpinQuant

Code repo for the paper "SpinQuant LLM quantization with learned rotations"

Python 249 36 Updated Feb 14, 2025

lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality

Python 3,784 290 Updated Aug 10, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 12,858 1,432 Updated Apr 4, 2025

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

671 33 Updated Mar 27, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 268 28 Updated Nov 22, 2024

Starred topics

Natural language processing