- πΌ Principal AI Researcher at Together AI β creator and lead of TGL, the companyβs proprietary inference engine.
- π§© My journey with SGLang has evolved from one of the first core developers, to leading inference optimization efforts, and eventually taking on a builder role to support its next phase of growth. I have led major releases and technical blogs, such as Llama 3, DeepSeek V3, Large Scale EP, and GB200 NVL72.
- π Co-author of the FlashInfer paper (MLSys 2025 Best Paper) and committer to FlashInfer. Previously, I was Lead Software Engineer at Baseten (co-authored the DeepSeek V3 and Qwen 3 launches) and led CTR GPU inference and vector retrieval system development at Meituan.
- π€ Interviewed by The New York Times (Article 1, Article 2), Featured speaker at AMD AI DevDay 2025 and PyTorch Conference 2025.
- π« Contact: [email protected] | Telegram | LinkedIn | Homepage
Pinned Loading
-
sgl-project/sglang
sgl-project/sglang PublicSGLang is a fast serving framework for large language models and vision language models.
-
flashinfer-ai/flashinfer
flashinfer-ai/flashinfer PublicFlashInfer: Kernel Library for LLM Serving
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.