-
SiliconFlow Inc
- ShenZhen
- https://strint.github.io/
Highlights
- Pro
Stars
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
🤖 The next generation of Multi-Modal Multi-Agent platform. 👾 🦄 🔮
Count the MACs / FLOPs of your PyTorch model.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Cost-efficient and pluggable Infrastructure components for GenAI inference
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
Ring attention implementation with flash attention
Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
A comprehensive toolkit for reliably locking, packing and deploying environments for ComfyUI workflows.
Tile primitives for speedy kernels
nanobind: tiny and efficient C++/Python bindings
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
An auxiliary project analysis of the characteristics of KV in DiT Attention.
A debugging and profiling tool that can trace and visualize python code execution
Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
FlashInfer: Kernel Library for LLM Serving
⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025 Oral)
OneDiff: An out-of-the-box acceleration library for diffusion models.
Efficient Triton Kernels for LLM Training
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Official inference repo for FLUX.1 models
SGLang is a fast serving framework for large language models and vision language models.