Skip to content
View strint's full-sized avatar

Highlights

  • Pro

Block or report strint

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 1,732 120 Updated Mar 30, 2025

🤖 The next generation of Multi-Modal Multi-Agent platform. 👾 🦄 🔮

Python 92 4 Updated Mar 20, 2025

Count the MACs / FLOPs of your PyTorch model.

Python 4,974 531 Updated Jul 8, 2024

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,113 536 Updated Mar 28, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,332 685 Updated Mar 28, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Jupyter Notebook 3,347 311 Updated Mar 29, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,392 812 Updated Mar 1, 2025

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,687 64 Updated Mar 10, 2025

Ring attention implementation with flash attention

Python 720 61 Updated Feb 24, 2025

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,064 44 Updated Feb 23, 2025

A comprehensive toolkit for reliably locking, packing and deploying environments for ComfyUI workflows.

Python 127 14 Updated Mar 10, 2025

Tile primitives for speedy kernels

Cuda 2,197 133 Updated Mar 30, 2025
Python 5,934 971 Updated Mar 30, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 2,677 221 Updated Mar 28, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 1,059 70 Updated Mar 25, 2025

An auxiliary project analysis of the characteristics of KV in DiT Attention.

Python 28 1 Updated Nov 29, 2024

Official repository for LTX-Video

Python 3,226 284 Updated Mar 5, 2025

A debugging and profiling tool that can trace and visualize python code execution

Python 6,288 428 Updated Mar 25, 2025

Finetune Llama 3.3, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Python 36,172 2,790 Updated Mar 27, 2025

The best OSS video generation models

Python 3,054 324 Updated Jan 8, 2025

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 11,639 2,448 Updated Feb 10, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,527 262 Updated Mar 30, 2025

⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation (AAAI 2025 Oral)

Python 568 39 Updated Mar 11, 2025

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook 1,848 125 Updated Jan 13, 2025

Efficient Triton Kernels for LLM Training

Python 4,751 289 Updated Mar 30, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 816 53 Updated Mar 19, 2025

Run your GitHub Actions locally 🚀

Go 58,826 1,489 Updated Mar 29, 2025

Official inference repo for FLUX.1 models

Python 21,101 1,491 Updated Feb 6, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 12,646 1,392 Updated Mar 30, 2025
Next
Showing results