Stars
An open-source toolbox for fast sampling of diffusion models. Official implementations of our works published in ICML, NeurIPS, CVPR.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement [ICLR 2025 Spotlight]
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
The data skeleton from "3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera" http://3dscenegraph.stanford.edu
Fully open reproduction of DeepSeek-R1
chongzhou96 / MaskCLIP
Forked from open-mmlab/mmsegmentationOfficial PyTorch implementation of "Extract Free Dense Labels from CLIP" (ECCV 22 Oral)
🔨🔨🔨(mmplot)used to draw graphs of multiple index parameters such as algorithm accuracy and speed of multiple deep learning models.
Integrate deep learning models for image classification | Backbone learning/comparison/magic modification project
Densely Captioned Images (DCI) dataset repository.
This repository contains the source code of our ICCV 2021 paper, Learning of Visual Relations: The Devil is in the Tails.
This is an pytorch implementation of Mask R-CNN on CLEVR dataset.
Implementation of the Paper Scene-Graph ViT
[ICCV 2023] HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation
[Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enlarged hidden dimension to build super frontier vision languag…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"
Official code of ACM MM2024 paper- Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection
[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training
A Pytorch Implementation of "Adaptive Image-to-video Scene Graph Generation via Knowledge Reasoning and Adversarial Learning"