CVPR 2024 论文和开源项目合集(Papers with Code)

CVPR 2024 decisions are now available on OpenReview！

注1：欢迎各位大佬提交issue，分享CVPR 2024论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

CVPR 2019

CVPR 2020

CVPR 2021

CVPR 2022

CVPR 2023

欢迎扫码加入【CVer学术交流群】，这是最大的计算机视觉AI知识星球！每日更新，第一时间分享最新最前沿的计算机视觉、AI绘画、图像处理、深度学习、自动驾驶、医疗影像和AIGC等方向的学习资料，学起来！

【CVPR 2024 论文开源目录】

3DGS(Gaussian Splatting)
Avatars
Backbone
CLIP
MAE
Embodied AI
GAN
GNN
多模态大语言模型(MLLM)
NAS
OCR
NeRF
DETR
Prompt
Diffusion Models(扩散模型)
ReID(重识别)
长尾分布(Long-Tail)
Vision Transformer
视觉和语言(Vision-Language)
自监督学习(Self-supervised Learning)
数据增强(Data Augmentation)
目标检测(Object Detection)
目标跟踪(Visual Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像(Medical Image)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video Object Segmentation)
视频实例分割(Video Instance Segmentation)
参考图像分割(Referring Image Segmentation)
图像抠图(Image Matting)
图像编辑(Image Editing)
Low-level Vision
超分辨率(Super-Resolution)
去噪(Denoising)
去模糊(Deblur)
自动驾驶(Autonomous Driving)
3D点云(3D Point Cloud)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D目标跟踪(3D Object Tracking)
3D语义场景补全(3D Semantic Scene Completion)
3D配准(3D Registration)
3D人体姿态估计(3D Human Pose Estimation)
3D人体Mesh估计(3D Human Mesh Estimation)
医学图像(Medical Image)
图像生成(Image Generation)
视频生成(Video Generation)
视频理解(Video Understanding)
行为检测(Action Detection)
文本检测(Text Detection)
知识蒸馏(Knowledge Distillation)
模型剪枝(Model Pruning)
图像压缩(Image Compression)
异常检测(Anomaly Detection)
三维重建(3D Reconstruction)
深度估计(Depth Estimation)
轨迹预测(Trajectory Prediction)
车道线检测(Lane Detection)
图像描述(Image Captioning)
视觉问答(Visual Question Answering)
手语识别(Sign Language Recognition)
视频预测(Video Prediction)
新视点合成(Novel View Synthesis)
Zero-Shot Learning(零样本学习)
立体匹配(Stereo Matching)
特征匹配(Feature Matching)
场景图生成(Scene Graph Generation)
隐式神经表示(Implicit Neural Representations)
图像质量评价(Image Quality Assessment)
视频质量评价(Video Quality Assessment)
数据集(Datasets)
新任务(New Tasks)
其他(Others)

3DGS(Gaussian Splatting)

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

Homepage: https://city-super.github.io/scaffold-gs/
Paper: https://arxiv.org/abs/2312.00109
Code: https://github.com/city-super/Scaffold-GS

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

Homepage: https://shunyuanzheng.github.io/GPS-Gaussian
Paper: https://arxiv.org/abs/2312.02155
Code: https://github.com/ShunyuanZheng/GPS-Gaussian

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Paper: https://arxiv.org/abs/2312.02134
Code: https://github.com/huliangxiao/GaussianAvatar

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Paper: https://arxiv.org/abs/2311.14521
Code: https://github.com/buaacyw/GaussianEditor

Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction

Homepage: https://ingra14m.github.io/Deformable-Gaussians/
Paper: https://arxiv.org/abs/2309.13101
Code: https://github.com/ingra14m/Deformable-3D-Gaussians

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

Homepage: https://yihua7.github.io/SC-GS-web/
Paper: https://arxiv.org/abs/2312.14937
Code: https://github.com/yihua7/SC-GS

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Homepage: https://oppo-us-research.github.io/SpacetimeGaussians-website/
Paper: https://arxiv.org/abs/2312.16812
Code: https://github.com/oppo-us-research/SpacetimeGaussians

Avatars

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Paper: https://arxiv.org/abs/2312.02134
Code: https://github.com/huliangxiao/GaussianAvatar

Backbone

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/abs/2307.09283
Code: https://github.com/THU-MIG/RepViT

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Paper: https://arxiv.org/abs/2311.17132
Code: https://github.com/DaiShiResearch/TransNeXt

CLIP

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Paper: https://arxiv.org/abs/2312.03818
Code: https://github.com/SunzeY/AlphaCLIP

MAE

Embodied AI

EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Homepage: https://tai-wang.github.io/embodiedscan/
Paper: https://arxiv.org/abs/2312.16170
Code: https://github.com/OpenRobotLab/EmbodiedScan

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception

Homepage: https://iranqin.github.io/MP5.github.io/
Paper: https://arxiv.org/abs/2312.07472
Code: https://github.com/IranQin/MP5

LEMON: Learning 3D Human-Object Interaction Relation from 2D Images

Paper: https://arxiv.org/abs/2312.08963
Code: https://github.com/yyvhang/lemon_3d

GAN

OCR

An Empirical Study of Scaling Law for OCR

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Paper: https://arxiv.org/abs/2403.00303
Code: https://github.com/PriNing/ODM

NeRF

DETR

DETRs Beat YOLOs on Real-time Object Detection

Paper: https://arxiv.org/abs/2304.08069
Code: https://github.com/lyuwenyu/RT-DETR

Prompt

多模态大语言模型(MLLM)

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Link-Context Learning for Multimodal LLMs

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Paper: https://arxiv.org/abs/2311.17911
Code: https://github.com/shikiw/OPERA

Making Large Multimodal Models Understand Arbitrary Visual Prompts

Homepage: https://vip-llava.github.io/
Paper: https://arxiv.org/abs/2312.00784

Pink: Unveiling the power of referential comprehension for multi-modal llms

Paper: https://arxiv.org/abs/2310.00582
Code: https://github.com/SY-Xuan/Pink

NAS

ReID(重识别)

Diffusion Models(扩散模型)

InstanceDiffusion: Instance-level Control for Image Generation

Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

DeepCache: Accelerating Diffusion Models for Free

Paper: https://arxiv.org/abs/2312.00858
Code: https://github.com/horseee/DeepCache

Vision Transformer

TransNeXt: Robust Foveal Visual Perception for Vision Transformers

Paper: https://arxiv.org/abs/2311.17132
Code: https://github.com/DaiShiResearch/TransNeXt

RepViT: Revisiting Mobile CNN From ViT Perspective

Paper: https://arxiv.org/abs/2307.09283
Code: https://github.com/THU-MIG/RepViT

视觉和语言(Vision-Language)

目标检测(Object Detection)

DETRs Beat YOLOs on Real-time Object Detection

Paper: https://arxiv.org/abs/2304.08069
Code: https://github.com/lyuwenyu/RT-DETR

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

目标跟踪(Object Tracking)

语义分割(Semantic Segmentation)

Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Paper: https://arxiv.org/abs/2312.04265
Code: https://github.com/w1oves/Rein

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Paper: https://arxiv.org/abs/2311.15537
Code: https://github.com/xb534/SED

医学图像(Medical Image)

Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology

Paper: https://arxiv.org/abs/2402.17228
Code: https://github.com/DearCaat/RRT-MIL

VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis

Paper: https://arxiv.org/abs/2402.17300
Code: https://github.com/Luffy03/VoCo

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

Paper: https://arxiv.org/abs/2311.15264
Code: https://github.com/nicoboou/chada_vit

医学图像分割(Medical Image Segmentation)

自动驾驶(Autonomous Driving)

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Paper: https://arxiv.org/abs/2310.08370
Code: https://github.com/Nightmare-n/UniPAD

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Paper: https://arxiv.org/abs/2311.17663
Code: https://github.com/haomo-ai/Cam4DOcc

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection

Paper: https://arxiv.org/abs/2312.08371
Code: https://github.com/kuanchihhuang/PTT

UniMODE: Unified Monocular 3D Object Detection

Paper: https://arxiv.org/abs/2402.18573

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

Edit One for All: Interactive Batch Image Editing

Homepage: https://thaoshibe.github.io/edit-one-for-all
Paper: https://arxiv.org/abs/2401.10219
Code: https://github.com/thaoshibe/edit-one-for-all

视频编辑(Video Editing)

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

Homepage: https://maskint.github.io
Paper: https://arxiv.org/abs/2312.12468

Low-level Vision

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

超分辨率(Super-Resolution)

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Paper: https://arxiv.org/abs/2402.19387
Code: https://github.com/lbc12345/SeD

去噪(Denoising)

图像去噪(Image Denoising)

图像生成(Image Generation)

InstanceDiffusion: Instance-level Control for Image Generation

Homepage: https://people.eecs.berkeley.edu/~xdwang/projects/InstDiff/
Paper: https://arxiv.org/abs/2402.03290
Code: https://github.com/frank-xwang/InstanceDiffusion

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Homepage: https://eclipse-t2i.vercel.app/
Paper: https://arxiv.org/abs/2312.04655
Code: https://github.com/eclipse-t2i/eclipse-inference

Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper: https://arxiv.org/abs/2401.01952

Residual Denoising Diffusion Models

Paper: https://arxiv.org/abs/2308.13712
Code: https://github.com/nachifur/RDDM

UniGS: Unified Representation for Image Generation and Segmentation

Paper: https://arxiv.org/abs/2312.01985

视频生成(Video Generation)

Vlogger: Make Your Dream A Vlog

Paper: https://arxiv.org/abs/2401.09414
Code: https://github.com/Vchitect/Vlogger

VBench: Comprehensive Benchmark Suite for Video Generative Models

Homepage: https://vchitect.github.io/VBench-project/
Paper: https://arxiv.org/abs/2311.17982
Code: https://github.com/Vchitect/VBench

视频理解(Video Understanding)

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

知识蒸馏(Knowledge Distillation)

Logit Standardization in Knowledge Distillation

Efficient Dataset Distillation via Minimax Diffusion

Paper: https://arxiv.org/abs/2311.15529
Code: https://github.com/vimar-gu/MinimaxDiffusion

视频质量评价(Video Quality Assessment)

KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos

Homepage: https://lixinustc.github.io/projects/KVQ/
Paper: https://arxiv.org/abs/2402.07220
Code: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024

其他(Others)

Object Recognition as Next Token Prediction

Paper: https://arxiv.org/abs/2312.02142
Code: https://github.com/kaiyuyue/nxtp

ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks

Paper: https://arxiv.org/abs/2306.14525
Code: https://parameternet.github.io/

Seamless Human Motion Composition with Blended Positional Encodings

Paper: https://arxiv.org/abs/2402.15509
Code: https://github.com/BarqueroGerman/FlowMDM

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update

Homepage: https://clova-tool.github.io/
Paper: https://arxiv.org/abs/2312.10908

MoMask: Generative Masked Modeling of 3D Human Motions

Paper: https://arxiv.org/abs/2312.00063
Code: https://github.com/EricGuo5513/momask-codes

Amodal Ground Truth and Completion in the Wild

Homepage: https://www.robots.ox.ac.uk/~vgg/research/amodal/
Paper: https://arxiv.org/abs/2312.17247
Code: https://github.com/Championchess/Amodal-Completion-in-the-Wild

Improved Visual Grounding through Self-Consistent Explanations

Paper: https://arxiv.org/abs/2312.04554
Code: https://github.com/uvavision/SelfEQ

Name		Name	Last commit message	Last commit date
Latest commit History 641 Commits
CVPR2019-Papers-with-Code.md		CVPR2019-Papers-with-Code.md
CVPR2020-Papers-with-Code.md		CVPR2020-Papers-with-Code.md
CVPR2021-Papers-with-Code.md		CVPR2021-Papers-with-Code.md
CVPR2022-Papers-with-Code.md		CVPR2022-Papers-with-Code.md
CVPR2023-Papers-with-Code.md		CVPR2023-Papers-with-Code.md
CVer学术交流群.png		CVer学术交流群.png
README.md		README.md
master		master

renjianyanhuo/CVPR2024-Papers-with-Code

Folders and files

Latest commit

History

Repository files navigation

CVPR 2024 论文和开源项目合集(Papers with Code)

【CVPR 2024 论文开源目录】

3DGS(Gaussian Splatting)

Avatars

Backbone

CLIP

MAE

Embodied AI

GAN

OCR

NeRF

DETR

Prompt

多模态大语言模型(MLLM)

NAS

ReID(重识别)

Diffusion Models(扩散模型)

Vision Transformer

视觉和语言(Vision-Language)

目标检测(Object Detection)

目标跟踪(Object Tracking)

语义分割(Semantic Segmentation)

医学图像(Medical Image)

医学图像分割(Medical Image Segmentation)

自动驾驶(Autonomous Driving)

3D点云(3D-Point-Cloud)

3D目标检测(3D Object Detection)

3D语义分割(3D Semantic Segmentation)

图像编辑(Image Editing)

视频编辑(Video Editing)

Low-level Vision

超分辨率(Super-Resolution)

去噪(Denoising)

图像去噪(Image Denoising)

图像生成(Image Generation)

视频生成(Video Generation)

视频理解(Video Understanding)

知识蒸馏(Knowledge Distillation)

视频质量评价(Video Quality Assessment)

其他(Others)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages