ICCV 2023 论文和开源项目合集(papers with code)!
2160 papers accepted!
ICCV 2023 收录论文IDs:https://t.co/A0mCH8gbOi
注1:欢迎各位大佬提交issue,分享ICCV 2023论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
- Backbone
- CLIP
- MAE
- GAN
- GNN
- MLP
- NAS
- OCR
- NeRF
- DETR
- Prompt
- Diffusion Models(扩散模型)
- Prompt
- Avatars
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像分类(Medical Image Classfication)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 图像编辑(Image Editing)
- 视频编辑(Video Editing)
- 视频理解(Video Understanding)
- 人体运动生成(Human Motion Generation)
- 低光照图像增强(Low-light Image Enhancement)
- 场景文本识别(Scene Text Recognition)
- 图像检索(Image Retrieval)
- 图像融合(Image Fusion)
- 轨迹预测(Trajectory Prediction)
- 人群计数(Crowd Counting)
- Video Quality Assessment(视频质量评价)
- 其它(Others)
Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control
Paper: https://arxiv.org/abs/2303.17606
Code: https://github.com/songrise/AvatarCraft
Rethinking Mobile Block for Efficient Attention-based Models
PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization
CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation
IntrinsicNeRF: Learning Intrinsic Neural Radiance Fields for Editable Novel View Synthesis
- Homepage: https://zju3dv.github.io/intrinsic_nerf/
- Paper: https://arxiv.org/abs/2210.00647
- Code: https://github.com/zju3dv/IntrinsicNeRF
Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control
FlipNeRF: Flipped Reflection Rays for Few-shot Novel View Synthesis
- Homepage: https://shawn615.github.io/flipnerf/
- Code: https://github.com/shawn615/FlipNeRF
- Paper: https://arxiv.org/abs/2306.17723
Tri-MipRF: Tri-Mip Representation for Efficient Anti-Aliasing Neural Radiance Fields
PoseDiffusion: Solving Pose Estimation via Diffusion-aided Bundle Adjustment
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
DIRE for Diffusion-Generated Image Detection
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Introducing Language Guidance in Prompt-based Continual Learning
- Paper: https://arxiv.org/abs/2308.15827
- Code: None
Read-only Prompt Optimization for Vision-Language Few-shot Learning
Femtodet: an object detection baseline for energy versus performance tradeoffs
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
ASAG: Building Strong One-Decoder-Layer Sparse Detectors via Adaptive Sparse Anchor Generation
Cross-modal Orthogonal High-rank Augmentation for RGB-Event Transformer-trackers
- Paper: https://arxiv.org/abs/2307.04129
- Code: https://github.com/ZHU-Zhiyu/High-Rank_RGB-Event_Tracker
Segment Anything
- Homepage: https://segment-anything.com/
- Paper: https://arxiv.org/abs/2304.02643
- Code: https://github.com/facebookresearch/segment-anything
MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation
FreeCOS: Self-Supervised Learning from Fractals and Unlabeled Images for Curvilinear Object Segmentation
Residual Pattern Learning for Pixel-wise Out-of-Distribution Detection in Semantic Segmentation
Disentangle then Parse:Night-time Semantic Segmentation with Illumination Disentanglement
Towards Robust Referring Video Object Segmentation with Cyclic Relational Consensus
DVIS: Decoupled Video Instance Segmentation Framework
BoMD: Bag of Multi-label Descriptors for Noisy Chest X-ray Classification
CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection
Self-supervised Learning to Bring Dual Reversed Rolling Shutter Images Alive
Spherical Space Feature Decomposition for Guided Depth Map Super-Resolution.
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
- Homepage: https://ldkong.com/Robo3D
- Paper: https://arxiv.org/abs/2303.17597
- Code: https://github.com/ldkong1205/Robo3D
Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models
Point Contrastive Prediction with Semantic Clustering for Self-Supervised Learning on Point Cloud Videos
- Paper: https://arxiv.org/abs/2308.09247
- Code: None
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection
SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection
StreamPETR: Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation
- Paper: https://arxiv.org/abs/2304.09801
- Project: https://chongjiange.github.io/metabev.html
- Code: https://github.com/ChongjianGE/MetaBEV
Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling
SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
Rethinking Range View Representation for LiDAR Segmentation
- Homepage: https://ldkong.com/RangeFormer
- Paper: https://arxiv.org/abs/2303.05367
- Code: None
MBPTrack: Improving 3D Point Cloud Tracking with Memory Networks and Box Priors
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Simulating Fluids in Real-World Still Images
- Homepage: https://slr-sfs.github.io/
- Paper: https://arxiv.org/abs/2204.11335
- Code: https://github.com/simon3dv/SLR-SFS
Multimodal Garment Designer: Human-Centric Latent Diffusion Models for Fashion Image Editing
- Paper: https://arxiv.org/abs/2304.02051
- Code: https://github.com/aimagelab/multimodal-garment-designer
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing
- Project: https://fate-zero-edit.github.io/
- Paper: https://arxiv.org/abs/2303.09535
- Code: https://github.com/ChenyangQiQi/FateZero
BeLFusion: Latent Diffusion for Behavior-Driven Human Motion Prediction
Implicit Neural Representation for Cooperative Low-light Image Enhancement
Self-supervised Character-to-Character Distillation for Text Recognition
MRN: Multiplexed Routing Network for Incremental Multilingual Text Recognition
- Paper: https://arxiv.org/abs/2305.14758
- Code: https://github.com/simplify23/MRN
- 中文解读:https://zhuanlan.zhihu.com/p/643948935
Zero-Shot Composed Image Retrieval with Textual Inversion
DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion
EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting
Point-Query Quadtree for Crowd Counting, Localization, and More
Exploring Video Quality Assessment on User Generated Contents from Aesthetic and Technical Perspectives
MotionBERT: A Unified Perspective on Learning Human Motion Representations
- Homepage: https://motionbert.github.io/
- Paper: https://arxiv.org/abs/2210.06551
- Code: https://github.com/Walter0807/MotionBERT
Graph Matching with Bi-level Noisy Correspondence
- Paper: https://arxiv.org/pdf/2212.04085.pdf
- Code: https://github.com/Lin-Yijie/Graph-Matching-Networks/tree/main/COMMON
LDL: Line Distance Functions for Panoramic Localization
Active Neural Mapping
- Homepage: https://zikeyan.github.io/active-INR/index.html
- Paper: https://arxiv.org/abs/2308.16246
- Code: https://zikeyan.github.io/active-INR/index.html#
Reconstructing Groups of People with Hypergraph Relational Reasoning