CVPR 2025 decisions are now available on OpenReview!22.1% = 2878 / 13008
注1:欢迎各位大佬提交issue,分享CVPR 2025论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: /~https://github.com/amusi/daily-paper-computer-vision
欢迎扫码加入【CVer学术交流群】,可以获取CVPR 2025等最前沿工作!这是最大的计算机视觉AI知识星球!每日更新,第一时间分享最新最前沿的计算机视觉、AIGC、扩散模型、多模态、深度学习、自动驾驶、医疗影像和遥感等方向的学习资料,快加入学起来!
- 3DGS(Gaussian Splatting)
- Avatars
- Backbone
- CLIP
- Mamba
- Embodied AI
- GAN
- GNN
- 多模态大语言模型(MLLM)
- 大语言模型(LLM)
- NAS
- OCR
- NeRF
- DETR
- 扩散模型(Diffusion Models)
- ReID(重识别)
- 长尾分布(Long-Tail)
- Vision Transformer
- 视觉和语言(Vision-Language)
- 自监督学习(Self-supervised Learning)
- 数据增强(Data Augmentation)
- 目标检测(Object Detection)
- 异常检测(Anomaly Detection)
- 目标跟踪(Visual Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像(Medical Image)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video Object Segmentation)
- 视频实例分割(Video Instance Segmentation)
- 参考图像分割(Referring Image Segmentation)
- 图像抠图(Image Matting)
- 图像编辑(Image Editing)
- Low-level Vision
- 超分辨率(Super-Resolution)
- 去噪(Denoising)
- 去模糊(Deblur)
- 自动驾驶(Autonomous Driving)
- 3D点云(3D Point Cloud)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D语义场景补全(3D Semantic Scene Completion)
- 3D配准(3D Registration)
- 3D人体姿态估计(3D Human Pose Estimation)
- 3D人体Mesh估计(3D Human Mesh Estimation)
- 医学图像(Medical Image)
- 图像生成(Image Generation)
- 视频生成(Video Generation)
- 3D生成(3D Generation)
- 视频理解(Video Understanding)
- 行为检测(Action Detection)
- 具身智能(Embodied AI)
- 文本检测(Text Detection)
- 知识蒸馏(Knowledge Distillation)
- 模型剪枝(Model Pruning)
- 图像压缩(Image Compression)
- 三维重建(3D Reconstruction)
- 深度估计(Depth Estimation)
- 轨迹预测(Trajectory Prediction)
- 车道线检测(Lane Detection)
- 图像描述(Image Captioning)
- 视觉问答(Visual Question Answering)
- 手语识别(Sign Language Recognition)
- 视频预测(Video Prediction)
- 新视点合成(Novel View Synthesis)
- Zero-Shot Learning(零样本学习)
- 立体匹配(Stereo Matching)
- 特征匹配(Feature Matching)
- 暗光图像增强(Low-light Image Enhancement)
- 场景图生成(Scene Graph Generation)
- 风格迁移(Style Transfer)
- 隐式神经表示(Implicit Neural Representations)
- 图像质量评价(Image Quality Assessment)
- 视频质量评价(Video Quality Assessment)
- 数据集(Datasets)
- 新任务(New Tasks)
- 其他(Others)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
TinyFusion: Diffusion Transformers Learned Shallow
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
Multiple Object Tracking as ID Prediction
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
- Project: https://ldkong.com/LiMoE
- Paper: https://arxiv.org/abs/2501.04004
- Code: /~https://github.com/Xiangxu-0103/LiMoE
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
AESOP: Auto-Encoded Supervision for Perceptual Image Super-Resolution
- Paper: https://arxiv.org/abs/2412.00124
- Code: /~https://github.com/2minkyulee/AESOP-Auto-Encoded-Supervision-for-Perceptual-Image-Super-Resolution
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
- Homepage: https://byteflow-ai.github.io/TokenFlow/
- Code: /~https://github.com/ByteFlow-AI/TokenFlow
- Paper:https://arxiv.org/abs/2412.03069
PAR: Parallelized Autoregressive Visual Generation
- Project: https://epiphqny.github.io/PAR-project/
- Paper: https://arxiv.org/abs/2412.15119
- Code: /~https://github.com/Epiphqny/PAR
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models
X-Dyna: Expressive Dynamic Human Image Animation
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Universal Actions for Enhanced Embodied Foundation Models
- Project: https://2toinf.github.io/UniAct/
- Paper: https://arxiv.org/abs/2501.10105
- Code: /~https://github.com/2toinf/UniAct
HVI: A New color space for Low-light Image Enhancement
- Paper: https://arxiv.org/abs/2502.20272
- Code: /~https://github.com/Fediory/HVI-CIDNet
- Demo: https://huggingface.co/spaces/Fediory/HVI-CIDNet_Low-light-Image-Enhancement_
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements