历年综述论文分类汇总戳这里↘️ CV-Surveys施工中~~~~~~~~~~
- Learning Anchor Transformations for 3D Garment Animation
⭐code - Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
⭐code - Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
⭐code - CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
- Hierarchical Fine-Grained Image Forgery Detection and Localization
⭐code - Detecting and Grounding Multi-Modal Media Manipulation
⭐code虚假信息检测 - Evading DeepFake Detectors via Adversarial Statistical Consistency
- Re-thinking Federated Active Learning based on Inter-class Diversity
- Box-Level Active Detection
- Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision
⭐code - Self-Supervised 3D Scene Flow Estimation Guided by Superpoints
- Unsupervised Cumulative Domain Adaptation for Foggy Scene Optical Flow
- MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
⭐code - Large-scale Training Data Search for Object Re-identification
⭐code - Adaptive Sparse Pairwise Loss for Object Re-Identification
- 缺陷定位
- 工业异常检测
- Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- Context-Based Trit-Plane Coding for Progressive Image Compression
⭐code - Learned Image Compression with Mixed Transformer-CNN Architectures
⭐code - LVQAC: Lattice Vector Quantization Coupled with Spatially Adaptive Companding for Efficient Learned Image Compression
- Optimization-Inspired Cross-Attention Transformer for Compressive Sensing
⭐code - 视频压缩
- NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
🏠project - FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization
🏠project - Local Implicit Ray Function for Generalizable Radiance Field Representation
⭐code - FitMe: Deep Photorealistic 3D Morphable Model Avatars
⭐code - Pointersect: Neural Rendering with Cloud-Ray Intersection
- StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
🏠project - Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention
⭐code - WildLight: In-the-wild Inverse Rendering with a Flashlight
⭐code - Grid-guided Neural Radiance Fields for Large Urban Scenes
⭐code - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images
- Inverse Rendering of Translucent Objects using Physical and Neural Renderers
- HandNeRF: Neural Radiance Fields for Animatable Interacting Hands
- ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field
- NeRF-DS: Neural Radiance Fields for Dynamic Specular Objects
⭐code - JAWS: Just A Wild Shot for Cinematic Transfer in Neural Radiance Fields
🏠project - FlexNeRF: Photorealistic Free-viewpoint Rendering of Moving Humans from Sparse Views
⭐code - NeFII: Inverse Rendering for Reflectance Decomposition with Near-Field Indirect Illumination
- MonoHuman: Animatable Human Neural Field from Monocular Video
⭐code - [PlenVDB: A Memory Efficient VDB-Based Radiance Fields for Fast Training and Rendering]
在 iPhone12 手机上达到了对于输出 1280x720 分辨率的画面每秒 30 帧的速率。 - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
⭐code - Multi-Space Neural Radiance Fields
- Conditional Generation of Audio from Video via Foley Analogies
⭐code - 扬声器检测
- 视听语音识别
- 视听定位
- 音频源分离
- 声音合成
- 电影音频描述
- 从声音中生成场景图像
- 视听异常检测
- Frequency-Modulated Point Cloud Rendering with Easy Editing
⭐code - Learning Neural Duplex Radiance Fields for Real-Time View Synthesis
🏠project - ReLight My NeRF: A Dataset for Novel View Synthesis and Relighting of Real World Objects
⭐code - Balanced Spherical Grid for Egocentric View Synthesis
- Progressively Optimized Local Radiance Fields for Robust View Synthesis
⭐code - F$^{2}$-NeRF: Fast Neural Radiance Field Training with Free Camera Trajectories
⭐code - Enhanced Stable View Synthesis
- Consistent View Synthesis with Pose-Guided Diffusion Models
⭐code - Learning to Render Novel Views from Wide-Baseline Stereo Pairs
⭐code - Painting 3D Nature in 2D: View Synthesis of Natural Scenes From a Single Semantic Mask
🏠project - NoPe-NeRF: Optimising Neural Radiance Field With No Pose Prior
🏠[project](https://nope-nerf. active.vision)
- xFBD: Focused Building Damage Dataset and Analysis
建筑物损坏数据集 - Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo
🌻dataset - Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
🌻dataset - CUDA: Convolution-based Unlearnable Datasets
🌻dataset - MVImgNet: A Large-scale Dataset of Multi-view Images
🌻dataset - V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception
Vehicle-to-Vehicle(V2V)感知 - Polynomial Implicit Neural Representations For Large Diverse Datasets
🌻dataset - MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset
🌻dataset - RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset
🌻dataset - Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
- Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts
⭐code - ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data
⭐code - CelebV-Text: A Large-Scale Facial Text-Video Dataset
人脸文本到视频生成 - Towards Artistic Image Aesthetics Assessment: a Large-scale Dataset and a New Method
艺术图像美学评估 - GeoNet: Benchmarking Unsupervised Adaptation across Geographies
⭐code - PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout
⭐code - CIMI4D: A Large Multimodal Climbing Motion Dataset under Human-scene Interactions
攀爬动作数据集 - Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
- AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection
🏠project公共短视频镜头边界检测数据集 - V2X-Seq: A Large-Scale Sequential Dataset for Vehicle-Infrastructure Cooperative Perception and Forecasting
⭐code - WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
- 手语识别
- 手语检索
- EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
⭐code - DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback
- DyLiN: Making Light Field Networks Dynamic
⭐code - Learning Rotation-Equivariant Features for Visual Correspondence
- Diversity-Measurable Anomaly Detection
- SimpleNet: A Simple Network for Image Anomaly Detection and Localization
⭐code - WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
- DG
- Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
- Improved Test-Time Adaptation for Domain Generalization
⭐code - Modality-Agnostic Debiasing for Single Domain Generalization
- Neuron Structure Modeling for Generalizable Remote Physiological Measurement
⭐code - Sharpness-Aware Gradient Matching for Domain Generalization
⭐code - Improving Generalization with Domain Convex Game
- Generalist: Decoupling Natural and Robust Generalization
⭐code - ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
⭐code - Deep Frequency Filtering for Domain Generalization
- Progressive Random Convolutions for Single Domain Generalization
- Meta-causal Learning for Single Domain Generalization
- DA
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
⭐code - DATE: Domain Adaptive Product Seeker for E-commerce
⭐code - Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective
- Upcycling Models under Domain and Category Shift
⭐code - C-SFDA: A Curriculum Learning Aided Self-Training Framework for Efficient Source Free Domain Adaptation
- A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
⭐code - TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
⭐code - [OSAN: A One-Stage Alignment Network to Unify Multimodal Alignment and Unsupervised Domain Adaptation]域适应(论文未公开)
- Feature Alignment and Uniformity for Test Time Adaptation
- Guiding Pseudo-labels with Uncertainty Estimation for Test-Time Adaptation
- Prototype-based Embedding Network for Scene Graph Generation
⭐code - Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
- DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
- Ensemble-based Blackbox Attacks on Dense Prediction
⭐code - Probabilistic Prompt Learning for Dense Prediction
- 密集检测
- Make Landscape Flatter in Differentially Private Federated Learning
- The Resource Problem of Using Linear Layer Leakage Attack in Federated Learning
- Rethinking Federated Learning With Domain Shift: A Prototype View
- 类增量学习
- Dense Network Expansion for Class Incremental Learning
- Class-Incremental Exemplar Compression for Class-Incremental Learning
⭐code - Learning with Fantasy: Semantic-Aware Virtual Contrastive Constraint for Few-Shot Class-Incremental Learning
⭐code - On the Stability-Plasticity Dilemma of Class-Incremental Learning
- Feature Separation and Recalibration for Adversarial Robustness
⭐code - CFA: Class-wise Calibrated Fair Adversarial Training
⭐code - 黑盒
- 对抗样本
- 后门攻击
- 对抗攻击
- Adversarial Attack with Raindrops
- Transferable Adversarial Attacks on Vision Transformers with Token Gradient Regularization
- StyLess: Boosting the Transferability of Adversarial Examples
- Re-thinking Model Inversion Attacks Against Deep Neural Networks
⭐code - Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
⭐code - Jedi: Entropy-based Localization and Removal of Adversarial Patches
- Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
⭐code - Computationally Budgeted Continual Learning: What Does Matter?
⭐code - Achieving a Better Stability-Plasticity Trade-Off via Auxiliary Networks in Continual Learning
- Preserving Linear Separability in Continual Learning by Backward Feature Projection
- Regularizing Second-Order Influences for Continual Learning
⭐code - [Rethinking Gradient Projection Continual Learning: Stability / Plasticity Feature Space Decoupling]持续学习(论文未公开)
- Exploring Data Geometry for Continual Learning
- PCR: Proxy-based Contrastive Replay for Online Class-Incremental Continual Learning
- Meta-Learning with a Geometry-Adaptive Preconditioner
⭐code元学习 - Meta Omnium: A Benchmark for General-Purpose Learning-to-Learn
- Twin Contrastive Learning with Noisy Labels
⭐code - Revisiting Multimodal Representation in Contrastive Learning: From Patch and Token Embeddings to Finite Discrete Tokens
- Rethinking Optical Flow from Geometric Matching Consistent Perspective
⭐code - DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling
- AnyFlow: Arbitrary Scale Optical Flow with Implicit Neural Representation
- TransFlow: Transformer as Flow Learner
- 场景文本检测
- 表格结构识别
- 字体生成
- 手写文本生成
- 矢量字体合成
- 生成图形文档
- Detecting Human-Object Contact in Images
🏠project - Category Query Learning for Human-Object Interaction Classification
- Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
- Relational Context Learning for Human-Object Interaction Detection
- HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
⭐code - ViPLO: Vision Transformer based Pose-Conditioned Self-Loop Graph for Human-Object Interaction Detection
⭐code - Visibility Aware Human-Object Interaction Tracking from Single RGB Camera
🏠project - Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
- 双手交互
- 手物交互
- FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
⭐code - DeAR: Debiasing Vision-Language Models with Additive Residuals
- Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
- Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding
⭐code - VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
- MAGVLT: Masked Generative Vision-and-Language Transformer
- Visual-Language Prompt Tuning with Knowledge-guided Context Optimization
- Top-Down Visual Attention from Analysis by Synthesis
🏠project - Accelerating Vision-Language Pretraining with Free Language Modeling
⭐code - Multi-Modal Representation Learning with Text-Driven Soft Masks
- Fine-tuned CLIP models are efficient video learners
⭐code - MaPLe: Multi-modal Prompt Learning
⭐code - Learning to Name Classes for Vision and Language Models
- Clover: Towards A Unified Video-Language Alignment and Fusion Model
Clover 视频-文本预训练模型在 DiDeMo、MSRVTT 和 LSMDC 三个文本-视频检索任务上取得了 zero-shot 及 finetune performance 的最佳表现;在 8 个主流的视频问答 benchmark 上也达到了新的 state-of-the-art。 - VLN
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
🏠project - Lana: A Language-Capable Navigator for Instruction Following and Generation
⭐code - KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
⭐code - Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
- Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
- SimVQA: Exploring Simulated Environments for Visual Question Answering
🏠project - MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
- Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering
⭐code - MD-VQA: Multi-Dimensional Quality Assessment for UGC Live Videos
- Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
- SimVQA: Exploring Simulated Environments for Visual Question Answering
- 机器人
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations
- DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects
🏠project - Learning Human-to-Robot Handovers from Point Clouds
⭐code - Neural Volumetric Memory for Visual Locomotion Control
⭐code - Affordances from Human Videos as a Versatile Representation for Robotics
⭐code - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models机器人
- 机器手抓取
- Visual Navigation(视觉导航)
- 虚拟试穿
- Visual Localization(视觉定位)
- CAP-VSTNet: Content Affinity Preserved Versatile Style Transfer
- StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
⭐code - Modernizing Old Photos Using Multiple References via Photorealistic Style Transfer
⭐code - Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer
- 物体姿势估计
- Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation
- HS-Pose: Hybrid Scope Feature Extraction for Category-level Object Pose Estimation
- TTA-COPE: Test-Time Adaptation for Category-Level Object Pose Estimation
🏠project - IMP: Iterative Matching and Pose Estimation with Adaptive Pooling
- 6D
- 动物姿态估计
- Equiangular Basis Vectors
⭐code - Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
- Boosting Verified Training for Robust Image Classifications via Abstraction
⭐code - Semantic Prompt for Few-Shot Image Recognition
- Regularization of polynomial networks for image recognition
⭐code - Active Finetuning: Exploiting Annotation Budget in the Pretraining-Finetuning Paradigm
⭐code - Dynamic Conceptional Contrastive Learning for Generalized Category Discovery
⭐code - Learning Bottleneck Concepts in Image Classification
⭐code - Learning Partial Correlation based Deep Visual Representation for Image Classification
- 小样本分类
- 细粒度
- 长尾分类
- 长尾视觉识别
- SuperDisco: Super-Class Discovery Improves Visual Recognition for the Long-Tail
- Long-tailed Visual Recognition via Gaussian Clouded Logit Adjustment
⭐code - Global and Local Mixture Consistency Cumulative Learning for Long-tailed Visual Recognitions
⭐code - Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
- 多标签分类
- 多标签识别
- 多视觉分类
- OPE-SR: Orthogonal Position Encoding for Designing a Parameter-free Upsampling Module in Arbitrary-scale Image Super-Resolution
- CABM: Content-Aware Bit Mapping for Single Image Super-Resolution Network with Large Input
- Super-Resolution Neural Operator
⭐code - Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution
- Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution
- Human Guided Ground-truth Generation for Realistic Image Super-resolution
⭐code - Implicit Diffusion Models for Continuous Super-Resolution
- [Better "CMOS" Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution]超分辨率(论文未公开)
- Omni Aggregation Networks for Lightweight Image Super-Resolution
- 文本图像超分辨率
- Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval
⭐code - Revisiting Self-Similarity: Structural Embedding for Image Retrieval
⭐[code](https://github. com/sungonce/SENet) - 基于草图的图像检索
- 文本-视频检索
- 视频-文本
- 多模态检索
- Freestyle Layout-to-Image Synthesis
⭐code - Few-shot Semantic Image Synthesis with Class Affinity Transfer图像合成
- Zero-shot Generative Model Adaptation via Image-specific Prompt Learning
⭐code - NoisyTwins: Class-Consistent and Diverse Image Generation Through StyleGANs
🏠project - TopNet: Transformer-based Object Placement Network for Image Compositing
- 基于草图生成
- 图像-视频合成
- 海报生成
- 文本-图像合成
- prompting
- 生成
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
🏠project - Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
🏠project - Exploring Incompatible Knowledge Transfer in Few-shot Image Generation
- Picture That Sketch: Photorealistic Image Generation From Abstract Sketches
🏠project - DiffCollage: Parallel Generation of Large Content with Diffusion Models
🏠project - Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
⭐code - LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation
- LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
- 视频生成
- 自动驾驶
- Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving
- ReasonNet: End-to-End Driving with Temporal and Global Reasoning
- LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation
⭐code - Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
⭐code - Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving
- MSeg3D: Multi-Modal 3D Semantic Segmentation for Autonomous Driving
⭐code - 轨迹预测
- IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- Leapfrog Diffusion Model for Stochastic Trajectory Prediction
⭐code - Uncovering the Missing Pattern: Unified Framework Towards Trajectory Imputation and Prediction
⭐code - FEND: A Future Enhanced Distribution-Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
- Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
- Place Recognition
- 人员检索
- 可见光-红外人员重识别(VIReID)
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
⭐code - Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
- PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification可见光-红外人员重识别(VI-ReID)
- Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification
- 行人检测
- 人群计数
- Towards Trustable Skin Cancer Diagnosis via Rewriting Model’s Decision
- Hierarchical discriminative learning improves visual representations of biomedical microscopy
🏠project - Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
- Image Quality-aware Diagnosis via Meta-knowledge Co-embedding
- METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens医学诊断
- 3D医学
- 图像配准
- 图像分类
- 报告生成
- 医学影像分割
- Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
- SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
- Fair Federated Medical Image Segmentation via Client Contribution Estimation
- Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
⭐code - Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
- Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
- 医学影像分析
- 肿瘤分割
- 医学影像报告生成
- 切片分析
- 细胞检测、跟踪与计数
- 无监督学习
- 自监督
- Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
- Correlational Image Modeling for Self-Supervised Visual Pre-Training
- Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
👍CVPR 2023 深挖无标签数据价值!自监督学习框架SOLIDER:用于以人为中心的视觉 - Mixed Autoencoder for Self-supervised Visual Representation Learning
- Siamese DETR
⭐code - Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
- 半监督
- Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
🏠project - AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images
⭐code - Generic-to-Specific Distillation of Masked Autoencoders
⭐code - BiFormer: Vision Transformer with Bi-Level Routing Attention
⭐code - Making Vision Transformers Efficient from A Token Sparsification View
- Dual-path Adaptation from Image to Video Transformers
⭐code - Spherical Transformer for LiDAR-based 3D Recognition
⭐code - MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
⭐code - Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers
- Learning Expressive Prompting With Residuals for Vision Transformers
- SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
🏠project - Visual Dependency Transformers: Dependency Tree Emerges from Reversed AttentionTransformer
- Token Boosting for Robust Self-Supervised Visual Transformer Pre-trainingTransformer
- Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
⭐code - RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
⭐code - DropKey
👍CVPR 2023|两行代码高效缓解视觉Transformer过拟合,美图&国科大联合提出正则化方法DropKey - Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers
⭐code - EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention
⭐code - TrojViT: Trojan Insertion in Vision Transformers
- Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
- Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
⭐code - StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
- VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
- Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
🏠project - 视频时刻检索
- 视频高亮检测
- 视频帧插值
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
⭐code - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation
⭐code - Joint Video Multi-Frame Interpolation and Deblurring under Unknown Exposure Time
⭐code - BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
- Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation
- 视频合成
- 视频预测
- 视频理解
- Selective Structured State-Spaces for Long-Form Video Understanding
- How you feelin'? Learning Emotions and Mental States in Movie Scenes
⭐code - System-status-aware Adaptive Network for Online Streaming Video Understanding
- Streaming Video Model
⭐code - Procedure-Aware Pretraining for Instructional Video Understanding
⭐code - Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations
⭐code - Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
⭐code - [NewsNet: A Novel Benchmark for Hierarchical Temporal Segmentation]视频理解(论文未公开)
- 视频分类
- 视频描述
- 视频摘要
- 视频识别
- Video Deflickering(去闪烁)
- 时间句子定位(TSG)
- Improving GAN Training via Feature Space Shrinkage
⭐code - CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing
- Spider GAN: Leveraging Friendly Neighbors to Accelerate GAN Training
- NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs
⭐code - Graph Transformer GANs for Graph-Constrained House Generation
- Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
⭐code - Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis
⭐code - VIVE3D: Viewpoint-Independent Video Editing using 3D-Aware GANs
⭐code - Discriminator-Cooperated Feature Map Distillation for GAN Compression
⭐code - Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
⭐code - 图像-文本合成
- DSI2I: Dense Style for Unpaired Image-to-Image Translation
- Fix the Noise: Disentangling Source Feature for Controllable Domain Translation
⭐code - 3D-Aware Multi-Class Image-to-Image Translation with NeRFs
- LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data
🏠project - 图像翻译
- Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition
- BlendFields: Few-Shot Example-Driven Facial Modeling
⭐code - Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition
👍CVPR 2023 | 人脸识别路漫漫:清华、北大等提出AT3D人脸识别系统攻击方法 - Collaborative Diffusion for Multi-Modal Face Generation and Editing
⭐code - Recognizability Embedding Enhancement for Very Low-Resolution Face Recognition and Quality Estimation
- DiffusionRig: Learning Personalized Priors for Facial Appearance Editing
⭐code - [Probabilistic Knowledge Distillation of Face Ensembles]人脸(论文未公开)
- DCFace: Synthetic Face Generation with Dual Condition Diffusion Model
⭐code - 3D 人脸
- Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images
- Learning a 3D Morphable Face Reflectance Model from Low-cost Data
🏠project - NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images
⭐code - FaceLit: Neural 3D Relightable Faces
- [Learning Neural Proto-face Field for Disentangled 3D Face Modeling In the Wild]人脸(论文未公开)
- 人脸重建
- 人脸恢复
- 人脸对齐
- 人脸匿名化
- 裸眼年龄识别
- 情绪识别
- 人像照明
- 人脸活体检测
- Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment
- Instance-Aware Domain Generalization for Face Anti-Spoofing
⭐code - [Instance-Aware Domain Generalization for Face Anti-Spoofing]人脸(论文未公开)
- 说话头
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
⭐code - Implicit Neural Head Synthesis via Controllable Local Deformation Fields
- Identity-Preserving Talking Face Generation with Landmark and Appearance Priors
⭐code - Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert
- High-Fidelity and Freely Controllable Talking Head Video Generation
🏠project - [High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning]说话人脸生成(论文未公开)
- GANHead: Towards Generative Animatable Neural Head Avatars
⭐code - One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
- OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering
- 人脸分割
- 眨眼检测
- 三维头像生成
- 人脸表情识别
- [Rethinking the Learning Paradigm for Dynamic Facial Expression Recognition]人脸(论文未公开)
- 微表情识别
- 人脸合成
- PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
🏠project - Patch-based 3D Natural Scene Generation from a Single Example
🏠project - Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning
⭐code - Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching
⭐code - SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field
⭐code - 3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
⭐code - DynamicStereo: Consistent Dynamic Depth from Stereo Videos
🏠project - 3D Concept Learning and Reasoning from Multi-View Images
🏠project - PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360$^{\circ}$
⭐code - Persistent Nature: A Generative Model of Unbounded 3D Worlds
🏠project - TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision
- Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization
- Robust Outlier Rejection for 3D Registration With Variational Bayes
⭐code - On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks
⭐code - SUDS: Scalable Urban Dynamic Scenes
🏠project - Understanding and Improving Features Learned in Deep Functional Maps
⭐code - TMO: Textured Mesh Acquisition of Objects with a Mobile Device by using Differentiable Rendering
⭐code - Generalizable Local Feature Pre-training for Deformable Shape Analysis
⭐code - CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
🏠project - CCuantuMM: Cycle-Consistent Quantum-Hybrid Matching of Multiple Shapes
🏠project - HOLODIFFUSION: Training a 3D Diffusion Model using 2D Images
⭐code - Multi-View Azimuth Stereo via Tangent Space Consistency
⭐code - 3D Line Mapping Revisited
⭐code - NeRF-Supervised Deep Stereo
⭐code - Robust Outlier Rejection for 3D Registration with Variational Bayes三维
- Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
- Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
- 三维重建
- Neural Lens Modeling
⭐code - Learning Articulated Shape with Keypoint Pseudo-labels from Web Images
⭐code - 3D Shape Reconstruction of Semi-Transparent Worms
- Power Bundle Adjustment for Large-Scale 3D Reconstruction
- PET-NeuS: Positional Encoding Tri-Planes for Neural Surfaces
⭐code - AutoRecon: Automated 3D Object Discovery and Reconstruction
⭐code - 3D Registration with Maximal Cliques
- 3D shape reconstruction of semi-transparent worms
- VisFusion: Visibility-aware Online 3D Scene Reconstruction from Videos
⭐code - NeUDF: Leaning Neural Unsigned Distance Fields with Volume Rendering
🏠project - ShapeClipper: Scalable 3D Shape Learning from Single-View Images via Geometric and CLIP-based Consistency
🏠project - Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects
⭐code - PAniC-3D: Stylized Single-view 3D Reconstruction from Portraits of Anime Characters
⭐code - Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly
- MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices
🏠project - Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container
⭐code - SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates
⭐code - MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
🏠project - Scalable, Detailed and Mask-Free Universal Photometric Stereo
⭐code - NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
- Neural Lens Modeling
- 深度估计
- Fully Self-Supervised Depth Estimation from Defocus Clue
⭐code - iDisc: Internal Discretization for Monocular Depth Estimation
🏠project - HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions
🏠project - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth Estimation in Dynamic Scenes
⭐code - Temporally Consistent Online Depth Estimation Using Point-Based Fusion
🏠project - DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
⭐code - Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
👍CVPR2023 | 轻量高效的自监督深度估计框架Lite-Mono
- Fully Self-Supervised Depth Estimation from Defocus Clue
- 深度补全
- 室内场景重建
- 场景重建
- 三维形状分类
- 手势
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
⭐code3D交互手势姿势估计 - Neural Voting Field for Camera-Space 3D Hand Pose Estimation
- AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
⭐code - 音频驱动的联合语音手势生成
- 3D手势合成
- 手部重建
- ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
⭐code - Overcoming the Trade-off Between Accuracy and Plausibility in 3D Hand Shape Reconstruction
- gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction
⭐code - MeMaHand: Exploiting Mesh-Mano Interaction for Single Image Two-Hand Reconstruction
- POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
- ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
- 3D手部恢复
- A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
- 人体
- DistilPose: Tokenized Pose Regression with Heatmap Distillation
- PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
⭐code - Human Pose as Compositional Tokens
⭐code - Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
- Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation
- HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation
⭐code - Human Pose Estimation in Extremely Low-Light Conditions
- 3D HPE
- PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
- NIKI: Neural Inverse Kinematics with Invertible Neural Networks for 3D Human Pose and Shape Estimation
⭐code - GFPose: Learning 3D Human Pose Prior With Gradient Fields
🏠project - PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
⭐code - 3D Human Pose Estimation via Intuitive Physics
- 4D HPE
- 网格恢复
- 三维人体网格估计
- 三维人体网格重建
- 3D人体重建
- 多人姿态预测
- 人体解析
- Prompt-Guided Zero-Shot Anomaly Action Recognition using Pretrained Deep Skeleton Features
- Learning Action Changes by Measuring Verb-Adverb Textual Relationships
⭐code - STMixer: A One-Stage Sparse Action Detector
- AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation
- Search-Map-Search: A Frame Selection Paradigm for Action Recognition
- On the Benefits of 3D Pose and Tracking for Human Action Recognition
⭐code - MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
⭐code - 基于骨架的动作识别
- Learning Discriminative Representations for Skeleton Based Action Recognition
- Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition
🏠project - 3Mformer: Multi-order Multi-mode Transformer for Skeletal Action Recognition
- HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions
- 基于关键点的动作识别
- 时序动作识别
- 开集动作识别
- 基于MoCap的动作识别
- 小样本动作识别
- 半监督动作识别
- 时序动作定位
- Neural Intrinsic Embedding for Non-rigid Point Cloud Matching
- SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds
- GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
- SCPNet: Semantic Scene Completion on Point Cloud
- NeuralEditor: Editing Neural Radiance Fields via Manipulating Point Clouds
⭐code - Rotation-Invariant Transformer for Point Cloud Matching
- Recognizing Rigid Patterns of Unlabeled Point Clouds by Complete and Continuous Isometry Invariants with no False Negatives and no False Positives
🏠project - PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos
- VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud
⭐code - Unsupervised Inference of Signed Distance Functions from Single Sparse Point Clouds without Learning Priors
⭐code - Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions
⭐code - Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis
- Spatiotemporal Self-supervised Learning for Point Clouds in the Wild
⭐code - NerVE: Neural Volumetric Edges for Parametric Curve Extraction from Point Cloud
⭐code - IterativePFN: True Iterative Point Cloud Filtering
⭐code - Fast Point Cloud Generation With Straight Flows
- 3D点云
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
⭐code - MarS3D: A Plug-and-Play Motion-Aware Model for Semantic Segmentation on Multi-Scan 3D Point Clouds
⭐code - NeuralPCI: Spatio-temporal Neural Field for 3D Point Cloud Multi-frame Non-linear Interpolation
⭐code - Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
- Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
- 点云实例分割
- 点云分类
- 点云补全
- 点云配准
- 点云理解
- 点云重建
- Propagate And Calibrate: Real-time Passive Non-line-of-sight Tracking
⭐code - Joint Visual Grounding and Tracking with Natural Language Specification
⭐code - Generalized Relation Modeling for Transformer Tracking
⭐code - SeqTrack: Sequence to Sequence Learning for Visual Object Tracking
- Tracking through Containers and Occluders in the Wild
🏠project - DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks
⭐code - CXTrack: Improving 3D Point Cloud Tracking With Contextual Information
- 多目标跟踪
- 多模态跟踪
- Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR
⭐code - Multiclass Confidence and Localization Calibration for Object Detection
⭐code - Mobile User Interface Element Detection Via Adaptively Prompt Tuning
- DynamicDet: A Unified Dynamic Architecture for Object Detection
⭐code - ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection
⭐code - Curricular Object Manipulation in LiDAR-based Object Detection
⭐code - STDLens: Model Hijacking-resilient Federated Learning for Object Detection
⭐code - What Can Human Sketches Do for Object Detection?
⭐code - Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects
⭐code - Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection
⭐code - Learned Two-Plane Perspective Prior based Image Resampling for Efficient Object Detection
- T-SEA: Transfer-based Self-Ensemble Attack on Object Detection
👍CVPR 2023 | 北大提出T-SEA: 自集成策略实现更强的黑盒攻击迁移性 - Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
- Learning to Name Classes for Vision and Language Models
- Universal Instance Perception as Object Discovery and Retrieval
⭐code - Continual Detection Transformer for Incremental Object Detection目标检测
- Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
⭐code目标检测 - 开放词汇式目标检测
- Aligning Bag of Regions for Open-Vocabulary Object Detection
⭐code - Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
- Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
⭐code - CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching
- DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
- Aligning Bag of Regions for Open-Vocabulary Object Detection
- 开放世界目标检测
- 目标定位
- Open-World检测
- 3D OD
- Virtual Sparse Convolution for Multimodal 3D Object Detection
⭐code - LinK: Linear Kernel for LiDAR-based 3D Perception
⭐code - PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds
- PVT-SSD: Single-Stage 3D Object Detector with Point-Voxel Transformer
⭐code - 3D Video Object Detection with Learnable Object-Centric Global Optimization
⭐code - Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection
⭐code - X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection
⭐code - Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving
- Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
⭐code - Viewpoint Equivariance for Multi-View 3D Object Detection
⭐code - Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving
⭐code - Collaboration Helps Camera Overtake LiDAR in 3D Detection
⭐code - OcTr: Octree-based Transformer for 3D Object Detection
- MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
⭐code - MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
- MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training
⭐code - NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
- VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
⭐code - Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
⭐code - LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
⭐code - PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
⭐code - CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
⭐code - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection
⭐code - Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
- Virtual Sparse Convolution for Multimodal 3D Object Detection
- 端到端目标检测
- 半监督目标检测
- 弱监督目标检测
- 小样本目标检测
- NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging
- Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection
- Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection
- DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
- 域适应目标检测
- 显著目标检测
- 红外目标检测
- 伪装目标检测
- 密集目标检测
- 协同目标检测
- 目标发现
- 小目标检测
- 视频字幕
- 图像字幕
- Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models
⭐code - Tunable Convolutions with Parametric Multi-Loss Optimization
- 阴影去除
- 图像恢复
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
⭐code - Contrastive Semi-Supervised Learning for Underwater Image Restoration via Reliable Bank
⭐code - Burstormer: Burst Image Restoration and Enhancement Transformer
- Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera
⭐code - Generative Diffusion Prior for Unified Image Restoration and Enhancement
- Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration
- Efficient and Explicit Modelling of Image Hierarchies for Image Restoration
- 图像照明
- 图像质量评估
- 去雾
- 去雨
- 去噪
- Masked Image Training for Generalizable Deep Image Denoising
- Spatially Adaptive Self-Supervised Learning for Real-World Image Denoising
⭐code - Real-time Controllable Denoising for Image and Video
- LG-BPN: Local and Global Blind-Patch Network for Self-Supervised Real-World Denoising
⭐code - Efficient View Synthesis and 3D-based Multi-Frame Denoising with Multiplane Feature Representations
- Learning with Noisy labels via Self-supervised Adversarial Noisy Masking去噪
- Learning from Noisy Labels with Decoupled Meta Label Purifier去噪
- 去模糊
- 去鬼影
- 去反射光斑
- 图像缩放
- 瞬间恢复与增强
- 图像增强
- 图像和谐化
- 图像曝光校正
- Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervisio
- PanelNet: Understanding 360 Indoor Environment via Panel Representation
- Ultra-High Resolution Segmentation with Ultra-Rich Context: A Novel Benchmark
⭐code - AutoFocusFormer: Image Segmentation off the Grid
- MP-Former: Mask-Piloted Transformer for Image Segmentation
⭐code - Explicit Visual Prompting for Low-Level Structure Segmentations
⭐code - Focused and Collaborative Feedback Integration for Interactive Image Segmentation
⭐code - FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
在 VIS、VOS、MOTS 三个下游视频分割任务的五个数据集上,将 InstMove 插入到现有 SOTA 模型可以进一步带来 1~5 个点的提升。 - Zero-shot Referring Image Segmentation with Global-Local Context Features
⭐code - Meta Compositional Referring Expression Segmentation
- MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation分割
- 3D分割
- 全景分割
- 实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
⭐code - DoNet: Deep De-overlapping Network for Cytology Instance Segmentation
⭐code - FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
⭐code - SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
⭐code - Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations
⭐code - 弱监督实例分割
- DynaMask: Dynamic Mask Selection for Instance Segmentation
- 语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
⭐code - Continual Semantic Segmentation With Automatic Memory Sample Selection
- Dynamically Instance-Guided Adaptation: A Backward-Free Approach for Test-Time Domain Adaptive Semantic Segmentation
⭐code - Continual Semantic Segmentation with Automatic Memory Sample Selection
- Federated Incremental Semantic Segmentation
⭐code - Delivering Arbitrary-Modal Semantic Segmentation
⭐code - Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation
- [A Simple Framework for Text-Supervised Semantic Segmentation]
在 PASCAL VOC 2012、PASCAL Context 和 COCO 数据集上的表现明显优于之前最先进的方法。 - Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Generative Semantic Segmentation
⭐code - Reliability in Semantic Segmentation: Are We on the Right Track?
⭐code - Both Style and Distortion Matter: Dual-Path Unsupervised Domain Adaptation for Panoramic Semantic Segmentation
- Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
⭐code - Instant Domain Augmentation for LiDAR Semantic Segmentation
🏠project - Leveraging Hidden Positives for Unsupervised Semantic Segmentation
⭐code - Delving into Shape-aware Zero-shot Semantic Segmentation
⭐code - 域适应语义分割
- 域泛化语义分割
- HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
⭐[code](https: //github.com/dingjiansw101/HGFormer)
- HGFormer: Hierarchical Grouping Transformer for Domain Generalized Semantic Segmentation
- 半监督语义分割
- 弱监督语义分割
- 点云语义分割
- 零样本语义分割
- 3D 语义分割
- IFSeg: Image-free Semantic Segmentation via Vision-Language Model
- 交互式分割
- 小样本分割
- InstMove: Instance Motion for Object-centric Video Segmentation
⭐code - MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
- Boosting Video Object Segmentation via Space-time Correspondence Learning
⭐code - Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual GroupingVOS
- Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
⭐code - Two-shot Video Object Segmentation
- InstMove: Instance Motion for Object-centric Video Segmentation
- 场景理解
- FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
- Self-supervised Pre-training with Masked Shape Prediction for 3D Scene Understanding
- Language-driven Open-Vocabulary 3D Scene Understanding
🏠project - CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP
- 抠图
- Multi Domain Learning for Motion Magnification
⭐code - Two-View Geometry Scoring Without Correspondences
🏠project - Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information
⭐code - ScanDMM: A Deep Markov Model of Scanpath Prediction for 360deg Images
⭐code - Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation
- Analyzing Physical Impacts Using Transient Surface Wave Imaging
- Adaptive Global Decay Process for Event Cameras
⭐code - Leveraging Inter-Rater Agreement for Classification in the Presence of Noisy Labels
- Towards Better Gradient Consistency for Neural Signed Distance Functions via Level Set Alignment
⭐code - Swept-Angle Synthetic Wavelength Interferometry
- Shape, Pose, and Appearance From a Single Image via Bootstrapped Radiance Field Inversion
🏠project - Unlearnable Clusters: Towards Label-Agnostic Unlearnable Examples
⭐code - 3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification
- EcoTTA: Memory-Efficient Continual Test-Time Adaptation via Self-Distilled Regularization
- Text-Guided Unsupervised Latent Transformation for Multi-Attribute Image Manipulation
- Minimizing the Accumulated Trajectory Error To Improve Dataset Distillation
⭐code - DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-Aware Scene Synthesis
🏠project - Virtual Occlusions Through Implicit Depth
- StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator
⭐code - Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
⭐code - Inverting the Imaging Process by Learning an Implicit Camera Model
⭐code - Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations
⭐code - GeoLayoutLM: Geometric Pre-training for Visual Information Extraction
⭐code - Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
- Noisy Correspondence Learning with Meta Similarity Correction
- Efficient Multimodal Fusion via Interactive Prompting
- Representing Volumetric Videos as Dynamic MLP Maps
⭐code - Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
- DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks
- EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization
- Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
- A Meta-Learning Approach to Predicting Performance and Data Requirements
- Multimodal Prompting with Missing Modalities for Visual Recognition
⭐code - Masked Images Are Counterfactual Samples for Robust Fine-tuning
- UniHCP: A Unified Model for Human-Centric Perceptions
⭐code - DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
⭐code - Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
- Progressive Open Space Expansion for Open-Set Model Attribution
⭐code - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
⭐code - HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
⭐code - 3D Cinemagraphy from a Single Image
🏠project - Masked Image Modeling with Local Multi-Scale Reconstruction
⭐code - Revisiting Rotation Averaging: Uncertainties and Robust Losses
⭐code - Unifying Layout Generation with a Decoupled Diffusion Model
- Adversarial Counterfactual Visual Explanations
⭐code - Trainable Projected Gradient Method for Robust Fine-tuning
⭐code - Partial Network Cloning
⭐code - Extracting Class Activation Maps from Non-Discriminative Features as well
⭐code - TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization
⭐code - Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive Matrices
⭐code - Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark
⭐code - PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
⭐code - Boundary Unlearning
🏠project - ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals
- VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions
- BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency
- Learning a Depth Covariance Function
⭐code - A Bag-of-Prototypes Representation for Dataset-Level Applications
- CrOC: Cross-View Online Clustering for Dense Visual Representation Learning
⭐code - Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels
⭐code - Marching-Primitives: Shape Abstraction from Signed Distance Function
⭐code - Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
- SIEDOB: Semantic Image Editing by Disentangling Object and Background
- Robust Test-Time Adaptation in Dynamic Scenarios
⭐code - Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck
⭐code - IDGI: A Framework to Eliminate Explanation Noise from Integrated Gradients
- Compacting Binary Neural Networks by Sparse Kernel Selection
- PDPP:Projected Diffusion for Procedure Planning in Instructional Videos
⭐code - Multi-Granularity Archaeological Dating of Chinese Bronze Dings Based on a Knowledge-Guided Relation Graph
⭐code - Quantum Multi-Model Fitting
⭐code - Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
- Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
⭐code - PMatch: Paired Masked Image Modeling for Dense Geometric Matching
⭐code - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
⭐code - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take
- Why is the winner the best?
- Disorder-invariant Implicit Neural Representation
⭐code - HypLiLoc: Towards Effective LiDAR Pose Regression with Hyperbolic Fusion
⭐code - Enhancing Deformable Local Features by Jointly Learning to Detect and Describe Keypoints
🏠project - SMPConv: Self-moving Point Representations for Continuous Convolution
⭐code - VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
⭐code - Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
- Large-capacity and Flexible Video Steganography via Invertible Neural Network
⭐code - SketchXAI: A First Look at Explainability for Human Sketches
⭐code - Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce多模态预训练
- Adaptive Assignment for Geometry Aware Local Feature Matching
⭐code特征匹配 - Hard Patches Mining for Masked Image Modeling
👍CVPR 2023 | HPM:在掩码学习中挖掘困难样本,带来稳固性能提升! - Learning Geometry-aware Representations by Sketching
- MMANet: Margin-aware Distillation and Modality-aware Regularization for Incomplete Multimodal Learning
⭐code多模态 - DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training
⭐code - Investigating the Nature of 3D Generalization in Deep Neural Networks
⭐code - EC^2: Emergent Communication for Embodied Control
- Generalizing Dataset Distillation via Deep Generative Prior
🏠project - Learning Locally Editable Virtual Humans
🏠project - Class-Balancing Diffusion Models
- SFD2: Semantic-guided Feature Detection and Description
⭐code - Deep Graph Reprogramming
- LayoutDM: Transformer-based Diffusion Model for Layout Generation
- ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos