A curated list of recent robot learning papers incorporating diffusion models for manipulation, navigation, planning etc.
- Benchmarks
- Diffusion Policy
- Diffusion Generation Models in Robot Learning
- Robot Learning Utilizing Diffusion Model Properties
-
Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations (RSS 2018)
-
Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning (CoRL 2020)
-
Bridge data: Boosting generalization of robotic skills with cross-domain datasets (RSS 2022)
-
DexMV: Imitation Learning for Dexterous Manipulation from Human Videos (ECCV 2022)
-
Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning (NeurIPS 2022 Datasets and Benchmarks Track)
-
Dexart: Benchmarking generalizable dexterous manipulation with articulated objects (CVPR 2023)
-
BridgeData V2: A Dataset for Robot Learning at Scale (CoRL 2023)
-
CALVIN: A Benchmark for Language-conditioned Policy Learning for Long-horizon Robot Manipulation Tasks (RAL 2022)
-
RLBench: The Robot Learning Benchmark & Learning Environment (RAL 2020)
-
LIBERO: Benchmarking Knowledge Transfer in Lifelong Robot Learning (NeurIPS 2023 Dataset and Benchmark Track)
Visual Pusher
Panda Arm
Dexdeform: Dexterous deformable object manipulation with human demonstrations and differentiable physics (To be checked)
Concealed Backdoor Attack on Diffusion Models for Smart Devices with Non-standard Gaussian Distribution Noise
-
Imitating Human Behaviour with Diffusion Models (ICLR 2023)
-
Se(3)-diffusionfields: Learning cost functions for joint grasp and motion optimization through diffusion (ICRA 2023)
-
Diffusion policy: Visuomotor policy learning via action diffusion (RSS 2023)
-
Goal-conditioned imitation learning using score-based diffusion policies (RSS 2023)
-
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition (CoRL 2023)
-
ChainedDiffuser: Unifying Trajectory Diffusion and Keypose Prediction for Robotic Manipulation (CoRL 2023)
-
Generative Skill Chaining: Long-Horizon Skill Planning with Diffusion Models (CoRL 2023)
-
PlayFusion: Skill Acquisition via Diffusion from Language-Annotated Play (CoRL 2023)
-
Memory-Consistent Neural Networks for Imitation Learning (ICLR 2024)
-
EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning (ICRA 2024)
-
Diffskill: Improving Reinforcement Learning Through Diffusion-Based Skill Denoiser for Robotic Manipulation (Knowledge-Based Systems 2024)
-
Consistency policy: Accelerated visuomotor policies via consistency distillation (RSS 2024)
ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation (3D, consistency model, efficiency)
-
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation (ECCV 2024)
-
Differentiable Robot Rendering (CoRL 2024)
-
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning (CoRL 2024)
-
Equivariant Diffusion Policy (CoRL 2024)
-
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy (CoRL 2024)
-
EquiBot: SIM (3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning (CoRL 2024)
-
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations (CoRL 2024)
-
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations (RSS 2024)
-
RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective (IROS 2024)
-
Vision-Language-Affordance-based Robot Manipulation with Flow Matching (Sep 2024)
Flow Matching Imitation Learning for Multi-Support Manipulation (Flow matching)
FlowPolicy: Enabling Fast and Robust 3D Flow-based Policy via Consistency Flow Matching for Robot Manipulation
ACTIONFLOW: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching
Fast and Robust Visuomotor Riemannian Flow Matching Policy
-
Planning-Guided Diffusion Policy Learning for Generalizable Contact-Rich Bimanual Manipulation (Sep 2024)
-
Diffusion-VLA: Scaling Robot Foundation Models via Unified Diffusion and Autoregression (Dec 2024)
RDT-1B: A DIFFUSION FOUNDATION MODEL FOR BIMANUAL MANIPULATION
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand (Combining openVLA and diffusion policy)
-
XSkill: Cross Embodiment Skill Discovery (CoRL 2023)
-
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution (CVPR 2024)
-
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation (CVPR 2024)
-
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning (Dec 2024)
-
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation (Dec 2024) (uses diffusion transformer)
Is Conditional Generative Modeling all you need for Decision-Making? (Decision Diffuser)
-
Waypoint-Based Imitation Learning for Robotic Manipulation (CoRL 2023)
-
RoLD: Robot Latent Diffusion for Multi-task Policy Modeling ()
-
UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers (CoRL 2024)
Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions (mobile manipulation)
M2Diffuser: Diffusion-based Trajectory Optimization for Mobile Manipulation in 3D Scenes (mobile manipulation)
Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation (mobile manipulation)
- Adaptive Online Replanning with Diffusion Models (NeurIPS 2023)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior (Highly Theoretical)
Cold Diffusion on the Replay Buffer: Learning to Plan from Known Good States
Scaling Diffusion Policy in Transformer to 1 Billion Parameters for Robotic Manipulation
ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos
SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting (Sim2Real)
Movement Primitive Diffusion: Learning Gentle Robotic Manipulation of Deformable Objects (Deformable)
SculptDiff: Learning Robotic Clay Sculpting from Humans with Goal Conditioned Diffusion Policy (3D deformable objects)
RoPotter: Toward Robotic Pottery and Deformable Object Manipulation with Structural Priors (3D deformable, pottery)
Garment Diffusion Models for Robot-Assisted Dressing
Diffusion Co-Policy for Synergistic Human-Robot Collaborative Tasks
PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations
Learning Playing Piano with Bionic-Constrained Diffusion Policy for Anthropomorphic Hand
Adaptive Compliance Policy: Learning Approximate Compliance for Diffusion Guided Control (considering compliance / forces during manipulation)
ForceMimic: Force-Centric Imitation Learning with Force-Motion Capture System for Contact-Rich Manipulation (forces centric)
Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation (considering compliance / forces)
VITaL Pretraining: Visuo-Tactile Pretraining for Tactile and Non-Tactile Manipulation Policies (force, tactile)
TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation (force, tactile)
3D-ViTac: Learning Fine-Grained Manipulation with Visuo-Tactile Sensing (Include tactile)
Admittance Visuomotor Policy Learning for General-Purpose Contact-Rich Manipulations (force, contact-rich)
FoAR: Force-Aware Reactive Policy for Contact-Rich Robotic Manipulatio (force, contact)
Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning (tactile, force)
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation (single task generalization)
GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy (single task generalizability)
DISCO: Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting (zero-shot language-guided diffusion polic)
Learning Generalizable 3D Manipulation With 10 Demonstrations (3D, few-shot generalizability)
JUICER: Data-Efficient Imitation Learning for Robotic Assembly (learn from small human demonstrations)
Diffusion-PbD: Generalizable Robot Programming by Demonstration with Diffusion Features (learn from single demonstration)
ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy (learn from few demonstrations)
Imitation Learning with Limited Actions via Diffusion Planners and Deep Koopman Controllers (learn from few demonstrations)
Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation (3D, object centric, single task generalization)
Affordance-Centric Policy Learning: Sample Efficient and Generalisable Robot Policy Learning using Affordance-Centric Task Frames (object centric)
Motion Before Action: Diffusing Object Motion as Manipulation Condition (object motion)
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation (Tracking object pose)
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation (object centric)
C3DM: Constrained-Context Conditional Diffusion Models for Imitation Learning (tackle spurious correlation)
From Imitation to Refinement – Residual RL for Precise Assembly (distribution shifts)
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress (out-of-distribution scenarios, detect failures)
Subgoal Diffuser: Coarse-to-fine Subgoal Generation to Guide Model Predictive Control for Robot Manipulation (long horizon, subgoal)
MaIL: Improving Imitation Learning with Mamba (using Mamba)
Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models (Mamba)
Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
The Ingredients for Robotic Diffusion Transformers
Diffusion Transformer Policy
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
Legibility Diffuser: Offline Imitation for Intent Expressive Motion (legible motion)
Maximizing Alignment with Minimal Feedback: Efficiently Learning Rewards for Visuomotor Robot Policy Alignment (human preference alignment)
Concealed Backdoor Attack on Diffusion Models for Smart Devices with Non-standard Gaussian Distribution Noise (human preference alignment)
INSTANT POLICY: IN-CONTEXT IMITATION LEARNING VIA GRAPH DIFFUSION (graph diffusion)
FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation (ambiguity)
Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies (adversarial attack)
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization (demo data selection and filter)
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals (learning from multimodal goal specifications with few language annotations)
Learning Diverse Robot Striking Motions with Diffusion Models and Kinematically Constrained Gradient Guidance (agile tasks)
Cutting Sequence Diffuser: Sim-to-Real Transferable Planning for Object Shaping by Grinding (grinding through grinding belt)
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance (Diffusion sub-goals generation)
Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance (DP with contact guidance)
Implicit Contact Diffuser: Sequential Contact Reasoning with Latent Point Cloud Diffusion (contact)
Diff-Control: A Stateful Diffusion-based Policy for Imitation Learning (control net)
- Composable Part-Based Manipulation (CoRL 2023)
FREE FROM BELLMAN COMPLETENESS: TRAJECTORY STITCHING VIA MODEL-BASED RETURN-CONDITIONED SUPERVISED LEARNING
One-Shot Imitation under Mismatched Execution
LANGUAGE CONTROL DIFFUSION: EFFICIENTLY SCALING THROUGH SPACE, TIME, AND TASKS
SDP: Spiking Diffusion Policy for Robotic Manipulation with Learnable Channel-Wise Membrane Thresholds
Diffusion-based learning of contact plans for agile locomotion
DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation (To be checked)
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control (out-of-distribution issue)
The Role of Domain Randomization in Training Diffusion Policies for Whole-Body Humanoid Control
DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation (pose estimation)
-
Learning score-based grasping primitive for human-assisting dexterous grasping (NeurIPS 2023)
-
Reorientdiff: Diffusion model based reorientation for object manipulation (ICRA 2024)
-
DexDiffuser: Generating Dexterous Grasps with Diffusion Models (Feb 2024)
-
Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy (Mar 2024)
Learning Visuotactile Skills with Two Multifingered Hands
DexGrasp-Diffusion: Diffusion-based Unified Functional Grasp Synthesis Method for Multi-Dexterous Robotic Hands
Grasp Diffusion Network: Learning Grasp Generators from Partial Point Clouds with Diffusion Models in SO(3) × R3
DAP: Diffusion-based Affordance Prediction for Multi-modality Storage (storage problem to be accurate)
NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration
DiPPeR: Diffusion-based 2D Path Planner applied on Legged Robots
LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance
DTG : Diffusion-based Trajectory Generation for Mapless Global Navigation
DARE: Diffusion Policy for Autonomous Robot Exploration
FloNa: Floor Plan Guided Embodied Visual Navigation
Learning Wheelchair Tennis Navigation from Broadcast Videos with Domain Knowledge Transfer and Diffusion Motion Planning
FlowNav: Learning Efficient Navigation Policies via Conditional Flow Matching (flow matching)
Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks (Replanning)
LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning
SafeDiffuser: Safe Planning with Diffusion Probabilistic Models
DiffusionSeeder: Seeding Motion Optimization with Diffusion for Rapid Motion Planning (motion planning)
Potential Based Diffusion Motion Planning
RAIL: Reachability-Aided Imitation Learning for Safe Policy Execution (collision free, hard constraints)
Sampling Constrained Trajectories Using Composable Diffusion Models (Trajectory optimization with constraints present)
DiMSam: Diffusion Models as Samplers for Task and Motion Planning under Partial Observability (With constraint)
Motion Planning Diffusion: Learning and Adapting Robot Motion Planning with Diffusion Models
DroneDiffusion: Robust Quadrotor Dynamics Learning with Diffusion Models (Drones)
-
DALL-E-Bot: DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics (RA-Letters 2023)
-
UniPi: Learning Universal Policies via Text-Guided Video Generation (NeurIPS 2023)
-
AVDC: Learning to Act from Actionless Videos through Dense Correspondences (ICLR 2024)
-
UniSim: UniSim: Learning Interactive Real-World Simulators (ICLR 2024)
-
HiP: Compositional Foundation Models for Hierarchical Planning (NeurIPS 2023)
-
DMD: Diffusion Meets DAgger: Supercharging Eye-in-hand Imitation Learning (Feb 2024)
-
VLP: Video language planning (ICLR 2024)
-
Dreamitate: Dreamitate: Real-World Visuomotor Policy Learning via Video Generation (CoRL 2024)
-
ARDuP: ARDuP: Active Region Video Diffusion for Universal Policies (Jun 2024)
-
This&That: This&That: Language-Gesture Controlled Video Generation for Robot Planning (July 2024)
-
RoboDreamer: RoboDreamer: Learning Compositional World Models for Robot Imagination (ICML 2024)
-
CLOVER: Closed-Loop Visuomotor Control with Generative Expectation for Robotic Manipulation (NeurIPS 2024)
-
SOAR: Autonomous Improvement of Instruction Following Skills via Foundation Models (CoRL 2024)
-
Cacti: Cacti: A framework for scalable multi-task multi-scene visual imitation learning (CoRL 2022 Workshop PRL)
-
GenAug: GenAug: Retargeting behaviors to unseen situations via Generative Augmentation
GR-MG: Leveraging Partially-Annotated Data via Multi-Modal Goal-Conditioned Policy (generate goal image)
Generative Image as Action Models
Scaling Robot Learning with Semantically Imagined Experience
Large-Scale Actionless Video Pre-Training via Discrete Diffusion for Efficient Policy Learning
Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning
3D-VLA: A 3D Vision-Language-Action Generative World Model
Learning Visual Parkour from Generated Images
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation (inverse dynamics, distillation from video to action generation)
IMAGINATION POLICY: Using Generative Point Cloud Models for Learning Manipulation Policies (3D generation, goal-conditioned)
Embodiment-agnostic Action Planning via Object-Part Scene Flow (generate object part flow)
Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks (generate image subgoal)
IRASim: Learning Interactive Real-Robot Action Simulators arXiv 2024.6
Structured World Models from Human Videos RSS 2023
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator ICIP 2022
DayDreamer: World Models for Physical Robot Learning CoRL 2022
MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
PoCo: Policy Composition from and for Heterogeneous Robot Learning (To be checked)
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion (To be checked)
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
Imitation Learning from Purified Demonstrations (using forward & reverse diffusion process to purify imperfect demonstrations)
Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition (a diffusion-model-based assistive agent to learn how to assist humans in collecting data in a shared control manner)
P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies (using DIFT for point correspondence)