Skip to content

Latest commit

 

History

History
740 lines (507 loc) · 119 KB

README.md

File metadata and controls

740 lines (507 loc) · 119 KB

(Source: Make-A-Video, SimDA, PYoCo, SVD , Video LDM and Tune-A-Video)

  • [News] We are planning to update the survey soon to encompass the latest work. If you have any suggestions, please feel free to contact us.
  • [News] The Chinese translation is available on Zhihu. Special thanks to Dai-Wenxun for this.

Open-source Toolboxes and Foundation Models

Methods Task Github
Open-Sora-Plan T2V Generation Star
Open-Sora T2V Generation Star
Morph Studio T2V Generation -
Genie T2V Generation -
Sora T2V Generation & Editing -
VideoPoet T2V Generation & Editing -
Stable Video Diffusion T2V Generation Star
NeverEnds T2V Generation -
Pika T2V Generation -
EMU-Video T2V Generation -
GEN-2 T2V Generation & Editing -
ModelScope T2V Generation Star
ZeroScope T2V Generation -
T2V Synthesis Colab T2V Genetation Star
VideoCraft T2V Genetation & Editing Star
Diffusers (T2V synthesis) T2V Genetation -
AnimateDiff Personalized T2V Genetation Star
Text2Video-Zero T2V Genetation Star
HotShot-XL T2V Genetation Star
Genmo T2V Genetation -
Fliki T2V Generation -

Table of Contents

Video Generation

Data

Caption-level

Title arXiv Github WebSite Pub. & Date
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation arXiv Star Website Jun., 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers arXiv Star Website CVPR, 2024
CelebV-Text: A Large-Scale Facial Text-Video Dataset arXiv Star - CVPR, 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation arXiv Star - May, 2023
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation arXiv - - May, 2023
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions arXiv - - Nov, 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval arXiv - - ICCV, 2021
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language arXiv - - CVPR, 2016

Category-level

Title arXiv Github WebSite Pub. & Date
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild arXiv - - Dec., 2012
First Order Motion Model for Image Animation arXiv - - May, 2023
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks arXiv - - CVPR,2018

Metric and BenchMark

Title arXiv Github WebSite Pub. & Date
ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation arXiv Star Website Jun., 2024
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models arXiv Star - ICLR, 2024
Subjective-Aligned Dateset and Metric for Text-to-Video Quality Assessment arXiv - - Mar, 2024
Towards A Better Metric for Text-to-Video Generation arXiv - Website Jan, 2024
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI arXiv - - Jan, 2024
VBench: Comprehensive Benchmark Suite for Video Generative Models arXiv Star Website Nov, 2023
FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation arXiv - - NeurIPS, 2023
CVPR 2023 Text Guided Video Editing Competition arXiv - - Oct., 2023
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models arXiv Star Website Oct., 2023
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset arXiv - - Sep., 2023

Text-to-Video Generation

Training-based

Title arXiv Github WebSite Pub. & Date
Grid Diffusion Models for Text-to-Video Generation arXiv Star Website CVPR, 2024
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators arXiv Star Website Apr., 2024
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework arXiv - - Mar., 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis arXiv - - Mar., 2024
Genie: Generative Interactive Environments arXiv - Website Feb., 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis arXiv - Website Feb., 2024
Lumiere: A Space-Time Diffusion Model for Video Generation arXiv - Website Jan, 2024
UNIVG: TOWARDS UNIFIED-MODAL VIDEO GENERATION arXiv - Website Jan, 2024
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models arXiv Star Website Jan, 2024
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model arXiv - Website Jan, 2024
MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation arXiv - Website Jan, 2024
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM arXiv - Website Jan, 2024
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos arXiv Star Website Dec, 2023
InstructVideo: Instructing Video Diffusion Models with Human Feedback arXiv Star Website Dec, 2023
VideoLCM: Video Latent Consistency Model arXiv - - Dec, 2023
Photorealistic Video Generation with Diffusion Models arXiv - Website Dec, 2023
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation arXiv Star Website Dec, 2023
Delving Deep into Diffusion Transformers for Image and Video Generation arXiv - Website Dec, 2023
StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter arXiv Star Website Nov, 2023
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation arXiv - Website Nov, 2023
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models arXiv Star Website Nov, 2023
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets arXiv Star Website Nov, 2023
FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline arXiv Star Website Nov, 2023
MoVideo: Motion-Aware Video Generation with Diffusion Models arXiv - Website Nov, 2023
Make Pixels Dance: High-Dynamic Video Generation arXiv - Website Nov, 2023
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning arXiv - Website Nov, 2023
Optimal Noise pursuit for Augmenting Text-to-Video Generation arXiv - - Nov, 2023
VideoDreamer: Customized Multi-Subject Text-to-Video Generation with Disen-Mix Finetuning arXiv - Website Nov, 2023
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation arXiv Star Website Oct, 2023
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction arXiv Star Website Oct, 2023
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors arXiv Star Website Oct., 2023
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation arXiv Star Website Oct., 2023
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model arXiv Star Website Oct, 2023
MotionDirector: Motion Customization of Text-to-Video Diffusion Models arXiv Star Website Oct, 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning arXiv Star Website Sep., 2023
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation arXiv Star Website Sep., 2023
LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models arXiv Star Website Sep., 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation arXiv Star Website Sep., 2023
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation arXiv - Website Sep., 2023
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text arXiv - - Jul., 2023
Text2Performer: Text-Driven Human Video Generation arXiv Star Website Apr., 2023
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning arXiv Star Website Jul., 2023
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models arXiv - Website Aug., 2023
SimDA: Simple Diffusion Adapter for Efficient Video Generation arXiv Star Website CVPR, 2024
Dual-Stream Diffusion Net for Text-to-Video Generation arXiv - - Aug., 2023
ModelScope Text-to-Video Technical Report arXiv Star Website Aug., 2023
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation arXiv Star - Jul., 2023
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation arXiv - - May, 2023
Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models arXiv - Website May, 2023
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models arXiv - Website -
Latent-Shift: Latent Diffusion with Temporal Shift arXiv - Website -
Probabilistic Adaptation of Text-to-Video Models arXiv - Website Jun., 2023
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation arXiv - Website Mar., 2023
ED-T2V: An Efficient Training Framework for Diffusion-based Text-to-Video Generation - - - IJCNN, 2023
MagicVideo: Efficient Video Generation With Latent Diffusion Models arXiv - Website -
Phenaki: Variable Length Video Generation From Open Domain Textual Description arXiv - Website -
Imagen Video: High Definition Video Generation With Diffusion Models arXiv - Website -
VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation arXiv Star Website -
MAGVIT: Masked Generative Video Transformer arXiv - Website Dec., 2022
Make-A-Video: Text-to-Video Generation without Text-Video Data arXiv - Website -
Latent Video Diffusion Models for High-Fidelity Video Generation With Arbitrary Lengths arXiv Star Website Nov., 2022
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers arXiv Star - May, 2022
Video Diffusion Models arXiv - Website -

Training-free

Title arXiv Github WebSite Pub. & Date
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models arXiv Star Website Mar, 2024
TRAILBLAZER: TRAJECTORY CONTROL FOR DIFFUSION-BASED VIDEO GENERATION arXiv Star Website Jan, 2024
FreeInit: Bridging Initialization Gap in Video Diffusion Models arXiv Star Website Dec, 2023
MTVG : Multi-text Video Generation with Text-to-Video Models arXiv - Website Dec, 2023
F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis arXiv - - Nov, 2023
AdaDiff: Adaptive Step Selection for Fast Diffusion arXiv - - Nov, 2023
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax arXiv Star Website Nov, 2023
🏀GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning arXiv Star Website Nov, 2023
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling arXiv Star Website Oct, 2023
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation arXiv Star Website Oct, 2023
LLM-grounded Video Diffusion Models arXiv Star Website Oct, 2023
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator arXiv Star - NeurIPS, 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis arXiv Star Website Aug, 2023
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation arXiv Star - May, 2023
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators arXiv Star Website Mar., 2023
PEEKABOO: Interactive Video Generation via Masked-Diffusion 🫣 arXiv Star Website CVPR, 2024

Video Generation with other conditions

Pose-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance arXiv Star Website Mar., 2024
Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions arXiv - - Mar., 2024
Do You Guys Want to Dance: Zero-Shot Compositional Human Dance Generation with Multiple Persons arXiv - - Jan., 2024
DreaMoving: A Human Dance Video Generation Framework based on Diffusion Models arXiv - Website Dec., 2023
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model arXiv Star Website Nov., 2023
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation arXiv Star Website Nov., 2023
MagicDance: Realistic Human Dance Video Generation with Motions & Facial Expressions Transfer arXiv Star Website Nov., 2023
DisCo: Disentangled Control for Referring Human Dance Generation in Real World arXiv Star Website Jul., 2023
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model arXiv - - Aug., 2023
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion arXiv Star Website Apr., 2023
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos arXiv Star Website Apr., 2023

Motion-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance arXiv Star Website Mar., 2024
Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling arXiv - - Jan., 2024
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation arXiv - - Jan., 2024
Customizing Motion in Text-to-Video Diffusion Models arXiv - Website Dec., 2023
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models arXiv Star Website CVPR 2024
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance arXiv Star Website Nov., 2023
Motion-Conditioned Diffusion Model for Controllable Video Synthesis arXiv - Website Apr., 2023
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory arXiv - - Aug., 2023

Sound-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation arXiv Star Website Jun., 2024
Context-aware Talking Face Video Generation arXiv - - Feb., 2024
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions arXiv Star Website Feb., 2024
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion arXiv - - ICCV, 2023
Generative Disco: Text-to-Video Generation for Music Visualization arXiv - - Apr., 2023
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion arXiv - - CVPRW, 2023

Image-guided Video Generation

Title arXiv Github WebSite Pub. & Date
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models arXiv Star Website CVPR 2024
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation arXiv - Website Mar., 2024
AtomoVideo: High Fidelity Image-to-Video Generation arXiv - Website Mar., 2024
Animated Stickers: Bringing Stickers to Life with Video Diffusion arXiv - - Feb., 2024
CONSISTI2V: Enhancing Visual Consistency for Image-to-Video Generation arXiv - Website Feb., 2024
I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models arXiv - - Dec., 2023
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models arXiv - Website Dec., 2023
DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance arXiv - Website Nov., 2023
LivePhoto: Real Image Animation with Text-guided Motion Control arXiv Star Website Nov., 2023
VideoBooth: Diffusion-based Video Generation with Image Prompts arXiv Star Website Nov., 2023
Decouple Content and Motion for Conditional Image-to-Video Generation arXiv - - Nov, 2023
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models arXiv - - Nov, 2023
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image arXiv - - MM, 2023
Generative Image Dynamics arXiv - Website Sep., 2023
LaMD: Latent Motion Diffusion for Video Generation arXiv - - Apr., 2023
Conditional Image-to-Video Generation with Latent Flow Diffusion Models arXiv Star - CVPR 2023
NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis arXiv Star Website CVPR 2022

Brain-guided Video Generation

Title arXiv Github WebSite Pub. & Date
NeuroCine: Decoding Vivid Video Sequences from Human Brain Activties arXiv - - Feb., 2024
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity arXiv Star Website NeurIPS, 2023

Depth-guided Video Generation

Title arXiv Github WebSite Pub. & Date
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation arXiv Star Website Jul., 2023
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance arXiv Star Website Jun., 2023

Multi-modal guided Video Generation

Title arXiv Github WebSite Pub. & Date
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control arXiv - - Mar., 2024
Magic-Me: Identity-Specific Video Customized Diffusion arXiv - Website Feb., 2024
InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions arXiv - Website Feb., 2024
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion arXiv - Website Feb., 2024
Boximator: Generating Rich and Controllable Motions for Video Synthesis arXiv - Website Feb., 2024
AnimateLCM: Accelerating the Animation of Personalized Diffusion Models and Adapters with Decoupled Consistency Learning arXiv - - Jan., 2024
ActAnywhere: Subject-Aware Video Background Generation arXiv - Website Jan., 2024
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects arXiv - - Jan., 2024
MoonShot: Towards Controllable Video Generation and Editing with Multimodal Conditions arXiv Star Website Jan., 2024
PEEKABOO: Interactive Video Generation via Masked-Diffusion arXiv - Website Dec., 2023
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling arXiv - - Dec., 2023
Fine-grained Controllable Video Generation via Object Appearance and Context arXiv - Website Nov., 2023
GPT4Video: A Unified Multimodal Large Language Model for Instruction-Followed Understanding and Safety-Aware Generation arXiv - Website Nov., 2023
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving arXiv - Website Nov., 2023
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models arXiv - Website Nov., 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability arXiv Star Website Jun., 2023
NExT-GPT: Any-to-Any Multimodal LLM arXiv - - Sep, 2023
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images arXiv - Website Jun, 2023
Any-to-Any Generation via Composable Diffusion arXiv Star Website May, 2023
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation arXiv Star - CVPR 2023

Unconditional Video Generation

U-Net based

Title arXiv Github WebSite Pub. & Date
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation arXiv Star Website Feb. 2024
Video Probabilistic Diffusion Models in Projected Latent Space arXiv Star Website CVPR 2023
VIDM: Video Implicit Diffusion Models arXiv Star Website AAAI 2023
GD-VDM: Generated Depth for better Diffusion-based Video Generation arXiv Star - Jun., 2023
LEO: Generative Latent Image Animator for Human Video Synthesis arXiv Star Website May., 2023

Transformer based

Title arXiv Github WebSite Pub. & Date
Latte: Latent Diffusion Transformer for Video Generation arXiv Star Website Jan., 2024
VDT: An Empirical Study on Video Diffusion with Transformers arXiv Star - May, 2023
Long Video Generation with Time-Agnostic VQGAN and Time-Sensitive Transformer arXiv Star Website May, 2023

Video Completion

Video Enhancement and Restoration

Title arXiv Github WebSite Pub. & Date
Towards Language-Driven Video Inpainting via Multimodal Large Language Models arXiv Star Website Jan., 2024
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution - - - WACW, 2023
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution arXiv Star Website Dec., 2023
AVID: Any-Length Video Inpainting with Diffusion Model arXiv Star Website Dec., 2023
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution arXiv Star - CVPR 2023
LDMVFI: Video Frame Interpolation with Latent Diffusion Models arXiv - - Mar., 2023
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming arXiv - - Nov., 2022
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos arXiv - - May., 2023

Video Prediction

Title arXiv Github Website Pub. & Date
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction arXiv Star Website Jun, 2024
STDiff: Spatio-temporal Diffusion for Continuous Stochastic Video Prediction arXiv Star - Dec, 2023
Video Diffusion Models with Local-Global Context Guidance arXiv Star - IJCAI, 2023
Seer: Language Instructed Video Prediction with Latent Diffusion Models arXiv - Website Mar., 2023
MaskViT: Masked Visual Pre-Training for Video Prediction arXiv Star Website Jun, 2022
Diffusion Models for Video Prediction and Infilling arXiv Star Website TMLR 2022
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation arXiv Star Website NeurIPS 2022
Diffusion Probabilistic Modeling for Video Generation arXiv Star - Mar., 2022
Flexible Diffusion Modeling of Long Videos arXiv Star Website May, 2022
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models arXiv Star Website May, 2023

Video Editing

General Editing Model

Title arXiv Github Website Pub. Date
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing arXiv Star Website Jun, 2024
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation arXiv - - Mar., 2024
FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing arXiv - - Mar., 2024
DreamMotion: Space-Time Self-Similarity Score Distillation for Zero-Shot Video Editing arXiv - Website Mar, 2024
Video Editing via Factorized Diffusion Distillation arXiv - - Mar, 2024
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis arXiv Star Website Dec, 2023
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers arXiv - Website Dec, 2023
Neutral Editing Framework for Diffusion-based Video Editing arXiv - Website Dec, 2023
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence arXiv - Website Nov, 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models arXiv Star Website Nov, 2023
Motion-Conditioned Image Animation for Video Editing arXiv - Website Nov, 2023
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation arXiv - - Sep, 2023
MagicEdit: High-Fidelity and Temporally Coherent Video Editing arXiv - - Aug, 2023
Edit Temporal-Consistent Videos with Image Diffusion Model arXiv - - Aug, 2023
Structure and Content-Guided Video Synthesis With Diffusion Models arXiv - Website ICCV, 2023
Dreamix: Video Diffusion Models Are General Video Editors arXiv - Website Feb, 2023

Training-free Editing Model

Title arXiv Github Website Pub. Date
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing arXiv Star Website Jun, 2024
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing arXiv Star Website March, 2024
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing arXiv - Website Feb, 2024
Object-Centric Diffusion for Efficient Video Editing arXiv - - Jan, 2024
RealCraft: Attention Control as A Solution for Zero-shot Long Video Editing arXiv - - Dec, 2023
VidToMe: Video Token Merging for Zero-Shot Video Editing arXiv Star Website Dec, 2023
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing arXiv Star Website Dec, 2023
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators arXiv Star - Dec, 2023
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models arXiv Star Website Dec, 2023
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models arXiv - Website Nov., 2023
Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion arXiv - - Nov., 2023
FastBlend: a Powerful Model-Free Toolkit Making Video Stylization Easier arXiv Star - Oct., 2023
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation arXiv - - Nov., 2023
Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models arXiv - - Oct., 2023
LOVECon: Text-driven Training-Free Long Video Editing with ControlNet arXiv Star - Oct., 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing arXiv - Website Oct., 2023
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models arXiv Star Website ICLR, 2024
MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance arXiv - - Aug., 2023
EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints arXiv - - Aug., 2023
ControlVideo: Training-free Controllable Text-to-Video Generation arXiv Star - May, 2023
TokenFlow: Consistent Diffusion Features for Consistent Video Editing arXiv Star Website Jul., 2023
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing arXiv - Website Jun., 2023
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation arXiv - Website Jun., 2023
Zero-Shot Video Editing Using Off-the-Shelf Image Diffusion Models arXiv Star Website Mar., 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing arXiv Star Website Mar., 2023
Pix2video: Video Editing Using Image Diffusion arXiv - Website Mar., 2023
InFusion: Inject and Attention Fusion for Multi Concept Zero Shot Text based Video Editing arXiv - Website Aug., 2023
Gen-L-Video: Multi-Text to Long Video Generation via Temporal Co-Denoising arXiv Star Website May, 2023

One-shot Editing Model

Title arXiv Github Website Pub. & Date
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models arXiv - Website Feb., 2024
MotionCrafter: One-Shot Motion Customization of Diffusion Models arXiv Star - Dec., 2023
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing arXiv - Website Dec., 2023
MotionEditor: Editing Video Motion via Content-Aware Diffusion arXiv Star Website CVPR, 2024
Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning arXiv - Website Nov., 2023
Cut-and-Paste: Subject-Driven Video Editing with Attention Control arXiv - - Nov, 2023
StableVideo: Text-driven Consistency-aware Diffusion Video Editing arXiv Star Website ICCV, 2023
Shape-aware Text-driven Layered Video Editing arXiv - - CVPR, 2023
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-guided Video Editing arXiv Star - May, 2023
Towards Consistent Video Editing with Text-to-Image Diffusion Models arXiv - - Mar., 2023
Edit-A-Video: Single Video Editing with Object-Aware Consistency arXiv - Website Mar., 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation arXiv Star Website ICCV, 2023
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing arXiv Star Website May, 2023
Video-P2P: Video Editing with Cross-attention Control arXiv Star Website Mar., 2023
SinFusion: Training Diffusion Models on a Single Image or Video arXiv Star Website Nov., 2022

Instruct-guided Video Editing

Title arXiv Github Website Pub. Date
VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing arXiv Star Website Jun, 2024
EffiVED:Efficient Video Editing via Text-instruction Diffusion Models arXiv - - Mar, 2024
Fairy: Fast Parallellized Instruction-Guided Video-to-Video Synthesis arXiv - Website Dec, 2023
Neural Video Fields Editing arXiv Star Website Dec, 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models arXiv Star Website Nov, 2023
Consistent Video-to-Video Transfer Using Synthetic Dataset arXiv - - Nov., 2023
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions arXiv - - May, 2023
Collaborative Score Distillation for Consistent Visual Synthesis arXiv - - July, 2023

Motion-guided Video Editing

Title arXiv Github Website Pub. Date
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation arXiv Star Website Nov, 2023
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction arXiv - Website Nov, 2023
DragVideo: Interactive Drag-style Video Editing arXiv Star - Nov, 2023
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet arXiv - Website July, 2023

Sound-guided Video Editing

Title arXiv Github Website Pub. Date
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model arXiv - - May., 2023
Soundini: Sound-Guided Diffusion for Natural Video Editing arXiv Star Website Apr., 2023

Multi-modal Control Editing Model

Title arXiv Github Website Pub. Date
AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks - Star Website Dec, 2023
Motionshop: An application of replacing the characters in video with 3D avatars - Star Website Dec, 2023
Anything in Any Scene: Photorealistic Video Object Insertion arXiv Star Website Jan, 2024
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion arXiv Star Website Dec, 2023
MagicStick: Controllable Video Editing via Control Handle Transformations arXiv Star Website Nov, 2023
SAVE: Protagonist Diversification with Structure Agnostic Video Editing arXiv - Website Nov, 2023
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation arXiv - - May, 2023
CCEdit: Creative and Controllable Video Editing via Diffusion Models arXiv - - Sep, 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts arXiv Star Website May, 2023

Domain-specific Editing Model

Title arXiv Github Website Pub. Date
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation arXiv - Website Jan. 2024
Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models arXiv - Website Jan. 2024
TRAINING-FREE SEMANTIC VIDEO COMPOSITION VIA PRE-TRAINED DIFFUSION MODEL arXiv - - Jan, 2024
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models arXiv - Website CVPR 2023
Multimodal-driven Talking Face Generation via a Unified Diffusion-based Generator arXiv - - May, 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis arXiv - - Aug, 2023
Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer arXiv Star - May, 2023
Instruct-Video2Avatar: Video-to-Avatar Generation with Instructions arXiv Star - Jun, 2023
Video Colorization with Pre-trained Text-to-Image Diffusion Models arXiv Star Website Jun, 2023
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding arXiv Star Website CVPR 2023

Non-diffusion Editing model

Title arXiv Github Website Pub. Date
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing arXiv - Website Oct., 2023
INVE: Interactive Neural Video Editing arXiv - Website Jul., 2023
Shape-Aware Text-Driven Layered Video Editing arXiv - Website Jan., 2023

Video Understanding

Title arXiv Github Website Pub. Date
EchoReel: Enhancing Action Generation of Existing Video Diffusion Modelsl arXiv - - Mar., 2024
VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model arXiv - - Mar., 2024
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion arXiv - - Mar., 2024
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models arXiv - - Mar., 2024
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation arXiv - - Mar., 2024
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction arXiv - - Mar., 2024
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval arXiv - - Jan., 2024
Diffusion Reward: Learning Rewards via Conditional Video Diffusion arXiv Star Website Dec., 2023
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models arXiv - Website Nov., 2023
Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models arXiv Star - Nov., 2023
Flow-Guided Diffusion for Video Inpainting arXiv Star - Nov., 2023
Breathing Life Into Sketches Using Text-to-Video Priors arXiv - - Nov., 2023
Infusion: Internal Diffusion for Video Inpainting arXiv - - Nov., 2023
DiffusionVMR: Diffusion Model for Video Moment Retrieval arXiv - - Aug., 2023
DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose Estimation arXiv - - Aug., 2023
CoTracker: It is Better to Track Together arXiv Star Website Aug., 2023
Unsupervised Video Anomaly Detection with Diffusion Models Conditioned on Compact Motion Representations arXiv - - ICIAP, 2023
Exploring Diffusion Models for Unsupervised Video Anomaly Detection arXiv - - Apr., 2023
Multimodal Motion Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection arXiv - - ICCV, 2023
Diffusion Action Segmentation arXiv - - Mar., 2023
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion arXiv Star Website Mar., 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model arXiv Star - ICCV, 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real arXiv Star Website Jul., 2023
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition arXiv Star Website Feb., 2023
Refined Semantic Enhancement Towards Frequency Diffusion for Video Captioning arXiv - - Nov., 2022
A Generalist Framework for Panoptic Segmentation of Images and Videos arXiv Star Website Oct., 2022
DAVIS: High-Quality Audio-Visual Separation with Generative Diffusion Models arXiv - - Jul., 2023
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming arXiv - - Mar., 2023
Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition arXiv - - Jul., 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos arXiv Star - CVPR 2023

Contact

If you have any suggestions or find our work helpful, feel free to contact us

Homepage: Zhen Xing

Email: [email protected]

If you find our survey is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{vdmsurvey,
  title={A Survey on Video Diffusion Models},
  author={Zhen Xing and Qijun Feng and Haoran Chen and Qi Dai and Han Hu and Hang Xu and Zuxuan Wu and Yu-Gang Jiang}, 
  journal={arXiv preprint arXiv:2310.10647},
  year={2023}
}