Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnimateDiff Video to Video #6328

Merged
merged 59 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
df1b6c4
begin animatediff img2video and video2video
a-r-r-o-w Dec 25, 2023
4be3068
revert animatediff to original implementation
a-r-r-o-w Dec 26, 2023
06b427f
add img2video as pipeline
a-r-r-o-w Dec 26, 2023
aaf9194
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Dec 26, 2023
2bc77c6
update
a-r-r-o-w Dec 27, 2023
d0b3893
add vid2vid pipeline
a-r-r-o-w Dec 27, 2023
466d92a
update imports
a-r-r-o-w Dec 31, 2023
fc815c8
update
a-r-r-o-w Dec 31, 2023
315daad
remove copied from line for check_inputs
a-r-r-o-w Dec 31, 2023
cc55f3d
update
a-r-r-o-w Dec 31, 2023
d7a85be
update examples
a-r-r-o-w Dec 31, 2023
7dd73ac
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Dec 31, 2023
a831a5e
add multi-batch support
a-r-r-o-w Jan 3, 2024
b5b5a3a
fix __init__.py files
a-r-r-o-w Jan 3, 2024
8bb0855
move img2vid to community
a-r-r-o-w Jan 3, 2024
26d3145
update community readme and examples
a-r-r-o-w Jan 3, 2024
4038d9a
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 3, 2024
3196a79
fix
a-r-r-o-w Jan 3, 2024
5a4f2ee
make fix-copies
a-r-r-o-w Jan 3, 2024
7fad71a
add vid2vid batch params
a-r-r-o-w Jan 4, 2024
71e8770
apply suggestions from review
a-r-r-o-w Jan 4, 2024
068e9d7
add test for animatediff vid2vid
a-r-r-o-w Jan 4, 2024
da4c308
torch.stack -> torch.cat
a-r-r-o-w Jan 4, 2024
be2bb21
make style
a-r-r-o-w Jan 4, 2024
43b4410
docs for vid2vid
a-r-r-o-w Jan 4, 2024
4ce5bae
update
a-r-r-o-w Jan 4, 2024
f895be8
fix prepare_latents
a-r-r-o-w Jan 4, 2024
2cb3267
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 4, 2024
2b0533d
fix docs
a-r-r-o-w Jan 4, 2024
cf2b1b3
remove img2vid
a-r-r-o-w Jan 9, 2024
f6f4079
update README to :main
a-r-r-o-w Jan 9, 2024
193edcd
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 9, 2024
b1c5db9
remove slow test
a-r-r-o-w Jan 10, 2024
3fc8623
refactor pipeline output
a-r-r-o-w Jan 10, 2024
817b44e
update docs
a-r-r-o-w Jan 10, 2024
caa423e
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 10, 2024
6543079
update docs
a-r-r-o-w Jan 11, 2024
df602b3
merge community readme from :main
a-r-r-o-w Jan 11, 2024
897dfd2
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 11, 2024
5042b7d
final fix i promise
a-r-r-o-w Jan 11, 2024
9a2d8ba
add support for url in animatediff example
a-r-r-o-w Jan 12, 2024
78fa5a8
update example
a-r-r-o-w Jan 12, 2024
967fb6c
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 12, 2024
fe871f7
update callbacks to latest implementation
a-r-r-o-w Jan 17, 2024
600e414
Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video…
a-r-r-o-w Jan 17, 2024
254ea67
Update src/diffusers/pipelines/animatediff/pipeline_animatediff_video…
a-r-r-o-w Jan 17, 2024
c4bf30c
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 17, 2024
ed37cae
fix merge
a-r-r-o-w Jan 17, 2024
6b84aef
Apply suggestions from code review
patrickvonplaten Jan 19, 2024
1c645ed
remove callback and callback_steps as suggested in review
a-r-r-o-w Jan 19, 2024
54b21c0
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 19, 2024
fdbb68f
Update tests/pipelines/animatediff/test_animatediff_video2video.py
a-r-r-o-w Jan 19, 2024
39a7628
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 23, 2024
5674a71
fix import error caused due to unet refactor in #6630
a-r-r-o-w Jan 23, 2024
032c24f
fix numpy import error after tensor2vid refactor in #6626
a-r-r-o-w Jan 23, 2024
41ac862
make fix-copies
a-r-r-o-w Jan 23, 2024
c3a70eb
fix numpy error
a-r-r-o-w Jan 23, 2024
8b820a0
fix progress bar test
a-r-r-o-w Jan 23, 2024
872dee6
Merge branch 'main' into animatediff-img2video
a-r-r-o-w Jan 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions docs/source/en/api/pipelines/animatediff.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,16 @@ The abstract of the paper is the following:
| Pipeline | Tasks | Demo
|---|---|:---:|
| [AnimateDiffPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff.py) | *Text-to-Video Generation with AnimateDiff* |
| [AnimateDiffVideoToVideoPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/animatediff/pipeline_animatediff_video2video.py) | *Video-to-Video Generation with AnimateDiff* |

## Available checkpoints

Motion Adapter checkpoints can be found under [guoyww](https://huggingface.co/guoyww/). These checkpoints are meant to work with any model based on Stable Diffusion 1.4/1.5.

## Usage example

### AnimateDiffPipeline

AnimateDiff works with a MotionAdapter checkpoint and a Stable Diffusion model checkpoint. The MotionAdapter is a collection of Motion Modules that are responsible for adding coherent motion across image frames. These modules are applied after the Resnet and Attention blocks in Stable Diffusion UNet.

The following example demonstrates how to use a *MotionAdapter* checkpoint with Diffusers for inference based on StableDiffusion-1.4/1.5.
Expand Down Expand Up @@ -98,6 +101,114 @@ AnimateDiff tends to work better with finetuned Stable Diffusion models. If you

</Tip>

### AnimateDiffVideoToVideoPipeline

AnimateDiff can also be used to generate visually similar videos or enable style/character/background or other edits starting from an initial video, allowing you to seamlessly explore creative possibilities.

```python
import imageio
import requests
import torch
from diffusers import AnimateDiffVideoToVideoPipeline, DDIMScheduler, MotionAdapter
from diffusers.utils import export_to_gif
from io import BytesIO
from PIL import Image

# Load the motion adapter
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
# load SD 1.5 based finetuned model
model_id = "SG161222/Realistic_Vision_V5.1_noVAE"
pipe = AnimateDiffVideoToVideoPipeline.from_pretrained(model_id, motion_adapter=adapter, torch_dtype=torch.float16)
scheduler = DDIMScheduler.from_pretrained(
model_id,
subfolder="scheduler",
clip_sample=False,
timestep_spacing="linspace",
beta_schedule="linear",
steps_offset=1,
)
pipe.scheduler = scheduler

# enable memory savings
pipe.enable_vae_slicing()
pipe.enable_model_cpu_offload()

# helper function to load videos
def load_video(file_path: str):
images = []

if file_path.startswith(('http://', 'https://')):
# If the file_path is a URL
response = requests.get(file_path)
response.raise_for_status()
content = BytesIO(response.content)
vid = imageio.get_reader(content)
else:
# Assuming it's a local file path
vid = imageio.get_reader(file_path)

for frame in vid:
pil_image = Image.fromarray(frame)
images.append(pil_image)

return images

video = load_video("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif")

output = pipe(
video = video,
prompt="panda playing a guitar, on a boat, in the ocean, high quality",
negative_prompt="bad quality, worse quality",
guidance_scale=7.5,
num_inference_steps=25,
strength=0.5,
generator=torch.Generator("cpu").manual_seed(42),
)
frames = output.frames[0]
export_to_gif(frames, "animation.gif")
```

Here are some sample outputs:

<table>
<tr>
<th align=center>Source Video</th>
<th align=center>Output Video</th>
</tr>
<tr>
<td align=center>
raccoon playing a guitar
<br />
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-1.gif"
alt="racoon playing a guitar"
style="width: 300px;" />
</td>
<td align=center>
panda playing a guitar
<br/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-output-1.gif"
alt="panda playing a guitar"
style="width: 300px;" />
</td>
</tr>
<tr>
<td align=center>
closeup of margot robbie, fireworks in the background, high quality
<br />
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-input-2.gif"
alt="closeup of margot robbie, fireworks in the background, high quality"
style="width: 300px;" />
</td>
<td align=center>
closeup of tony stark, robert downey jr, fireworks
<br/>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/animatediff-vid2vid-output-2.gif"
alt="closeup of tony stark, robert downey jr, fireworks"
style="width: 300px;" />
</td>
</tr>
</table>

## Using Motion LoRAs

Motion LoRAs are a collection of LoRAs that work with the `guoyww/animatediff-motion-adapter-v1-5-2` checkpoint. These LoRAs are responsible for adding specific types of motion to the animations.
Expand Down
2 changes: 2 additions & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,7 @@
"AmusedInpaintPipeline",
"AmusedPipeline",
"AnimateDiffPipeline",
"AnimateDiffVideoToVideoPipeline",
"AudioLDM2Pipeline",
"AudioLDM2ProjectionModel",
"AudioLDM2UNet2DConditionModel",
Expand Down Expand Up @@ -567,6 +568,7 @@
AmusedInpaintPipeline,
AmusedPipeline,
AnimateDiffPipeline,
AnimateDiffVideoToVideoPipeline,
AudioLDM2Pipeline,
AudioLDM2ProjectionModel,
AudioLDM2UNet2DConditionModel,
Expand Down
7 changes: 5 additions & 2 deletions src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,10 @@
]
)
_import_structure["amused"] = ["AmusedImg2ImgPipeline", "AmusedInpaintPipeline", "AmusedPipeline"]
_import_structure["animatediff"] = ["AnimateDiffPipeline"]
_import_structure["animatediff"] = [
"AnimateDiffPipeline",
"AnimateDiffVideoToVideoPipeline",
]
_import_structure["audioldm"] = ["AudioLDMPipeline"]
_import_structure["audioldm2"] = [
"AudioLDM2Pipeline",
Expand Down Expand Up @@ -341,7 +344,7 @@
from ..utils.dummy_torch_and_transformers_objects import *
else:
from .amused import AmusedImg2ImgPipeline, AmusedInpaintPipeline, AmusedPipeline
from .animatediff import AnimateDiffPipeline
from .animatediff import AnimateDiffPipeline, AnimateDiffVideoToVideoPipeline
from .audioldm import AudioLDMPipeline
from .audioldm2 import (
AudioLDM2Pipeline,
Expand Down
9 changes: 6 additions & 3 deletions src/diffusers/pipelines/animatediff/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@


_dummy_objects = {}
_import_structure = {}
_import_structure = {"pipeline_output": ["AnimateDiffPipelineOutput"]}

try:
if not (is_transformers_available() and is_torch_available()):
Expand All @@ -21,7 +21,8 @@

_dummy_objects.update(get_objects_from_module(dummy_torch_and_transformers_objects))
else:
_import_structure["pipeline_animatediff"] = ["AnimateDiffPipeline", "AnimateDiffPipelineOutput"]
_import_structure["pipeline_animatediff"] = ["AnimateDiffPipeline"]
_import_structure["pipeline_animatediff_video2video"] = ["AnimateDiffVideoToVideoPipeline"]

if TYPE_CHECKING or DIFFUSERS_SLOW_IMPORT:
try:
Expand All @@ -31,7 +32,9 @@
from ...utils.dummy_torch_and_transformers_objects import *

else:
from .pipeline_animatediff import AnimateDiffPipeline, AnimateDiffPipelineOutput
from .pipeline_animatediff import AnimateDiffPipeline
from .pipeline_animatediff_video2video import AnimateDiffVideoToVideoPipeline
from .pipeline_output import AnimateDiffPipelineOutput

else:
import sys
Expand Down
9 changes: 1 addition & 8 deletions src/diffusers/pipelines/animatediff/pipeline_animatediff.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,8 @@

import inspect
import math
from dataclasses import dataclass
from typing import Any, Callable, Dict, List, Optional, Tuple, Union

import numpy as np
import torch
import torch.fft as fft
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer, CLIPVisionModelWithProjection
Expand All @@ -37,7 +35,6 @@
)
from ...utils import (
USE_PEFT_BACKEND,
BaseOutput,
deprecate,
logging,
replace_example_docstring,
Expand All @@ -46,6 +43,7 @@
)
from ...utils.torch_utils import randn_tensor
from ..pipeline_utils import DiffusionPipeline
from .pipeline_output import AnimateDiffPipelineOutput


logger = logging.get_logger(__name__) # pylint: disable=invalid-name
Expand Down Expand Up @@ -147,11 +145,6 @@ def _freq_mix_3d(x: torch.Tensor, noise: torch.Tensor, LPF: torch.Tensor) -> tor
return x_mixed


@dataclass
class AnimateDiffPipelineOutput(BaseOutput):
frames: Union[torch.Tensor, np.ndarray]


class AnimateDiffPipeline(DiffusionPipeline, TextualInversionLoaderMixin, IPAdapterMixin, LoraLoaderMixin):
r"""
Pipeline for text-to-video generation.
Expand Down
Loading
Loading