Unofficial implementation of “DiffEdit: Diffusion-based semantic image editing with mask guidance” with Stable Diffusion, for better sample efficiency, we use DPM-solver, as sample method.
Paper: https://arxiv.org/abs/2210.11427
A suitable conda environment named ldm
can be created
and activated with:
conda env create -f environment.yaml
conda activate ldm
You can also update an existing latent diffusion environment by running
conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .
just run diffedit.ipynb
using jupyter notebook.
encode_ratio: float = 0.6
# encode_ratio indicate how noisy the img is add with, if ratio is near zero, the origin img is likely to return, if ratio is near 1.0, it may casue some problem
clamp_rate: float = 4
# the map value will be clamped to map.mean() * clamp_rate, then values will be scaled into 0~1, then term into binary(split at 0.5). so if a map value is large than map.mean() * clamp_rate * 0.5 will be encode to 1, less will be encode to 0.
# so the larger clamp rate is, less pixes will be encode to 1, the small clamp rate is, the more pixes will be encode to 1.
ddim_steps: int = 15
# for dpm-solver, steps do not need be too large
# encourage to use other parameter(like order, predict_x0) of dpm-solver
A bowl of fruits | generated mask | A bowl of pears |
---|---|---|