runwayml · Sneazy101 · Jul 27, 2022
diff --git a/README.md b/README.md
@@ -2,7 +2,18 @@
 
 ![teaser](assets/teaser.jpg)
 
-Many video editing tasks such as rotoscoping or object removal require the propagation of context across frames. While transformers and other attention-based approaches that aggregate features globally have demonstrated great success at propagating object masks from keyframes to the whole video, they struggle to propagate high-frequency details such as textures faithfully. We hypothesize that this is due to an inherent bias of global attention towards low-frequency features. To overcome this limitation, we present a two-stream approach, where high-frequency features interact locally and low-frequency features interact globally. The global interaction stream remains robust in difficult situations such as large camera motions, where explicit alignment fails. The local interaction stream propagates high-frequency details through deformable feature aggregation and, informed by the global interaction stream, learns to detect and correct errors of the deformation field. We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame as required for image inpainting, as well as their propagation from keyframes to target frames. Applied to video inpainting, our approach leads to 44% and 26% improvements in FID and LPIPS scores.
+### How Guided Inpainting is different
+
+Video editing tasks such as rotoscoping and object removal require context propagation across frames. Transformers and other attention-based approaches that aggregate features globally work great at propagating object masks from keyframes to the whole video. However, they struggle to propagate high-frequency details such as textures properly. We hypothesize this is due to an inherent bias of global attention toward low-frequency features. To overcome this limitation we used a two-stream approach where high-frequency features and low-frequency features interact globally.
+
+### How Guided Inpainting works
+
+Guided Inpainting uses a two-stream approach that used high-frequency features and low-frequency features to interact globally. The global interaction stream remains robust in difficult situations such as large camera motions, where explicit alignment fails. The local interaction stream propagates high-frequency details through deformable feature aggregation and, informed by the global interaction stream, learns to detect and correct errors of the deformation field. 
+
+### Evaluation
+
+We evaluate our two-stream approach for inpainting tasks, where experiments show that it improves both the propagation of features within a single frame (required for image inpainting) and their propagation from keyframes to target frames. Applied to video inpainting, our approach leads to 44% and 26% improvements in FID and LPIPS scores.
+
 
 [***Towards Unified Keyframe Propagation Models***](https://arxiv.org/abs/2205.09731)<br/>
 Patrick Esser, Peter Michael, Soumyadip Sengupta