You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Am I understanding correctly that the control image given to ControlNet Tile is just the downscaled version of the original image?
Let me explain how I understand it:
In the training phase, if the original image is 512x512, you would downscale it to 256x256 (or something like that), then upscale it back to 512x512 then use it as control image. This downscaling and upscaling will make the control image more blurry based on the resampler that you use. Ideally if you want it to look like a bunch of tiles then the resampler is probably Nearest Neighbor based, right?
At inference, to guarantee that it still works, you still probably need to do the same downscaling and upscaling process. But maybe the model generalizes well enough that you can just feed the unprocessed 512x512 image directly and it would generate a sharper version of the image. It's like the model has learned to make image sharper given any control image (even the one that's already sharp).
But I doubt its ability to work without the downscaling/upscaling process because the model has never seen such sharp control image in the training set before. Does this work in practice?
Please correct me if I'm wrong on any points.
Thanks!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Am I understanding correctly that the control image given to ControlNet Tile is just the downscaled version of the original image?
Let me explain how I understand it:
In the training phase, if the original image is 512x512, you would downscale it to 256x256 (or something like that), then upscale it back to 512x512 then use it as control image. This downscaling and upscaling will make the control image more blurry based on the resampler that you use. Ideally if you want it to look like a bunch of tiles then the resampler is probably Nearest Neighbor based, right?
At inference, to guarantee that it still works, you still probably need to do the same downscaling and upscaling process. But maybe the model generalizes well enough that you can just feed the unprocessed 512x512 image directly and it would generate a sharper version of the image. It's like the model has learned to make image sharper given any control image (even the one that's already sharp).
But I doubt its ability to work without the downscaling/upscaling process because the model has never seen such sharp control image in the training set before. Does this work in practice?
Please correct me if I'm wrong on any points.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions