Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

KarenAssaraf · 2023-05-07T08:47:14Z

Hey!
I see in this line:

Line 248 in b286649

transform=transforms.Compose([RandCrop(patch_size=config.crop_size),

, that when training on koniq10k, each image is first resized to (224, 224).
Then you apply transform function that contains a random crop to size (224, 224).
Unless I miss something does the original image has to be resized?

Thanks!

Stephen0808 · 2023-05-08T03:48:51Z

We select vision transformer as our feature extractor, which means the input images should be resized to the fixed image size(224X224).

KarenAssaraf · 2023-05-08T08:42:03Z

Hi @Stephen0808 !
Thanks for your fast answer. But then what is the effect of random crop in the transform function of the dataloader?

Also, it means all Koniq10k images, which are initially, full resolution are resized to (224, 224). We loose the information of the full resolution image quality. (Usually IQA tramnsformers try to avoid resizing and leverage transformers architecture to accept different input sizes)
Wouldn't it be better performance if we random crop first koniq10k dataset images, and then send to vit?
lets say we make sure at least 20 crops are in the dataset for each image, and each crop would have the label of the full resolution image for the training.
It would mean training on size(koniq10)*num_of_crops images.

What do you think?

Stephen0808 · 2023-05-08T09:17:56Z

As mentioned in your question, we crop several images(224X224) for inference and average the scores to get the final score.

KarenAssaraf · 2023-05-08T09:58:31Z

I mean:

in inference, the crop would be of high quality (there's no resize of the original inference image)
in training, we do affect the quality since it's the resized image that is forwarded to the model, and not a crop from the original full resolution training mage.

the question is: why not same process for training/inference?

Stephen0808 · 2023-05-08T11:10:55Z

Both in the inference and training phases, we used cropped images.

KarenAssaraf · 2023-05-08T12:02:07Z

Ok maybe I misunderstood something.
From my understanding:
in training, there is :

resizing (so it's not a crop and it affects image quality) from initial resolution to (224, 224)
and then crop from (224, 224) to (224, 224). Which does nothing since the input is already (224,224)? correct?

So I was wondering why instead of step1, there are not several crops then send this crops to vit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

KarenAssaraf commented May 7, 2023

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 •

edited

Loading

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 •

edited

Loading

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 •

edited

Loading

Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

Koniq10k dataloader do resize to (224, 224) and then apply transform with random crop? Why? #34

Comments

KarenAssaraf commented May 7, 2023

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 • edited Loading

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 • edited Loading

Stephen0808 commented May 8, 2023

KarenAssaraf commented May 8, 2023 • edited Loading

KarenAssaraf commented May 8, 2023 •

edited

Loading

KarenAssaraf commented May 8, 2023 •

edited

Loading

KarenAssaraf commented May 8, 2023 •

edited

Loading