Datasets We provide links to download our preprocessed dataset. Vision Dataset for ImageNet-1k Dataset for ImageNet-21k Dataset for MSCOCO Dataset for COCO-Stuff Dataset for ADE20K Dataset for Kinetics-400 Audio Dataset for ESC-50 Dataset for AudioCaps Dataset for Clotho Dataset for MACS Dataset for AVQA Dataset for Vggsound Dataset for FSD50K Vision-Language Dataset for MSCOCO Dataset for Flickr30k Dataset for NLVR2 Dataset for RefCOCO Dataset for RefCOCO+ Dataset for RefCOCOg Dataset for VQAv2