You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For Wukong dataset, we filtered the first 50 million images using Chinese-CLIP (Vit-B-16 model) and only kept samples with a visual-textual similarity score greater than 0.475. So, you will need to pair the captions with the corresponding images based on the image captions.
For CC3M, we will try to restore their original correspondence.
No description provided.
The text was updated successfully, but these errors were encountered: