Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to link the image id in DATA JSON to the image in IMAGE URLS for WuKong #11

Open
ghost opened this issue Jul 18, 2023 · 3 comments
Open

Comments

@ghost
Copy link

ghost commented Jul 18, 2023

No description provided.

@ghost ghost changed the title how to link the image id how to link the image id in DATA JSON to the image in IMAGE URLS for WuKong Jul 18, 2023
@ghost
Copy link
Author

ghost commented Jul 18, 2023

hi, I have downloaded the wukong data from the url provided in https://github.com/phellonchen/X-LLM/blob/main/README_DATA.md, the order of samples in CSV files is not consistent with the image id/name in JSON file, so how can l link them between original image urls and filtered image names?
@MingLunHan @phellonchen

@rumusan
Copy link

rumusan commented Jul 18, 2023

same question for cc3m

@phellonchen
Copy link
Owner

For Wukong dataset, we filtered the first 50 million images using Chinese-CLIP (Vit-B-16 model) and only kept samples with a visual-textual similarity score greater than 0.475. So, you will need to pair the captions with the corresponding images based on the image captions.

For CC3M, we will try to restore their original correspondence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants