-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the dataset source #8
Comments
Thanks for your interest. The current open-source dataset is collected from youtube by our team independently, the other part of our training data that cannot be opened to the public is sourced from open-sora plan. The CelebV-Text dataset seems to have the following problems, so we did not use it: |
The dataset uploaded in the preprint version appears to be incomplete. While the dataset contains 54,239 videos, masks are only available for 31,947 of them. @SHYuanBest |
We only used 31,947 of them. |
Thank you for your reply. Could you clarify whether you use a face mask or a head mask to obscure unrelated areas during training? |
We use the face first, and use the head if the face mask is missing. |
Thank you again for your reply. I have some confusion regarding the process: Do you first concatenate the VAE features of keypoint maps and the VAE features of facial images along the frame dimension, and then concatenate these with the noise video along the channel dimension to achieve a total of 32 channels? |
yes |
Thanks. |
Hi. Could you provide further clarification regarding the preference for "half-body or full-body images" as mentioned in the Hugging Face Space documentation? From my understanding, the system utilizes FaceXLib to detect and crop faces, which should effectively handle face images, half-body images, and full-body images alike. However, your statement seems to emphasize half-body and full-body images while not explicitly mentioning face images. Could you explain how this preference impacts the processing of face images and why there might be a specific emphasis on half-body and full-body shots? |
If we only input 'crop face', FaceXLib is likely to fail in detecting faces. However, using half-body or full-body images can significantly reduce this likelihood. |
Got it. Thank you very much for your prompt response. |
oh, i miss some important details. The 31947 are |
Thank you for your contribution to open-source datasets. I would like to confirm whether these video resources were collected independently by your team or if they are based on public datasets such as CelebV-Text, which have been processed or utilized further by you?
The text was updated successfully, but these errors were encountered: