Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train SphereFace2 on >1m identities, e.g. WebFace24M? #14

Open
yiminglin-ai opened this issue Jan 31, 2023 · 1 comment
Open

Comments

@yiminglin-ai
Copy link

yiminglin-ai commented Jan 31, 2023

Hi @wy1iu @ydwen
Thank you for open-sourcing this repo.
Section 2.4 of the Sphereface2 paper says

the gradient computations in SphereFace2 are class-independent and can be performed locally within one GPU. Thus no communication cost is needed.

But when you distribute classifiers $W_i$ to different GPUs, what if a GPU only gets a batch of negative features? The lines after one_hot cannot be executed.

Have you tried training on WebFace42M? What is the performance of Sphereface2?

Happy to discuss the implementation details and contribute to this repo :)

@ydwen
Copy link
Owner

ydwen commented Mar 9, 2023

There is no problem for SF2 to get a batch of negative features. The final loss will be averaged across all gpus.

"The lines after one_hot cannot be executed."
In this case, we will construct a zero matrix as labels.

I haven't tried WebFace42M yet, since it requires too much computational resources.
I am looking forward to seeing how SF2 works on this dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants