Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parallel Cross Entropy #2017

Conversation

zhenglongjiepheonix
Copy link
Contributor

@zhenglongjiepheonix zhenglongjiepheonix commented Sep 6, 2024

What does this PR do?

  • add parallel cross-entropy implementation
  • add parallel axis propagation rules for cross-entropy
  • add parallel cross-entropy replacement logic in pass
  • modify tests to trigger the critical path

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zhenglongjiepheonix zhenglongjiepheonix changed the title [WIP]Add Parallel Cross Entropy Add Parallel Cross Entropy Sep 6, 2024
Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Is it for the default backend?

@zhenglongjiepheonix
Copy link
Contributor Author

zhenglongjiepheonix commented Sep 6, 2024

LGTM! Is it for the default backend?

Nanotron backend will also use it because there is a label_mask input in Nanotron which is not compatible with transformers current implementation, the current sharded cross-entropy is compatible with transformers implementation, and it does not matter because cross-entropy layer does not contain any backend-specific stuffs, it does not even contain a single parameter

@zhenglongjiepheonix
Copy link
Contributor Author

If everything is good, can we merge this, I don't have write access to the main branch now :(

@michaelbenayoun michaelbenayoun merged commit bf1befd into huggingface:main Sep 18, 2024
42 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants