Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/sequence bucketing #610

Merged

Conversation

ArturoLlorente
Copy link
Collaborator

Implemented sequence bucketing as a collate function ( collator_sequence_bucketing) used by DataLoader module. This function sorts all graphs and splits the data into mini-batches, where the forward pass is performed separately.

Splits of the different mini-batches can be customized (by default 0-80%, 80-100%).

This method is effective when working with Transformers, since the computational complexity grows with the square of the sequence length. Improvement noted when then maximum number of pulses is greater.

@RasmusOrsoe
Copy link
Collaborator

@ArturoLlorente The IceTray disk space error is now fixed in main. Please try to update the branch and see if this doesn't fix the error.

Copy link
Collaborator

@RasmusOrsoe RasmusOrsoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @ArturoLlorente

@ArturoLlorente ArturoLlorente merged commit 02902fa into graphnet-team:main Oct 13, 2023
RasmusOrsoe pushed a commit to RasmusOrsoe/graphnet that referenced this pull request Oct 25, 2023
…ence_bucketing

Feature/sequence bucketing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants