Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeometrySplit functor isn't threadsafe #1607

Merged
merged 14 commits into from
Sep 16, 2024
Merged

GeometrySplit functor isn't threadsafe #1607

merged 14 commits into from
Sep 16, 2024

Conversation

simonge
Copy link
Contributor

@simonge simonge commented Sep 3, 2024

Briefly, what does this PR introduce?

Fixes issues with the initialization of the GeometrySplit functor across threads.

Currently only the std::once_flag is a shared pointer and none of the other variables which are being initialized. This meant that when copied to other threads the is_init flag was set but the decoder and geometry divisions had not been so the loop over the detector ids was always empty. This resulted in an apparent variable decrease in the efficiency of the Low-Q2 tagger depending on how many threads were asked for.

Unit tests for all of the meta factories and functors should be added as an issue to try head this off in the future but outwith this fix.

What kind of change does this PR introduce?

  • Bug fix (issue #__)
  • New feature (issue #__)
  • Documentation update
  • Other: __

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

No

Does this PR change default behavior?

All of the Low-Q2 Tagger hits will be converted into clusters rather than just those run on the first thread to call the functor.

@simonge simonge requested a review from wdconinc September 3, 2024 17:29
Copy link
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info might be a bit too loud

src/factories/meta/CollectionCollector_factory.h Outdated Show resolved Hide resolved
src/factories/meta/SubDivideCollection_factory.h Outdated Show resolved Hide resolved
@simonge
Copy link
Contributor Author

simonge commented Sep 3, 2024

There was an alternative solution, overwriting the copy constructors for the class so that each copy has a new std::once_flag and does it's own initialization on the first call. Making everything needing initialized shared pointers feels cleaner.

@simonge
Copy link
Contributor Author

simonge commented Sep 3, 2024

Are any of the CI tests run on multiple threads which would be possible to compare the output between a single and multi-threaded run?

@simonge
Copy link
Contributor Author

simonge commented Sep 3, 2024

Are any of the CI tests run on multiple threads which would be possible to compare the output between a single and multi-threaded run?

Looks like this was being partially approached in #975. For cases like this having at least one job still run on a single core to compare the output would be useful.

Checking the default flags it appears nthreads should be 1 if not specified, however the behaviour change seen here was only identified by explicitly adding the -Pnthreads=1 flag. Doesn't sound like that should have made a difference, any idea why it did?

@wdconinc
Copy link
Contributor

wdconinc commented Sep 9, 2024

Checking the default flags it appears nthreads should be 1 if not specified, however the behaviour change seen here was only identified by explicitly adding the -Pnthreads=1 flag. Doesn't sound like that should have made a difference, any idea why it did?

@nathanwbrei may be the best person to address that.

@simonge
Copy link
Contributor Author

simonge commented Sep 9, 2024

Please could this be reviewed independently of the issues around running/testing with different numbers of threads, I can flag that as separate issue.

@veprbl
Copy link
Member

veprbl commented Sep 10, 2024

Do we understand why these are copied in each thread in the first place?

@simonge
Copy link
Contributor Author

simonge commented Sep 11, 2024

Do we understand why these are copied in each thread in the first place?

I'm not familiar enough with JANA2 resource management to comment but would be keen to hear an explanation.

Copy link

@simonge
Copy link
Contributor Author

simonge commented Sep 16, 2024

If possible could this be reviewed and merged before tagging for September, the Low-Q2 Tagger reconstruction is (variably) broken without it. The questions raised around the threads and JANA will still need addressing but can be answered later.

Copy link
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea, but not the details. Fine to merge IMO.

@simonge simonge added this pull request to the merge queue Sep 16, 2024
Merged via the queue into main with commit a894644 Sep 16, 2024
86 of 87 checks passed
@simonge simonge deleted the Debug-meta branch September 16, 2024 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants