Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Multiple Primary MC-Cluster Associations #1396

Merged
merged 53 commits into from
Sep 17, 2024

Conversation

ruse-traveler
Copy link
Contributor

@ruse-traveler ruse-traveler commented Apr 23, 2024

Briefly, what does this PR introduce?

This PR adjusts the logic used in establishing the truth-cluster associations in CalorimeterClusterRecoCoG. Previously, the truth in the truth-cluster associations was defined to be the MCParticle of the 1st contribution stored in the list of contributions to the SimCalorimeterHit corresponding to the highest energy hit in the cluster. This PR changes the logic such that now

  1. Multiple MC-Cluster associations are created,
  2. Associations link back to primary (generator status == 1) MC particles, and
  3. Each association carries a weight of E_{contributed} / E_{cluster}.

This PR addresses #1475 and partially addresses #898. The generator status == 1 condition should be revisited in a subsequent PR: this definition of primary may miss cases, particularly for longer lived neutral resonances such as the K^{0}_{L}, which would be useful information to retain.

What kind of change does this PR introduce?a

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

While this PR doesn't introduce breaking changes, it does introduce substantial changes that users might need to account for in downstream analysis code (see below).

Does this PR change default behavior?

Yes: MC-Cluster associations (edm4eic::MCRecoClusterParticleAssociations) will now contain multiple associations per clusters, and will link back to primary (status == 1) MC particles.

@github-actions github-actions bot added the topic: calorimetry relates to calorimetry label Apr 23, 2024
@ruse-traveler
Copy link
Contributor Author

@wdconinc I remember you pointing out that this could be an issue for systems with large numbers of contributions (specifically the LFHCal) in the chat today. I'll check to make sure this doesn't cause a drastic slowdown...

@veprbl
Copy link
Member

veprbl commented Apr 24, 2024

I don't think this is correct. If you have a low-energy electron and a high-energy pion in ECal, your pion can leave a MIP response, while electron will actually contribute most of the energy in the cluster. The code tries to answer the question: which particle is measured by the cluster?

@ruse-traveler
Copy link
Contributor Author

I don't think this is correct. If you have a low-energy electron and a high-energy pion in ECal, your pion can leave a MIP response, while electron will actually contribute most of the energy in the cluster. The code tries to answer the question: which particle is measured by the cluster?

Ah! I see what you're saying. I was thinking about this in a very "HCal" sort of way... So maybe rather than selecting the highest energy particle, I select the one associated with the largest contribution? That would dovetail nicely with what Wouter suggested above!

@veprbl
Copy link
Member

veprbl commented Apr 26, 2024

I think the current implication is that the highest energy contribution is at the first index. We should, of course, check that. My personal suspicion is that we might need to calculate total primary particles contributions including their downstream contributions in the shower, and then we pick the maximal one. Basically, I think we need to traverse the tree to the nodes with status=1.

@veprbl
Copy link
Member

veprbl commented Apr 26, 2024

And yes, it kind of made sense for HCals, I agree.

@ruse-traveler
Copy link
Contributor Author

I think the current implication is that the highest energy contribution is at the first index. We should, of course, check that. My personal suspicion is that we might need to calculate total primary particles contributions including their downstream contributions in the shower, and then we pick the maximal one. Basically, I think we need to traverse the tree to the nodes with status=1.

That lines up with what I've been seeing: I don't think I've ever seen a 2nd particle be the highest energy (granted this is for single particle events and we're only considering the highest energy hit in a cluster). But I agree: it would be good to confirm this...

On the status=1, I'm curious: what about situations where a shower particle from an ECal (or possibly from a support structure or the beampipe) creates a cluster in an HCal? My concern would be that the HCal isn't necessarily measuring the primary in this case...

@veprbl
Copy link
Member

veprbl commented May 1, 2024

According to capybara, this doesn't do much for any calorimeter. Maybe, B0 is slightly modified, but we don't even know why.

On the status=1, I'm curious: what about situations where a shower particle from an ECal (or possibly from a support structure or the beampipe) creates a cluster in an HCal? My concern would be that the HCal isn't necessarily measuring the primary in this case...

It depends on what the purpose is. We don't build HCals to look good on beam tests, we need to do physics with those. Matching with primaries is a way to study how well we are doing that. Allegedly we don't store MCParticles outside of the tracking volume, so what would it even associate with.

@ruse-traveler
Copy link
Contributor Author

Allegedly we don't store MCParticles outside of the tracking volume, so what would it even associate with.

AH! If that's the case then agreed: we should just stick with primaries.

@ruse-traveler
Copy link
Contributor Author

To see if these changes introduce a dramatic reduction in performance, I tried out these changes on 1000 18x275, Q^2>100 NC DIS events and compared against main run on the same events. The average throughput doesn't seem to be too different (0.201 for this PR vs. 0.196 for main)...
main_nEvt1K.log
pr1396_withChanges_nEvt1K.log

[BTW I ran this on SDCC]

@veprbl
Copy link
Member

veprbl commented May 6, 2024

This is interesting. Looking at capybara reports, I can't quite see a big change for HCals. Has anyone looked at changes for $E/p$ before and after this?

@ruse-traveler
Copy link
Contributor Author

This is interesting. Looking at capybara reports, I can't quite see a big change for HCals. Has anyone looked at changes for E/p before and after this?

I did not! But I can quickly check that...

@ruse-traveler
Copy link
Contributor Author

Hi all, just wanted to check in on this: does the recent round of commits resolve the open issues, or is there more to discuss? Thanks!

@veprbl
Copy link
Member

veprbl commented Sep 13, 2024

@ruse-traveler I'd like to spend some more time checking the actual change. Have you done any benchmarking on your end?

@ruse-traveler
Copy link
Contributor Author

Same situation as #1406: mostly just spot-checks. Like with the that one, I can run some more detailed checks...

@veprbl
Copy link
Member

veprbl commented Sep 17, 2024

I'm experiencing some issues with this while processing DIS CC (simulated from EPIC/EVGEN/DIS/CC/10x100/minQ2=1000/pythia8CCDIS_10x100_minQ2=1000_beamEffects_xAngle=-0.025_hiDiv_1.hepmc3.tree.root).

reco> [EEMC:EcalEndcapNClusters] [debug] Associated cluster #5 to MC Particle #0 (pid = 2212, status = 4, energy = 99.94178715303674) with weight (0.9867191517808613)
reco> [EEMC:EcalEndcapNClusters] [debug] Associated cluster #5 to MC Particle #39 (pid = -211, status = 1, energy = 0.3006170675754705) with weight (0.01328087107687527)
diff --git a/src/algorithms/calorimetry/CalorimeterClusterRecoCoG.cc b/src/algorithms/calorimetry/CalorimeterClusterRecoCoG.cc
index ff75e3aa..2cffdd50 100644
--- a/src/algorithms/calorimetry/CalorimeterClusterRecoCoG.cc
+++ b/src/algorithms/calorimetry/CalorimeterClusterRecoCoG.cc
@@ -419,6 +419,12 @@ edm4hep::MCParticle CalorimeterClusterRecoCoG::get_primary(const edm4hep::CaloHi
   while (primary.parents_size() > 0) {
     primary = primary.getParents(0);
     if (primary.getGeneratorStatus() == 1) break;
+    trace("descending MC Particle #{} (pid = {}, status = {}, energy = {})",
+      primary.getObjectID().index,
+      primary.getPDG(),
+      primary.getGeneratorStatus(),
+      primary.getEnergy()
+    );
   }
   return primary;
 }

yields outputs like

reco> [EEMC:EcalEndcapNClusters] [trace] descending MC Particle #57 (pid = 310, status = 2, energy = 1.527582375052151)
reco> [EEMC:EcalEndcapNClusters] [trace] descending MC Particle #40 (pid = 311, status = 2, energy = 1.5275850283377552)
reco> [EEMC:EcalEndcapNClusters] [trace] descending MC Particle #6 (pid = 2101, status = 71, energy = 39.28357352176002)
reco> [EEMC:EcalEndcapNClusters] [trace] descending MC Particle #2 (pid = 2101, status = 63, energy = 39.28357352176002)
reco> [EEMC:EcalEndcapNClusters] [trace] descending MC Particle #0 (pid = 2212, status = 4, energy = 99.92043455278755)
reco> [EEMC:EcalEndcapNClusters] [trace] Identified primary: id = 0, pid = 2212, total energy = 99.92043455278755, contributed = 0.033173649972013664

Not sure if this is only effect for beam remnants, or status codes are overwritten somehow else (and then it's a bug somewhere). Workaround for this is to stop at the generated event boundary.

Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
11.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

Copy link
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested in a private benchmark. Doesn't seem to break it anymore.

@veprbl veprbl enabled auto-merge September 17, 2024 22:00
@veprbl veprbl added this pull request to the merge queue Sep 17, 2024
@veprbl
Copy link
Member

veprbl commented Sep 17, 2024

Another fun plot:
image
(red: main, blue: this commit).

Merged via the queue into main with commit 237af36 Sep 17, 2024
85 of 87 checks passed
@veprbl veprbl deleted the use-highest-energy-contributor-for-cluster-association branch September 17, 2024 22:50
@ruse-traveler
Copy link
Contributor Author

Another fun plot: image (red: main, blue: this commit).

Nice!! Looks very good! Thanks for catching that bug with the event boundary! 🙌
(Also apologies for the delayed response!)

github-merge-queue bot pushed a commit that referenced this pull request Oct 29, 2024
### Briefly, what does this PR introduce?

This PR applies the #1396 treatment to the `ImagingClusterReco`
algorithm. As a reminder:
- Previously, the truth in the truth-cluster associations was defined to
be the MCParticle of the 1st contribution stored in the list of
contributions to the SimCalorimeterHit corresponding to the highest
energy hit in the cluster.
- The logic here is
    1. Multiple MC-Cluster associations are created,
2. Associations link back to primary (generator status == 1) MC
particles, and
3. Each association carries a weight of E_{contributed} / E_{cluster}.

### What kind of change does this PR introduce?
- [x] Bug fix (issues #898, #1475)
- [ ] New feature (issue #__)
- [ ] Documentation update
- [ ] Other: __

### Please check if this PR fulfills the following:
- [ ] Tests for the changes have been added
- [ ] Documentation has been added / updated
- [x] Changes have been communicated to collaborators

### Does this PR introduce breaking changes? What changes might users
need to make to their code?

It doesn't introduce breaking changes, but it does substantially alter
cluster-MC associations in such a way that users may have account for in
downstream analysis code.

### Does this PR change default behavior?

**Yes:** MC-Cluster associations
(edm4eic::MCRecoClusterParticleAssociations) will now contain multiple
associations per clusters, and will link back to primary (status == 1)
MC particles.

---------

Co-authored-by: Dmitry Kalinkin <[email protected]>
Co-authored-by: Chao Peng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: backward topic: barrel topic: calorimetry relates to calorimetry topic: far-backward Reconstruction related to far backward detectors topic: far-forward Far forward reconstruction topic: forward
Projects
None yet
3 participants