Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JANA based event merger #1592

Open
simonge opened this issue Aug 21, 2024 · 12 comments
Open

JANA based event merger #1592

simonge opened this issue Aug 21, 2024 · 12 comments

Comments

@simonge
Copy link
Contributor

simonge commented Aug 21, 2024

In order to properly understand the physics performance of the ePIC detector we need to include hits from background sources in our reconstruction workflow. While the HEPMC_Merger exists it is far from an ideal solution to require mixed events to be passed through the simulation. Instead, merging events between the simulation and digitization stage, allows the same simulated events to be used multiple times in different studies such as investigating how reconstruction is effected by luminosity, amongst many others.

Requirements of the merger

  • Source events from multiple files
  • Simultaneously access multiple events from the same file
  • Distribute interaction events through time based on bunch number
  • Distribute beam background events through time based on bunch number, bunch structure and position along beamline.

Main issues with any approach

  • Keeping track of associations between MCParticles and hits
  • Collections with the same names being read from multiple files

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Potential approach (full of pitfalls)

  • Load each source file tree independently
  • In a new tree/frame create new renamed collections for each source which contains a reference to the simulation hit, time offset and event number
  • Pass all of these new collections into the appropriate detector digitization method
@veprbl
Copy link
Member

veprbl commented Aug 21, 2024

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Can you expand on that?

In a new tree/frame create new renamed collections for each source which contains a reference to the simulation hit, time offset and event number

There is no need to "rename" collections. It's just that we need to add functionality for PODIO Source to have modified tags, that are not simply original names of the collections in the frames.

@simonge
Copy link
Contributor Author

simonge commented Aug 21, 2024

Alternative more end user friendly approach is probably copy everything and do manual book keeping of the associations so it can all be kept in the same file

@veprbl
Copy link
Member

veprbl commented Aug 21, 2024

DDG4 doesn't produce any associations. We only need to copy hits from different collections to a single one, plus modify their timestamps.

@simonge
Copy link
Contributor Author

simonge commented Aug 21, 2024

DDG4 doesn't produce any associations. We only need to copy hits from different collections to a single one, plus modify their timestamps.

Sorry, I meant the relations rather than associations, OneToOneRelations from SimTrackerHits/CaloHitContribution to MCParticles and the OneToManyRelations between SimCalorimeterHit and CaloHitContribution.

@simonge
Copy link
Contributor Author

simonge commented Aug 21, 2024

This feels like it should be possible with the event folder, folding lots of physics events into a timeframe but from what I've seen that isn't quite right.

Can you expand on that?

Despite the many presentations on the concept from @nathanwbrei I haven't been able to conceptually follow what happens to the data when moving between timeframe/event/sub-event. All of the examples start with a higher level and break it into smaller ones, which means they can be processed as a conventional event read and just have tags saying what event/subevent it belongs to within a larger collection. I would be more than happy for my understanding to be wrong and have it rewritten.

@veprbl
Copy link
Member

veprbl commented Aug 21, 2024

(Replying to the last comment)

I might be wrong, but the division between timeframe, event, sub-event, while it is currently hardcoded (I believe, some constants are defined), is not implying an actual hierarchy of number of JEvents being passed. What I mean, is you could "fold" from timeframe to event and produce more events than timeframe, or "unfold" from timeframe to event and produce less events than timeframes. You can also do anything in between. And, I remember, @nathanwbrei mentioned that the level could, in principle be created by the users with arbitrary names (we could imaging "signal", "bg1", "bg2" instead), but for now we could (mis-)appropriate the existing defined levels for our needs.

@simonge
Copy link
Contributor Author

simonge commented Aug 23, 2024

Folding/unfolding actually changing the number of JEvents rather than tagging subsets of the event was my original impression. Keeping track of relations/associations between levels still sounds tricky.

My understanding is that a object in a podio collection can only ever belong to one collection, if you want to include it in another collection you need to either use a subset collection or copy it. Creating a subset collection which points to a different/several different JEvents wouldn't work (as far as I can see at least when writing out the objectID wouldn't work) while copying it will break any associations to it.

Here you'd want to have the object owned/shared by collections in JEvents on different levels. That way when saving to an output file with many frames/trees representing different levels both can contain the same objects with their relations in tact.

This might of course be what's happening in which case, fantastic, I'm slowly catching up.

@veprbl
Copy link
Member

veprbl commented Aug 23, 2024

I don't think we can do without rewriting collection id's and indices. Objects will have to be copied in any case, as we need to modify time fields in most of them.

@simonge
Copy link
Contributor Author

simonge commented Aug 23, 2024

The alternative would be having an event data type with a time offset (along with some tag which says what the event source was) and adding a bunch of association collections between the hits and particles to that.

Will something like that not be needed to trace back through the event levels anyway?

@simonge
Copy link
Contributor Author

simonge commented Aug 26, 2024

Current thoughts on a process which as far as I know would work but needs access to some additional JANA features:

For each event source j

  • Create a timeline to inject N_i events into for each sample
  • Use the JANA EventGroup to collect N_i events.
  • For each i event:
  1. Create a new edm4eic::EventSource with a time offset and identifier, this should either be a one to many or one to one relation with the MCParticles.
  2. Copy the MCParticles, offsetting the times by the timeline time and creating an association with the original MCParticle.
  3. Loop over the associations updating the parent/daughter fields of the new MCParticles.
  4. Copy the SimTrackerHits, offsetting the time and updating the relation based on the association.
  5. Copy the CaloHitContribution, offsetting the time and updating the relation based on the association. Create an association to the original CaloHitContribution.
  6. Copy the SimCalorimeterHits, updating their relation based on the contribution associations.

Bringing the event sources together is a bit simpler and could probably be handled in a number of different ways depending on whether it is attached to the digitization/reconstruction/stand alone. Probably easiest would be a stand alone merger (or at least separate JANA Plugin if that's how they're meant to work) where each source has 4 collections with a source tag then original named collections are just a subset collection with those merged.

Merging of the metadata/checking for conflicts is another kettle of fish.

@simonge
Copy link
Contributor Author

simonge commented Oct 2, 2024

@nathanwbrei would you be able to comment on this and direct me on how to get started.

@nathanwbrei
Copy link
Contributor

@simonge Let's schedule a call!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants