Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing support for variable binning with event classes (species) #835

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

marialiubarska
Copy link
Contributor

@marialiubarska marialiubarska commented Nov 10, 2024

!!!DO NOT MERGE UNTIL APPROVED BY REQUESTED REVIEWERS!!!

Summary of the changes:

  1. Introduced a new binning class, VarMultiDimBinning, which holds a list of EventSpecies
  2. each EventSpecie is defined by name, binning and selection string (can also be None)
  3. Added support for VarMultiDimBinning into Map, so can support arithmetic operations, hashing, eq/neq, metric calculation
  4. Limited support in Container, no translation from/to VarMultiDimBinning shape (planning to add translation VarMultiDimBinning <-> 'events', translation to/from MultidimBinning isn't necessary imho, but this is up for debate)
  5. Modified utils.hist stage by adding handling of VarMultiDimBinning. This is where sample is divided based on selection. For now I used if/else statements, but it leads to some code duplication, so I'm open for suggestion (maybe we just make two hist stages, with and without VarMultiDimBinning support?)

Main idea is that analysis functionality should be minimally affected (I think only get_detailed_metric_info should really be affected), this is the reason for introducing this at binning level. However I haven't tested all functionality yet

I have tested running Pipeline with identical 4d binning, one using MultiDimBinning and one with VarMultiDimBinning containing a single EventSpecie with the same 4d binning and selection=None. Resulting histograms are identical. You can see results of the timed test below:
image

Main TODOs so far:

  1. test with a full analysis to make sure nothing breaks
  2. in hist stage add a check to make sure selected event species don't overlap - we don't want double counting
  3. add example notebook and missing docstrings
  4. move selection ,ask calculations to setup_function

Let me know if you have any suggestions!

@JanWeldert
Copy link
Contributor

JanWeldert commented Nov 19, 2024

Thanks for implementing this. I will give it a more detailed look later, but have one first comment. I think we need a translation from each binning mode (events, multi dim, and var binning) to each other because we want to be able to run the stage before the hist stage in event as well as binned mode. We can always translate to event mode first and then change the binning mode though

@thehrh
Copy link
Contributor

thehrh commented Dec 4, 2024

In general, I'm a bit worried about adding another "events" class (here EventSpecie, whose main reason for existence seems to be to wrap a MultiDimBinning instance), cf core.events and core.events_pi. For example, EventsPi hold metadata beyond a simple selection string.

Does one actually need an EventSpecie class? Why not directly store a list of MultiDimBinnings, metadata, event names etc. in VarMultiDimBinning, instead of taking the detour via EventSpecie?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants