Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P_T reordering for JES/R and HEM modules #57

Open
lcorcodilos opened this issue Apr 2, 2021 · 0 comments
Open

P_T reordering for JES/R and HEM modules #57

lcorcodilos opened this issue Apr 2, 2021 · 0 comments
Labels
enhancement New feature or request

Comments

@lcorcodilos
Copy link
Owner

When applying any new JEC or variations on the jet momentum, the jets may no longer be ordered in pt. Any HEM issue module will have the same effect. It's currently the responsibility of the user to account for this (if at all).

Accounting for the re-ordering for the HEM issue is not so difficult because of the one-time nature of the module. However, doing the variations on the jet pt is frankly a big headache because each variation in the pt is a new re-ordering (meaning a new collection needs to be created for each variation).

Without TIMBER

There are a few ways around this that do no involve TIMBER automation that could have varying implications depending on the analysis.

  1. Just ignore the pt being out of order. If the analysis doesn't make any decisions on pt ordering, there's no need to do anything. For this reason, re-ordering should always be optional and not forced (from the TIMBER development perspective).
  2. Process the variations in pt one at a time with either (a) or (b) below. Will be less efficient but that may not matter if one is running on condor and the number of jobs is already low.
  3. Process the variations in parallel, tracking each variation of ordering manually but building the dataframe actions concurrently. This is more computationally efficient but requires lots of tracking and is prone to error (unless there's a good TIMBER idea... see below).

For (2) and (3), there are two sub-options.

a. If the analysis is only using 1 or 2 jets, one can simply search for the highest pt values in the new pt vector, extract the indices as new column variables, and use these variables everywhere in place of 0 and 1.

b. Use ReorderCollection() [1] to re-order the entire collection and use the new collection for access to variables (with 0 or 1)

Option (b) is more computationally expensive and it doesn't do much to improve the user interface. Option (a) is a bit more error prone since you're dealing with indices and debugging and indexing issue can be difficult (or hard to identify in the first place).

With TIMBER

Indexing > new collections

Learning from the Without TIMBER section, the optimized option seems to be to develop a new set of indices stored in a separate branch and to direct the user to use these if they want the re-ordered collection. For example, FatJet_pt[0] would become CalibratedFatJet_pt[JES_index[0]] to get the new leading jet (where JEC_index is the pt ordered list of indices for the original collection.

This is lightweight enough that users could make the choice to not use these values if they don't care about pt ordering (and actually, then there would be no computational penalty since they column would never be used).

Simultaneous branch action solutions

As an example, we have something like this...

                  base
                   |
                   1
                   |
                   2
                /  |   \
               /   |    \
              /    |     \
pt          nom   up    down  
variation    |     |      |
             3     3      3
identical    |     |      |
actions      4     4      4
3 & 4

where 1, 2, 3, and 4 are some actions on the dataframe and nom, up, down are the three branches of the processing that change the pt.

Option 1

One solution is to have an AnalyzerGroup() class which parallelizes actions on separate branches of the processing tree. The methods of the AnalyzerGroup are the same as the analyzer but just loop over all analyzer objects in the group to perform the action.

Pros: A new class keeps the logic separable.
Cons: All methods would have to be hard coded or a new generic proxy method would need to be written (making subsequent actions look less like actions on a single analyzer object)

Option 2

Modify analyzer() to always track multiple dataframes (the base case being the one dataframe to start). Then analyzer.DataFrame would point to a list of RDataFrames (via Nodes), not a single RDataFrame and every method acting on a Node would actually loop over all Nodes being tracked. There are some potential complications

  • The Nodes need to be tracked via a dictionary with unique keys (maybe NodeGroup class?). In fact, you'd most likely need subkeys pointing to information about the branch. Something like
allCurrentNodes = {
    "key1": {
        "node": Node(...),
        "CalibratedFatJet_*[*]: "CalibratedFatJet_*[key1_idx[*]]", # pattern for index substitution for cool idea below
        ...
    }
}
  • Snapshots would have to be saved to separate TTrees carefully.
  • It would be reasonable to assume that subsequent splittings could happen and these will also need to be tracked. It's not clear if these should be nested but that would require nesting analyzer objects which would be a more complicated task.
  • One should be able to remove nodes from the active group being tracked

Pros: No duplicating of functions/methods and probably less code overall needing to be added/changed. Subdictionaries would be powerful for string substitution. Everything shows up nicely in one PrintNodeTree()!
Cons: Lots of string parsing and substitution which is always error prone and can be hard to debug when the print out is lengthy.

Cool idea: Store the "list" of dataframes/Nodes as a dictionary/NodeGroup and use the subkeys to denote suffix of ordering indexes. Then do automatic find/replace on action strings with key and value pairs in the subdictionary so that one could do

a.Cut("...","CalibratedFatJet_pt[0] > 400")

and get back

CalibratedFatJet_pt[key1_idx[0]] > 400
...
@lcorcodilos lcorcodilos changed the title JES/R and HEM module pt re-ordering Pt re-ordering for JES/R and HEM modules Apr 2, 2021
@lcorcodilos lcorcodilos changed the title Pt re-ordering for JES/R and HEM modules Pt reordering for JES/R and HEM modules Apr 2, 2021
@lcorcodilos lcorcodilos changed the title Pt reordering for JES/R and HEM modules P_T reordering for JES/R and HEM modules Apr 2, 2021
@lcorcodilos lcorcodilos added the enhancement New feature or request label Apr 13, 2021
@lcorcodilos lcorcodilos added this to the Beta 2.0 milestone Apr 19, 2021
@lcorcodilos lcorcodilos removed this from the Beta 2.0 milestone Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant