Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of correction-lib within CorrectedJets/METFactory #1134

Open
wants to merge 6 commits into
base: backports-v0.7.x
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 88 additions & 15 deletions coffea/jetmet_tools/CorrectedJetsFactory.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import warnings
from functools import partial
import operator

import correctionlib as clib

_stack_parts = ["jec", "junc", "jer", "jersf"]
_MIN_JET_ENERGY = numpy.array(1e-2, dtype=numpy.float32)
Expand Down Expand Up @@ -98,8 +98,21 @@ def getfunction(layout, depth):
return smearfact


def get_corr_inputs(jets, corr_obj, name_map):
"""
Helper function for getting values of input variables
given a dictionary and a correction object.
Source:
https://gitlab.cern.ch/cms-nanoAOD/jsonpog-integration/-/blob/master/examples/jercExample.py?ref_type=heads
"""
input_values = [
awkward.flatten(jets[name_map[inp.name]]) for inp in corr_obj.inputs
]
return input_values


class CorrectedJetsFactory(object):
def __init__(self, name_map, jec_stack):
def __init__(self, name_map, jec_stack, jec_names, jec_year):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not alter the signature of this function (especially with new non-defaulted arguments). We want to maintain backwards compatibility for the time being.

Could you try implementing this such that the code detects correction-lib inputs coming in from the JEC stack, and then follow a different code path in the case of correctionlib inputs?

Copy link
Author

@anpicci anpicci Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgray thank you for pointing out this. Would it work if jec_stack is a list or a dict if one wants to use correction-lib? IMU, the JECStack class (as well as the auxiliary ones) is not needed when using correction-lib, as the latter already has functionalities to get the right numbers from the formulas that are part of the JME corrections/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JEC stack also defines the mapping from names in your data to the arguments of the functions to evaluate. Sure, you can have the jec_stack be something much simpler (like a list) if correctionlib does a bunch of the bookkeeping for you.

Though I think you'll need to do some arranging yourself for the JES uncertainties.

# from PhysicsTools/PatUtils/interface/SmearedJetProducerT.h#L283
self.forceStochastic = False

Expand All @@ -118,6 +131,9 @@ def __init__(self, name_map, jec_stack):
)
name_map["ptRaw"] = name_map["JetMass"] + "_raw"

# In principle, we can get rid of all this stuff when adopting correction_lib, since this is handled
# with the get_corr_input function
"""
total_signature = set()
for part in _stack_parts:
attr = getattr(jec_stack, part)
Expand All @@ -131,7 +147,7 @@ def __init__(self, name_map, jec_stack):
+ " Cannot evaluate jet corrections!"
+ " Please supply mappings for these variables!"
)

"""
if "ptGenJet" not in name_map:
warnings.warn(
'Input JaggedCandidateArray must have "ptGenJet" in order to apply hybrid JER smearing method. Stochastic smearing will be applied.'
Expand All @@ -140,7 +156,11 @@ def __init__(self, name_map, jec_stack):

self.real_sig = [v for k, v in name_map.items()]
self.name_map = name_map
self.jec_stack = jec_stack
self.jec_stack = jec_stack ## to be removed after the porting is completed
self.jec_names = jec_names
self.jec_year = jec_year
## the last element of jec_names has to be a boolean, specifying if the use wants to save the different correction levels
self.separated = self.jec_names.pop()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a useful debug feature, but from the JME point of view, providing just the name of the JEC tag for the required step should suffice. I don't foresee a need for analyses to store the result of each step, but I'm open to hearing about any exceptions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garvitaa I would suggest keeping it to provide users more flexibility, without the need for them to put the hands on the coffee scripts


def uncertainties(self):
out = ["JER"] if self.jec_stack.jer is not None else []
Expand Down Expand Up @@ -171,6 +191,29 @@ def build(self, jets, lazy_cache):
out_dict = dict(in_dict)

# take care of nominal JEC (no JER if available)
## General setup to use correction-lib
clib_json = self.jec_year
json_path = f"/cvmfs/cms.cern.ch/rsync/cms-nanoAOD/jsonpog-integration/POG/JME/{clib_json}/jet_jerc.json.gz"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not put hard dependencies on cvmfs paths in coffea.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be configurable.

I'd also be careful of building new objects every time you call uncertainties, this can become very slow.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the json file name would change depending on if it's AK8 or AK4. So, I would also recommend against hard-coding this and instead keeping this in the jec_stack.

cset = clib.CorrectionSet.from_file(json_path)

if self.separated:
corrections_list = []

total_correction = None
# total correction is calculated as product of the levels in jec_names.
# The user can either pass the list of levels, or the compound correction directly
# in the first case, user can also choose to save the separate correction levels
for key in self.jec_names:
sf = cset[key]
inputs = get_corr_inputs(jets=jets, corr_obj=sf, name_map=self.name_map)
correction = sf.evaluate(*inputs).astype(dtype=numpy.float32)
if total_correction is None:
total_correction = numpy.ones_like(correction, dtype=numpy.float32)
total_correction *= correction
if self.separated:
corrections_list.append([correction, key])
total_correction = awkward.Array(total_correction)

out_dict[self.name_map["JetPt"] + "_orig"] = out_dict[self.name_map["JetPt"]]
out_dict[self.name_map["JetMass"] + "_orig"] = out_dict[
self.name_map["JetMass"]
Expand All @@ -182,17 +225,8 @@ def build(self, jets, lazy_cache):
jec_name_map = dict(self.name_map)
jec_name_map["JetPt"] = jec_name_map["ptRaw"]
jec_name_map["JetMass"] = jec_name_map["massRaw"]
if self.jec_stack.jec is not None:
jec_args = {
k: out_dict[jec_name_map[k]] for k in self.jec_stack.jec.signature
}
out_dict["jet_energy_correction"] = self.jec_stack.jec.getCorrection(
**jec_args, form=scalar_form, lazy_cache=lazy_cache
)
else:
out_dict["jet_energy_correction"] = awkward.without_parameters(
awkward.ones_like(out_dict[self.name_map["JetPt"]])
)

out_dict["jet_energy_correction"] = total_correction

# finally the lazy binding to the JEC
init_pt = partial(
Expand All @@ -219,6 +253,45 @@ def build(self, jets, lazy_cache):
out_dict[self.name_map["JetPt"] + "_jec"] = out_dict[self.name_map["JetPt"]]
out_dict[self.name_map["JetMass"] + "_jec"] = out_dict[self.name_map["JetMass"]]

# if the user wants to save separately the level corrections, here that is done
if self.separated:
for correction_lvl, lvl_tag in corrections_list:
jec_lvl_tag = "_jec_" + lvl_tag

out_dict[f"jet_energy_correction_{lvl_tag}"] = total_correction
init_pt_lvl = partial(
awkward.virtual,
operator.mul,
args=(
out_dict[f"jet_energy_correction_{lvl_tag}"],
out_dict[self.name_map["ptRaw"]],
),
cache=lazy_cache,
)
init_mass_lvl = partial(
awkward.virtual,
operator.mul,
args=(
out_dict[f"jet_energy_correction_{lvl_tag}"],
out_dict[self.name_map["massRaw"]],
),
cache=lazy_cache,
)

out_dict[self.name_map["JetPt"] + f"_{lvl_tag}"] = init_pt_lvl(
length=len(out), form=scalar_form
)
out_dict[self.name_map["JetMass"] + f"_{lvl_tag}"] = init_mass_lvl(
length=len(out), form=scalar_form
)

out_dict[self.name_map["JetPt"] + jec_lvl_tag] = out_dict[
self.name_map["JetPt"] + f"_{lvl_tag}"
]
out_dict[self.name_map["JetMass"] + jec_lvl_tag] = out_dict[
self.name_map["JetMass"] + f"_{lvl_tag}"
]

# in jer we need to have a stash for the intermediate JEC products
has_jer = False
if self.jec_stack.jer is not None and self.jec_stack.jersf is not None:
Expand Down