15 Jun 21:53

c6bfc94

Beta version 2.0 Latest

Latest

Change log

This is perhaps a bigger update than I was intending but I think justifies the release of Beta 2.0. It incorporates PRs #54, #58, #59, #61, #62, #64, and #68. Changes are listed in reverse chronological order in each section.

General

Add option to "Take" with SubCollection() method.
Remove SubCollection() being made during CalibrateVars().
Various robustness and consistency changes.
Reduce jdl_template default CPU and memory requests for condor
Remove TIMBER collection structs from Prefire_weight for speed
Drop createAllCollections option from analyzer
Add option to save Runs branch in analyzer.Snapshot() (default is to save it)
In PrintNodeTree(), drop subcollection definitions by default.
Deduce extension for image saved by CompareShapes
Add item_meta to Group class for lazy template histograms to be possible
Change Tpt weight module to drop alpha variation since it's only a normalization effect. Switch the beta class method to eval and drop corr (eval now does the nominal and variations).
Change __ prefix on private variables to _ for consistency.
Create CollectionOrganizer and implement it. Does not create any user-facing changes but provides infrastructure for future features.
Add hardware::Open option (default to "READ") with inTIMBER option for internal and external paths.
Add hardware::LoadHisto to load histogram into memory with inTIMBER option for internal and external paths.
Make Correction/ModuleWorker constructor arguments more logical - pass correctly typed variable instead of a string of that variable.
Add MakeWeightCols() correlation option to correlate uncertainties that are tied together but had to be calculated separately (ex. two top jet tag SFs being applied).
Remove repeated clang parsing when cloning ModuleWorker/Correction
Change lhaid type from str to int

New Features

Nodes now have unique hashes which keep them unique in the analyzer so that Nodes of the same name can be tracked. This is useful in the case where the processing has forked and you'd like to keep node naming consistent across processing branches.
HistGroup.Add() has made the name argument optional and, if not specified, it will instead derive it from the hist (via TObject.GetName()). However, this will initiate the RDataFrame loop!
Change genEventCount calculation to genEventSumw (for simulation).
Argument extraNominal added to MakeWeightCols() which will scale all weight values (ex. xsec*lumi/genEventSumw).

New Additions

MemoryFile class to store a string in memory to mimic a file one would open().
DictToMarkdownTable() method to convert python dictionary to markdown table (uses MemoryFile).
TIMBER/Tools/AutoPU.py added to automate (in pieces or as a whole) the processes of making a pileup weight and applying it.
Common.GenerateHash() added for Node hashes.
analyzer.GetColumnNames() returns a list of all column names that currently exist.
hardware::MultiHadamardProduct for non-nested vectors
Update GenMatching tools to be better optimized and to take advantage of new AoS feature

CMS Algos

Update luminosity golden jsons.
Add DeepAK8 CSV reader and top tagging SF module (note that there have been crashes in some instances for this module that are currently being studied).
HEM and Prefire correction modules added.
Add JME data tarball information in readme
Add W and top tagging scale factor modules (only tau21 and tau32+subjet btag supported, respectively)

Pileup

Add WeightCalculatorFromHistogram (from NanoAOD-tools)
Add C++ pileup module with "auto" mode to grab npvx distribution from memory
Add pileup data files and information on where they are from (+script to get them)

Bug fixes

Do not try to get Runs branch if it doesn't exist.
Fix bug when making new collections using CollectionOrg.AddBranch().
Cleanup plotting in Plot.py to be more consistent and documentation-ready.
setup.sh had back-ticks that caused unintended executions and return is more suitable than exit.
Return index from Pythonic::InList rather than a bool
If ModuleWorker looks in a TIMBER .cc for a function (eval typically) and can't find it, look for it in the equivalent .h (since that's where templates live)

Assets 2

19 Mar 02:47

lcorcodilos

v0.1.4

c2dcde5

Beta version 1.4

Change log

The main addition is the new code to handle JME calibrations and uncertainties. This will be covered at the end since it is the most lengthy. First, some more general changes (some of which are from the JME related work).

Collections as arrays of structs (AoS)

First, a "collection" is used here to describe a group of physics objects stored in NanoAOD format. For example, "Electron" is a collection and has attributes "pt", "eta", etc all of which are stored in branches of the NanoAOD Events tree named as Electron_pt, Electron_eta, etc.

If the user would like to access a collection, they can simply use <CollectionName>s which is built dynamically in the background as an AoS. Keeping the electron example, this means that Electrons[0] will return the leading electron in the event with attributes pt, eta, etc that can be accessed as Electrons[0].pt, Electrons[0].eta, etc.

This turns the physics objects into OOP objects which has various niceties for the sake of more complicated algorithms that need to be written in C++ but require an extensive set of arguments. For example, a generator particle matching algorithm would need several attributes from GenPart and from the reco object (say, an AK8 jet) which would make the method definition long and the use of it prone to error (ex. you need to keep all of the arguments in the right order, etc). With the TIMBER collections, you'd only need two arguments - GenParts and FatJets[i], where i is the index of the jet you want to use in the matching.

The C++ is dynamically written here.

There are several important notes to make about this feature :

The object (ex. FatJets) doesn't exist until TIMBER detects it in a Cut or Define AND the underlying struct (FatJetStruct) is also not defined yet. So if you'd like to write a C++ method to take one of these collections as input, you'll need to use a C++ template. Please have a look here for an example of this.
This feature has penalties! Because there is more being compiled and built, memory usage increases (though not beyond values that are reasonable). It also increases processing time substantially. The difference between calculating the mean of FatJet_pt[0] and FatJets[0].pt is nearly a factor of 6 (for just this action). Thus, the collections should be reserved for using as input to C++ modules that benefit from having fewer positional arguments. There is a more efficient way to handle the construction of the collections (see issue #36) but this is where I leave it for now.

General

Created libtimber which is the shared library of all modules in TIMBER/Framework. This is compiled by the new Makefile during setup.sh and is loaded (if not already) by CompileCpp (ie. it doesn't require using the analyzer() class to use!) The JME modules do not work in standalone so these are NOT included in the library if outside of CMSSW.
- Closes Issue #37.
The analyzer silence attribute can be set to silence the print out from Define and Cut calls.
Added dedicated method ReorderCollection(). This is meant to be used when JECs affect the pt of jets and a re-ordering is needed. Note that this is not done by AutoJME.
Added ModuleWorker class to handle all of the functionality shared by Correction and the new Calibration (both of which now inherit from ModuleWorker).
- Closes Issue #21.
To common.h, added TempDir (handles temporary directory storage) and ReadTarFile (opens and streams tarball contents - took a day to get working!) This was necessary because the JECs come in tarballs and untarring 800+ files and holding them in TIMBER is undesirable.
- This required adding libarchive as a dependency. Added to setup.sh the recipe (if it doesn't exist) to download and build it inside of TIMBER/bin
Added Node.GetBaseNode() so that the top most parent can be accessed from a Node (ie. outside the analyzer).
Kept RunChain as attribute of analyzer so that contents can be easily accessed.
Moved TIMBER/Framework/ExternalTools/to TIMBER/Framework/ext/
Organized TIMBER/Framework/src and TIMBER/Framework/include so that declarations and implementations are split (roughly - there are still some outstanding where they make sense).
Added to hardware "Hadamard product" algorithms.
Add tcsh setup script
Add GetWeightName to get column name for certain weight
Add SaveRunChain to save out the Run TTree (with option to merge it with an existing file like a snapshot of Events)

JME module work

Modified from PR #38

Write JetRecalibrator class which handles the interfacing to the CMSSW based tools.
Write JMEpaths class which handles the interfacing to the JME txt files
Write JES_weight class which is the user-facing module to access the corrections (including recalibrations) and uncertainties.
Write JMS_weight class which just accesses a hard-coded table of values.
(Re)Write JetSmearer class which has the algorithms to evaluate the weights to smear jet energy (pt) and jet mass.
Write JER_weight class which uses JetSmearer to calculate the weights per-jet to smear the pt distribution.
Write JMR_weight class which uses JetSmearer to calculate the weights per-jet to smear the mass distribution.

NOTE 1: The four J*_weight classes all have eval() functions which return a vector with length of the number of jets. Each entry in this vector is another vector, {nominal, up, down} where "up" and "down" are absolute, not relative, weights to apply to the pt and/or mass.
NOTE 2: The JMS_weight is included for completeness but it's terribly inefficient because no calculation is done - it just creates a vector of length nFatJet which stores the same values over and over again. There's almost certainly a better way to do this but that can be put on the to-do. For now, the uniformity between modules is important.

Added CalibrateVar() method which will actually do the multiplication of a variable by the calibration weight (looping through uncertainty variations as well).
Added Calibration class which doesn't do much different from ModuleWorker at the moment (it's just not Correction).
Added JME related data files to TIMBER/data/JES and TIMBER/data/JER

Validation

Validation of the new modules was done using the new TIMBER bench. See PR #38 for validation details.

Assets 2

06 Jan 22:53

lcorcodilos

v0.1.3

ac151e7

Beta version 1.3

Change log

Collection of changes made from mid-November through December 2020. Highlights are the weight column calculation fix and improved C++ argument matching.

Setup/install

None

Analyzer

Add ObjectFromCollection() method that creates a subcollection but for just one object in the originating collection.
Fix the correction/weight collection so that only parent nodes are considered in the weight calculation for a node tree. In other words, if the node/processing tree has split, weights calculated in branch A should not affect those in branch B but they should share any weights calculated before the branches diverged.
--- If one has separate branches, each one needs to have the MakeWeightCols() method called. Default called on ActiveNode but can take other nodes as input. With this in mind, the method also now takes a name to name a group of weights so that duplicate nodes are not created on the separate branches.
--- Changes for this happened in __checkCorrections() (traversing up the tree), Node class (add back parent attribute), MakeWeightCols() (the naming)
Improved C++ argument matching when building a correction.
--- Will now check against active node columns and not just the base node.
--- Correction.MakeCall() changed to take a dict as input instead of a list. Keys are the C++ method argument names (as written in the C++ file) and values are the names of the RDataFrame columns that you'd like to use as function arguments. If there are arguments in the C++ method that are not in the dict, TIMBER will automatically try to determine if it matches a column name and will use that when building the call to the C++ method.
Added Range() method to analyzer and Node classes to select a subset of data to analyzer. Docs include warning to not use this with ROOT.EnableImplicitMT()

Tools

Add s/sqrt(b) plotting to Tools/Plot.py
Add function GetStandardFlags() to return list of standard MET filter flags. Used as default flagList for GetFlagString().
Change cut and not option defaults in TrigTester.py.
Consolidated Cutflow* functions and added "initial" count to be included when producing the cutflow.

Modules

Staging JetSmearer.h, JetRecalib.h, fatJetUncertainties.cc, and JetMETinfo.h (includes commented-out bit in common.h)
Change Trigger_weight.cc default plateau to -1

Pythonic.h

Create Pythonic namespace.
Add header guards.
Add IsDir() and Execute() functions.
Updated naming so all functions are capitalized
Add EffLoader.cc module #24

Data

None

Testing

Fix tests so AddCorrections() changes work.
Fix test_Common.py so it works. Add in actual tests for Cutflow* functions.
Add test for Range()

Issues that were addressed

Fix file reading from afs
Return ActiveNode with SubCollection method.
When providing a dict to Node.SetChildren(), the code was checking if the keys of the dict were of type Node. Fixed to check the dict values.
In C++ modules, switch int to size_t in for loops.
Fix "Library compiling doesn't play nice with periods in TIMBERPATH"
Fix TIMBER/data/README.md
Implement corr Correction type deduction
Fix CompareShapes() so that it works with empty bkgs, signals, and colors correctly.
Allow for default arguments when doing C++ clang parsing and automatic calls to correction methods.

Assets 2

14 Nov 01:50

lcorcodilos

v0.1.2

4a94ff9

Beta version 1.2

Change log

Combination of #17, #18, and #19.

Setup/install

Added boost dependency information to main README.md (needed for LumiFilter.h)

Modules

Added GenMatching.h which can be used to reconstruct the entire generator particle decay tree from the mother indexes stored in the NanoAOD. This is useful for traversing the entire decay chain with relative ease. Example added in How to use GenMatching.h.
Added LumiFilter.h which can be used in conjunction with the newly added golden JSONs to filter data based on the JSONs.
Added HistLoader.h which can be used to load in a histogram once before processing, with access to the histogram via the class methods while looping over the RDataFrame entries. The eval module returns based on the input axis value and eval_bybin returns based on the provided bin number.
Added TopPt_weight.cc which calculates the top pT correction from the TOP group based on the data/POWHEG+Pythia8 fit. The nominal correction is calculated with the corr() method and variations of the constants in the exponential form can be calculated using the alpha() and beta() methods.
Added Trigger_weight.cc which is uses HistLoader.h to load a trigger efficiency histogram. The eval() method returns the efficiency for that event (based on the input variable, of course) and calculates the uncertainty as one-half the trigger inefficiency.
Rename "analyzer" namespace to "hardware" in common.h. Done for clarity in the documentation to avoid confusion with the Analyzer python namespace (aka Analyzer.py).
Change hardware::invariantMass() argument to be a vector of Lorentz vectors. Invariant mass of all provided vectors is calculated.
Moved Framework/src/Collection.cc to Framework/include/Collection.h

Testing

Added a draft of test_modules.py which features an example for TopPt_weight.cc but it is currently commented out because the test file does not have the GenPart collection or the FatJet collection (and is also not a ttbar set)
Added make_test_file.py to make a small testing histogram.
Added small testing histogram generated by make_test_file.py.

Data

Added golden JSONs for 2017 and 2018 and added info to the README ledger. It seems 2016 does not have a golden JSON anymore (?)

Analyzer

Add corr type for Correction() class. It represents a corrections with no uncertainty. The clang parsing CANNOT currently derive it automatically from the C++ script but it can be assigned as the corrtype via the argument to Correction() constructor.
Optimized MakeTemplateHistos() to book histogram actions before looping. They previously looped over the dataframe one after the other. This provides a significant speed up.

Small bug fixes

More robust python version checking for ASCII encoding in OpenJSON().
Fix PrintNodeTree() for cross-system compatibility. The networkx package is used to create the graph which can be drawn with a number of tools. TIMBER was using pygraphviz which, to be installed, needs the development library of graphviz. While it's easy to get this on Ubuntu or macOS, it is not available on either LPC or LXPLUS servers and we can't install it without a bit of a headache. Thus, pydot is now used with networkx instead since it does not have the same build dependencies. However, the version of graphviz on the system (aka dot) cannot always write out to modern image formats like PNG. The solution is for TIMBER to attempt to save the requested format and if it isn't possible, save out the .dot file for later conversion. Instructions on how to convert the .dot to something else locally were added to the FAQ section of the docs.

Assets 2

26 Oct 14:38

lcorcodilos

v0.1.1

620df99

Beta version 1.1

NOTE: These are copied excerpts from #16

Benchmarks

Benchmarks 1-9 added in benchmarks/ex*.py. Some internal comments included about what was done. The CMS Open Data sample included in the examples/ folder does not have electrons so it was not used for benchmarks 7 or 8 (these need the tester to use their own private file which this repo does not provide).
Filled out more of the general testing with pytest.

New to `analyzer`

Close(): Implemented to safely delete an analyzer instance.
__str__: Implemented to provide an informational printout when print(<analyzer>) is called.
Can specify Node type as Cut and Define argument if you have a specific type you'd like to track.
SubCollection(): Creates a named sub-collection based on some discriminant where the sub-collection has all of the same branches as the parent but only includes vector entries that passed the discriminant.
NOTE: myColl_var1 is an RVec and so myColl_var1 > 5 returns a vector the same size as myColl_var1 but filled with bools for each entry. These bools determine which entries of the RVecs of the sub-collection branches are made.

a = analyzer(...)
# Say there is a collection "myColl" with branches "myColl_var1", "myColl_var2", "myColl_var3"
a.SubCollection("mySubColl","myColl","myColl_var1 > 5")
# Now there is a new collection "mySubColl" with branches "mySubColl_var1", "mySubColl_var2", "mySubColl_var3"
# which only have values where myColl_var1 > 5

MergeCollections(): Creates a new collection which is a merge of all provided collections. New collection has variables that are common between collections being merged.
CommonVars(): Finds the common variables between a set of collections (provided as a list of names).
PrintNodeTree(): Added optional argument toSkip=[] which skips plotting any nodes of types specified by toSkip. Note that the function checks for the type in toSkip as a substring of the type of the Node. So if you provide toSkip=["Define"] all nodes of type "MergeDefine" and "SubCollDefine" will also be dropped.
- Also switched to using networkx (which uses pygraphviz).
MakeHistsWithBinning(): Batch creates histograms at the current ActiveNode based on the input histDict which is formatted as {[<column name>]: <binning tuple>}. The dimensions of the returned histograms are determined from the size of [<column name>].
- [<column name>] is a list of column names that you'd like to plot against each other in [x,y,z] order
- binning_tuple is the set of arguments that would normally be passed to TH1.

New to `Node`

Add "types" to Nodes to denote what was done to produce the Node. Currently used for controlling nodes present in PrintNodeTree() output. Current possible types are "Define", "Cut", "MergeDefine", "SubCollDefine", "Correction".
Close(): Implemented to safely delete a Node instance.
__str__: Implemented to provide an informational printout when print(<node>) is called.

New to `HistGroup`

Merge(): Adds together all of the histograms in the group and returns the output histogram.

New to C++ Code

`common.h`

transverseMass() to get transverse mass of MET + one object. Could be more generalized.
2nd constructor for TLvector() that takes RVecs as arguments rather than floats (returns back an RVec of PtEtaPhiMVectors)

Small bug fixes

Fix Common.py TIMBER imports
Make fileName attribute public (used for new __str__ method for printing analyzer object)
Add BaseNode to AllNodes for tracking
Force BaseNode to zero children on initialization to avoid memory issues
Fix Group addition

Assets 2

12 Oct 20:08

lcorcodilos

v0.1.0

e5893d6

Beta version 1.0

With the most recent changes, I believe we've exited any sort of alpha and so the project will now start doing tags and releases to track development and allow users to check if they have the latest version of TIMBER (or grab a development version).

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change log

General

New Features

New Additions

CMS Algos

Pileup

Bug fixes

Change log

Collections as arrays of structs (AoS)

General

JME module work

Validation

Change log

Setup/install

Analyzer

Tools

Modules

Pythonic.h

Data

Testing

More documentation

Issues that were addressed

Change log

Setup/install

Modules

Testing

Data

Analyzer

More documentation

Small bug fixes

Benchmarks

New to `analyzer`

New to `Node`

New to `HistGroup`

New to C++ Code

`common.h`

Small bug fixes

Releases: lcorcodilos/TIMBER

Beta version 2.0

Change log

General

New Features

New Additions

CMS Algos

Pileup

Bug fixes

Beta version 1.4

Change log

Collections as arrays of structs (AoS)

General

JME module work

Validation

Beta version 1.3

Change log

Setup/install

Analyzer

Tools

Modules

Pythonic.h

Data

Testing

More documentation

Issues that were addressed

Beta version 1.2

Change log

Setup/install

Modules

Testing

Data

Analyzer

More documentation

Small bug fixes

Beta version 1.1

Benchmarks

New to analyzer

New to Node

New to HistGroup

New to C++ Code

common.h

Small bug fixes

Beta version 1.0

New to `analyzer`

New to `Node`

New to `HistGroup`

`common.h`