Introduction or How to on creating form mapping #1227
Replies: 2 comments 1 reply
-
@tomeichlersmith I'm currently on a work-away trip, so I don't have much time to answer this right now. Our API for this might change soon, as coffea is the only consumer (that I know of) and we have some necessary upgrades to the upstream dask-awkward for optimisation. I'll ping @lgray who is intimately familiar with Coffea's use of remapping to hopefully provide some pointers. |
Beta Was this translation helpful? Give feedback.
-
@tomeichlersmith please refer to coffea's nanoevents sub-package which is a fairly mature (and as far as I know only) implementation of non-trivial form remapping. It is able to achieve all the goals you mention while using dask-awkward. We're considering turning it into its own package in time but it's not quite broad enough to justify it right now. There is also very little reason why we couldn't include LDMX data as a schema within coffea. There's already ATLAS, multiple CMS tree flavors, and a Dune prototype. It functions by essentially zipping forms (not data!) together based on patterns in the input data, while keeping the references to the original, less structured, data around as form keys. It also includes an embedded domain-specific language for dynamically creating cross references and such (MC matches, cross cleaning, etc). This means that low-level data is only touched when you access the associated columns in the high-level form that's presented to the user. Looking at the BaseSchema and NanoAODSchema are probably the most instructive examples. I imagine there are a number of other things in coffea, correctionlib, and related packages that would also serve your analysis needs well. Happy to chat about this further. |
Beta Was this translation helpful? Give feedback.
-
Howdy uproot/awkward experts! This is pretty intricately tied with
awkward
, but since the feature I want to utilize is theform_mapping
argument ofuproot.dask
, I am opening this discussion here.Basically, I would like to imbue a structure onto my data using awkward forms. In the past, I've been using piles of
ak.zip
to construct the form and re-name branches. This has worked well, but does not naturally propagate intouproot.dask
. I guess my direct question is how does one develop a form mapping? I have tried writing a dummy form-mappinglambda f: f
but this does not work and probably points to my own ignorance of the code. Are there example form mappings available? Do I need to inherit from a specific class to inherit the default behavior off which I can build?Goals
In order of priority.
vector
behaviors (I was doing this withwith_name
ofak.zip
before) and potentially other customak
mixins.Alternatives to
form_mapping
uproot.iterate
withmultiprocessing
over many files.step_size
inuproot.iterate
. It also tends to load unnecessary branches since I don't automatically trim the list of necessary branches to load.ak.zip
afteruproot.dask
ak.zip
would then trigger them to be "necessary" and thus loaded from the perspective of dask.Example
I have a filepath to a large ROOT file stored in
data_7800
. I can then show the forms I would like to map between:The two JSON files are attached.
dask.json
desired.json
Beta Was this translation helpful? Give feedback.
All reactions