Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing a histogram with sumw2 information to a file in the getting started page #853

Open
alexander-held opened this issue Mar 9, 2023 · 3 comments
Labels
docs Improvements or additions to documentation

Comments

@alexander-held
Copy link
Member

Based on an interaction I just had with a user: the getting started: writing objects to a file page does include various examples of how to write histograms to a root file. It even includes weighted ones with hist, but only shows how to handle a histogram that is being filled on the spot. It is not obvious from there how a user would take an existing piece of histogram information (in the form of counts per bin + uncertainty) and turn that into a TH1 in a file.

The hist documentation quick start also does not cover that. The pattern

hist[...] = np.stack([yields, stdev**2], axis=-1)

is mentioned in the boost-histogram docs (via scikit-hep/boost-histogram#421) but a user is unlikely to get this far I imagine.

Perhaps it makes sense to mention this pattern somewhere more prominently. I am not sure where the best spot would be, but wanted to raise this to see if others had ideas (and if you agree that this would be useful to do).

@alexander-held alexander-held added the docs Improvements or additions to documentation label Mar 9, 2023
@agoose77
Copy link
Collaborator

agoose77 commented Mar 9, 2023

I think we've heard before that our documentation on how to write histograms is not as clear / comprehensive as it could be. On uproot's end, I think we need to be clearer that for histograms with weights, third-party libraries are needed, and perhaps be slightly more explicit about the histogramdd form.

The other part of this improvement is probably to explain how to add weights in hist. I'll open another issue there based upon this one?

@jpivarski
Copy link
Member

Maybe this Uproot Getting Started Guide could be expanded by a few sentences, with more pointers to the hist documentation. (NumPy histograms can't have uncertainties, so if you're arriving with pre-filled bin values and uncertainties, then you'll need boost-histogram or hist.)

However, an introduction to the task of histogramming with ROOT-file serialization would cross package boundaries significantly. If the Uproot Getting Started Guide had a lot of details about this, it would

  • get too long: users interested in Uproot topics other than histograms would have more difficulty finding what they need
  • obscure which parts are Uproot and which parts are boost-histogram/hist or other packages
  • easily get out of date, since hist authors will think of updating the hist documentation when they change an interface, but might not even realize there are Uproot docs talking about it.

I can think of technical solutions to the first (break down "Getting Started" into many pages) and third (set up automated testing of the documentation, with good alerting to the authors), but I think a better solution is to rely more on centralized tutorials that cut across package boundaries. That is, instead of an $n^2$ maintenance problem of $n$ package authors documenting the overlaps of all $n$ packages, students interested in a cross-cutting task like histogramming should be directed to a tutorial on histogramming, which uses any packages that are helpful for that task.

At our last group meeting, @klieret described efforts on developing a centralized portal for these tutorials, which used to be developed in disparate places but is now being pulled together into one place. I wonder: how is that coming?

The easiest documentation for package authors to write is the one thing that has to be on the package-specific site: API references. These task-oriented tutorials are much harder when you're in package-author mode, and most of the ones I've seen are pretty thin. (Uproot's has only one Getting Started Guide and Awkward's has a lot of blank pages.) Rather than swim against that tide of making the package authors do it1, let's try to point users toward the centralized tutorial-portal.

Footnotes

  1. This isn't passing off on responsibility. The same people can write good tutorials when they take off their package-author hats and know that what they're writing isn't going to go on their package's documentation site. I wrote a Scikit-HEP tutorial for the HSF/Carpentries because the existing tutorial focused too much on Uproot.

@klieret
Copy link
Contributor

klieret commented Mar 9, 2023

Hi @jpivarski: I submitted a GSoC project for that. Honestly completely overwhelmed by the number of applicants, but there are some great ones, so I think this will work out great :)

The main idea so far was to develop something to replace the current HSF training enter with something that looks more like learn.astropy (but is simply configured with a yaml file, like the current one). This would give us a lot of space and flexibility to list everything we have in the community.

However, just the other day I was thinking about whether we should also have something very similar to learn.astropy (i.e., a training center generated from a collection of stand-alone notebooks that are each listed individually) just for various scikit-hep howto-guides (!= tutorial) that cover more ground.
Based on the fact that the current best applicants were basically coding a rough version of my first idea within 48h from scratch, I think that both might be in the scope of the project (though I designated it as 'short' = 175h).

Alternatively, we could simply add a 'scikit-hep howtos' tab (or another button) to the training center (similarly configured with yaml) and list all the howto guides there. I think this might work just as well (and is considerably less work for sure).

Let me know what you think! (though this might be a discussion that touches more projects and might be better at another place)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

4 participants