Add machine-readable MC results and uncertainties to calculate interpolation error #270

cschwan · 2024-04-05T09:15:53Z

In pinefarm we add metadata to the generated PineAPPL grid. Unfortunately the metadata changed a bit over time and thus isn't uniform and also a bit hard to read automatically. An example of the results metadata for ATLAS_DY_7TEV_CF is:

----------------------------------------------------------------------
   PineAPPL         MC        sigma      central         min      max 
                              1/100   sigma   1/1000   1/1000   1/1000
----------------------------------------------------------------------
 2.378766e+00  2.378884e+00   0.262   0.019   0.0496   0.1263   0.0397
 6.711583e+00  6.711661e+00   0.086   0.014   0.0117   0.0855   0.0486
 1.246710e+01  1.246338e+01   0.052   0.569   0.2981   0.2739   0.3157
 1.930853e+01  1.930489e+01   0.012   1.525   0.1884   0.1806   0.2039
 2.658518e+01  2.658270e+01   0.012   0.813   0.0935   0.0782   0.1117
 3.351150e+01  3.351280e+01   0.007   0.534   0.0387   0.0357   0.0409
 6.590712e+01  6.591164e+01   0.005   1.319   0.0685   0.0718   0.0676
 3.044189e+01  3.044364e+01   0.011   0.534   0.0575   0.0428   0.0705
 7.526862e+00  7.526933e+00   0.030   0.031   0.0094   0.0232   0.0019
 1.890373e-01  1.890007e-01   0.079   0.245   0.1940   0.1845   0.2035
 4.149114e-01  4.148817e-01   0.019   0.378   0.0714   0.0634   0.0824
 6.575542e-01  6.575642e-01   0.010   0.149   0.0152   0.0105   0.0187
 6.353079e-01  6.353490e-01   0.010   0.631   0.0647   0.0660   0.0615
 2.540983e-01  2.541186e-01   0.018   0.433   0.0800   0.0767   0.0850
 4.145008e-02  4.145223e-02   0.061   0.085   0.0517   0.0390   0.0632

This can be improved:

Don't store the column PineAPPL; we can always calculate this number by convoluting the grid with the PDF given in the metadata results_pdf.
The column MC stores the sum of all contributions generated by the Monte Carlo integrator. If possible, we should store the MC predictions of each order, since we may want to test the interpolation error.
Instead of sigma, which shows to the total MC uncertaintiy, we should instead show the Monte Carlo uncertainties of each order; this would enable us to use them in pineappl uncert, which then can calculate PDF, scale-variation and MC uncertainties and combinations of them.
The last three columns show the interpolation error in per mille of the central prediction and the envelopes (min and max) of a 7- or 9-point scale variation. We never document which one it is. We could unify this by treating the log-grids similarly to all the other orders.

The text was updated successfully, but these errors were encountered:

felixhekhorn · 2024-04-15T13:19:31Z

Not sure if this applies to mg5, but the typical workflow of MC generators is to collect histograms on the fly, without a dedicated error propagation (e.g. by just calling pinappl_grid_fill), so a bin-by-bin error I would not expect to be generally available. Generators may provide a total unc to the full integral and one can hope that this error would roughly hold also on a more differential level (to be proven) - so not sure what you could gain here ... (also a order break down might not be provided by default, same for logs) And if you want to play such a game for sure we would use a dedicated field; the results key (to me) was always intended to give a quick summary (so even storing "PineAPPL" is fine)

cschwan · 2024-04-15T17:02:23Z

Not sure if this applies to mg5, but the typical workflow of MC generators is to collect histograms on the fly, without a dedicated error propagation (e.g. by just calling pinappl_grid_fill), so a bin-by-bin error I would not expect to be generally available.

Madgraph5_aMC@NLO and all other MCs should provide MC uncertainties for each histogram and for each bin separately.

Generators may provide a total unc to the full integral and one can hope that this error would roughly hold also on a more differential level (to be proven) - so not sure what you could gain here ...

This would underestimate the uncertainties because for bins they're always larger than for all bins added together. But I suppose the problem that you see is the difference between MC integration and classical integration. In the former you'll get them for free for each bin, for classical integration you can't get them easily I suppose. However, for classical integration they should also be much smaller so they don't matter much I guess. More important would be the interpolation error.

The idea would be able to run

pineappl closure-test <GRID>

which uses the PDF the MC used and convolves the grid with this PDF to quantify the interpolation error. This is something useful, because we noticed that after the evolution of very specific interpolation grids the predictions deviated quite a bit w.r.t. the original MC values (in that case the error is dominated by the evolution or rather some unfortunate choices of specific parameters of the evolution).

(also a order break down might not be provided by default, same for logs)

Yes, that's definitely a point that we have to take into account, the logs themselves we'll never get, but very often MCs give the scale varied results from which we can reconstruct the logs.

And if you want to play such a game for sure we would use a dedicated field; the results key (to me) was always intended to give a quick summary (so even storing "PineAPPL" is fine)

We'd have to use a separate field, I agree.

felixhekhorn · 2024-04-16T08:26:23Z

Madgraph5_aMC@NLO and all other MCs should provide MC uncertainties for each histogram and for each bin separately.

Maybe I'm too naive (and if so please tell me), but let me judge from the three examples I know (and with my naive point of view I consider them representative): the dyaa example in this repo, LeProHQ (my thesis) and MaunaKea (my top++ patch). All three work in the same way: 1) I compute the matrix elements, 2) I parametrize the phase space, 3) I add some histograming, i.e. e.g. this->grid->fill(this->v.x1, this->v.x2, this->mu2, idx_order, 0.5, idx_lumi, grid_weight); and 4) I run a MC algorithm.

Specifically after 3) I don't care about bins any longer, but let PineAPPL do its job. And if I would want to provide a MC error on a specific bin, I would basically need to reimplement PineAPPL as I would need to keep track of bins myself and all the events and their associated weights that go into each of them. I.e. such a features requires some dedicated effort and if you want such a thing it would/could fall into PineAPPL core functionality.

This would underestimate the uncertainties because for bins they're always larger than for all bins added together. But I suppose the problem that you see is the difference between MC integration and classical integration. In the former you'll get them for free for each bin, for classical integration you can't get them easily I suppose. However, for classical integration they should also be much smaller so they don't matter much I guess. More important would be the interpolation error.

I would say if ever, classical is easier.

And just to stress your first point: indeed it would be underestimated, because often you have bins in kinematically unfavoured regions (say large pt) which are significantly scarcer populated (and thus have a larger error). This is true already in the plain MC implemented in the local dyaa example and even more so in the MC I usually use, which is an adaptive integration (not Vegas, but built on top of that - precisely because I need the associated vegas weight to fill the histograms (and that library is the only one I found which supports that)) - and I want an adaptive algorithm to sample more important regions and thus get a more reliant result.

In a classical (=quadrature) integration you would typically do a integration per bin and thus you can use that one as error. Note that this does not hold for yadism, where we do a quad for every (interpolation) point and it is not straight away clear to me how to collapse them into a single number.

The idea would be able to run
pineappl closure-test <GRID>
which uses the PDF the MC used and convolves the grid with this PDF to quantify the interpolation error. This is something useful, because we noticed that after the evolution of very specific interpolation grids the predictions deviated quite a bit w.r.t. the original MC values (in that case the error is dominated by the evolution or rather some unfortunate choices of specific parameters of the evolution).

I can see your point, but I don't think there is a simple solution ... actually, the simplest (but most expensive) solution I can see is to run the MC with different number of points and so guess the MC error. In any case you will need a PDF to compare numbers.

alecandido · 2024-04-16T08:56:35Z

Just a few comments.

Unfortunately the metadata changed a bit over time and thus isn't uniform and also a bit hard to read automatically.

If you want to read them automatically, define a format, and avoid using an arbitrary KVDB.
Additions could be made optional, with default empty. This would make them backwards-compatible, without the need of struct-with-versions.

But I suppose the problem that you see is the difference between MC integration and classical integration. In the former you'll get them for free for each bin, for classical integration you can't get them easily I suppose.

In both cases, there are different strategies for estimating the uncertainty, and they are equally valid (sample variance, approximation error - you estimate with a higher order quadrature rule).

However, for classical integration they should also be much smaller so they don't matter much I guess. More important would be the interpolation error.

It matters, quite as much as in the MC. The problem is with weird behaviors, that of course are more likely in high dimensionality (so, it is correlated to MC vs quad, but indirectly).
Moreover, the integration error in quad is the interpolation error (the strategy is always to analytically integrate some kind of interpolation). Ok, it is not the only interpolation, but of course the various ones are related.

For well-behaved (enough) functions, the estimates in the two cases should be rather accurate. The main problem are systematics in integration (in MC could be suitable coverage of various regions, and in quad the oscillatory behavior of the functions, that is terrible for Mellin inversion, and potentially also in the case of special functions).

Note that this does not hold for yadism, where we do a quad for every (interpolation) point and it is not straight away clear to me how to collapse them into a single number.

Well, in the case of yadism you get an error for the individual point. Thus, you have a function evaluated on some points, and you can not choose. If you want to integrate that function you can essentially use a quadrature integration as well, but it can't be adaptive, since the points are predetermined (so the errors would have a different meaning, but if you think about a 0-th order quad integration it is still a meaningful error - at most the problem is propagating it through higher order methods, together with the additional uncertainty coming from the integration).

alecandido · 2024-04-16T09:05:48Z

The idea would be able to run
pineappl closure-test <GRID>
which uses the PDF the MC used and convolves the grid with this PDF to quantify the interpolation error. This is something useful, because we noticed that after the evolution of very specific interpolation grids the predictions deviated quite a bit w.r.t. the original MC values (in that case the error is dominated by the evolution or rather some unfortunate choices of specific parameters of the evolution).
I can see your point, but I don't think there is a simple solution ... actually, the simplest (but most expensive) solution I can see is to run the MC with different number of points and so guess the MC error. In any case you will need a PDF to compare numbers.

Well, if I understood correctly, that should be simple enough: you have the full observable from the MC, and the observable computed through PineAPPL with the same PDF (storing the label).

However, I don't see the point of pineappl closure-test: in principle, this should never change after generation (the grid is always that, the MC results as well, and you should use exactly the same PDF), so this could be computed right after generation. And it was (and I guess it is), since it is the consistency test run by Pinefarm after dumping the grid.
I don't remember if we're storing the result of this test in the KVDB, and in case under which key (but it is easy to retrieve from pinefarm code).

Comment on evolution (for posterity)

This is not necessarily a reliable test for evolution.
When you create this result, you estimate the observable interpolation error made by PineAPPL (not integration systematics, present in both). But you're just using the PDF at your scale.
Evolution instead is bringing it from a different scale. At that point, you would start comparing with something you haven't seen during the MC run, i.e. the evolution of the PDF you used from the boundary condition. That error was not estimated from the CT on the grid, thus it would be completely decoupled (but that's also a good thing, you could exclude PineAPPL's approximation on its own).

cschwan · 2024-07-09T11:59:42Z

Yet another source of uncertainty is the extrapolation uncertainty from Monte Carlo integrators using slicing methods.

felixhekhorn · 2024-07-12T11:52:08Z

Yet another source of uncertainty is the extrapolation uncertainty from Monte Carlo integrators using slicing methods.

just to say that this is an orthogonal error to what we were discussing mainly before here 🙃 so I'd say:

"Add machine-readable MC results": I completely agree and e.g. in tupi-dijet I did just that; however, this is the responsibility of interface implementer and I'm not sure PineAPPL can do much more then ask people to do so ...
"uncertainties to calculate interpolation error": as I was trying to argue: difficult to my point of view
other (numerical) errors: since the numerical algorithms are unpredictable again I'm not sure there is much PineAPPL can do here - again I'd say it is up to the interface implementers to provide the pineappl closure-test; for tupi-dijet I did it here

cschwan self-assigned this Apr 5, 2024

cschwan changed the title ~~[TODO] Add machine-readable MC results and uncertainties to calculate interpolation error~~ Add machine-readable MC results and uncertainties to calculate interpolation error Apr 5, 2024

cschwan added this to the v1.0 milestone Jul 1, 2024

cschwan mentioned this issue Jul 4, 2024

Release version v1 of the file format #118

Open

15 tasks

cschwan mentioned this issue Nov 4, 2024

Implement v1 file format [WIP] #299

Draft

34 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add machine-readable MC results and uncertainties to calculate interpolation error #270

Add machine-readable MC results and uncertainties to calculate interpolation error #270

cschwan commented Apr 5, 2024 •

edited

Loading

felixhekhorn commented Apr 15, 2024

cschwan commented Apr 15, 2024

felixhekhorn commented Apr 16, 2024

alecandido commented Apr 16, 2024

alecandido commented Apr 16, 2024 •

edited

Loading

cschwan commented Jul 9, 2024

felixhekhorn commented Jul 12, 2024

Add machine-readable MC results and uncertainties to calculate interpolation error #270

Add machine-readable MC results and uncertainties to calculate interpolation error #270

Comments

cschwan commented Apr 5, 2024 • edited Loading

felixhekhorn commented Apr 15, 2024

cschwan commented Apr 15, 2024

felixhekhorn commented Apr 16, 2024

alecandido commented Apr 16, 2024

alecandido commented Apr 16, 2024 • edited Loading

Comment on evolution (for posterity)

cschwan commented Jul 9, 2024

felixhekhorn commented Jul 12, 2024

cschwan commented Apr 5, 2024 •

edited

Loading

alecandido commented Apr 16, 2024 •

edited

Loading