-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add machine-readable MC results and uncertainties to calculate interpolation error #270
Comments
Not sure if this applies to mg5, but the typical workflow of MC generators is to collect histograms on the fly, without a dedicated error propagation (e.g. by just calling |
Madgraph5_aMC@NLO and all other MCs should provide MC uncertainties for each histogram and for each bin separately.
This would underestimate the uncertainties because for bins they're always larger than for all bins added together. But I suppose the problem that you see is the difference between MC integration and classical integration. In the former you'll get them for free for each bin, for classical integration you can't get them easily I suppose. However, for classical integration they should also be much smaller so they don't matter much I guess. More important would be the interpolation error. The idea would be able to run
which uses the PDF the MC used and convolves the grid with this PDF to quantify the interpolation error. This is something useful, because we noticed that after the evolution of very specific interpolation grids the predictions deviated quite a bit w.r.t. the original MC values (in that case the error is dominated by the evolution or rather some unfortunate choices of specific parameters of the evolution).
Yes, that's definitely a point that we have to take into account, the logs themselves we'll never get, but very often MCs give the scale varied results from which we can reconstruct the logs.
We'd have to use a separate field, I agree. |
Maybe I'm too naive (and if so please tell me), but let me judge from the three examples I know (and with my naive point of view I consider them representative): the dyaa example in this repo, LeProHQ (my thesis) and MaunaKea (my top++ patch). All three work in the same way: 1) I compute the matrix elements, 2) I parametrize the phase space, 3) I add some histograming, i.e. e.g. Specifically after 3) I don't care about bins any longer, but let PineAPPL do its job. And if I would want to provide a MC error on a specific bin, I would basically need to reimplement PineAPPL as I would need to keep track of bins myself and all the events and their associated weights that go into each of them. I.e. such a features requires some dedicated effort and if you want such a thing it would/could fall into PineAPPL core functionality.
I would say if ever, classical is easier. And just to stress your first point: indeed it would be underestimated, because often you have bins in kinematically unfavoured regions (say large pt) which are significantly scarcer populated (and thus have a larger error). This is true already in the plain MC implemented in the local dyaa example and even more so in the MC I usually use, which is an adaptive integration (not Vegas, but built on top of that - precisely because I need the associated vegas weight to fill the histograms (and that library is the only one I found which supports that)) - and I want an adaptive algorithm to sample more important regions and thus get a more reliant result. In a classical (=quadrature) integration you would typically do a integration per bin and thus you can use that one as error. Note that this does not hold for yadism, where we do a quad for every (interpolation) point and it is not straight away clear to me how to collapse them into a single number.
I can see your point, but I don't think there is a simple solution ... actually, the simplest (but most expensive) solution I can see is to run the MC with different number of points and so guess the MC error. In any case you will need a PDF to compare numbers. |
Just a few comments.
If you want to read them automatically, define a format, and avoid using an arbitrary KVDB.
In both cases, there are different strategies for estimating the uncertainty, and they are equally valid (sample variance, approximation error - you estimate with a higher order quadrature rule).
It matters, quite as much as in the MC. The problem is with weird behaviors, that of course are more likely in high dimensionality (so, it is correlated to MC vs quad, but indirectly). For well-behaved (enough) functions, the estimates in the two cases should be rather accurate. The main problem are systematics in integration (in MC could be suitable coverage of various regions, and in quad the oscillatory behavior of the functions, that is terrible for Mellin inversion, and potentially also in the case of special functions).
Well, in the case of yadism you get an error for the individual point. Thus, you have a function evaluated on some points, and you can not choose. If you want to integrate that function you can essentially use a quadrature integration as well, but it can't be adaptive, since the points are predetermined (so the errors would have a different meaning, but if you think about a 0-th order quad integration it is still a meaningful error - at most the problem is propagating it through higher order methods, together with the additional uncertainty coming from the integration). |
Well, if I understood correctly, that should be simple enough: you have the full observable from the MC, and the observable computed through PineAPPL with the same PDF (storing the label). However, I don't see the point of Comment on evolution (for posterity)This is not necessarily a reliable test for evolution. |
Yet another source of uncertainty is the extrapolation uncertainty from Monte Carlo integrators using slicing methods. |
just to say that this is an orthogonal error to what we were discussing mainly before here 🙃 so I'd say:
|
In pinefarm we add metadata to the generated PineAPPL grid. Unfortunately the metadata changed a bit over time and thus isn't uniform and also a bit hard to read automatically. An example of the
results
metadata forATLAS_DY_7TEV_CF
is:This can be improved:
PineAPPL
; we can always calculate this number by convoluting the grid with the PDF given in the metadataresults_pdf
.MC
stores the sum of all contributions generated by the Monte Carlo integrator. If possible, we should store the MC predictions of each order, since we may want to test the interpolation error.sigma
, which shows to the total MC uncertaintiy, we should instead show the Monte Carlo uncertainties of each order; this would enable us to use them inpineappl uncert
, which then can calculate PDF, scale-variation and MC uncertainties and combinations of them.min
andmax
) of a 7- or 9-point scale variation. We never document which one it is. We could unify this by treating the log-grids similarly to all the other orders.The text was updated successfully, but these errors were encountered: