Application of POIs using formulas #850

rafaellopesdesa · 2020-05-02T05:50:27Z

Hi,

It seems that, right now, the modifiers are limited to addition of multiplication, with the POIs as normfactors being defined by simple modification POI*pdf. That's ok for most models, but some models require more complex likelihoods. The case that comes to my mind (need?) is EFT analyses which have an interference term and a squared term and, therefore, would need something like sqrt(POI)inter + POIsquare.

I don't know exactly what is the best way to implement that. I think that a simple way would be if one could define multiple POIs and if one could define the value of a modifier as a function of other modifiers. Even simple functions like polynomials would already encompass most of the case while keeping all operations very fast and efficient.

Let me know if there is anything I can do to help.

Thanks for considering.

lukasheinrich · 2020-05-09T08:47:31Z

hi @rafaellopesdesa

apologies for the slow response.

the way to implement this is to add a new modifier. We could call it a formulafactor, with the rule being that it can only reuse existing parameters and does not introduce new ones

{
"name": "some_name",
"type": "formulafactor", 
"data": {
  "expression" : "sqrt(x)*y", dependencies: [{"x": "mu", "y": mu2}]
}
},
{
"name": "another_name",
"type": "formulafactor", 
"data": {
  "expression" : "x**2", dependencies: [{"x": "some_name"}]
}
}

I think in the constrained language of numexpr https://github.com/pydata/numexpr and the with this constraint that no new parameters are introduced this would be quite doable.

cc @alexander-held who has also raised this issue

rafaellopesdesa · 2020-05-14T13:23:48Z

Hi @lukasheinrich ,

Yes, that would be great for more complex likelihood models.

I will give it a try this weekend and, if I manage to implement your idea, will come back here to report.

Thanks again!

lukasheinrich · 2020-05-14T13:34:22Z

I would suggest you orient yourself along the lines of the normfactor code

https://scikit-hep.org/pyhf/_modules/pyhf/modifiers/normfactor.html#normfactor

the output shape of a modifier is

(n_modifiers, n_global_samples, n_alphas, n_global_bin)

here's the meaning of the output shape

different "formulas" to be applied
the affect of that formula for each sample (if the sample doesn't declare/is unaffected by the formulafactor, it should just be multiplied by 1)
the batch size of parameters.. this is probably just 1
the global bin number (across all channels)

so if you need to compute the affect of 3 formulas and in total have 10 sample (which might or might not be affected by the formula(s) and want to evaluate the formula for 5 parameter settings and the total number of bins is 20 you would have

an output tensor of

(3,10,5,20)

lukasheinrich · 2020-05-27T08:39:02Z

Hi, I will likely take a stab at it since I need this for my analysis. (unless someone is already working on it?)

alexander-held · 2020-09-23T09:05:45Z

Hi, just to add an example of what other libraries do: uproot4 can interpret cuts / selection requirements as python (https://github.com/scikit-hep/uproot4/blob/master/uproot4/language/python.py).

I imagine that in practical applications, a very limited amount of operators will be enough for most fits: addition, subtraction, multiplication, division. Taking a power is convenient, but I guess non-integer powers are rarely needed, so multiplication is fine. Square roots are convenient too, but could be worked around by redefining normfactor->sqrt(normfactor) and then taking the power instead.

rafaellopesdesa · 2021-05-07T18:25:24Z

Ok, I am coming back here because I am starting a new project that could use this and this time I really want to give pyhf a try.

I would say that powers of the normfactor would already solve 99% of the cases I have in mind. The main application I have in my is cases with interference (like theta^2 * S + theta * I + B) [think EFT, for instance]

Maybe we could just have an option in normfactor that says the power which which the normfactor acts on the sample.

Would that be a simple way to implement it?

balunas · 2021-08-06T11:08:32Z

To bring this issue up again, I see that @lukasheinrich added some "custom modifier" feature but I can't locate any documentation on how to use this to define my own modifier. Is this something users can currently interface with? We'd like to use something like this in ATLAS di-Higgs analyses, where our POI can be a coupling which has rather complicated interference effects and can't be described by any existing modifier. To give a concrete example, we want to implement something like

yield = B + f(k)*S1 + g(k)*S2 + h(k)*S3

where k is our POI, f, g, and h are functions we define, and S1, S2, S3 are different "basis" signal distributions. Is something like this already supported in some way, or is this part of the open items remaining on this issue?

mswiatlo · 2021-09-21T22:09:36Z

Hi @lukasheinrich , @alexander-held ,

I wanted to bring this up again as it's starting to become a limiting factor in our analysis, and it would be really nice to avoid having to convert everything back to Roofit. Is there any documentation/testing/etc. that still needs to be done here? I also have a student who could potentially contribute to this (development, testing, etc.) if that's the limiting factor. Please let us know-- it would be really great to get this last feature that would enable a pure pyhf fit structure for our analysis!

Best wishes,
Max

lukasheinrich · 2021-09-21T22:26:00Z

hi @mswiatlo - let's prioritize this then.

is this the same type of formula that @balunas needs? and if so what do f(k),g(k),h(k) look like?

Importantly - is each bin modification computed only as a function of the parameters and the sample yields in that bin or are cross-bin relations important?

mswiatlo · 2021-09-21T23:33:51Z

Yes, we're working on the same analysis :)

balunas · 2021-09-22T15:43:37Z

Hi @lukasheinrich -

The ideal scenario would be if we could implement f(k) etc. ourselves in python, but in practice we're dealing with polynomials here at the moment. That is, f(k) = a + bk + ck^2 + ... There's also a k-dependent overall normalization (which gets looked up from the theory calculation). In principle that can be handled with a separate normfactor which we fix ourselves, but it would be even better if that could be included in a user-defined f(k).

There are no cross-bin relations that come in, the only dependence is on the POI.

Looking further ahead, we'd also love to use this sort of functionality with multiple POIs. I know general support for that is still in the works, but the dream would be to have something like

yield = B + f(k1, k2, k3)*S1 + g(k1, k2, k3)*S2 + ...

lukasheinrich · 2021-09-22T15:48:37Z

I think if the general structure is a multiplicative modifier per-sample that only depends on the parameters

sample_yield = base_sample_yield * f(POIs, NPs) is definitely workable. The summing over samples would then happen as in the other modifiers

this would not cover yet e.g. custom functions that depend on the bin location or other bins e.g. f(mass) for a mass bump or so, but seems like a good, quite general middle ground?

alexander-held · 2021-09-22T16:04:36Z

yield = B + f(k1, k2, k3)*S1 + g(k1, k2, k3)*S2 + ...

As long as f and g are not too complicated (I assume everything that can be put in a TFormula may work), I guess this is exactly of the form that ROOT allows via AddPreprocessFunction. k1, k2 etc. are going to be normalization factors not acting on any sample, but they are free parameters in the fit. f, g are expressions that depend on those normalization factors, but are not a free parameter in the fit themselves (but they are attached to the samples they should scale). For example f could be something like

k1 + k2*3.5 + k3*k2*7.0

without any problem (in the existing ROOT implementation). I don't think that any sort of look-up works in that ROOT system from within that formula but could very well be wrong here. A look-up could also be approximated by some fit and then implemented via an appropriate polynomial or similar. I am not sure whether ROOT allows these expressions to depend on (constrained) nuisance parameters, I have only used them with normalization factors.

In case it may be helpful, here's an (ATLAS-internal) documentation page describing the use of this in TRExFiter: https://trexfitter-docs.web.cern.ch/trexfitter-docs/model_building/expression/.

lukasheinrich · 2021-09-22T16:48:15Z

thanks @alexander-held - so I guess that's compatible wiith my previous reply? can you confirm?

alexander-held · 2021-09-22T16:51:46Z

@lukasheinrich I think so, yes. The ROOT version may be slightly less general as you have f(POIs, NPs) and I am not sure that you can do a dependency on NPs in ROOT (nor have I seen a use case for that yet). This should work for almost all cases I have come across so far.

lukasheinrich · 2021-10-12T19:09:51Z

(not sure why github is not auto-linking this thread here: #1627)

matthewfeickert · 2021-10-12T19:13:48Z

(not sure why github is not auto-linking this thread here: #1627)

It's pretty annoying, but GitHub won't show a visual cross-references between Issues and Discussions. No idea why. 😞

lukasheinrich · 2022-09-07T22:33:45Z

hi all,

we're preparing our 0.7.0 release and with this we are making some progress on this front

check out the code here

https://gist.github.com/lukasheinrich/a7f23b71cf048727ca9012f5f6ea940a

@alexander-held has been testing this and an initial look make it seem like this would cover quite a bit of ground. It'd be interesting to get your opinion @balunas @rafaellopesdesa

kratsg · 2022-09-08T15:17:10Z

This is migrated into a draft PR #1991. See tests/test_experimental.py for an example usage (basically what is currently in the cell above right now).

amartone21 · 2023-08-07T16:08:28Z

Hi,

I'm a master's student and for my thesis I'm working on the HWW off-shell analysis. In my case, I'd need to have modifiers which are function of the POI, something like mu - sqrt(mu). Can this be done via the above implementation?

alexander-held · 2023-08-07T17:05:49Z

Yes this works, an example of that is also discussed in this issue: scikit-hep/cabinetry#382.

kratsg added the question Further information is requested label May 2, 2020

matthewfeickert added the user request Request coming form a pyhf user label May 9, 2020

lukasheinrich mentioned this issue Nov 19, 2020

feat: custom modifiers #1188

Closed

lhenkelm mentioned this issue Sep 23, 2021

Expose base_uri in utils.validate #1609

Closed

1 task

matthewfeickert mentioned this issue Oct 15, 2021

feat: Add setup for custom modifiers #1625

Merged

1 task

kratsg mentioned this issue Sep 8, 2022

feat: Experimental implementation of custom modifiers #1991

Draft

4 tasks

alexander-held mentioned this issue Sep 23, 2022

docs: Add v0.7.0 release notes #1705

Merged

4 tasks

alexander-held mentioned this issue Nov 14, 2022

Attaching modifiers twice to a sample #2059

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Application of POIs using formulas #850

Application of POIs using formulas #850

rafaellopesdesa commented May 2, 2020

lukasheinrich commented May 9, 2020 •

edited

Loading

rafaellopesdesa commented May 14, 2020

lukasheinrich commented May 14, 2020

lukasheinrich commented May 27, 2020 •

edited

Loading

alexander-held commented Sep 23, 2020

rafaellopesdesa commented May 7, 2021

balunas commented Aug 6, 2021

mswiatlo commented Sep 21, 2021

lukasheinrich commented Sep 21, 2021 •

edited

Loading

mswiatlo commented Sep 21, 2021

balunas commented Sep 22, 2021

lukasheinrich commented Sep 22, 2021 •

edited

Loading

alexander-held commented Sep 22, 2021 •

edited

Loading

lukasheinrich commented Sep 22, 2021

alexander-held commented Sep 22, 2021 •

edited

Loading

lukasheinrich commented Oct 12, 2021

matthewfeickert commented Oct 12, 2021

lukasheinrich commented Sep 7, 2022

kratsg commented Sep 8, 2022

amartone21 commented Aug 7, 2023 •

edited

Loading

alexander-held commented Aug 7, 2023

Application of POIs using formulas #850

Application of POIs using formulas #850

Comments

rafaellopesdesa commented May 2, 2020

lukasheinrich commented May 9, 2020 • edited Loading

rafaellopesdesa commented May 14, 2020

lukasheinrich commented May 14, 2020

lukasheinrich commented May 27, 2020 • edited Loading

alexander-held commented Sep 23, 2020

rafaellopesdesa commented May 7, 2021

balunas commented Aug 6, 2021

mswiatlo commented Sep 21, 2021

lukasheinrich commented Sep 21, 2021 • edited Loading

mswiatlo commented Sep 21, 2021

balunas commented Sep 22, 2021

lukasheinrich commented Sep 22, 2021 • edited Loading

alexander-held commented Sep 22, 2021 • edited Loading

lukasheinrich commented Sep 22, 2021

alexander-held commented Sep 22, 2021 • edited Loading

lukasheinrich commented Oct 12, 2021

matthewfeickert commented Oct 12, 2021

lukasheinrich commented Sep 7, 2022

kratsg commented Sep 8, 2022

amartone21 commented Aug 7, 2023 • edited Loading

alexander-held commented Aug 7, 2023

lukasheinrich commented May 9, 2020 •

edited

Loading

lukasheinrich commented May 27, 2020 •

edited

Loading

lukasheinrich commented Sep 21, 2021 •

edited

Loading

lukasheinrich commented Sep 22, 2021 •

edited

Loading

alexander-held commented Sep 22, 2021 •

edited

Loading

alexander-held commented Sep 22, 2021 •

edited

Loading

amartone21 commented Aug 7, 2023 •

edited

Loading