You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use uBoost as an alternative to usual BDTs to which one can add more variables and maintain uniformity of the response over a specific variable (invariant mass).
I'm using the intuitively named and properly described in the documentation methods uBoostClassifier.fit() and uBoostClassifier.predict_proba(), but the results are far from expected. The background probability is distributed around 0.5, and signal is flat (signal is flat over the response, not the response is flat over the invariant mass).
When I dig a little deeper into the source code of hep_ml/uboost.py, I find that predict_proba() calculates these probabilities a little weirdly. It sums the output of uBoostBDT._uboost_predict_score() and then transforms it into probability using a sigmoid in hep_ml/commonutils.pyscore_to_proba().
As a result, the values that are summed are all positive (they are outputs of a sigmoid function). When one feeds the sum to the second sigmoid, the probability can only be larger than 0.5.
For the background, individual outputs of the first sigmoids are around zero, so the second sigmoid gives around expit(0) = 0.5.
For the signal, the individual outputs are close to one, and their sum is equal to the number of sigmoids. Then it is around expit(1) = 0.73....
Both these values are far from 0 and 1, which they should be since they are probabilities to be signal.
This behavior looks quite incorrect, but I don't even know how to fix it properly, at which step.
The text was updated successfully, but these errors were encountered:
should be replaced with just averaging, then result[:, 1] = x, result[:, 0] = 1 - x. (PR that fixes this is welcome)
Other than probability calibration, this won't change result (e.g. flatness).
Comment: Sigmoid in uboost.py#365 is added to utilize predictions of individual uboostBDTs more efficiently.
Dear developers,
I'm trying to use uBoost as an alternative to usual BDTs to which one can add more variables and maintain uniformity of the response over a specific variable (invariant mass).
I'm using the intuitively named and properly described in the documentation methods
uBoostClassifier.fit()
anduBoostClassifier.predict_proba()
, but the results are far from expected. The background probability is distributed around 0.5, and signal is flat (signal is flat over the response, not the response is flat over the invariant mass).When I dig a little deeper into the source code of
hep_ml/uboost.py
, I find thatpredict_proba()
calculates these probabilities a little weirdly. It sums the output ofuBoostBDT._uboost_predict_score()
and then transforms it into probability using a sigmoid inhep_ml/commonutils.py
score_to_proba()
.hep_ml/hep_ml/uboost.py
Lines 532 to 540 in 442a321
uBoostBDT._uboost_predict_score()
doesn't return just the score. It also converts it usinghep_ml/commonutils.py
sigmoid_function()
.hep_ml/hep_ml/uboost.py
Lines 363 to 366 in 442a321
For the background, individual outputs of the first sigmoids are around zero, so the second sigmoid gives around
expit(0) = 0.5
.For the signal, the individual outputs are close to one, and their sum is equal to the number of sigmoids. Then it is around
expit(1) = 0.73...
.Both these values are far from 0 and 1, which they should be since they are probabilities to be signal.
This behavior looks quite incorrect, but I don't even know how to fix it properly, at which step.
The text was updated successfully, but these errors were encountered: