-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exporting/saving/reusing the reweighting formula #33
Comments
This is a frequent question (or family of questions) from physicists, who are interested in applying reweighting to one more data sample. Below I give solutions for different situations. Working from the same scriptFrequently applicable, but for some reason ignored by physicists (ROOT influence?) solution is read this file inside the same script/notebook and apply reweigher. You can store the weights column using recipe from this issue. When you need to store formulaPossible reasons:
You can use cPickle. Works as following: import cPickle as pickle
# saving formula
with open('reweighter.pkl', 'w') as f:
pickle.dump(reweighter, f)
#loading formula
with open('reweighter.pkl') as f:
reweighter = pickle.load(f) Exporting to TMVA(needed when you need to build it inside some production script / experiment) When applying formula, reweighter is not much different from simple gradient boosting / random forest (see how
There are solutions, which convert sklearn's trees to TMVA format: koza4ok and sklearn-pmml. Warning: I haven't tried any of those, since I am not using TMVA, so I expect many caveats on that way. If someone tried and succeeded with exporting to TMVA, let me know. |
Hi Alex, thanks a lot for the quick feedback! Regards, |
I have a question about converting from
However, I am not sure the last line gives the correct output. In hep_ml/hep_ml/gradientboosting.py Lines 136 to 144 in 41e97d5
At the end,
Should I export the array from |
Hi @kpedro88 For conversion, almost surely you'll need to do the following (not tested, maybe needs corrections): for tree, leaf_values in estimators:
new_tree = copy.deepcopy(tree)
assert new_tree.tree_.value.shape == (len(leaf_values), 1, 1)
new_tree.tree_.value[:, 0, 0] = leaf_values
<save new tree to the ensemble> Don't forget to verify you get the same predictions before / after conversion |
Sometimes one would like to use a control sample, e.g. because more abundant, to determine MC weights to be then applied to other, e.g. more rare, samples
For this reason it would be very useful if hep_ml.reweight could export the "reweighting formula" in some format, e.g. ROOT, so that it can be reused also from different programming languages
Thanks
The text was updated successfully, but these errors were encountered: