-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xgboost model #255
base: eurovision-main
Are you sure you want to change the base?
Add xgboost model #255
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! Everything but the graphviz plot (see comment) ran smoothly and made sense (though I didn't ponder over every detail).
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"graph = xgb.to_graphviz(model_basic, num_trees=1, rankdir='LR')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very nice, but I had to brew install graphviz
to make this work (which I think is fine, but might be worth a comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to note: I also did pip install graphviz
before, but that wasn't sufficient it seems (but maybe we should add it to the requirements anyway)
"source": [ | ||
"model_ranked, test_data, train_data = xgboost_rank_model(df_xgboost.loc[df_xgboost['points'] > 0], seed=7, test_size=0.33)\n", | ||
"out = ranked_model_predictions(model_ranked, test_data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"model_ranked_all, test_data_all, train_data_all = xgboost_rank_model(df_xgboost, seed=7, test_size=0.33)\n", | ||
"out_all = ranked_model_predictions(model_ranked_all, test_data_all)" | ||
] | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a very good comparison. Interesting that accuracy drops that much when 0's are excluded (though I think we did talk about this at some point). Also, what I don't quite understand is why the accuracy in the basic XGBoost is still 20% higher than in the ranked including 0's.
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"violins(out)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those are super helpful to understand what's going on. It seems like there is definitely some predictive quality to the model, though not very precise (except for the top score categories maybe)
Summary
Adds the xgboost model fitting and predictions
What should a reviewer concentrate their feedback on?
Acknowledging contributors