Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xgboost model #255

Open
wants to merge 1 commit into
base: eurovision-main
Choose a base branch
from

Conversation

KatrionaGoldmann
Copy link
Collaborator

Summary

Adds the xgboost model fitting and predictions

What should a reviewer concentrate their feedback on?

  • Updating requirements file
  • Everything looks ok?

Acknowledging contributors

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Collaborator

@mastoffel mastoffel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! Everything but the graphviz plot (see comment) ran smoothly and made sense (though I didn't ponder over every detail).

"metadata": {},
"outputs": [],
"source": [
"graph = xgb.to_graphviz(model_basic, num_trees=1, rankdir='LR')\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very nice, but I had to brew install graphviz to make this work (which I think is fine, but might be worth a comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to note: I also did pip install graphviz before, but that wasn't sufficient it seems (but maybe we should add it to the requirements anyway)

Comment on lines +3165 to +3179
"source": [
"model_ranked, test_data, train_data = xgboost_rank_model(df_xgboost.loc[df_xgboost['points'] > 0], seed=7, test_size=0.33)\n",
"out = ranked_model_predictions(model_ranked, test_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_ranked_all, test_data_all, train_data_all = xgboost_rank_model(df_xgboost, seed=7, test_size=0.33)\n",
"out_all = ranked_model_predictions(model_ranked_all, test_data_all)"
]
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a very good comparison. Interesting that accuracy drops that much when 0's are excluded (though I think we did talk about this at some point). Also, what I don't quite understand is why the accuracy in the basic XGBoost is still 20% higher than in the ranked including 0's.

"metadata": {},
"outputs": [],
"source": [
"violins(out)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those are super helpful to understand what's going on. It seems like there is definitely some predictive quality to the model, though not very precise (except for the top score categories maybe)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants