Add xgboost model #255

KatrionaGoldmann · 2024-03-28T17:21:19Z

Summary

Adds the xgboost model fitting and predictions

What should a reviewer concentrate their feedback on?

Updating requirements file
Everything looks ok?

Acknowledging contributors

All contributors to this pull request are already named in the table of contributors in the README file.
The following people should be added to the table of contributors in the README file

review-notebook-app · 2024-03-28T17:21:24Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

mastoffel

Great PR! Everything but the graphviz plot (see comment) ran smoothly and made sense (though I didn't ponder over every detail).

mastoffel · 2024-04-08T09:05:58Z

stories/2024-01-01-Eurovision/story.ipynb

+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "graph = xgb.to_graphviz(model_basic, num_trees=1, rankdir='LR')\n",


this is very nice, but I had to brew install graphviz to make this work (which I think is fine, but might be worth a comment)

just to note: I also did pip install graphviz before, but that wasn't sufficient it seems (but maybe we should add it to the requirements anyway)

mastoffel · 2024-04-08T09:12:03Z

stories/2024-01-01-Eurovision/story.ipynb

+            "source": [
+                "model_ranked, test_data, train_data = xgboost_rank_model(df_xgboost.loc[df_xgboost['points'] > 0], seed=7, test_size=0.33)\n",
+                "out = ranked_model_predictions(model_ranked, test_data)"
+            ]
+        },
+        {
+            "cell_type": "code",
+            "execution_count": null,
+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "model_ranked_all, test_data_all, train_data_all = xgboost_rank_model(df_xgboost, seed=7, test_size=0.33)\n",
+                "out_all = ranked_model_predictions(model_ranked_all, test_data_all)"
+            ]
+        },


this is a very good comparison. Interesting that accuracy drops that much when 0's are excluded (though I think we did talk about this at some point). Also, what I don't quite understand is why the accuracy in the basic XGBoost is still 20% higher than in the ranked including 0's.

mastoffel · 2024-04-08T09:13:45Z

stories/2024-01-01-Eurovision/story.ipynb

+            "metadata": {},
+            "outputs": [],
+            "source": [
+                "violins(out)"


those are super helpful to understand what's going on. It seems like there is definitely some predictive quality to the model, though not very precise (except for the top score categories maybe)

Add xgboost model

2850a6f

mastoffel approved these changes Apr 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xgboost model #255

Add xgboost model #255

KatrionaGoldmann commented Mar 28, 2024

review-notebook-app bot commented Mar 28, 2024

mastoffel left a comment

mastoffel Apr 8, 2024

mastoffel Apr 8, 2024

mastoffel Apr 8, 2024

mastoffel Apr 8, 2024

Add xgboost model #255

Are you sure you want to change the base?

Add xgboost model #255

Conversation

KatrionaGoldmann commented Mar 28, 2024

Summary

What should a reviewer concentrate their feedback on?

Acknowledging contributors

review-notebook-app bot commented Mar 28, 2024

mastoffel left a comment

Choose a reason for hiding this comment

mastoffel Apr 8, 2024

Choose a reason for hiding this comment

mastoffel Apr 8, 2024

Choose a reason for hiding this comment

mastoffel Apr 8, 2024

Choose a reason for hiding this comment

mastoffel Apr 8, 2024

Choose a reason for hiding this comment