-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
another classifier in classification chapter? #232
Comments
I'll make this a v2 enhancement, with the goal of adding logistic regression. But in reality this is somewhere between "v2 enhancement" and "blue sky enhancement". |
Just a bit of brainstorming on this as a followup: if we do this, I think it might make the most sense to have a new chapter, Classification III, that covers logistic regression. That way we can mimic the structure of Regression II, which is the equivalent chapter for regression. If we tried to do LogReg in Classification I or II, we wouldn't be able to do that (b/c there will be concepts that weren't introduced yet). This would also involve editing Reg II to avoid repeating ourselves when we get to Lin Reg |
I worry a bit about introducing logistic regression before linear regression... and we use knn classification as a gateway to knn regression, so we're kind of tied to classification and then regression... Maybe all this wouldn't be a problem if we place Classification II after Regression II? So it's kind of like a classification sandwich? Alternatively, we choose some other algorithm for classification II? Decision trees could be good? They're the basis for some of the most popular and best performing ML models right now? Or we could choose SVMs? |
Thanks for brainstorming :)
I'm definitely on board with adding more interesting classification models like decision trees / forests / SVMs / NNs. If I had to pick more classification models to add, I'd go with LogReg first (because it's almost as simple as linear regression, very popular, and a nice counterpart to the uninterpretable knn stuff) and then Decision tree/forest because it has a really nice algorithmic / intuitive description of how it classifies things. SVMs and NNs are harder to introduce at the level of this textbook -- esp SVMs... -- but I don't think it's impossible.
Hmmm, I don't think that will work -- that would cause a huge rewrite of at least 3 chapters -- since Reg 1 & 2 rely on knowing about cross val / tuning / etc from Cls 2.
For me the purpose of Cls 2 is mostly to introduce evaluation / tuning. I wouldn't want to introduce a new classifier at the same time, just to avoid overloading people. That would also involve fairly heavy editing on an already polished chapter.
I don't think it's super important to jump directly from knn classification to knn regression. We already space them out by Cls 2, which is all about tuning/eval. If we had a new "classification 3", at the beginning of "regression 1" we would just keep the same introduction to regression problems, and make very minor modifications to the text to say that we're going to introduce regression with a k-nn-based model, just like we did in classification. I'm still fairly convinced that the most natural place to introduce new classifiers is in a new "Classification 3" chapter. It also makes it natural to later on consider adding other regression models in "Regression 3", but we could merge those into Reg 2 as well. |
Just documenting one point from an in-person chat with Tiffany: we probably want to avoid introducing new classifiers in the actual DSCI100 course itself, to avoid conflict with other existing classes (CPSC330 notably). But adding to the textbook can be independent of that. One other potential issue with "Cls 3" chapter: introducing logistic regression before linear regression will be a bit awkward. Maybe best to stick with decision tree/forest? Probably will punt this edit for now and return to it later. |
Posting #106 Reviewer E's comment here about classification chapter:
The text was updated successfully, but these errors were encountered: