Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated slides #478

Open
qualiaMachine opened this issue Jun 4, 2024 · 3 comments
Open

Updated slides #478

qualiaMachine opened this issue Jun 4, 2024 · 3 comments

Comments

@qualiaMachine
Copy link
Collaborator

I have some updated slides that I used to teach this lesson last week: https://docs.google.com/presentation/d/1uT4uvfWrpvrrQEFp84PGfAQ2r9Ylqx8tbwiVFuGfEao/edit?usp=sharing

Please feel free to use/repurpose anything in there.

I felt it was important to comment on the double descent phenomenon during the discussion of "how much data is needed?", especially in the age of increasingly large language models. Double descent is not currently mentioned in the lesson. I may make a pull request on the topic if I can find the time... it's something we may want to add to an earlier episode.

@svenvanderburg
Copy link
Collaborator

Thanks for sharing @qualiaMachine ! We'll think about what to do with slides, since it doesn't make much sense if everyone develops their own slides. I didn't actually know about double descent, thanks for teaching me! Is it something that you come across frequently in practice?

@qualiaMachine
Copy link
Collaborator Author

Glad I could share! It's something that hasn't really been discussed much up until a few years ago. Older textbooks still need to be updated since the classic bias-variance tradeoff is violated with deep neural networks! I have personally never experienced it, but I have worked with fairly small datasets relative to other deep learning applications. Evidently double descent is more frequently observed when you have larger datasets... at least 10,000 observations which I never encountered in my research applications :(. Many other learners may be in a similar boat, but I still think it's worthwhile to point out. I usually talk about it in the context of large language models, which despite having billions/trillions of weights, can still avoid overfitting. It's also something that's worth mentioning when early stopping is implemented. While in general, I would recommend sticking with early stopping, those with large datasets may want to explore "overparameterized" models to see if they can reach past the initial overfitting phase.

The book I recommended has a chapter on it if you're curious to learn more: https://udlbook.github.io/udlbook/.

Here are a couple other references that are worth checking out:

@svenvanderburg
Copy link
Collaborator

Cool, thanks for the clear explanations! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants