Readings will be added to the later weeks of the course in response to student interest and new developments in the field.
Readings are due on the day indicated.
Problem sets will be distributed via CMS no later than the Friday indicated. They are are due as indicated on CMS (often, but not always, two weeks from distribution).
Occasional brief responses are due as assigned on Tuesdays before 4:00pm. For details, see the Discussions section of Canvas.
- JM = Jurafsky and Martin, Speech and Language Processing, 3rd ed. (online)
Week | Monday | Wednesday | Friday |
---|---|---|---|
1 (8/21) | Introduction | Tokenization. | Problem set 0: Setup and shakedown |
2 (8/28) | Dictionary methods and vector space models.
|
|
PS 1: Tokens, vectors, and regression |
3 (9/4) | No class. Labor Day. | Regression. | |
4 (9/11) | Clustering.
|
|
PS 2: Clustering and classification |
5 (9/18) | Classification. |
|
|
6 (9/25) | Feature importance and hypothesis testing. |
|
|
7 (10/2) | Topic models
|
No section meetings. Fall break. PS 3: Features and comparisons. |
|
8 (10/9) | No class. Fall break. | NLP and feature expansion.
|
|
9 (10/16) | Static word embeddings.
|
Nelson, "Leveraging the Alignment between Machine Learning and Intersectionality" (Canvas) | PS 4: Entities and static embeddings |
10 (10/23) | BERT and contextual embeddings.
|
|
|
11 (10/30) | Large language models and generative AI.
|
||
12 (11/6) | Catch-up: Using BERT for classification | No class. Prof. Wilkens out of town. | PS 5: Contextual embeddings and LLMs |
13 (11/13) | Catch-up: The BERT architecture | Text generation with LLMs
|
|
14 (11/20) | LLM applications | No class. Thanksgiving. | No section meetings. Thanksgiving. |
15 (11/27) | Multilingual NLP | Working with user-generated content. | Review and exam preparation |
16 (12/4) | Summary discussion and conclusions. | ----- | ----- |