Skip to content

Latest commit

 

History

History
43 lines (31 loc) · 4.77 KB

README.md

File metadata and controls

43 lines (31 loc) · 4.77 KB

NLP for Social Sciences

This repository supplements the NLP for Social Sciences course taught during the fall semester of 2024 at the Université Lumière Lyon 2.

Lecture/TD materials are stored in the ./n* folders. You can run all Jupyter Notebooks locally or in Google Colab.

Course Plan

  1. Introduction to Natural Language Processing. Challenges of text processing (word ambiguity, idioms, slang, spelling wo). Existing applications of NLP (translation, trend analysis, summarization, virtual assistants). Text preprocessing steps. Lemmatization vs stemming. (CM 1) Link
  2. Vector representation of words. Embeddings obtained with one-hot encoding. Distributional hypothesis. Word-word co-occurrence and PMI matrices. Word-document matrices for tf-idf. Overview of word2vec models. (CM2) Link
  3. Summary of approaches to vector representation. Negative sampling. Word2Vec: skip-gram vs CBOW. Linear operations with vectors, including addition and subtraction. Impact of large/small context window size on embedding results. Problem statement for text classification. Overview of feature extraction approaches: count-based vs neural. Overview of text classification with Naive Bayes. (CM3) Link
  4. Overview of feature extraction approaches: count-based vs neural. Text classification with Naive Bayes. Laplace (add-one) smoothing. Text classification with Logistic Regression. Training: Maximizing Likelihood. Naïve Bayes vs Logistic Regression. Text classification with SVM. Overview of classification with Neural Networks. A variety of word embeddings. Data Augmentation for Text. (CM4) Link
  5. Neural Networks. Fully-connected neural networks. Transformer models. Encoders and decoders. Attention Mechanism. (CM5) Link

TD 1

To run the notebooks on a cloud platform, just click on one of the badges in the table below:

Topic Colab
1 Preliminaries of gradient descent Open In Colab
2 Word embeddings Open In Colab

TD 2

Topic Colab
1 Supervised text classification Open In Colab
0 Text pre-processing Open In Colab

TD 3

Aurora-embeddings Open In Colab

TD 4

Topic Colab
1 Mistral Open In Colab
2 OLLaMA Open In Colab

Useful links for the course:

  1. https://web.stanford.edu/~jurafsky/slp3/ (in English)
  2. Official course by Hugging-Face: https://huggingface.co/learn/nlp-course/fr