diff --git a/homeworks/hw3_bert/hw_attention_and_bert.ipynb b/homeworks/hw3_bert/hw_attention_and_bert.ipynb new file mode 100644 index 0000000..43da786 --- /dev/null +++ b/homeworks/hw3_bert/hw_attention_and_bert.ipynb @@ -0,0 +1,596 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "d755d457", + "metadata": {}, + "source": [ + "\n", + "## Grading Criteria\n", + "\n", + "**Maximum Score: 10 points**\n", + "\n", + "1. **Step #1: Implementation of Attention Mechanisms (4 points)** \n", + " - 2 points for implementing the `additive` attention mechanism correctly.\n", + " - 2 points for implementing the `multiplicative` attention mechanism correctly.\n", + "\n", + "2. **Step #2: BERT-based Text Classification Task (4 points)** \n", + " - 2 points for setting up the BERT model correctly for classification.\n", + " - 2 points for evaluating the model performance accurately.\n", + "\n", + "3. **Code Quality and Comments (2 points)** \n", + " - 1 point for code clarity and logical structuring of functions and classes.\n", + " - 1 point for detailed comments explaining each part of the code.\n", + "\n", + "**Total: 10 points** \n", + "Each section will be reviewed to ensure that all requirements are met and that the code is efficient and well-documented. Pay attention to using proper variable names and providing comments that describe the purpose of each function and major code sections.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "jNtLJlW4v5VF" + }, + "source": [ + "## Attention & BERT\n", + "\n", + "For this homework assignment, your goal is to delve into the Attention mechanism (implementing several of its variants) and revisit the text classification task, this time solving it with BERT.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import random\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "from sklearn.metrics import roc_auc_score, roc_curve, accuracy_score\n", + "\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.nn.functional as F\n", + "\n", + "import matplotlib.pyplot as plt\n", + "from IPython.display import clear_output \n", + "# Display inline matplotlib plots\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step #1. Implementation of Attention\n", + "\n", + "In this task, please implement the Attention mechanism, specifically several methods for calculating attention scores. While this mechanism is already implemented in popular frameworks, you'll implement it using `numpy` to gain a better understanding.\n", + "\n", + "Your task in this part: implement `additive` and `multiplicative` variants of Attention. For your convenience (and as an example), the `dot product` attention (based on scalar product) is already implemented.\n", + "\n", + "Detailed descriptions of these types of Attention are available in the lecture slides." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "decoder_hidden_state = np.array([7, 11, 4]).astype(float)[:, None]\n", + "\n", + "plt.figure(figsize=(2, 5))\n", + "plt.pcolormesh(decoder_hidden_state)\n", + "plt.colorbar()\n", + "plt.title(\"Decoder state\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dot product attention (example implementation)\n", + "\n", + "Let's consider a single encoder state – a vector with dimensions `(n_hidden, 1)`, where `n_hidden = 3`:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "single_encoder_hidden_state = np.array([1, 5, 11]).astype(float)[:, None]\n", + "\n", + "plt.figure(figsize=(2, 5))\n", + "plt.pcolormesh(single_encoder_hidden_state)\n", + "plt.colorbar()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The attention score between these encoder and decoder states is simply calculated as a dot product:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "np.dot(decoder_hidden_state.T, single_encoder_hidden_state)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the general case, there are, of course, multiple encoder states. Attention scores are computed with each encoder state:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "encoder_hidden_states = (\n", + " np.array([[1, 5, 11], [7, 4, 1], [8, 12, 2], [-9, 0, 1]]).astype(float).T\n", + ")\n", + "\n", + "encoder_hidden_states" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then, to calculate the dot products between a single decoder state and all encoder states, we can use the following function (which is essentially just matrix multiplication and type conversion):\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def dot_product_attention_score(decoder_hidden_state, encoder_hidden_states):\n", + " \"\"\"\n", + " decoder_hidden_state: np.array of shape (n_features, 1)\n", + " encoder_hidden_states: np.array of shape (n_features, n_states)\n", + "\n", + " return: np.array of shape (1, n_states)\n", + " Array with dot product attention scores\n", + " \"\"\"\n", + " attention_scores = np.dot(decoder_hidden_state.T, encoder_hidden_states)\n", + " return attention_scores" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "dot_product_attention_score(decoder_hidden_state, encoder_hidden_states)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To calculate the \"weights,\" we need Softmax:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def softmax(vector):\n", + " \"\"\"\n", + " vector: np.array of shape (n, m)\n", + "\n", + " return: np.array of shape (n, m)\n", + " Matrix where softmax is computed for every row independently\n", + " \"\"\"\n", + " nice_vector = vector - vector.max()\n", + " exp_vector = np.exp(nice_vector)\n", + " exp_denominator = np.sum(exp_vector, axis=1)[:, np.newaxis]\n", + " softmax_ = exp_vector / exp_denominator\n", + " return softmax_" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "weights_vector = softmax(\n", + " dot_product_attention_score(decoder_hidden_state, encoder_hidden_states)\n", + ")\n", + "\n", + "weights_vector" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we'll use these weights and compute the final vector, as described for dot product attention.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "attention_vector = weights_vector.dot(encoder_hidden_states.T).T\n", + "print(attention_vector)\n", + "\n", + "plt.figure(figsize=(2, 5))\n", + "plt.pcolormesh(attention_vector, cmap=\"spring\")\n", + "plt.colorbar()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This vector accumulates information from all encoder states, weighted based on proximity to the given decoder state. Let's implement all the above transformations in a single function:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def dot_product_attention(decoder_hidden_state, encoder_hidden_states):\n", + " \"\"\"\n", + " decoder_hidden_state: np.array of shape (n_features, 1)\n", + " encoder_hidden_states: np.array of shape (n_features, n_states)\n", + "\n", + " return: np.array of shape (n_features, 1)\n", + " Final attention vector\n", + " \"\"\"\n", + " softmax_vector = softmax(\n", + " dot_product_attention_score(decoder_hidden_state, encoder_hidden_states)\n", + " )\n", + " attention_vector = softmax_vector.dot(encoder_hidden_states.T).T\n", + " return attention_vector" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert (\n", + " attention_vector\n", + " == dot_product_attention(decoder_hidden_state, encoder_hidden_states)\n", + ").all()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Multiplicative attention. Your current task: implement multiplicative attention.\n", + "\n", + "$$\n", + "e_i = \\mathbf{s}^TW_{mult}\\mathbf{h}_i\n", + "$$\n", + "\n", + "The weight matrix `W_mult` is given below. It should be noted that multiplicative attention allows working with encoder and decoder states of different dimensions, so the encoder states will be updated:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "encoder_hidden_states_complex = (\n", + " np.array([[1, 5, 11, 4, -4], [7, 4, 1, 2, 2], [8, 12, 2, 11, 5], [-9, 0, 1, 8, 12]])\n", + " .astype(float)\n", + " .T\n", + ")\n", + "\n", + "W_mult = np.array(\n", + " [\n", + " [-0.78, -0.97, -1.09, -1.79, 0.24],\n", + " [0.04, -0.27, -0.98, -0.49, 0.52],\n", + " [1.08, 0.91, -0.99, 2.04, -0.15],\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Implement the attention calculation according to the formulas and create the final function `multiplicative_attention`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def multiplicative_attention(decoder_hidden_state, encoder_hidden_states, W_mult):\n", + " \"\"\"\n", + " decoder_hidden_state: np.array of shape (n_features_dec, 1)\n", + " encoder_hidden_states: np.array of shape (n_features_enc, n_states)\n", + " W_mult: np.array of shape (n_features_dec, n_features_enc)\n", + "\n", + " return: np.array of shape (n_features_enc, 1)\n", + " Final attention vector\n", + " \"\"\"\n", + " # your code here\n", + " return attention_vector" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Additive attention. Now you need to implement additive attention.\n", + "\n", + "$$\n", + "e_i = \\mathbf{v}^T \\text{tanh} (W_{add-enc} \\mathbf{h}_i + W_{add-dec} \\mathbf{s})\n", + "$$\n", + "\n", + "The weight matrices `W_add_enc` and `W_add_dec` are provided below, as well as the weight vector `v_add`. For activation calculation, you can use `np.tanh`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "v_add = np.array([[-0.35, -0.58, 0.07, 1.39, -0.79, -1.78, -0.35]]).T\n", + "\n", + "W_add_enc = np.array(\n", + " [\n", + " [-1.34, -0.1, -0.38, 0.12, -0.34],\n", + " [-1.0, 1.28, 0.49, -0.41, -0.32],\n", + " [-0.39, -1.38, 1.26, 1.21, 0.15],\n", + " [-0.18, 0.04, 1.36, -1.18, -0.53],\n", + " [-0.23, 0.96, 1.02, 0.39, -1.26],\n", + " [-1.27, 0.89, -0.85, -0.01, -1.19],\n", + " [0.46, -0.12, -0.86, -0.93, -0.4],\n", + " ]\n", + ")\n", + "\n", + "W_add_dec = np.array(\n", + " [\n", + " [-1.62, -0.02, -0.39],\n", + " [0.43, 0.61, -0.23],\n", + " [-1.5, -0.43, -0.91],\n", + " [-0.14, 0.03, 0.05],\n", + " [0.85, 0.51, 0.63],\n", + " [0.39, -0.42, 1.34],\n", + " [-0.47, -0.31, -1.34],\n", + " ]\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# your code here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Implement the attention calculation according to the formulas and create the final function `additive_attention`:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def additive_attention(\n", + " decoder_hidden_state, encoder_hidden_states, v_add, W_add_enc, W_add_dec\n", + "):\n", + " \"\"\"\n", + " decoder_hidden_state: np.array of shape (n_features_dec, 1)\n", + " encoder_hidden_states: np.array of shape (n_features_enc, n_states)\n", + " v_add: np.array of shape (n_features_int, 1)\n", + " W_add_enc: np.array of shape (n_features_int, n_features_enc)\n", + " W_add_dec: np.array of shape (n_features_int, n_features_dec)\n", + "\n", + " return: np.array of shape (n_features_enc, 1)\n", + " Final attention vector\n", + " \"\"\"\n", + " # your code here\n", + " return attention_vector" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Submit the `multiplicative_attention` and `additive_attention` functions in the contest. Don’t forget to import `numpy`!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Step #2. Text classification using a pretrained language model.\n", + "\n", + "We work with the SST-2 dataset. Split the dataset into train and test sets." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# do not change the code in the block below\n", + "# __________start of block__________\n", + "\n", + "!wget https://raw.githubusercontent.com/girafe-ai/ml-course/msu_branch/homeworks/hw08_attention/holdout_texts08.npy\n", + "# __________end of block__________" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# do not change the code in the block below\n", + "# __________start of block__________\n", + "df = pd.read_csv(\n", + " \"https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv\",\n", + " delimiter=\"\\t\",\n", + " header=None,\n", + ")\n", + "texts_train = df[0].values[:5000]\n", + "y_train = df[1].values[:5000]\n", + "texts_test = df[0].values[5000:]\n", + "y_test = df[1].values[5000:]\n", + "texts_holdout = np.load(\"holdout_texts08.npy\", allow_pickle=True)\n", + "# __________end of block__________" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The rest of the code is up to you to write.\n", + "To successfully achieve the maximum score, you need to reach at least 84.5% accuracy on the test part of the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# your beautiful experiments here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Submitting the Assignment in the Contest\n", + "\n", + "Save the probabilities of belonging to class 0 and class 1, respectively, in the dictionary `out_dict`:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "out_dict = {\n", + " 'train': # np.array of size (5000, 2) with probas\n", + " 'test': # np.array of size (1920, 2) with probas\n", + " 'holdout': # np.array of size (500, 2) with probas\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Several `assert`s to check your solution:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "assert isinstance(out_dict[\"train\"], np.ndarray), \"Dict values should be numpy arrays\"\n", + "assert out_dict[\"train\"].shape == (\n", + " 5000,\n", + " 2,\n", + "), \"The predicted probas shape does not match the train set size\"\n", + "assert np.allclose(\n", + " out_dict[\"train\"].sum(axis=1), 1.0\n", + "), \"Probas do not sum up to 1 for some of the objects\"\n", + "\n", + "assert isinstance(out_dict[\"test\"], np.ndarray), \"Dict values should be numpy arrays\"\n", + "assert out_dict[\"test\"].shape == (\n", + " 1920,\n", + " 2,\n", + "), \"The predicted probas shape does not match the test set size\"\n", + "assert np.allclose(\n", + " out_dict[\"test\"].sum(axis=1), 1.0\n", + "), \"Probas do not sum up to 1 for some of the object\"\n", + "\n", + "assert isinstance(out_dict[\"holdout\"], np.ndarray), \"Dict values should be numpy arrays\"\n", + "assert out_dict[\"holdout\"].shape == (\n", + " 500,\n", + " 2,\n", + "), \"The predicted probas shape does not match the holdout set size\"\n", + "assert np.allclose(\n", + " out_dict[\"holdout\"].sum(axis=1), 1.0\n", + "), \"Probas do not sum up to 1 for some of the object\"" + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "NLP_hw01_texts.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Py3 Research", + "language": "python", + "name": "py3_research" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.7" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}