"
+ "# Lab 6: Finetuning To Follow Instructions"
]
},
{
"cell_type": "markdown",
- "id": "c2520ec3-722f-4f44-bdd1-885b13e7afbf",
- "metadata": {
- "id": "c2520ec3-722f-4f44-bdd1-885b13e7afbf"
- },
+ "id": "2626e3f5",
+ "metadata": {},
"source": [
- "# Chapter 7: Finetuning To Follow Instructions"
+ "This lab covers:\n",
+ "- Preparing a dataset with instructions (instruction + response pairs)\n",
+ "- Finetuning the LLM to follow instructions\n",
+ "- Evaluation of the LLM\n",
+ "\n",
+ "\n",
+ ""
]
},
{
@@ -48,11 +40,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "matplotlib version: 3.7.1\n",
- "tiktoken version: 0.7.0\n",
- "torch version: 2.4.0\n",
- "tqdm version: 4.66.4\n",
- "tensorflow version: 2.15.0\n"
+ "matplotlib version: 3.8.2\n",
+ "tiktoken version: 0.8.0\n",
+ "torch version: 2.2.0\n",
+ "tqdm version: 4.66.2\n",
+ "tensorflow version: 2.18.0\n"
]
}
],
@@ -77,7 +69,7 @@
"id": "264fca98-2f9a-4193-b435-2abfa3b4142f"
},
"source": [
- ""
+ ""
]
},
{
@@ -87,7 +79,7 @@
"id": "8bbc68e9-75b3-41f1-ac2c-e071c3cd0813"
},
"source": [
- "## 7.1 Introduction to instruction finetuning"
+ "## 1 Introduction to instruction finetuning"
]
},
{
@@ -97,9 +89,9 @@
"id": "53dba24a-6805-496c-9a7f-c75e2d3527ab"
},
"source": [
- "- In chapter 5, we saw that pretraining an LLM involves a training procedure where it learns to generate one word at a time\n",
+ "- In lab 4, we saw that pretraining an LLM involves a training procedure where it learns to generate one word at a time\n",
"- Hence, a pretrained LLM is good at text completion, but it is not good at following instructions\n",
- "- In this chapter, we teach the LLM to follow instructions better"
+ "- In this lab, we teach the LLM to follow instructions better"
]
},
{
@@ -109,19 +101,7 @@
"id": "18dc0535-0904-44ed-beaf-9b678292ef35"
},
"source": [
- ""
- ]
- },
- {
- "cell_type": "markdown",
- "id": "b4698b23-12e0-4bd7-a140-ccb3dd71d4e8",
- "metadata": {
- "id": "b4698b23-12e0-4bd7-a140-ccb3dd71d4e8"
- },
- "source": [
- "- The topics covered in this chapter are summarized in the figure below\n",
- "\n",
- ""
+ ""
]
},
{
@@ -131,7 +111,7 @@
"id": "5384f0cf-ef3c-4436-a5fa-59bd25649f86"
},
"source": [
- "## 7.2 Preparing a dataset for supervised instruction finetuning"
+ "## 2 Preparing a dataset for supervised instruction finetuning"
]
},
{
@@ -141,7 +121,7 @@
"id": "f8b34ff8-619f-4e89-bd03-ce513269760d"
},
"source": [
- "- We will work with an instruction dataset I prepared for this chapter"
+ "- We will work with an instruction dataset that was already prepared in advance"
]
},
{
@@ -285,7 +265,7 @@
"id": "dffa4f70-44d4-4be4-89a9-2159f4885b10"
},
"source": [
- ""
+ ""
]
},
{
@@ -295,7 +275,7 @@
"id": "dd79a74e-befb-491c-be49-f777a6a5b6a6"
},
"source": [
- "- In this chapter, we use Alpaca-style prompt formatting, which was the original prompt template for instruction finetuning\n",
+ "- In this lab, we use Alpaca-style prompt formatting, which was the original prompt template for instruction finetuning\n",
"- Below, we format the input that we will pass as input to the LLM"
]
},
@@ -472,7 +452,7 @@
"id": "fcaaf606-f913-4445-8301-632ae10d387d"
},
"source": [
- "## 7.3 Organizing data into training batches"
+ "## 3 Organizing data into training batches"
]
},
{
@@ -482,7 +462,7 @@
"id": "233f63bd-9755-4d07-8884-5e2e5345cf27"
},
"source": [
- ""
+ ""
]
},
{
@@ -494,7 +474,7 @@
"source": [
"- We tackle this dataset batching in several steps, as summarized in the figure below\n",
"\n",
- ""
+ ""
]
},
{
@@ -504,9 +484,9 @@
"id": "b9af423f-aad9-4b3c-bea5-153021c04862"
},
"source": [
- "- First, we implement an `InstructionDataset` class that pre-tokenizes all inputs in the dataset, similar to the `SpamDataset` in chapter 6\n",
+ "- First, we implement an `InstructionDataset` class that pre-tokenizes all inputs in the dataset, similar to the `SpamDataset` in lab 5\n",
"\n",
- ""
+ ""
]
},
{
@@ -550,8 +530,8 @@
"id": "384f0e69-4b22-41c0-a25d-f077527eddd1"
},
"source": [
- "- Similar to chapter 6, we want to collect multiple training examples in a batch to accelerate training; this requires padding all inputs to a similar length\n",
- "- Also similar to the previous chapter, we use the `<|endoftext|>` token as a padding token"
+ "- Similar to lab 5, we want to collect multiple training examples in a batch to accelerate training; this requires padding all inputs to a similar length\n",
+ "- Also similar to the previous lab, we use the `<|endoftext|>` token as a padding token"
]
},
{
@@ -588,7 +568,7 @@
"id": "9e5bd7bc-f347-4cf8-a0c2-94cb8799e427"
},
"source": [
- "- In chapter 6, we padded all examples in a dataset to the same length\n",
+ "- In lab 5, we padded all examples in a dataset to the same length\n",
" - Here, we take a more sophisticated approach and develop a custom \"collate\" function that we can pass to the data loader\n",
" - This custom collate function pads the training examples in each batch to have the same length (but different batches can have different lengths)"
]
@@ -600,7 +580,7 @@
"id": "65c4d943-4aa8-4a44-874e-05bc6831fbd3"
},
"source": [
- ""
+ ""
]
},
{
@@ -688,7 +668,7 @@
"id": "c46832ab-39b7-45f8-b330-ac9adfa10d1b"
},
"source": [
- ""
+ ""
]
},
{
@@ -709,7 +689,7 @@
"id": "0386b6fe-3455-4e70-becd-a5a4681ba2ef"
},
"source": [
- ""
+ ""
]
},
{
@@ -792,7 +772,7 @@
"source": [
"- Next, we introduce an `ignore_index` value to replace all padding token IDs with a new value; the purpose of this `ignore_index` is that we can ignore padding values in the loss function (more on that later)\n",
"\n",
- "\n",
+ "\n",
"\n",
"- Concretely, this means that we replace the token IDs corresponding to `50256` with `-100` as illustrated below"
]
@@ -804,7 +784,7 @@
"id": "bd4bed33-956e-4b3f-a09c-586d8203109a"
},
"source": [
- ""
+ ""
]
},
{
@@ -911,7 +891,7 @@
},
"source": [
"- Let's see what this replacement by -100 accomplishes\n",
- "- For illustration purposes, let's assume we have a small classification task with 2 class labels, 0 and 1, similar to chapter 6\n",
+ "- For illustration purposes, let's assume we have a small classification task with 2 class labels, 0 and 1, similar to lab 5\n",
"- If we have the following logits values (outputs of the last layer of the model), we calculate the following loss"
]
},
@@ -1058,7 +1038,7 @@
"id": "fab8f0ed-80e8-4fd9-bf84-e5d0e0bc0a39"
},
"source": [
- ""
+ ""
]
},
{
@@ -1068,7 +1048,7 @@
"id": "bccaf048-ec95-498c-9155-d5b3ccba6c96"
},
"source": [
- "## 7.4 Creating data loaders for an instruction dataset"
+ "## 4 Creating data loaders for an instruction dataset"
]
},
{
@@ -1088,7 +1068,7 @@
"id": "9fffe390-b226-4d5c-983f-9f4da773cb82"
},
"source": [
- ""
+ ""
]
},
{
@@ -1104,7 +1084,7 @@
},
{
"cell_type": "code",
- "execution_count": 21,
+ "execution_count": 36,
"id": "etpqqWh8phKc",
"metadata": {
"colab": {
@@ -1118,7 +1098,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Device: cuda\n"
+ "Device: cpu\n"
]
}
],
@@ -1130,11 +1110,11 @@
"# which is much faster than on an Apple CPU (as measured on an M3 MacBook Air).\n",
"# However, the resulting loss values may be slightly different.\n",
"\n",
- "#if torch.cuda.is_available():\n",
+ "# if torch.cuda.is_available():\n",
"# device = torch.device(\"cuda\")\n",
- "#elif torch.backends.mps.is_available():\n",
+ "# elif torch.backends.mps.is_available():\n",
"# device = torch.device(\"mps\")\n",
- "#else:\n",
+ "# else:\n",
"# device = torch.device(\"cpu\")\n",
"\n",
"print(\"Device:\", device)"
@@ -1142,7 +1122,7 @@
},
{
"cell_type": "code",
- "execution_count": 22,
+ "execution_count": 37,
"id": "4e47fb30-c2c6-4e6d-a64c-76cc65be4a2c",
"metadata": {
"id": "4e47fb30-c2c6-4e6d-a64c-76cc65be4a2c"
@@ -1165,12 +1145,12 @@
"id": "8ff42c29-8b81-45e5-ae8d-b97cd1cf447a"
},
"source": [
- "- Next, we instantiate the data loaders similar to previous chapters, except that we now provide our own collate function for the batching process"
+ "- Next, we instantiate the data loaders similar to labs chapters, except that we now provide our own collate function for the batching process"
]
},
{
"cell_type": "code",
- "execution_count": 23,
+ "execution_count": 38,
"id": "BtWkgir6Hlpe",
"metadata": {
"id": "BtWkgir6Hlpe"
@@ -1198,7 +1178,7 @@
},
{
"cell_type": "code",
- "execution_count": 24,
+ "execution_count": 39,
"id": "1d097dc8-ad34-4f05-b435-e4147965f532",
"metadata": {
"id": "1d097dc8-ad34-4f05-b435-e4147965f532"
@@ -1238,7 +1218,7 @@
},
{
"cell_type": "code",
- "execution_count": 25,
+ "execution_count": 40,
"id": "GGs1AI3vHpnX",
"metadata": {
"colab": {
@@ -1391,7 +1371,7 @@
},
{
"cell_type": "code",
- "execution_count": 26,
+ "execution_count": 41,
"id": "21b8fd02-014f-4481-9b71-5bfee8f9dfcd",
"metadata": {
"colab": {
@@ -1411,8 +1391,7 @@
" 985, 576, 13, 198, 198, 21017, 23412, 25, 198, 464,\n",
" 5156, 318, 845, 13779, 13, 198, 198, 21017, 18261, 25,\n",
" 198, 464, 5156, 318, 355, 13779, 355, 257, 4936, 13,\n",
- " 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256],\n",
- " device='cuda:0')\n"
+ " 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256, 50256])\n"
]
}
],
@@ -1432,7 +1411,7 @@
},
{
"cell_type": "code",
- "execution_count": 27,
+ "execution_count": 42,
"id": "51649ab4-1a7e-4a9e-92c5-950a24fde211",
"metadata": {
"colab": {
@@ -1452,8 +1431,7 @@
" 576, 13, 198, 198, 21017, 23412, 25, 198, 464, 5156,\n",
" 318, 845, 13779, 13, 198, 198, 21017, 18261, 25, 198,\n",
" 464, 5156, 318, 355, 13779, 355, 257, 4936, 13, 50256,\n",
- " -100, -100, -100, -100, -100, -100, -100, -100, -100],\n",
- " device='cuda:0')\n"
+ " -100, -100, -100, -100, -100, -100, -100, -100, -100])\n"
]
}
],
@@ -1468,7 +1446,7 @@
"id": "d6aad445-8f19-4238-b9bf-db80767fb91a"
},
"source": [
- "## 7.5 Loading a pretrained LLM"
+ "## 5 Loading a pretrained LLM"
]
},
{
@@ -1478,7 +1456,7 @@
"id": "5a5c07d1-4fc9-4846-94cf-b11a085a667b"
},
"source": [
- "- In this section, we load a pretrained GPT model using the same code that we used in section 5.5 of chapter 5 and section 6.4 in chapter 6"
+ "- In this section, we load a pretrained GPT model"
]
},
{
@@ -1488,7 +1466,7 @@
"id": "8d1b438f-88af-413f-96a9-f059c6c55fc4"
},
"source": [
- ""
+ ""
]
},
{
@@ -1498,12 +1476,12 @@
"id": "8c68eda7-e02e-4caa-846b-ca6dbd396ca2"
},
"source": [
- "- However, instead of loading the smallest 124 million parameter model, we load the medium version with 355 million parameters since the 124 million model is too small for achieving qualitatively reasonable results via instruction finetuning"
+ "- We load the medium version with 355 million parameters for achieving qualitatively reasonable results via instruction finetuning"
]
},
{
"cell_type": "code",
- "execution_count": 28,
+ "execution_count": null,
"id": "0d249d67-5eba-414e-9bd2-972ebf01329d",
"metadata": {
"colab": {
@@ -1512,31 +1490,10 @@
"id": "0d249d67-5eba-414e-9bd2-972ebf01329d",
"outputId": "3f08f5e1-ca7c-406d-e2ae-1b5fcafad3f2"
},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "2024-07-25 02:22:49.969483: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
- "2024-07-25 02:22:50.023103: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
- "2024-07-25 02:22:50.023136: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
- "2024-07-25 02:22:50.024611: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
- "2024-07-25 02:22:50.033304: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
- "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
- "2024-07-25 02:22:51.282247: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
- "checkpoint: 100%|██████████| 77.0/77.0 [00:00<00:00, 169kiB/s]\n",
- "encoder.json: 100%|██████████| 1.04M/1.04M [00:00<00:00, 2.43MiB/s]\n",
- "hparams.json: 100%|██████████| 91.0/91.0 [00:00<00:00, 168kiB/s]\n",
- "model.ckpt.data-00000-of-00001: 100%|██████████| 1.42G/1.42G [00:56<00:00, 25.0MiB/s]\n",
- "model.ckpt.index: 100%|██████████| 10.4k/10.4k [00:00<00:00, 16.5MiB/s]\n",
- "model.ckpt.meta: 100%|██████████| 927k/927k [00:00<00:00, 1.96MiB/s]\n",
- "vocab.bpe: 100%|██████████| 456k/456k [00:00<00:00, 1.53MiB/s]\n"
- ]
- }
- ],
+ "outputs": [],
"source": [
- "from gpt_download import download_and_load_gpt2\n",
- "from previous_chapters import GPTModel, load_weights_into_gpt\n",
+ "from helper_functions import download_and_load_gpt2, load_weights_into_gpt\n",
+ "from previous_labs import GPTModel\n",
"\n",
"\n",
"BASE_CONFIG = {\n",
@@ -1580,7 +1537,7 @@
},
{
"cell_type": "code",
- "execution_count": 29,
+ "execution_count": 44,
"id": "7bd32b7c-5b44-4d25-a09f-46836802ca74",
"metadata": {
"colab": {
@@ -1610,14 +1567,14 @@
},
{
"cell_type": "code",
- "execution_count": 30,
+ "execution_count": 45,
"id": "2e3e68e0-2627-4c65-b4e7-1e0667e4f6fa",
"metadata": {
"id": "2e3e68e0-2627-4c65-b4e7-1e0667e4f6fa"
},
"outputs": [],
"source": [
- "from previous_chapters import (\n",
+ "from previous_labs import (\n",
" generate,\n",
" text_to_token_ids,\n",
" token_ids_to_text\n",
@@ -1646,7 +1603,7 @@
},
{
"cell_type": "code",
- "execution_count": 31,
+ "execution_count": 46,
"id": "ba4a55bf-a245-48d8-beda-2838a58fb5ba",
"metadata": {
"colab": {
@@ -1694,7 +1651,7 @@
"id": "70d27b9d-a942-4cf5-b797-848c5f01e723"
},
"source": [
- "## 7.6 Finetuning the LLM on instruction data"
+ "## 6 Finetuning the LLM on instruction data"
]
},
{
@@ -1706,21 +1663,21 @@
"source": [
"- In this section, we finetune the model\n",
"\n",
- "\n",
+ "\n",
"\n",
- "- Note that we can reuse all the loss calculation and training functions that we used in previous chapters"
+ "- Note that we can reuse all the loss calculation and training functions that we used in previous labs"
]
},
{
"cell_type": "code",
- "execution_count": 32,
+ "execution_count": 47,
"id": "65444865-df87-4d98-9faf-875e1c4be860",
"metadata": {
"id": "65444865-df87-4d98-9faf-875e1c4be860"
},
"outputs": [],
"source": [
- "from previous_chapters import (\n",
+ "from previous_labs import (\n",
" calc_loss_loader,\n",
" train_model_simple\n",
")"
@@ -1733,12 +1690,12 @@
"id": "00083059-aa41-4d37-8a17-1c72d1b1ca00"
},
"source": [
- "- Let's calculate the initial training and validation set loss before we start training (as in previous chapters, the goal is to minimize the loss)"
+ "- Let's calculate the initial training and validation set loss before we start training (as in previous labs, the goal is to minimize the loss)"
]
},
{
"cell_type": "code",
- "execution_count": 33,
+ "execution_count": 48,
"id": "d99fc6f8-63b2-43da-adbb-a7b6b92c8dd5",
"metadata": {
"colab": {
@@ -1752,8 +1709,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
- "Training loss: 3.82590970993042\n",
- "Validation loss: 3.761933755874634\n"
+ "Training loss: 3.8258948802947996\n",
+ "Validation loss: 3.7619192123413088\n"
]
}
],
@@ -1777,7 +1734,7 @@
"id": "12a6da8f-15b3-42b0-a136-619b7a35c3e9"
},
"source": [
- "- Note that the training is a bit more expensive than in previous chapters since we are using a larger model (355 million instead of 124 million parameters)\n",
+ "- Note that the training is a bit more expensive than in previous labs since we are using a larger model (355 million parameters)\n",
"- The runtimes for various devices are shown for reference below (running this notebook on a compatible GPU device requires no changes to the code)"
]
},
@@ -1801,14 +1758,12 @@
"| gpt2-small (124M) | GPU (L4) | 0.69 minutes |\n",
"| gpt2-small (124M) | GPU (A100) | 0.39 minutes |\n",
"\n",
- "\n",
- "\n",
- "- I ran this notebook using the `\"gpt2-medium (355M)\"` model"
+ "\n"
]
},
{
"cell_type": "code",
- "execution_count": 34,
+ "execution_count": 49,
"id": "78bcf83a-1fff-4540-97c1-765c4016d5e3",
"metadata": {
"colab": {
@@ -1823,55 +1778,55 @@
"output_type": "stream",
"text": [
"Ep 1 (Step 000000): Train loss 2.637, Val loss 2.626\n",
- "Ep 1 (Step 000005): Train loss 1.174, Val loss 1.102\n",
- "Ep 1 (Step 000010): Train loss 0.872, Val loss 0.944\n",
+ "Ep 1 (Step 000005): Train loss 1.174, Val loss 1.103\n",
+ "Ep 1 (Step 000010): Train loss 0.872, Val loss 0.945\n",
"Ep 1 (Step 000015): Train loss 0.857, Val loss 0.906\n",
"Ep 1 (Step 000020): Train loss 0.776, Val loss 0.881\n",
"Ep 1 (Step 000025): Train loss 0.754, Val loss 0.859\n",
"Ep 1 (Step 000030): Train loss 0.799, Val loss 0.836\n",
"Ep 1 (Step 000035): Train loss 0.714, Val loss 0.808\n",
"Ep 1 (Step 000040): Train loss 0.672, Val loss 0.806\n",
- "Ep 1 (Step 000045): Train loss 0.633, Val loss 0.789\n",
- "Ep 1 (Step 000050): Train loss 0.663, Val loss 0.783\n",
- "Ep 1 (Step 000055): Train loss 0.760, Val loss 0.763\n",
+ "Ep 1 (Step 000045): Train loss 0.633, Val loss 0.790\n",
+ "Ep 1 (Step 000050): Train loss 0.662, Val loss 0.783\n",
+ "Ep 1 (Step 000055): Train loss 0.760, Val loss 0.764\n",
"Ep 1 (Step 000060): Train loss 0.719, Val loss 0.743\n",
- "Ep 1 (Step 000065): Train loss 0.653, Val loss 0.735\n",
+ "Ep 1 (Step 000065): Train loss 0.652, Val loss 0.735\n",
"Ep 1 (Step 000070): Train loss 0.532, Val loss 0.729\n",
- "Ep 1 (Step 000075): Train loss 0.569, Val loss 0.728\n",
+ "Ep 1 (Step 000075): Train loss 0.569, Val loss 0.729\n",
"Ep 1 (Step 000080): Train loss 0.605, Val loss 0.725\n",
"Ep 1 (Step 000085): Train loss 0.509, Val loss 0.709\n",
"Ep 1 (Step 000090): Train loss 0.562, Val loss 0.691\n",
"Ep 1 (Step 000095): Train loss 0.500, Val loss 0.681\n",
- "Ep 1 (Step 000100): Train loss 0.503, Val loss 0.677\n",
+ "Ep 1 (Step 000100): Train loss 0.502, Val loss 0.677\n",
"Ep 1 (Step 000105): Train loss 0.564, Val loss 0.670\n",
- "Ep 1 (Step 000110): Train loss 0.555, Val loss 0.666\n",
+ "Ep 1 (Step 000110): Train loss 0.555, Val loss 0.667\n",
"Ep 1 (Step 000115): Train loss 0.508, Val loss 0.664\n",
"Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The meal is prepared every day by the chef.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive:\n",
"Ep 2 (Step 000120): Train loss 0.435, Val loss 0.672\n",
- "Ep 2 (Step 000125): Train loss 0.451, Val loss 0.687\n",
- "Ep 2 (Step 000130): Train loss 0.447, Val loss 0.683\n",
- "Ep 2 (Step 000135): Train loss 0.405, Val loss 0.682\n",
- "Ep 2 (Step 000140): Train loss 0.409, Val loss 0.681\n",
- "Ep 2 (Step 000145): Train loss 0.369, Val loss 0.680\n",
- "Ep 2 (Step 000150): Train loss 0.382, Val loss 0.675\n",
- "Ep 2 (Step 000155): Train loss 0.413, Val loss 0.675\n",
- "Ep 2 (Step 000160): Train loss 0.415, Val loss 0.683\n",
+ "Ep 2 (Step 000125): Train loss 0.451, Val loss 0.686\n",
+ "Ep 2 (Step 000130): Train loss 0.447, Val loss 0.682\n",
+ "Ep 2 (Step 000135): Train loss 0.404, Val loss 0.682\n",
+ "Ep 2 (Step 000140): Train loss 0.410, Val loss 0.681\n",
+ "Ep 2 (Step 000145): Train loss 0.369, Val loss 0.681\n",
+ "Ep 2 (Step 000150): Train loss 0.381, Val loss 0.676\n",
+ "Ep 2 (Step 000155): Train loss 0.412, Val loss 0.676\n",
+ "Ep 2 (Step 000160): Train loss 0.415, Val loss 0.684\n",
"Ep 2 (Step 000165): Train loss 0.379, Val loss 0.686\n",
- "Ep 2 (Step 000170): Train loss 0.323, Val loss 0.681\n",
- "Ep 2 (Step 000175): Train loss 0.337, Val loss 0.669\n",
- "Ep 2 (Step 000180): Train loss 0.392, Val loss 0.656\n",
- "Ep 2 (Step 000185): Train loss 0.415, Val loss 0.657\n",
- "Ep 2 (Step 000190): Train loss 0.340, Val loss 0.648\n",
- "Ep 2 (Step 000195): Train loss 0.330, Val loss 0.634\n",
- "Ep 2 (Step 000200): Train loss 0.310, Val loss 0.634\n",
- "Ep 2 (Step 000205): Train loss 0.352, Val loss 0.630\n",
- "Ep 2 (Step 000210): Train loss 0.367, Val loss 0.630\n",
- "Ep 2 (Step 000215): Train loss 0.394, Val loss 0.635\n",
- "Ep 2 (Step 000220): Train loss 0.299, Val loss 0.648\n",
- "Ep 2 (Step 000225): Train loss 0.346, Val loss 0.661\n",
- "Ep 2 (Step 000230): Train loss 0.292, Val loss 0.659\n",
+ "Ep 2 (Step 000170): Train loss 0.323, Val loss 0.682\n",
+ "Ep 2 (Step 000175): Train loss 0.337, Val loss 0.670\n",
+ "Ep 2 (Step 000180): Train loss 0.393, Val loss 0.658\n",
+ "Ep 2 (Step 000185): Train loss 0.416, Val loss 0.659\n",
+ "Ep 2 (Step 000190): Train loss 0.340, Val loss 0.650\n",
+ "Ep 2 (Step 000195): Train loss 0.330, Val loss 0.637\n",
+ "Ep 2 (Step 000200): Train loss 0.310, Val loss 0.637\n",
+ "Ep 2 (Step 000205): Train loss 0.352, Val loss 0.632\n",
+ "Ep 2 (Step 000210): Train loss 0.367, Val loss 0.631\n",
+ "Ep 2 (Step 000215): Train loss 0.396, Val loss 0.635\n",
+ "Ep 2 (Step 000220): Train loss 0.301, Val loss 0.649\n",
+ "Ep 2 (Step 000225): Train loss 0.349, Val loss 0.662\n",
+ "Ep 2 (Step 000230): Train loss 0.294, Val loss 0.658\n",
"Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: Convert the active sentence to passive: 'The chef cooks the meal every day.' ### Response: The meal is cooked every day by the chef.<|endoftext|>The following is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: What is the capital of the United Kingdom\n",
- "Training completed in 1.84 minutes.\n"
+ "Training completed in 80.16 minutes.\n"
]
}
],
@@ -1911,7 +1866,7 @@
},
{
"cell_type": "code",
- "execution_count": 35,
+ "execution_count": 50,
"id": "4acd368b-1403-4807-a218-9102e35bfdbb",
"metadata": {
"colab": {
@@ -1924,7 +1879,7 @@
"outputs": [
{
"data": {
- "image/png": "\n",
+ "image/png": "",
"text/plain": [
"