diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb
index 78427505020..81d8b3fd5b3 100644
--- a/site/en/guide/data_performance.ipynb
+++ b/site/en/guide/data_performance.ipynb
@@ -274,6 +274,8 @@
       "source": [
         "### Prefetching\n",
         "\n",
+        "<a name=\"prefetching\"></a>\n",
+        "\n",
         "Prefetching overlaps the preprocessing and model execution of a training step.\n",
         "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n",
         "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n",
@@ -321,6 +323,8 @@
       "source": [
         "### Parallelizing data extraction\n",
         "\n",
+        "<a name=\"parallelizing_data_extraction\"></a>\n",
+        "\n",
         "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n",
         "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n",
         "\n",
@@ -420,6 +424,8 @@
       "source": [
         "### Parallelizing data transformation\n",
         "\n",
+        "<a name=\"parallelizing_data_transformation\"></a>\n",
+        "\n",
         "When preparing data, input elements may need to be pre-processed.\n",
         "To this end, the `tf.data` API offers the `tf.data.Dataset.map` transformation, which applies a user-defined function to each element of the input dataset.\n",
         "Because input elements are independent of one another, the pre-processing can be parallelized across multiple CPU cores.\n",
@@ -527,6 +533,8 @@
       "source": [
         "### Caching\n",
         "\n",
+        "<a name=\"caching\"></a>\n",
+        "\n",
         "The `tf.data.Dataset.cache` transformation can cache a dataset, either in memory or on local storage.\n",
         "This will save some operations (like file opening and data reading) from being executed during each epoch."
       ]
@@ -572,6 +580,8 @@
       "source": [
         "### Vectorizing mapping\n",
         "\n",
+        "<a name=\"vectorizing_mapping\"></a>\n",
+        "\n",
         "Invoking a user-defined function passed into the `map` transformation has overhead related to scheduling and executing the user-defined function.\n",
         "Vectorize the user-defined function (that is, have it operate over a batch of inputs at once) and apply the `batch` transformation _before_ the `map` transformation.\n",
         "\n",
@@ -687,6 +697,8 @@
       "source": [
         "### Reducing memory footprint\n",
         "\n",
+        "<a name=\"reducing_memory_footprint\"></a>\n",
+        "\n",
         "A number of transformations, including `interleave`, `prefetch`, and `shuffle`, maintain an internal buffer of elements. If the user-defined function passed into the `map` transformation changes the size of the elements, then the ordering of the map transformation and the transformations that buffer elements affects the memory usage. In general, choose the order that results in lower memory footprint, unless different ordering is desirable for performance.\n",
         "\n",
         "#### Caching partial computations\n",
@@ -713,12 +725,12 @@
         "Here is a summary of the best practices for designing performant TensorFlow\n",
         "input pipelines:\n",
         "\n",
-        "*   [Use the `prefetch` transformation](#Pipelining) to overlap the work of a producer and consumer\n",
-        "*   [Parallelize the data reading transformation](#Parallelizing-data-extraction) using the `interleave` transformation\n",
-        "*   [Parallelize the `map` transformation](#Parallelizing-data-transformation) by setting the `num_parallel_calls` argument\n",
-        "*   [Use the `cache` transformation](#Caching) to cache data in memory during the first epoch\n",
-        "*   [Vectorize user-defined functions](#Map-and-batch) passed in to the `map` transformation\n",
-        "*   [Reduce memory usage](#Reducing-memory-footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations"
+        "*   [Use the `prefetch` transformation](#prefetching) to overlap the work of a producer and consumer\n",
+        "*   [Parallelize the data reading transformation](#parallelizing_data_extraction) using the `interleave` transformation\n",
+        "*   [Parallelize the `map` transformation](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n",
+        "*   [Use the `cache` transformation](#caching) to cache data in memory during the first epoch\n",
+        "*   [Vectorize user-defined functions](#vectorizing_mapping) passed in to the `map` transformation\n",
+        "*   [Reduce memory usage](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations"
       ]
     },
     {
@@ -1153,7 +1165,6 @@
     "colab": {
       "collapsed_sections": [],
       "name": "data_performance.ipynb",
-      "provenance": [],
       "toc_visible": true
     },
     "kernelspec": {