From 9ac9d517c59e04f0ad053bc05fe9a97b05853e9b Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Thu, 1 Jun 2023 12:52:21 +0530 Subject: [PATCH 01/11] Updated few links URLs in data_performance.ipynb I have updated the links for routing directly to the mentioned section which was not working earlier. Also changed the code format of 2 words. --- site/en/guide/data_performance.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 78427505020..c24c77a15c5 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -275,7 +275,7 @@ "### Prefetching\n", "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", - "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", + "While the model is executing training `steps`, the input pipeline is reading the data for `steps+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", "\n", "The `tf.data` API provides the `tf.data.Dataset.prefetch` transformation.\n", @@ -713,12 +713,12 @@ "Here is a summary of the best practices for designing performant TensorFlow\n", "input pipelines:\n", "\n", - "* [Use the `prefetch` transformation](#Pipelining) to overlap the work of a producer and consumer\n", - "* [Parallelize the data reading transformation](#Parallelizing-data-extraction) using the `interleave` transformation\n", - "* [Parallelize the `map` transformation](#Parallelizing-data-transformation) by setting the `num_parallel_calls` argument\n", - "* [Use the `cache` transformation](#Caching) to cache data in memory during the first epoch\n", - "* [Vectorize user-defined functions](#Map-and-batch) passed in to the `map` transformation\n", - "* [Reduce memory usage](#Reducing-memory-footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" + "* [Use the `prefetch` transformation](#prefetching) to overlap the work of a producer and consumer\n", + "* [Parallelize the data reading transformation](#parallelizing_data_extraction) using the `interleave` transformation\n", + "* [Parallelize the `map` transformation](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n", + "* [Use the `cache` transformation](#caching) to cache data in memory during the first epoch\n", + "* [Vectorize user-defined functions](#vectorizing_mapping) passed in to the `map` transformation\n", + "* [Reduce memory usage](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" ] }, { From bd1a00c47a4e25f5ffc2932ef56d63f3c9724197 Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Thu, 1 Jun 2023 14:15:26 +0530 Subject: [PATCH 02/11] Update data_performance.ipynb Reverted the changes as mentioned. --- site/en/guide/data_performance.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index c24c77a15c5..3a07014ae1f 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -275,7 +275,7 @@ "### Prefetching\n", "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", - "While the model is executing training `steps`, the input pipeline is reading the data for `steps+1`.\n", + "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", "\n", "The `tf.data` API provides the `tf.data.Dataset.prefetch` transformation.\n", From d105024044da4ad603322a1d949c6b569cb8b202 Mon Sep 17 00:00:00 2001 From: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Date: Tue, 18 Jul 2023 20:37:36 +0000 Subject: [PATCH 03/11] Apply nbfmt to data_performance.ipynb --- site/en/guide/data_performance.ipynb | 1 - 1 file changed, 1 deletion(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 3a07014ae1f..e9b5ddb59bb 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -1153,7 +1153,6 @@ "colab": { "collapsed_sections": [], "name": "data_performance.ipynb", - "provenance": [], "toc_visible": true }, "kernelspec": { From de3d730d7609936dc8b6332a01553d1b96cb3719 Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Tue, 26 Sep 2023 13:23:01 +0530 Subject: [PATCH 04/11] Update data_performance.ipynb Updated the file as mentioned by MarkDaoust. --- site/en/guide/data_performance.ipynb | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index e9b5ddb59bb..a4db1a18712 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -273,6 +273,8 @@ }, "source": [ "### Prefetching\n", + + "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", @@ -320,6 +322,7 @@ }, "source": [ "### Parallelizing data extraction\n", + "\n", "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n", "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n", @@ -419,6 +422,7 @@ }, "source": [ "### Parallelizing data transformation\n", + "\n", "When preparing data, input elements may need to be pre-processed.\n", "To this end, the `tf.data` API offers the `tf.data.Dataset.map` transformation, which applies a user-defined function to each element of the input dataset.\n", @@ -526,6 +530,7 @@ }, "source": [ "### Caching\n", + "\n", "The `tf.data.Dataset.cache` transformation can cache a dataset, either in memory or on local storage.\n", "This will save some operations (like file opening and data reading) from being executed during each epoch." @@ -571,6 +576,7 @@ }, "source": [ "### Vectorizing mapping\n", + "\n", "Invoking a user-defined function passed into the `map` transformation has overhead related to scheduling and executing the user-defined function.\n", "Vectorize the user-defined function (that is, have it operate over a batch of inputs at once) and apply the `batch` transformation _before_ the `map` transformation.\n", @@ -686,6 +692,7 @@ }, "source": [ "### Reducing memory footprint\n", + "\n", "A number of transformations, including `interleave`, `prefetch`, and `shuffle`, maintain an internal buffer of elements. If the user-defined function passed into the `map` transformation changes the size of the elements, then the ordering of the map transformation and the transformations that buffer elements affects the memory usage. In general, choose the order that results in lower memory footprint, unless different ordering is desirable for performance.\n", "\n", From 27fe0525f7f01cef8be83a1e623ab85c65aafce2 Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Tue, 26 Sep 2023 13:46:23 +0530 Subject: [PATCH 05/11] Update data_performance.ipynb --- site/en/guide/data_performance.ipynb | 1 - 1 file changed, 1 deletion(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index a4db1a18712..720101a4962 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -274,7 +274,6 @@ "source": [ "### Prefetching\n", - "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", From 43c6ecf9c2b05ea347b2b750675d23756c322d2e Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Wed, 27 Sep 2023 10:35:30 +0530 Subject: [PATCH 06/11] Update data_performance.ipynb --- site/en/guide/data_performance.ipynb | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 720101a4962..3e610a6b97a 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -273,8 +273,7 @@ }, "source": [ "### Prefetching\n", - - "\n", + "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", @@ -321,8 +320,7 @@ }, "source": [ "### Parallelizing data extraction\n", - - "\n", + "\n", "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n", "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n", "\n", @@ -421,8 +419,7 @@ }, "source": [ "### Parallelizing data transformation\n", - - "\n", + "\n", "When preparing data, input elements may need to be pre-processed.\n", "To this end, the `tf.data` API offers the `tf.data.Dataset.map` transformation, which applies a user-defined function to each element of the input dataset.\n", "Because input elements are independent of one another, the pre-processing can be parallelized across multiple CPU cores.\n", @@ -529,8 +526,7 @@ }, "source": [ "### Caching\n", - - "\n", + "\n", "The `tf.data.Dataset.cache` transformation can cache a dataset, either in memory or on local storage.\n", "This will save some operations (like file opening and data reading) from being executed during each epoch." ] @@ -575,8 +571,7 @@ }, "source": [ "### Vectorizing mapping\n", - - "\n", + "\n", "Invoking a user-defined function passed into the `map` transformation has overhead related to scheduling and executing the user-defined function.\n", "Vectorize the user-defined function (that is, have it operate over a batch of inputs at once) and apply the `batch` transformation _before_ the `map` transformation.\n", "\n", @@ -691,8 +686,7 @@ }, "source": [ "### Reducing memory footprint\n", - - "\n", + "\n", "A number of transformations, including `interleave`, `prefetch`, and `shuffle`, maintain an internal buffer of elements. If the user-defined function passed into the `map` transformation changes the size of the elements, then the ordering of the map transformation and the transformations that buffer elements affects the memory usage. In general, choose the order that results in lower memory footprint, unless different ordering is desirable for performance.\n", "\n", "#### Caching partial computations\n", From 9d449cbd4d1e849fbaa6a3f010915d4713288eee Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Wed, 27 Sep 2023 11:02:34 +0530 Subject: [PATCH 07/11] Update data_performance.ipynb --- site/en/guide/data_performance.ipynb | 30 ++++++++++++++-------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 3e610a6b97a..afe2c2c572d 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -135,7 +135,7 @@ "id": "3bU5gsSI-jKF" }, "source": [ - "### The dataset\n", + " The dataset\n", "\n", "Start with defining a class inheriting from `tf.data.Dataset` called `ArtificialDataset`.\n", "This dataset:\n", @@ -187,7 +187,7 @@ "id": "FGK1Y4jn-jKM" }, "source": [ - "### The training loop\n", + " The training loop\n", "\n", "Next, write a dummy training loop that measures how long it takes to iterate over a dataset.\n", "Training time is simulated." @@ -227,7 +227,7 @@ "id": "Xi8t26y7-jKV" }, "source": [ - "### The naive approach\n", + " The naive approach\n", "\n", "Start with a naive pipeline using no tricks, iterating over the dataset as-is." ] @@ -273,7 +273,7 @@ }, "source": [ "### Prefetching\n", - "\n", + "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", @@ -320,7 +320,7 @@ }, "source": [ "### Parallelizing data extraction\n", - "\n", + "\n", "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n", "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n", "\n", @@ -419,7 +419,7 @@ }, "source": [ "### Parallelizing data transformation\n", - "\n", + "\n", "When preparing data, input elements may need to be pre-processed.\n", "To this end, the `tf.data` API offers the `tf.data.Dataset.map` transformation, which applies a user-defined function to each element of the input dataset.\n", "Because input elements are independent of one another, the pre-processing can be parallelized across multiple CPU cores.\n", @@ -526,7 +526,7 @@ }, "source": [ "### Caching\n", - "\n", + "\n", "The `tf.data.Dataset.cache` transformation can cache a dataset, either in memory or on local storage.\n", "This will save some operations (like file opening and data reading) from being executed during each epoch." ] @@ -571,7 +571,7 @@ }, "source": [ "### Vectorizing mapping\n", - "\n", + "\n", "Invoking a user-defined function passed into the `map` transformation has overhead related to scheduling and executing the user-defined function.\n", "Vectorize the user-defined function (that is, have it operate over a batch of inputs at once) and apply the `batch` transformation _before_ the `map` transformation.\n", "\n", @@ -686,7 +686,7 @@ }, "source": [ "### Reducing memory footprint\n", - "\n", + "\n", "A number of transformations, including `interleave`, `prefetch`, and `shuffle`, maintain an internal buffer of elements. If the user-defined function passed into the `map` transformation changes the size of the elements, then the ordering of the map transformation and the transformations that buffer elements affects the memory usage. In general, choose the order that results in lower memory footprint, unless different ordering is desirable for performance.\n", "\n", "#### Caching partial computations\n", @@ -713,12 +713,12 @@ "Here is a summary of the best practices for designing performant TensorFlow\n", "input pipelines:\n", "\n", - "* [Use the `prefetch` transformation](#prefetching) to overlap the work of a producer and consumer\n", - "* [Parallelize the data reading transformation](#parallelizing_data_extraction) using the `interleave` transformation\n", - "* [Parallelize the `map` transformation](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n", - "* [Use the `cache` transformation](#caching) to cache data in memory during the first epoch\n", - "* [Vectorize user-defined functions](#vectorizing_mapping) passed in to the `map` transformation\n", - "* [Reduce memory usage](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" + "* [\Use the `prefetch` transformation\](#prefetching) to overlap the work of a producer and consumer\n", + "* [\Parallelize the data reading transformation\](#parallelizing_data_extraction) using the `interleave` transformation\n", + "* [\Parallelize the `map` transformation\](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n", + "* [\Use the `cache` transformation\](#caching) to cache data in memory during the first epoch\n", + "* [\Vectorize user-defined functions\](#vectorizing_mapping) passed in to the `map` transformation\n", + "* [\Reduce memory usage\](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" ] }, { From 46e3b35a1b9190431d2e5d2b70ae11eb5b95fd3c Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Wed, 27 Sep 2023 11:05:48 +0530 Subject: [PATCH 08/11] Update data_performance.ipynb --- site/en/guide/data_performance.ipynb | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index afe2c2c572d..1a5ad0be3c2 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -713,12 +713,12 @@ "Here is a summary of the best practices for designing performant TensorFlow\n", "input pipelines:\n", "\n", - "* [\Use the `prefetch` transformation\](#prefetching) to overlap the work of a producer and consumer\n", - "* [\Parallelize the data reading transformation\](#parallelizing_data_extraction) using the `interleave` transformation\n", - "* [\Parallelize the `map` transformation\](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n", - "* [\Use the `cache` transformation\](#caching) to cache data in memory during the first epoch\n", - "* [\Vectorize user-defined functions\](#vectorizing_mapping) passed in to the `map` transformation\n", - "* [\Reduce memory usage\](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" + "* [Use the `prefetch` transformation](#prefetching) to overlap the work of a producer and consumer\n", + "* [Parallelize the data reading transformation](#parallelizing_data_extraction) using the `interleave` transformation\n", + "* [Parallelize the `map` transformation](#parallelizing_data_transformation) by setting the `num_parallel_calls` argument\n", + "* [Use the `cache` transformation](#caching) to cache data in memory during the first epoch\n", + "* [Vectorize user-defined functions](#vectorizing_mapping) passed in to the `map` transformation\n", + "* [Reduce memory usage](#reducing_memory_footprint) when applying the `interleave`, `prefetch`, and `shuffle` transformations" ] }, { From 3410fd3164b0cb2acbe37ff458387f224bba7943 Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Wed, 27 Sep 2023 11:17:12 +0530 Subject: [PATCH 09/11] Update data_performance.ipynb --- site/en/guide/data_performance.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 1a5ad0be3c2..bffdf21524e 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -273,7 +273,7 @@ }, "source": [ "### Prefetching\n", - "\n", + "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", @@ -320,7 +320,7 @@ }, "source": [ "### Parallelizing data extraction\n", - "\n", + "\n", "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n", "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n", "\n", From 89bee847f82ce23c719933ba4396efe6eb94b022 Mon Sep 17 00:00:00 2001 From: RenuPatelGoogle <89264621+RenuPatelGoogle@users.noreply.github.com> Date: Wed, 27 Sep 2023 12:02:25 +0530 Subject: [PATCH 10/11] Update data_performance.ipynb Final copy of change --- site/en/guide/data_performance.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index bffdf21524e..9476e7bab3f 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -135,7 +135,7 @@ "id": "3bU5gsSI-jKF" }, "source": [ - " The dataset\n", + "### The dataset\n", "\n", "Start with defining a class inheriting from `tf.data.Dataset` called `ArtificialDataset`.\n", "This dataset:\n", @@ -187,7 +187,7 @@ "id": "FGK1Y4jn-jKM" }, "source": [ - " The training loop\n", + "### The training loop\n", "\n", "Next, write a dummy training loop that measures how long it takes to iterate over a dataset.\n", "Training time is simulated." @@ -227,7 +227,7 @@ "id": "Xi8t26y7-jKV" }, "source": [ - " The naive approach\n", + "### The naive approach\n", "\n", "Start with a naive pipeline using no tricks, iterating over the dataset as-is." ] From e4210b940833a4e4e7c5d35a19ca46cafc27c7c5 Mon Sep 17 00:00:00 2001 From: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Date: Wed, 27 Sep 2023 07:02:29 +0000 Subject: [PATCH 11/11] Lint Better performance with the tf.data API guide --- site/en/guide/data_performance.ipynb | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/site/en/guide/data_performance.ipynb b/site/en/guide/data_performance.ipynb index 9476e7bab3f..81d8b3fd5b3 100644 --- a/site/en/guide/data_performance.ipynb +++ b/site/en/guide/data_performance.ipynb @@ -273,7 +273,9 @@ }, "source": [ "### Prefetching\n", + "\n", "\n", + "\n", "Prefetching overlaps the preprocessing and model execution of a training step.\n", "While the model is executing training step `s`, the input pipeline is reading the data for step `s+1`.\n", "Doing so reduces the step time to the maximum (as opposed to the sum) of the training and the time it takes to extract the data.\n", @@ -320,7 +322,9 @@ }, "source": [ "### Parallelizing data extraction\n", + "\n", "\n", + "\n", "In a real-world setting, the input data may be stored remotely (for example, on Google Cloud Storage or HDFS).\n", "A dataset pipeline that works well when reading data locally might become bottlenecked on I/O when reading data remotely because of the following differences between local and remote storage:\n", "\n", @@ -419,7 +423,9 @@ }, "source": [ "### Parallelizing data transformation\n", + "\n", "\n", + "\n", "When preparing data, input elements may need to be pre-processed.\n", "To this end, the `tf.data` API offers the `tf.data.Dataset.map` transformation, which applies a user-defined function to each element of the input dataset.\n", "Because input elements are independent of one another, the pre-processing can be parallelized across multiple CPU cores.\n", @@ -526,7 +532,9 @@ }, "source": [ "### Caching\n", + "\n", "\n", + "\n", "The `tf.data.Dataset.cache` transformation can cache a dataset, either in memory or on local storage.\n", "This will save some operations (like file opening and data reading) from being executed during each epoch." ] @@ -571,7 +579,9 @@ }, "source": [ "### Vectorizing mapping\n", + "\n", "\n", + "\n", "Invoking a user-defined function passed into the `map` transformation has overhead related to scheduling and executing the user-defined function.\n", "Vectorize the user-defined function (that is, have it operate over a batch of inputs at once) and apply the `batch` transformation _before_ the `map` transformation.\n", "\n", @@ -686,7 +696,9 @@ }, "source": [ "### Reducing memory footprint\n", + "\n", "\n", + "\n", "A number of transformations, including `interleave`, `prefetch`, and `shuffle`, maintain an internal buffer of elements. If the user-defined function passed into the `map` transformation changes the size of the elements, then the ordering of the map transformation and the transformations that buffer elements affects the memory usage. In general, choose the order that results in lower memory footprint, unless different ordering is desirable for performance.\n", "\n", "#### Caching partial computations\n",