From 01a73ab0f92c9527b22e0e215f9b8210726b549e Mon Sep 17 00:00:00 2001 From: Rishi Chandra Date: Wed, 11 Dec 2024 20:17:10 +0000 Subject: [PATCH] typos --- examples/ML+DL-Examples/Optuna-Spark/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ML+DL-Examples/Optuna-Spark/README.md b/examples/ML+DL-Examples/Optuna-Spark/README.md index 6b790fc8..737d6bef 100644 --- a/examples/ML+DL-Examples/Optuna-Spark/README.md +++ b/examples/ML+DL-Examples/Optuna-Spark/README.md @@ -228,7 +228,7 @@ Application parallelism with JoblibSpark: ###### Data I/O: Since each worker requires the full dataset to perform hyperparameter tuning, there are two strategies to get the data into worker memory: - **Worker I/O**: *each worker reads the dataset* from the filepath once the task has begun. In practice, this requires the dataset to be written to a distributed file system accessible to all workers prior to tuning. The `optuna-joblibspark` notebook demonstrates this. - - **Spark I/O**: Spark reads the dataset and **creates a copy of the dataset for each worker**, then maps the tuning task onto each copy. In practice, this enables the code to be chained to other Dataframe operations (e.g. ETL stages) without the intermediate step of writing to DBFS, at the cost of some overhead during duplication. The `optuna-dataframe` notebook demonstrates this. + - **Spark I/O**: Spark reads the dataset and *creates a copy of the dataset for each worker*, then maps the tuning task onto each copy. In practice, this enables the code to be chained to other Dataframe operations (e.g. ETL stages) without the intermediate step of writing to DBFS, at the cost of some overhead during duplication. The `optuna-dataframe` notebook demonstrates this. - To achieve this, we coalesce the input Dataframe to a single partition, and recursively self-union until we have the desired number of copies (number of workers). Thus each partition will contain a duplicate of the entire dataset, and the Optuna task can be mapped directly onto the partitions.