-
Notifications
You must be signed in to change notification settings - Fork 660
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added new page under Data types and IO section for tensorflow_types i…
…n flyte documentation (#5807) * Trying to add a new page under Data types and IO section for tensorflow types Signed-off-by: sumana sree <[email protected]> * Updated tensorflow_type.md file Signed-off-by: sumana sree <[email protected]> * updated file Signed-off-by: sumana sree <[email protected]> * corrected lines reference according to doccumentation. Signed-off-by: sumana sree <[email protected]> * changed lines of reference Signed-off-by: Sumana Sree Angajala <[email protected]> * Updated reference links of the example code snippets. Signed-off-by: Sumana Sree Angajala <[email protected]> * fixed errors Signed-off-by: Sumana Sree Angajala <[email protected]> * Apply suggestions from code review Co-authored-by: Nikki Everett <[email protected]> Signed-off-by: Sumana Sree Angajala <[email protected]> --------- Signed-off-by: sumana sree <[email protected]> Signed-off-by: Sumana Sree Angajala <[email protected]> Co-authored-by: Nikki Everett <[email protected]>
- Loading branch information
1 parent
a3ef15f
commit dcdd472
Showing
2 changed files
with
84 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -148,4 +148,5 @@ accessing_attributes | |
pytorch_type | ||
enum_type | ||
pickle_type | ||
tensorflow_type | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
(tensorflow_type)= | ||
|
||
# TensorFlow types | ||
|
||
```{eval-rst} | ||
.. tags:: MachineLearning, Basic | ||
``` | ||
|
||
This document outlines the TensorFlow types available in Flyte, which facilitate the integration of TensorFlow models and datasets in Flyte workflows. | ||
|
||
### Import necessary libraries and modules | ||
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py | ||
:caption: data_types_and_io/tensorflow_type.py | ||
:lines: 2-14 | ||
``` | ||
|
||
## Tensorflow model | ||
Flyte supports the TensorFlow SavedModel format for serializing and deserializing `tf.keras.Model` instances. The `TensorFlowModelTransformer` is responsible for handling these transformations. | ||
|
||
### Transformer | ||
- **Name:** TensorFlow Model | ||
- **Class:** `TensorFlowModelTransformer` | ||
- **Python Type:** `tf.keras.Model` | ||
- **Blob Format:** `TensorFlowModel` | ||
- **Dimensionality:** `MULTIPART` | ||
|
||
### Usage | ||
The `TensorFlowModelTransformer` allows you to save a TensorFlow model to a remote location and retrieve it later in your Flyte workflows. | ||
|
||
```{note} | ||
To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. | ||
``` | ||
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py | ||
:caption: data_types_and_io/tensorflow_type.py | ||
:lines: 16-33 | ||
``` | ||
|
||
## TFRecord files | ||
Flyte supports TFRecord files through the TFRecordFile type, which can handle serialized TensorFlow records. The TensorFlowRecordFileTransformer manages the conversion of TFRecord files to and from Flyte literals. | ||
|
||
### Transformer | ||
- **Name:** TensorFlow Record File | ||
- **Class:** `TensorFlowRecordFileTransformer` | ||
- **Blob Format:** `TensorFlowRecord` | ||
- **Dimensionality:** `SINGLE` | ||
|
||
### Usage | ||
The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord files, making it easy to read and write data in TensorFlow's TFRecord format. | ||
|
||
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py | ||
:caption: data_types_and_io/tensorflow_type.py | ||
:lines: 35-45 | ||
``` | ||
|
||
## TFRecord directories | ||
Flyte supports directories containing multiple TFRecord files through the `TFRecordsDirectory type`. The `TensorFlowRecordsDirTransformer` manages the conversion of TFRecord directories to and from Flyte literals. | ||
|
||
### Transformer | ||
- **Name:** TensorFlow Record Directory | ||
- **Class:** `TensorFlowRecordsDirTransformer` | ||
- **Python Type:** `TFRecordsDirectory` | ||
- **Blob Format:** `TensorFlowRecord` | ||
- **Dimensionality:** `MULTIPART` | ||
|
||
### Usage | ||
The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFRecord files, which is useful for handling large datasets that are split across multiple files. | ||
|
||
#### Example | ||
```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py | ||
:caption: data_types_and_io/tensorflow_type.py | ||
:lines: 47-56 | ||
``` | ||
|
||
## Configuration class: `TFRecordDatasetConfig` | ||
The `TFRecordDatasetConfig` class is a data structure used to configure the parameters for creating a `tf.data.TFRecordDataset`, which allows for efficient reading of TFRecord files. This class uses the `DataClassJsonMixin` for easy JSON serialization. | ||
|
||
### Attributes | ||
- **compression_type**: (Optional) Specifies the compression method used for the TFRecord files. Possible values include an empty string (no compression), "ZLIB", or "GZIP". | ||
- **buffer_size**: (Optional) Defines the size of the read buffer in bytes. If not set, defaults will be used based on the local or remote file system. | ||
- **num_parallel_reads**: (Optional) Determines the number of files to read in parallel. A value greater than one outputs records in an interleaved order. | ||
- **name**: (Optional) Assigns a name to the operation for easier identification in the pipeline. | ||
|
||
This configuration is crucial for optimizing the reading process of TFRecord datasets, especially when dealing with large datasets or when specific performance tuning is required. |