From d614a02b99aa042ddb688574f6d900f0a3c69d81 Mon Sep 17 00:00:00 2001 From: sumana sree Date: Fri, 4 Oct 2024 17:34:14 +0530 Subject: [PATCH 1/8] Trying to add a new page under Data types and IO section for tensorflow types Signed-off-by: sumana sree --- docs/user_guide/data_types_and_io/index.md | 1 + .../data_types_and_io/tensorflow_type.md | 83 +++++++++++++++++++ 2 files changed, 84 insertions(+) create mode 100644 docs/user_guide/data_types_and_io/tensorflow_type.md diff --git a/docs/user_guide/data_types_and_io/index.md b/docs/user_guide/data_types_and_io/index.md index d03df92804..3280054696 100644 --- a/docs/user_guide/data_types_and_io/index.md +++ b/docs/user_guide/data_types_and_io/index.md @@ -148,4 +148,5 @@ accessing_attributes pytorch_type enum_type pickle_type +tensorflow_type ``` diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md new file mode 100644 index 0000000000..e6fe5c561a --- /dev/null +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -0,0 +1,83 @@ +(tensorflow_type)= + +# TensorFlow types + +```{eval-rst} +.. tags:: MachineLearning, Basic +``` + +This document outlines the TensorFlow types available in Flyte, which facilitate the integration of TensorFlow models and datasets in Flyte workflows. + +## Tensorflow Model +Flyte supports the TensorFlow SavedModel format for serializing and deserializing `tf.keras.Model` instances. The `TensorFlowModelTransformer` is responsible for handling these transformations. + +### Transformer +- **Name:** TensorFlow Model +- **Class:** `TensorFlowModelTransformer` +- **Python Type:** `tf.keras.Model` +- **Blob Format:** `TensorFlowModel` +- **Dimensionality:** `MULTIPART` + +### Usage +The `TensorFlowModelTransformer` allows you to save a TensorFlow model to a remote location and retrieve it later in your Flyte workflows. + +```{note} +To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. +``` +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +:caption: data_types_and_io/tensorflow_type.py +:lines: 2-24 +``` + +## TFRecord Files +Flyte supports TFRecord files through the TFRecordFile type, which can handle serialized TensorFlow records. The TensorFlowRecordFileTransformer manages the conversion of TFRecord files to and from Flyte literals. + +### Transformer +- **Name:** TensorFlow Record File +- **Class:** `TensorFlowRecordFileTransformer` +- **Blob Format:** `TensorFlowRecord` +- **Dimensionality:** `SINGLE` + +### Usage +The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord files, making it easy to read and write data in TensorFlow's TFRecord format. + +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +:caption: data_types_and_io/tensorflow_type.py +:lines: 27-43 +``` + +## TFRecord Directories +Flyte supports directories containing multiple TFRecord files through the `TFRecordsDirectory type`. The `TensorFlowRecordsDirTransformer` manages the conversion of TFRecord directories to and from Flyte literals. + +### Transformer +- **Name:** TensorFlow Record Directory +- **Class:** `TensorFlowRecordsDirTransformer` +- **Python Type:** `TFRecordsDirectory` +- **Blob Format:** `TensorFlowRecord` +- **Dimensionality:** `MULTIPART` + +### Usage +The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFRecord files, which is useful for handling large datasets that are split across multiple files. + +#### Example +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +:caption: data_types_and_io/tensorflow_type.py +:lines: 46-65 +``` + +## Configuration Class: `TFRecordDatasetConfig` +The `TFRecordDatasetConfig` class is a data structure used to configure the parameters for creating a `tf.data.TFRecordDataset`, which allows for efficient reading of TFRecord files. This class uses the `DataClassJsonMixin` for easy JSON serialization. + +### Attributes: +- **compression_type**: (Optional) Specifies the compression method used for the TFRecord files. Possible values include an empty string (no compression), "ZLIB", or "GZIP". +- **buffer_size**: (Optional) Defines the size of the read buffer in bytes. If not set, defaults will be used based on the local or remote file system. +- **num_parallel_reads**: (Optional) Determines the number of files to read in parallel. A value greater than one outputs records in an interleaved order. +- **name**: (Optional) Assigns a name to the operation for easier identification in the pipeline. + +This configuration is crucial for optimizing the reading process of TFRecord datasets, especially when dealing with large datasets or when specific performance tuning is required. + +#### Example +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +:caption: data_types_and_io/tensorflow_type.py +:lines: 67-86 +``` From ea934e45f86cbc3fbb3c9f236bf5555968666839 Mon Sep 17 00:00:00 2001 From: sumana sree Date: Sat, 5 Oct 2024 09:44:23 +0530 Subject: [PATCH 2/8] Updated tensorflow_type.md file Signed-off-by: sumana sree --- .../data_types_and_io/tensorflow_type.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index e6fe5c561a..924740c437 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -8,6 +8,12 @@ This document outlines the TensorFlow types available in Flyte, which facilitate the integration of TensorFlow models and datasets in Flyte workflows. +### Import necessary libraries and modules +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +:caption: data_types_and_io/tensorflow_type.py +:lines: 1-7 +``` + ## Tensorflow Model Flyte supports the TensorFlow SavedModel format for serializing and deserializing `tf.keras.Model` instances. The `TensorFlowModelTransformer` is responsible for handling these transformations. @@ -26,7 +32,7 @@ To clone and run the example code on this page, see the [Flytesnacks repo][flyte ``` ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 2-24 +:lines: 9-28 ``` ## TFRecord Files @@ -43,7 +49,7 @@ The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord f ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 27-43 +:lines: 31-44 ``` ## TFRecord Directories @@ -62,7 +68,7 @@ The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFR #### Example ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 46-65 +:lines: 47-61 ``` ## Configuration Class: `TFRecordDatasetConfig` @@ -79,5 +85,5 @@ This configuration is crucial for optimizing the reading process of TFRecord dat #### Example ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 67-86 +:lines: 64-78 ``` From d571d50fdbd0995f9406162be81e5acf73805747 Mon Sep 17 00:00:00 2001 From: sumana sree Date: Sun, 6 Oct 2024 10:50:39 +0530 Subject: [PATCH 3/8] updated file Signed-off-by: sumana sree --- docs/user_guide/data_types_and_io/tensorflow_type.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index 924740c437..1b53639009 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -81,9 +81,3 @@ The `TFRecordDatasetConfig` class is a data structure used to configure the para - **name**: (Optional) Assigns a name to the operation for easier identification in the pipeline. This configuration is crucial for optimizing the reading process of TFRecord datasets, especially when dealing with large datasets or when specific performance tuning is required. - -#### Example -```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py -:caption: data_types_and_io/tensorflow_type.py -:lines: 64-78 -``` From 9c1fa828750976869bdc5d2a285e0d8366eb96cd Mon Sep 17 00:00:00 2001 From: sumana sree Date: Sun, 13 Oct 2024 12:41:57 +0530 Subject: [PATCH 4/8] corrected lines reference according to doccumentation. Signed-off-by: sumana sree --- docs/user_guide/data_types_and_io/tensorflow_type.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index 1b53639009..e8342f25b8 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -11,7 +11,7 @@ This document outlines the TensorFlow types available in Flyte, which facilitate ### Import necessary libraries and modules ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 1-7 +:lines: 2-7 ``` ## Tensorflow Model @@ -32,7 +32,7 @@ To clone and run the example code on this page, see the [Flytesnacks repo][flyte ``` ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 9-28 +:lines: 10-29 ``` ## TFRecord Files @@ -49,7 +49,7 @@ The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord f ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 31-44 +:lines: 32-43 ``` ## TFRecord Directories @@ -68,7 +68,7 @@ The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFR #### Example ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 47-61 +:lines: 46-57 ``` ## Configuration Class: `TFRecordDatasetConfig` From 61345e99a4f7ca2c6b8b07511bf7a99da64c1a51 Mon Sep 17 00:00:00 2001 From: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> Date: Thu, 17 Oct 2024 14:53:29 +0530 Subject: [PATCH 5/8] changed lines of reference Signed-off-by: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> --- docs/user_guide/data_types_and_io/tensorflow_type.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index e8342f25b8..40c150c0ec 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -11,7 +11,7 @@ This document outlines the TensorFlow types available in Flyte, which facilitate ### Import necessary libraries and modules ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 2-7 +:lines: 2-14 ``` ## Tensorflow Model @@ -32,7 +32,7 @@ To clone and run the example code on this page, see the [Flytesnacks repo][flyte ``` ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 10-29 +:lines: 16-33 ``` ## TFRecord Files @@ -49,7 +49,7 @@ The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord f ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 32-43 +:lines: 35-45 ``` ## TFRecord Directories @@ -68,7 +68,7 @@ The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFR #### Example ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 46-57 +:lines: 47-57 ``` ## Configuration Class: `TFRecordDatasetConfig` From 0e70fa8969960dd8946cb25282e9136b24be1b44 Mon Sep 17 00:00:00 2001 From: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> Date: Thu, 17 Oct 2024 21:32:26 +0530 Subject: [PATCH 6/8] Updated reference links of the example code snippets. Signed-off-by: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> --- docs/user_guide/data_types_and_io/tensorflow_type.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index 40c150c0ec..0a64a3afa4 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -9,7 +9,7 @@ This document outlines the TensorFlow types available in Flyte, which facilitate the integration of TensorFlow models and datasets in Flyte workflows. ### Import necessary libraries and modules -```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py :lines: 2-14 ``` @@ -30,7 +30,7 @@ The `TensorFlowModelTransformer` allows you to save a TensorFlow model to a remo ```{note} To clone and run the example code on this page, see the [Flytesnacks repo][flytesnacks]. ``` -```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py :lines: 16-33 ``` @@ -47,7 +47,7 @@ Flyte supports TFRecord files through the TFRecordFile type, which can handle se ### Usage The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord files, making it easy to read and write data in TensorFlow's TFRecord format. -```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py :lines: 35-45 ``` @@ -66,7 +66,7 @@ Flyte supports directories containing multiple TFRecord files through the `TFRec The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFRecord files, which is useful for handling large datasets that are split across multiple files. #### Example -```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/tensorflow_type.py +```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py :lines: 47-57 ``` From 453c9b9d52e6de96b9afdaa0ab08787ae3448349 Mon Sep 17 00:00:00 2001 From: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> Date: Thu, 17 Oct 2024 23:44:52 +0530 Subject: [PATCH 7/8] fixed errors Signed-off-by: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> --- docs/user_guide/data_types_and_io/tensorflow_type.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index 0a64a3afa4..4cac97bfd2 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -68,7 +68,7 @@ The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFR #### Example ```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/refs/heads/master/examples/data_types_and_io/data_types_and_io/tensorflow_type.py :caption: data_types_and_io/tensorflow_type.py -:lines: 47-57 +:lines: 47-56 ``` ## Configuration Class: `TFRecordDatasetConfig` From f76486d9edc445bea17dd2603d9af948c940f16d Mon Sep 17 00:00:00 2001 From: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> Date: Fri, 18 Oct 2024 22:55:30 +0530 Subject: [PATCH 8/8] Apply suggestions from code review Co-authored-by: Nikki Everett Signed-off-by: Sumana Sree Angajala <110307215+sumana-2705@users.noreply.github.com> --- docs/user_guide/data_types_and_io/tensorflow_type.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/user_guide/data_types_and_io/tensorflow_type.md b/docs/user_guide/data_types_and_io/tensorflow_type.md index 4cac97bfd2..a68ce5ecaf 100644 --- a/docs/user_guide/data_types_and_io/tensorflow_type.md +++ b/docs/user_guide/data_types_and_io/tensorflow_type.md @@ -14,7 +14,7 @@ This document outlines the TensorFlow types available in Flyte, which facilitate :lines: 2-14 ``` -## Tensorflow Model +## Tensorflow model Flyte supports the TensorFlow SavedModel format for serializing and deserializing `tf.keras.Model` instances. The `TensorFlowModelTransformer` is responsible for handling these transformations. ### Transformer @@ -35,7 +35,7 @@ To clone and run the example code on this page, see the [Flytesnacks repo][flyte :lines: 16-33 ``` -## TFRecord Files +## TFRecord files Flyte supports TFRecord files through the TFRecordFile type, which can handle serialized TensorFlow records. The TensorFlowRecordFileTransformer manages the conversion of TFRecord files to and from Flyte literals. ### Transformer @@ -52,7 +52,7 @@ The `TensorFlowRecordFileTransformer` enables you to work with single TFRecord f :lines: 35-45 ``` -## TFRecord Directories +## TFRecord directories Flyte supports directories containing multiple TFRecord files through the `TFRecordsDirectory type`. The `TensorFlowRecordsDirTransformer` manages the conversion of TFRecord directories to and from Flyte literals. ### Transformer @@ -71,10 +71,10 @@ The `TensorFlowRecordsDirTransformer` allows you to work with directories of TFR :lines: 47-56 ``` -## Configuration Class: `TFRecordDatasetConfig` +## Configuration class: `TFRecordDatasetConfig` The `TFRecordDatasetConfig` class is a data structure used to configure the parameters for creating a `tf.data.TFRecordDataset`, which allows for efficient reading of TFRecord files. This class uses the `DataClassJsonMixin` for easy JSON serialization. -### Attributes: +### Attributes - **compression_type**: (Optional) Specifies the compression method used for the TFRecord files. Possible values include an empty string (no compression), "ZLIB", or "GZIP". - **buffer_size**: (Optional) Defines the size of the read buffer in bytes. If not set, defaults will be used based on the local or remote file system. - **num_parallel_reads**: (Optional) Determines the number of files to read in parallel. A value greater than one outputs records in an interleaved order.