Skip to content

Commit

Permalink
docs: Align code lines of StructuredDataset with Flytesnacks example
Browse files Browse the repository at this point in the history
Signed-off-by: JiaWei Jiang <[email protected]>
  • Loading branch information
JiangJiaWei1103 committed Oct 20, 2024
1 parent bdaf79f commit 80cb84a
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions docs/user_guide/data_types_and_io/structureddataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ To begin, import the dependencies for the example:

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 1-18
:lines: 1-19
```

Define a task that returns a Pandas DataFrame.
Expand Down Expand Up @@ -68,7 +68,7 @@ First, initialize column types you want to extract from the `StructuredDataset`.

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 30-31
:lines: 31-32
```

Define a task that opens a structured dataset by calling `all()`.
Expand All @@ -78,7 +78,7 @@ For instance, you can use ``pa.Table`` to convert the Pandas DataFrame to a PyAr

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 41-51
:lines: 42-52
```

The code may result in runtime failures if the columns do not match.
Expand All @@ -91,7 +91,7 @@ and enable the CSV serialization by annotating the structured dataset with the C

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 57-71
:lines: 58-72
```

## Storage driver and location
Expand Down Expand Up @@ -230,14 +230,14 @@ and the byte format, which in this case is `PARQUET`.

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 127-129
:lines: 128-130
```

You can now use `numpy.ndarray` to deserialize the parquet file to NumPy and serialize a task's output (NumPy array) to a parquet file.

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 134-149
:lines: 135-148
```

:::{note}
Expand All @@ -248,7 +248,7 @@ You can run the code locally as follows:

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 153-157
:lines: 152-156
```

### The nested typed columns
Expand All @@ -261,7 +261,7 @@ Nested field StructuredDataset should be run when flytekit version > 1.11.0.

```{rli} https://raw.githubusercontent.com/flyteorg/flytesnacks/69dbe4840031a85d79d9ded25f80397c6834752d/examples/data_types_and_io/data_types_and_io/structured_dataset.py
:caption: data_types_and_io/structured_dataset.py
:lines: 159-270
:lines: 158-285
```

[flytesnacks]: https://github.com/flyteorg/flytesnacks/tree/master/examples/data_types_and_io/

0 comments on commit 80cb84a

Please sign in to comment.