From 56e56a783e4b79e96d09c20918a23bb3e11976d1 Mon Sep 17 00:00:00 2001 From: Merel Theisen <49397448+merelcht@users.noreply.github.com> Date: Thu, 22 Aug 2024 18:27:53 +0100 Subject: [PATCH] Add extra clarification about `fs_args` (#4112) * Add extra clarification about fs args --------- Signed-off-by: Merel Theisen --- docs/source/data/data_catalog.md | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/source/data/data_catalog.md b/docs/source/data/data_catalog.md index 1d597f1128..568e66ee4f 100644 --- a/docs/source/data/data_catalog.md +++ b/docs/source/data/data_catalog.md @@ -99,10 +99,10 @@ The following protocols are available: This section explains the additional settings available within `catalog.yml`. -### Load and save arguments -The Kedro Data Catalog also accepts two different groups of `*_args` parameters that serve different purposes: +### Load, save and filesystem arguments +The Kedro Data Catalog also accepts different groups of `*_args` parameters that serve different purposes: -* **`load_args` and `save_args`**: Configures how a third-party library loads/saves data from/to a file. In the spaceflights example above, `load_args`, is passed to the excel file read method (`pd.read_excel`) as a [keyword argument](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html). Although not specified here, the equivalent output is `save_args` and the value would be passed to [`pd.DataFrame.to_excel` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html). +* **`load_args` and `save_args`**: Configure how a third-party library loads/saves data from/to a file. In the spaceflights example above, `load_args`, is passed to the excel file read method (`pd.read_excel`) as a [keyword argument](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html). Although not specified here, the equivalent output is `save_args` and the value would be passed to [`pd.DataFrame.to_excel` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html). For example, to load or save a CSV on a local file system, using specified load/save arguments: @@ -143,6 +143,22 @@ test_dataset: encoding: "utf-8" ``` +If you want to save a file in append mode instead of overwrite you can use the `open_args_save` `mode` parameter: + +```yaml +test_dataset: + type: ... + fs_args: + open_args_save: + mode: "a" +``` + +```{note} +Default load, save and filesystem arguments are defined inside the specific dataset implementations as `DEFAULT_LOAD_ARGS`, `DEFAULT_SAVE_ARGS`, and `DEFAULT_FS_ARGS` respectively. +You can check those in {py:mod}`the dataset API documentation `. +``` + + ### Dataset access credentials The Data Catalog also works with the `credentials.yml` file in `conf/local/`, allowing you to specify usernames and passwords required to load certain datasets.