From fc4702e13935e5a761527359316653ac0af46a69 Mon Sep 17 00:00:00 2001 From: Nok Date: Fri, 1 Nov 2024 12:47:39 +0000 Subject: [PATCH 1/8] release.md Signed-off-by: Nok --- RELEASE.md | 1 + 1 file changed, 1 insertion(+) diff --git a/RELEASE.md b/RELEASE.md index ac3c4d95d9..8cacab5772 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -12,6 +12,7 @@ ## Documentation changes * Updated CLI autocompletion docs with new Click syntax. * Standardised `.parquet` suffix in docs and tests. +* Added a new minial Kedro project creation guide. ## Community contributions * [Hyewon Choi](https://github.com/hyew0nChoi) From 4f12a432a5b1b158c7728389bb7b6ec187b58ffa Mon Sep 17 00:00:00 2001 From: Nok Date: Fri, 1 Nov 2024 13:20:52 +0000 Subject: [PATCH 2/8] placeholder Signed-off-by: Nok --- docs/source/get_started/minimal_kedro_project.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 docs/source/get_started/minimal_kedro_project.md diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md new file mode 100644 index 0000000000..c4f2f3643a --- /dev/null +++ b/docs/source/get_started/minimal_kedro_project.md @@ -0,0 +1 @@ +# What is a Minimal Kedro Project From 4588380203427f7adfc7353f8bae7f5745883f08 Mon Sep 17 00:00:00 2001 From: Nok Date: Mon, 11 Nov 2024 13:52:20 +0000 Subject: [PATCH 3/8] add examples Signed-off-by: Nok --- .../get_started/minimal_kedro_project.md | 178 +++++++++++++++++- 1 file changed, 177 insertions(+), 1 deletion(-) diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index c4f2f3643a..cebc8b8db9 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -1 +1,177 @@ -# What is a Minimal Kedro Project +# Create a Minimal Kedro Project +The goal of this documentation is to explain what makes a minimal Kedro project. This guide will start from a blank project and introduce the necessary components. In reality, you are more likely to use a [project template](./new_project.md), or start from an existing Python project. You will able to able to adapt the concept and accomdate your specific need. + +## Essential Components of a Kedro Project + +Kedro is an Python framework designed for creating reproducible data science code. A typical Kedro project consists of two parts, the **mandatory structure** and the **opionated project structure**. + +### 1. **Recommended Structure** +Kedro projects follow a specific directory structure that promotes best practices for collaboration and maintenance. The default structure includes: + +| Directory/File | Description | +|-----------------------|-----------------------------------------------------------------------------| +| `conf/` | Contains configuration files such as `catalog.yml` and `parameters.yml`. | +| `data/` | Local project data, typically not committed to version control. | +| `docs/` | Project documentation files. | +| `notebooks/` | Jupyter notebooks for experimentation and prototyping. | +| `src/` | Source code for the project, including pipelines and nodes. | +| `README.md` | Project overview and instructions. | +| `pyproject.toml` | Metadata about the project, including dependencies. | +| `.gitignore` | Specifies files and directories to be ignored by Git. | + +### 2. **Mandatory Files** +There are 3 files that you must have to be considered as a Kedro project, i.e. able to run `kedro run` on it. +- **`pyprojec.toml`**: Defines the python project +- **`settings.py`**: Defines project settings, including library component registration. +- **`pipeline_registry.py`**: Registers the project's pipelines. + +If you want to see some examples, you can either create a project with `kedro new` or check out the [project template on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) + + +#### `pyproject.toml` +The `pyproject.toml` file is a crucial component of a Kedro project, serving as the standard way to store build metadata and tool settings for Python projects. It is essential for defining the project's configuration and ensuring proper integration with various tools and libraries. + +Particularly, Kedro requires `[tool.kedro]` section in `pyproject.toml`, this describes the [project metadata](../kedro_project_setup/settings.md) in the project. + +Typically, it looks similar to this: +```toml +[tool.kedro] +package_name = "package_name" +project_name = "project_name" +kedro_init_version = "kedro_version" +tools = "" +example_pipeline = "False" +source_dir = "src" +``` + +This informs Kedro where to look for the source code, `settings.py` and `pipeline_registry.py` are. + +#### `settings.py` +The `settings.py` file is an important configuration file in a Kedro project that allows you to define various settings and hooks for your project. Here’s a breakdown of its purpose and functionality: +- Project Settings: This file is where you can configure project-wide settings, such as defining the logging level, setting environment variables, or specifying paths for data and outputs. +- Hooks Registration: You can register custom hooks in settings.py, which are functions that can be executed at specific points in the Kedro pipeline lifecycle (e.g., before or after a node runs). This is useful for adding additional functionality, such as logging or monitoring. +- Integration with Plugins: If you are using Kedro plugins, settings.py can also be utilized to configure them appropriately. + +Even if you do not have any settings, an empty `settings.py` is still required. Typically, they are stored at `src//settings.py`. + +#### `pipeline_registry.py` +The `pipeline_registry.py` file is essential for managing the pipelines within your Kedro project. It provides a centralized way to register and access all pipelines defined in the project. Here are its key features: +- Pipeline Registration: The file must contain a top-level function called `register_pipelines()` that returns a mapping from pipeline names to Pipeline objects. This function is crucial because it enables the Kedro CLI and other tools to discover and run the defined pipelines. +- Autodiscovery of Pipelines: Since Kedro 0.18.3, you can use the [`find_pipeline`](../nodes_and_pipelines/pipeline_registry.md#pipeline-autodiscovery) function to automatically discover pipelines defined in your project without manually updating the registry each time you create a new pipeline. + +## Creating a Minimal Kedro Project Step-by-Step +The following section will guide you to create a minimal Kedro project, where you can successfully run `kedro run` with just 3 files. + +To create a minimal Kedro project, follow these steps: + +### Step 1: Install Kedro +First, ensure you have Python installed on your machine, then install Kedro using pip: + +```bash +pip install kedro +``` + +### Step 2: Create a New Kedro Project +Create a new working directory: +```bash +mkdir minikedro +``` + +Change into your newly created project directory: + +```bash +cd minikiedro +``` + +### Step 3: Create `pyproject.toml` +Create a new file called `pyproject.toml` at the project directory: + +```toml +[tool.kedro] +package_name = "minikedro" +project_name = "minikedro" +kedro_init_version = "0.19.9" +source_dir = "." +``` + +At this point, your workingn directory should look like this: +```bash +. +├── pyproject.toml +``` + + +```{note} +Note we define `source_dir = "."`, usually we keep our source code inside a directory called `src`. For this example, we try to keep the structure minimal so we keep the source code in the root directory +``` + +### Step 4: Create `settings.py` and `pipeline_registry.py` +First, create a folder called `minikedro`, the name of the folder should be the same as the `package_name` defined in `pyproject.toml`. + +```bash +mkdir minikedro +``` + +Then, create two empty files `settings.py` and `pipeline_registry.py` inside the folder. +```bash +touch minikedro/settings.py minikedro/pipeline_registry.py +``` + +Now your working directory should look like this: +```bash +. +├── minikedro +│ ├── pipeline_registry.py +│ └── settings.py +└── pyproject.toml +``` + +Run this in ther terminal: +```bash +kedro run +``` + +You should see an error because we have an empty `pipeline_registry.py`. +```bash +AttributeError: module 'minikedro.pipeline_registry' has no attribute 'register_pipelines' +``` + +### Step 5: Create a Simple Pipeline +Now, copy the following code into `pipeline_registry.py` so that we have some pipeline to run. + +```python +from kedro.pipeline import pipeline, node + +def foo(): + return "dummy" + +def register_pipelines(): + return {"__default__": pipeline([node(foo, None, "dummy_output")])} +``` + +If you try to run the pipeline again with `kedro run`, you will see a new error: +```bash +MissingConfigException: Given configuration path either does not exist or is not a valid directory: /workspace/kedro/minikedro/conf/base +``` + +### Step 6: Define the Project Settings +The error happened by default the Kedro Framework expects a configuration folder called `conf` and two separate environment named `base` and `local`. + +To fix this, add these two lines into `settings.py`: +```python +CONF_SOURCE = "." +CONFIG_LOADER_ARGS = {"base_env": ".", "default_run_env": "."} +``` + +This override the defaults so that Kedro knows that it should not look for configurations in `conf`. This is explained in details in [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project]((../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) + +Run the pipeline again: +```bash +kedro run +``` + +You should see that the pipeline run is successful! + +## Conclusion + +Kedro provides a structured approach to developing data pipelines with clear separation of concerns through its components and directory structure. By following the steps outlined above, you can set up a minimal Kedro project that serves as a foundation for more complex data processing workflows. This guide explains the essential concepts of Kedro Project. If you already have a Python project and want to embeded a Kedro project within it, these concepts can help you to adjust easily. From 815d361a8b8e9b6b56b7506c96ae3bdf34810364 Mon Sep 17 00:00:00 2001 From: Nok Date: Mon, 11 Nov 2024 13:57:16 +0000 Subject: [PATCH 4/8] language Signed-off-by: Nok --- .../get_started/minimal_kedro_project.md | 37 ++++++++++--------- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index cebc8b8db9..42b87ba282 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -60,31 +60,30 @@ The `pipeline_registry.py` file is essential for managing the pipelines within y - Autodiscovery of Pipelines: Since Kedro 0.18.3, you can use the [`find_pipeline`](../nodes_and_pipelines/pipeline_registry.md#pipeline-autodiscovery) function to automatically discover pipelines defined in your project without manually updating the registry each time you create a new pipeline. ## Creating a Minimal Kedro Project Step-by-Step -The following section will guide you to create a minimal Kedro project, where you can successfully run `kedro run` with just 3 files. - -To create a minimal Kedro project, follow these steps: +This guide will walk you through the process of creating a minimal Kedro project, allowing you to successfully run `kedro run` with just three files. ### Step 1: Install Kedro -First, ensure you have Python installed on your machine, then install Kedro using pip: + +First, ensure that Python is installed on your machine. Then, install Kedro using pip: ```bash pip install kedro ``` ### Step 2: Create a New Kedro Project -Create a new working directory: +Create a new directory for your project: ```bash mkdir minikedro ``` -Change into your newly created project directory: +Navigate into your newly created project directory: ```bash cd minikiedro ``` ### Step 3: Create `pyproject.toml` -Create a new file called `pyproject.toml` at the project directory: +Create a new file named `pyproject.toml` in the project directory with the following content: ```toml [tool.kedro] @@ -106,13 +105,13 @@ Note we define `source_dir = "."`, usually we keep our source code inside a dire ``` ### Step 4: Create `settings.py` and `pipeline_registry.py` -First, create a folder called `minikedro`, the name of the folder should be the same as the `package_name` defined in `pyproject.toml`. +Next, create a folder named minikedro, which should match the package_name defined in pyproject.toml: ```bash mkdir minikedro ``` +Inside this folder, create two empty files: settings.py and pipeline_registry.py: -Then, create two empty files `settings.py` and `pipeline_registry.py` inside the folder. ```bash touch minikedro/settings.py minikedro/pipeline_registry.py ``` @@ -126,18 +125,18 @@ Now your working directory should look like this: └── pyproject.toml ``` -Run this in ther terminal: +Try running the following command in the terminal: ```bash kedro run ``` -You should see an error because we have an empty `pipeline_registry.py`. +You will encounter an error indicating that pipeline_registry.py is empty: ```bash AttributeError: module 'minikedro.pipeline_registry' has no attribute 'register_pipelines' ``` ### Step 5: Create a Simple Pipeline -Now, copy the following code into `pipeline_registry.py` so that we have some pipeline to run. +To resolve this issue, add the following code to pipeline_registry.py, which defines a simple pipeline to run: ```python from kedro.pipeline import pipeline, node @@ -149,13 +148,13 @@ def register_pipelines(): return {"__default__": pipeline([node(foo, None, "dummy_output")])} ``` -If you try to run the pipeline again with `kedro run`, you will see a new error: +If you attempt to run the pipeline again with kedro run, you will see another error: ```bash MissingConfigException: Given configuration path either does not exist or is not a valid directory: /workspace/kedro/minikedro/conf/base ``` ### Step 6: Define the Project Settings -The error happened by default the Kedro Framework expects a configuration folder called `conf` and two separate environment named `base` and `local`. +This error occurs because Kedro expects a configuration folder named `conf`, along with two environments called `base` and `local`. To fix this, add the following lines to `settings.py`: To fix this, add these two lines into `settings.py`: ```python @@ -163,15 +162,17 @@ CONF_SOURCE = "." CONFIG_LOADER_ARGS = {"base_env": ".", "default_run_env": "."} ``` -This override the defaults so that Kedro knows that it should not look for configurations in `conf`. This is explained in details in [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project]((../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) +These lines override the default settings so that Kedro knows to look for configurations in the current directory instead of the expected conf folder. For more details, refer to [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project]((../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) -Run the pipeline again: +Now, run the pipeline again: ```bash kedro run ``` -You should see that the pipeline run is successful! + + +You should see that the pipeline runs successfully! ## Conclusion -Kedro provides a structured approach to developing data pipelines with clear separation of concerns through its components and directory structure. By following the steps outlined above, you can set up a minimal Kedro project that serves as a foundation for more complex data processing workflows. This guide explains the essential concepts of Kedro Project. If you already have a Python project and want to embeded a Kedro project within it, these concepts can help you to adjust easily. +Kedro provides a structured approach to developing data pipelines with clear separation of concerns through its components and directory structure. By following the steps outlined above, you can set up a minimal Kedro project that serves as a foundation for more complex data processing workflows. This guide explains essential concepts of Kedro projects. If you already have a Python project and want to integrate Kedro into it, these concepts will help you adjust and fit your own needs. From 7e1f646e602c45e1e7d436515b9da8a17fb5f5ca Mon Sep 17 00:00:00 2001 From: Nok Date: Tue, 12 Nov 2024 12:07:06 +0000 Subject: [PATCH 5/8] format and changes with review comments Signed-off-by: Nok --- RELEASE.md | 2 +- docs/source/get_started/minimal_kedro_project.md | 16 ++++++++-------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/RELEASE.md b/RELEASE.md index 9b1c37326d..e2c491ac92 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -14,7 +14,7 @@ ## Documentation changes * Updated CLI autocompletion docs with new Click syntax. * Standardised `.parquet` suffix in docs and tests. -* Added a new minial Kedro project creation guide. +* Added a new minimal Kedro project creation guide. ## Community contributions * [Hyewon Choi](https://github.com/hyew0nChoi) diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index 42b87ba282..aa96639271 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -1,9 +1,9 @@ # Create a Minimal Kedro Project -The goal of this documentation is to explain what makes a minimal Kedro project. This guide will start from a blank project and introduce the necessary components. In reality, you are more likely to use a [project template](./new_project.md), or start from an existing Python project. You will able to able to adapt the concept and accomdate your specific need. +This documentation aims to explain the essential components of a minimal Kedro project. The guide begins with a blank project and gradually introduces the necessary elements. While most users typically start with a [project template]((./new_project.md)) or adapt an existing Python project, this guide will help you understand the core concepts and how to customise them to suit your specific needs. ## Essential Components of a Kedro Project -Kedro is an Python framework designed for creating reproducible data science code. A typical Kedro project consists of two parts, the **mandatory structure** and the **opionated project structure**. +Kedro is a Python framework designed for creating reproducible data science code. A typical Kedro project consists of two parts, the **mandatory structure** and the **opinionated** project structure**. ### 1. **Recommended Structure** Kedro projects follow a specific directory structure that promotes best practices for collaboration and maintenance. The default structure includes: @@ -25,7 +25,7 @@ There are 3 files that you must have to be considered as a Kedro project, i.e. a - **`settings.py`**: Defines project settings, including library component registration. - **`pipeline_registry.py`**: Registers the project's pipelines. -If you want to see some examples, you can either create a project with `kedro new` or check out the [project template on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) +If you want to see some examples of these files, you can either create a project with `kedro new` or check out the [project template on GitHub](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas) #### `pyproject.toml` @@ -110,7 +110,7 @@ Next, create a folder named minikedro, which should match the package_name defin ```bash mkdir minikedro ``` -Inside this folder, create two empty files: settings.py and pipeline_registry.py: +Inside this folder, create two empty files: `settings.py` and `pipeline_registry.py`: ```bash touch minikedro/settings.py minikedro/pipeline_registry.py @@ -136,7 +136,7 @@ AttributeError: module 'minikedro.pipeline_registry' has no attribute 'register_ ``` ### Step 5: Create a Simple Pipeline -To resolve this issue, add the following code to pipeline_registry.py, which defines a simple pipeline to run: +To resolve this issue, add the following code to `pipeline_registry.py`, which defines a simple pipeline to run: ```python from kedro.pipeline import pipeline, node @@ -148,13 +148,13 @@ def register_pipelines(): return {"__default__": pipeline([node(foo, None, "dummy_output")])} ``` -If you attempt to run the pipeline again with kedro run, you will see another error: +If you attempt to run the pipeline again with `kedro run`, you will see another error: ```bash MissingConfigException: Given configuration path either does not exist or is not a valid directory: /workspace/kedro/minikedro/conf/base ``` ### Step 6: Define the Project Settings -This error occurs because Kedro expects a configuration folder named `conf`, along with two environments called `base` and `local`. To fix this, add the following lines to `settings.py`: +This error occurs because Kedro expects a configuration folder named `conf`, along with two environments called `base` and `local`. To fix this, add these two lines into `settings.py`: ```python @@ -162,7 +162,7 @@ CONF_SOURCE = "." CONFIG_LOADER_ARGS = {"base_env": ".", "default_run_env": "."} ``` -These lines override the default settings so that Kedro knows to look for configurations in the current directory instead of the expected conf folder. For more details, refer to [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project]((../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) +These lines override the default settings so that Kedro knows to look for configurations in the current directory instead of the expected conf folder. For more details, refer to [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project](../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) Now, run the pipeline again: ```bash From 9637ee4ff5ad5b3af462816d308f3e90cbc7c698 Mon Sep 17 00:00:00 2001 From: Nok Date: Tue, 12 Nov 2024 12:22:00 +0000 Subject: [PATCH 6/8] add docs to index temp Signed-off-by: Nok --- docs/source/get_started/index.md | 1 + docs/source/get_started/minimal_kedro_project.md | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/get_started/index.md b/docs/source/get_started/index.md index 59e5ae38e5..613c0e17f2 100644 --- a/docs/source/get_started/index.md +++ b/docs/source/get_started/index.md @@ -8,4 +8,5 @@ This section explains the first steps to set up and explore Kedro: install new_project kedro_concepts +minimal_kedro_project ``` diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index aa96639271..67e32ac634 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -169,8 +169,6 @@ Now, run the pipeline again: kedro run ``` - - You should see that the pipeline runs successfully! ## Conclusion From 8f40c704a9d62904acb0542329db2be805dc89aa Mon Sep 17 00:00:00 2001 From: Nok Lam Chan Date: Fri, 15 Nov 2024 20:05:27 +0800 Subject: [PATCH 7/8] Update docs/source/get_started/minimal_kedro_project.md Co-authored-by: Merel Theisen <49397448+merelcht@users.noreply.github.com> Signed-off-by: Nok Lam Chan --- docs/source/get_started/minimal_kedro_project.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index 67e32ac634..b17647cda9 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -20,7 +20,7 @@ Kedro projects follow a specific directory structure that promotes best practice | `.gitignore` | Specifies files and directories to be ignored by Git. | ### 2. **Mandatory Files** -There are 3 files that you must have to be considered as a Kedro project, i.e. able to run `kedro run` on it. +For a project to be recognised as a Kedro project and support running `kedro run`, it must contain three essential files: - **`pyprojec.toml`**: Defines the python project - **`settings.py`**: Defines project settings, including library component registration. - **`pipeline_registry.py`**: Registers the project's pipelines. From d55899b31cc363a59ddf2487a357989608ba2103 Mon Sep 17 00:00:00 2001 From: Nok Date: Tue, 19 Nov 2024 06:43:47 +0000 Subject: [PATCH 8/8] changes to address comment, typos Signed-off-by: Nok --- docs/source/get_started/minimal_kedro_project.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/source/get_started/minimal_kedro_project.md b/docs/source/get_started/minimal_kedro_project.md index b17647cda9..856d28d93a 100644 --- a/docs/source/get_started/minimal_kedro_project.md +++ b/docs/source/get_started/minimal_kedro_project.md @@ -1,9 +1,9 @@ # Create a Minimal Kedro Project -This documentation aims to explain the essential components of a minimal Kedro project. The guide begins with a blank project and gradually introduces the necessary elements. While most users typically start with a [project template]((./new_project.md)) or adapt an existing Python project, this guide will help you understand the core concepts and how to customise them to suit your specific needs. +This documentation aims to explain the essential components of a minimal Kedro project. While most users typically start with a [project template](./new_project.md) or adapt an existing Python project, this guide begins with a blank project and gradually introduces the necessary elements. This will help you understand the core concepts and how to customise them to suit your specific needs. ## Essential Components of a Kedro Project -Kedro is a Python framework designed for creating reproducible data science code. A typical Kedro project consists of two parts, the **mandatory structure** and the **opinionated** project structure**. +Kedro is a Python framework designed for creating reproducible data science code. A typical Kedro project consists of two parts, the **mandatory structure** and the **opinionated project structure**. ### 1. **Recommended Structure** Kedro projects follow a specific directory structure that promotes best practices for collaboration and maintenance. The default structure includes: @@ -21,7 +21,7 @@ Kedro projects follow a specific directory structure that promotes best practice ### 2. **Mandatory Files** For a project to be recognised as a Kedro project and support running `kedro run`, it must contain three essential files: -- **`pyprojec.toml`**: Defines the python project +- **`pyproject.toml`**: Defines the python project - **`settings.py`**: Defines project settings, including library component registration. - **`pipeline_registry.py`**: Registers the project's pipelines. @@ -29,7 +29,7 @@ If you want to see some examples of these files, you can either create a project #### `pyproject.toml` -The `pyproject.toml` file is a crucial component of a Kedro project, serving as the standard way to store build metadata and tool settings for Python projects. It is essential for defining the project's configuration and ensuring proper integration with various tools and libraries. +The `pyproject.toml` file is a crucial component of a Kedro project that serve as the standard way to store build metadata and tool settings for Python projects. It is essential for defining the project's configuration and ensuring proper integration with various tools and libraries. Particularly, Kedro requires `[tool.kedro]` section in `pyproject.toml`, this describes the [project metadata](../kedro_project_setup/settings.md) in the project. @@ -49,8 +49,8 @@ This informs Kedro where to look for the source code, `settings.py` and `pipelin #### `settings.py` The `settings.py` file is an important configuration file in a Kedro project that allows you to define various settings and hooks for your project. Here’s a breakdown of its purpose and functionality: - Project Settings: This file is where you can configure project-wide settings, such as defining the logging level, setting environment variables, or specifying paths for data and outputs. -- Hooks Registration: You can register custom hooks in settings.py, which are functions that can be executed at specific points in the Kedro pipeline lifecycle (e.g., before or after a node runs). This is useful for adding additional functionality, such as logging or monitoring. -- Integration with Plugins: If you are using Kedro plugins, settings.py can also be utilized to configure them appropriately. +- Hooks Registration: You can register custom hooks in `settings.py`, which are functions that can be executed at specific points in the Kedro pipeline lifecycle (e.g., before or after a node runs). This is useful for adding additional functionality, such as logging or monitoring. +- Integration with Plugins: If you are using Kedro plugins, `settings.py` can also be utilized to configure them appropriately. Even if you do not have any settings, an empty `settings.py` is still required. Typically, they are stored at `src//settings.py`. @@ -130,7 +130,7 @@ Try running the following command in the terminal: kedro run ``` -You will encounter an error indicating that pipeline_registry.py is empty: +You will encounter an error indicating that `pipeline_registry.py` is empty: ```bash AttributeError: module 'minikedro.pipeline_registry' has no attribute 'register_pipelines' ``` @@ -162,7 +162,7 @@ CONF_SOURCE = "." CONFIG_LOADER_ARGS = {"base_env": ".", "default_run_env": "."} ``` -These lines override the default settings so that Kedro knows to look for configurations in the current directory instead of the expected conf folder. For more details, refer to [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project](../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) +These lines override the default settings so that Kedro knows to look for configurations in the current directory instead of the expected `conf` folder. For more details, refer to [How to change the setting for a configuration source folder](../configuration/configuration_basics.md#how-to-change-the-setting-for-a-configuration-source-folder) and [Advance Configuration without a full Kedro project](../configuration/advanced_configuration.md#advanced-configuration-without-a-full-kedro-project) Now, run the pipeline again: ```bash