Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create nested instead of flat dictionaries with OmegaConfigLoader #4077

Closed
puneeter opened this issue Aug 9, 2024 · 2 comments
Closed

Create nested instead of flat dictionaries with OmegaConfigLoader #4077

puneeter opened this issue Aug 9, 2024 · 2 comments
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@puneeter
Copy link
Contributor

puneeter commented Aug 9, 2024

Description

I am using . dot separated namespacing in my parameters files. I suppose OmegaConfigLoader creates the flat dictionary instead of creating the nested namespace based dictionary. Example:

# parameters.yaml
namespace1.sub1.key1: value1
namespace1.sub2.key2: value2
# resolved config by kedro OmegaConfigLoader:
{"namespace1.sub1.key1": "value1", "namespace1.sub2.key2": "value2"}

# expected by OmegaConf (for `select` method to work correctly):
{"namespace1": {"sub1": {"key1": "value1"}, "sub2": {"key2": "value2"}}}

Accessing namespace params becomes a bit difficult when you have a pipeline, sub-pipeline structure:

conf
├── base
│   ├── pipeline1
│   │   ├── subpipeline1_1
│   │   │   ├── catalog.yaml
│   │   │   └── parameters.yaml
│   │   └── subpipeline1_2
│   │       ├── catalog.yaml
│   │       └── parameters.yaml
│   └── pipeline2
│       ├── subpipeline2_1
│       │   ├── catalog.yaml
│       │   └── parameters.yaml
│       └── subpipeline2_2
│           ├── catalog.yaml
│           └── parameters.yaml
└── local
    └── credentials.yaml

Parameters:

# pipeline1/subpipeline1_1/parameters.yaml
pipeline1.subpipeline1_1.myparam: myval

# pipeline1/subpipeline1_2/parameters.yaml
pipeline1.subpipeline1_2.myparam: myval

Accessing the params:pipeline1 is impossible with current OmegaConfigLoader implementation. I can't even use the following structure in kedro because then I get an error (probably that the param - params:pipeline1 already exists):

# pipeline1/subpipeline1_1/parameters.yaml
pipeline1:
  subpipeline1_1:
    myparam: myval

# pipeline1/subpipeline1_2/parameters.yaml
pipeline1:
  subpipeline1_2:
    myparam: myval

OmegaConf by default expects a nested dictionary based on namespace but Kedro provides it a flattened dictionary in the above case. A simple conversion from flat -> nested dictionary would do the job.

Context

With this change we:

  • Handle the namespaces in a much better way.
  • Refer to the combined . separated namespace params as well.

Possible Implementation

A simple conversion from flat -> nested dictionary while doing OmegaConf.create() in OmegaConfigLoader would do the job.

Link to discussion: https://kedro-org.slack.com/archives/C03RKP2LW64/p1723193517960259

Linked issue on Omegaconf: omry/omegaconf#1188 (comment)

@puneeter puneeter added the Issue: Feature Request New feature or improvement to existing feature label Aug 9, 2024
@puneeter
Copy link
Contributor Author

As per my discussion with @noklam , there are two ways to solve this:

  1. Convert the . namespaced keys to nested dicts.
  2. Handle the namespaces at user's parameters side. For this kedro needs to handle the Exception which occurs when:
# conf/base/namespace1/sub-namespace1.yaml
namespace1:
  sub-namespace1:
    key1: value1

# conf/base/namespace1/sub-namespace2.yaml
namespace1:
  sub-namespace2:
    key2: value2
    key3: value3

regarding namespace1 is duplicated.
@noklam recommended to go with option 2. Happy to do the PR, please let me know

@noklam
Copy link
Contributor

noklam commented Aug 14, 2024

Full conversation: https://linen-slack.kedro.org/t/22897367/hey-team-i-am-using-dot-separated-namespacing-in-my-paramete#c1d0c9e0-cc4a-428c-a609-246c9e463702

TL;DR I think the two things mentioned above are separate issues. 1. require change in the internal data structure of the config, 2. is an smaller change, which changes how Kedro validate config (It's a valid config so I think it's a bug fix).

There are some value of doing 1, but we should keep that as a separate conversation. Having a nested namespace can potentially simplify config with less nested layer, particular for sub-pipeline(modular pipeline). The downside is that introduce difference between how OmegaConf parse config.

@puneeter If 2. is good enough for what you need now, feel free to go ahead!

@kedro-org kedro-org locked and limited conversation to collaborators Dec 2, 2024
@astrojuanlu astrojuanlu converted this issue into discussion #4362 Dec 2, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: Done
Development

No branches or pull requests

2 participants