Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading behavior of optional fields is inconsistent when using nested pydantic models #895

Closed
codingcyclist opened this issue Jan 15, 2024 · 4 comments
Assignees

Comments

@codingcyclist
Copy link
Collaborator

dlt version

0.4.2

Describe the problem

class Child(BaseModel):
    any_string: str
    optional_child_attribute: t.Optional[str] = None

class Parent(BaseModel):
    child: Child
    optional_parent_attribute: t.Optional[str] = None

    dlt_config: t.ClassVar[DltConfig] = {"skip_complex_types": True}

Context:

  • I'm using Parent as a nested pydantic model to enforce a schema on a rescue.
  • I'm setting skip_complex_types on the parent model to True and max_table_nesting on the resource to 1 so that my child model does not get coerced into a single complex column or into several child tables, but flattened into several columns (i.e. child__{child-column-name})
  • Both the parent- and the child model have optional fields with well-defined, primitive data types

Current behavior:
The optional child model field won’t appear as a child__optional_child_attribute column in the destination until the first data comes through where this filed is not None. The optional parent attribute, however, appears as and optional_parent_attribute column from the first time when the resource gets loaded.

The problem here seems to be that the table schema has no knowledge about the data type of child model attributes as the schema is being generated before the flattening logic gets applied.

Expected behavior

Optional child model fields should always get loaded into the destination if they have a primitive data type

# example source data
{
    "child": {"any_string": "hello_world"},
}

# desired format when loaded into destination
{
    "optional_parent_attribute": None,
    "child__any_string": "hello_world",
    "child__optional_child_attribute": None
}

Steps to reproduce

See above

Operating system

macOS

Runtime environment

Local

Python version

3.11

dlt data source

Custom (i.e. AWS EventBridge)

dlt destination

Snowflake

Other deployment details

No response

Additional information

No response

@rudolfix
Copy link
Collaborator

@sultaniman looking #901 we can't really fix how we map Pydantic models into our schema fully. It may require some conceptual change. It is a good outcome of the research.

Still we should fix this ticket and we can be (a little) hacky. We'll ignore recurring into lists for now (those generate dlt tables) but we can support nested pydantic models (that are not part of the list). We we see a Pydantic model like in @codingcyclist example below we should recursively create a dlt schema from it and then flatten this model into parent model.

we use make_path of NamingConvention protocol to do that. You can take implementation from snake_case.py or create global instance to make paths. this is OFC a hack but will work for all our naming conventions as of now.

later we'll fix it end to end. my current take that we should let people define models for tables and create relational schemas only in normalize. but that's for later

@sh-rp sh-rp moved this from Planned to In Progress in dlt core library Jan 29, 2024
@sultaniman
Copy link
Contributor

@rudolfix @sh-rp I adjusted the code to reflect the comments for a single nested pydantic model and using the naming convention as well.

@codingcyclist
Copy link
Collaborator Author

Amazing, I will take a look at the PR later tonight

@sultaniman
Copy link
Contributor

@codingcyclist with implementation and all related reviews this has been merged now.

@github-project-automation github-project-automation bot moved this from In Progress to Done in dlt core library Feb 7, 2024
@rudolfix rudolfix moved this from Done to In Progress in dlt core library Feb 19, 2024
@rudolfix rudolfix moved this from In Progress to Done in dlt core library Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

3 participants