Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure arrow field's nullable flag matches the schema column #1429

Merged
merged 1 commit into from
Jun 3, 2024

Conversation

steinitzu
Copy link
Collaborator

Description

Updates nullability of fields in arrow schema as part of normalize_py_arrow_item.
Fixes issue that affects at least sql database source when pandas backend and bigquery destination is used. Any resource yielding dataframes when pk/non-nullable columns are defined has the same problem, e.g.

import dlt
import pandas as pd


@dlt.resource(primary_key="col1")
def some_data():
    yield pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})

# Fails due to schema mismatch
info = dlt.pipeline(full_refresh=True, destination="bigquery").run(some_data)
print(info)

Fails with:

Provided Schema does not match Table serene-anagram-189621:dlt_ipython_dataset_20240531072948.some_data. 
Field col1 has changed mode from REQUIRED to NULLABLE

Related Issues

Somewhat related to dlt-hub/verified-sources#430 , part of the reason bigquery fails

Copy link

netlify bot commented May 31, 2024

Deploy Preview for dlt-hub-docs canceled.

Name Link
🔨 Latest commit e3d46e9
🔍 Latest deploy log https://app.netlify.com/sites/dlt-hub-docs/deploys/665a29f068cf450008966e61

@steinitzu steinitzu force-pushed the arrow-normalize-non-nullable-columns branch from 1c43861 to e3d46e9 Compare May 31, 2024 19:50
Copy link
Collaborator

@rudolfix rudolfix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! is this PR still draft?

@steinitzu steinitzu marked this pull request as ready for review June 2, 2024 20:28
@steinitzu
Copy link
Collaborator Author

LGTM! is this PR still draft?

@rudolfix no just didn't have time to wait for all tests to run.
You can merge if it looks good to you 🙂

@rudolfix rudolfix merged commit 701d503 into devel Jun 3, 2024
49 of 50 checks passed
@rudolfix rudolfix deleted the arrow-normalize-non-nullable-columns branch June 3, 2024 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants