Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_cdf return an empty dataframe when a version is out of range #3035

Closed
pblocz opened this issue Nov 28, 2024 · 2 comments · Fixed by #3040
Closed

load_cdf return an empty dataframe when a version is out of range #3035

pblocz opened this issue Nov 28, 2024 · 2 comments · Fixed by #3040
Labels
enhancement New feature or request

Comments

@pblocz
Copy link
Contributor

pblocz commented Nov 28, 2024

Description

In spark delta table you can enable an option to manage out of range versions or timestamps. https://docs.delta.io/latest/delta-change-data-feed.html#read-changes-in-streaming-queries
image

Right now the behaviour of load_cdf is inconsistent, if you provide an out of range version you get an error:
image

But with a timestamp out of range, you get an empty dataset:
image

It would be useful for incremental pipelines to have a way to manage this behaviour and make it consistent.

@pblocz pblocz added the enhancement New feature or request label Nov 28, 2024
@ion-elgreco
Copy link
Collaborator

If you know some rust it's probably a simple fix

@pblocz
Copy link
Contributor Author

pblocz commented Nov 29, 2024

If you know some rust it's probably a simple fix

@ion-elgreco I have never used rust and it is been a while since I have done anything that needs to be compiles, but can give it a go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants