Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]Update _fivetran_deleted Filtering Strategy #152

Open
fivetran-catfritz opened this issue Dec 6, 2024 · 0 comments
Open

[Feature]Update _fivetran_deleted Filtering Strategy #152

fivetran-catfritz opened this issue Dec 6, 2024 · 0 comments
Labels
type:enhancement New functionality or enhancement

Comments

@fivetran-catfritz
Copy link
Contributor

fivetran-catfritz commented Dec 6, 2024

Background

Currently, we filter out _fivetran_deleted rows at the staging layer here. However, this approach prevents deleted rows from flowing downstream, which can cause issues with incremental models recognizing these rows for proper deletion handling.

For example, in cases where a transaction is deleted in the source, the deletion does not propagate to the final models because the _fivetran_deleted rows are removed early in the pipeline.

Proposed Solution

Update the _fivetran_deleted filtering strategy to defer the removal of these rows to downstream transformations. This change would allow incremental models to process deletions correctly while preserving the ability to exclude deleted rows in the final outputs.

To do

  • Requires updating all models downstream of staging to ensure _fivetran_deleted rows are handled appropriately.
  • Needs validation to ensure no unintended side effects, such as retaining deleted rows in final outputs.

Steps to Implement

  1. Remove the _fivetran_deleted filtering logic from staging models.
  2. Update downstream models to explicitly filter _fivetran_deleted rows where necessary.
  3. Write tests to confirm that deleted rows are processed correctly in incremental and full-refresh scenarios.

Additional Context

This change is proposed as an alternative solution to address incremental data quality issues, particularly for users who cannot schedule full-refreshes.

Open Questions

  • What performance implications might this change introduce in larger datasets?
@fivetran-catfritz fivetran-catfritz added the type:enhancement New functionality or enhancement label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:enhancement New functionality or enhancement
Projects
None yet
Development

No branches or pull requests

1 participant