Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equality test calculation doesn't make sense #836

Closed
foundinblank opened this issue Sep 14, 2023 · 4 comments
Closed

Equality test calculation doesn't make sense #836

foundinblank opened this issue Sep 14, 2023 · 4 comments
Labels
bug Something isn't working Stale triage

Comments

@foundinblank
Copy link
Contributor

Describe the bug

This is to discuss the logic behind the existing calculation in the equality test and propose a more straightforward calculation.

Currently the calculation is:

count(*) + coalesce(abs(
sum(case when which_diff = 'a_minus_b' then 1 else 0 end) -
sum(case when which_diff = 'b_minus_a' then 1 else 0 end)
), 0)

This has the effect of returning unusual numbers in the console when a_minus_b vs b_minus_a are different numbers.

Example:

  1. Failure table contains 1,000 rows total
  2. Failure table contains 250 rows of a_minus_b
  3. Failure table contains 750 rows of b_minus_a
  4. My expectation is the output in the console would report [FAIL 1000] because the failure table has 1,000 rows
  5. However, the actual output is [FAIL 1500] because
    1. count(*) = 1,000
    2. abs(250-750) = -500
    3. total = 1,500

As a dbt user, if I see 1,500 failures, I expect to find 1,500 failing rows, and it's very confusing when I investigate and find 1,000 failing rows instead.

This calculation seems to have been part of the equality test since it was first committed in 2017 by @jthandy. Would anyone know why the calculation is set up in this way, and if we would be fine with just a count(*) instead? Happy to make the PR for it after some discussion! If there is a good reason for keeping the calculation as-is, I'm happy to write additional comments in the macro file for other users who may be confused too 😄

Are you interested in contributing the fix?

Yes!

@foundinblank foundinblank added bug Something isn't working triage labels Sep 14, 2023
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 13, 2024
@foundinblank
Copy link
Contributor Author

Let's keep this open

@github-actions github-actions bot removed the Stale label Mar 14, 2024
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Sep 10, 2024
Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale triage
Projects
None yet
Development

No branches or pull requests

1 participant