Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Filter out records with invalid relations #572

Closed
4 tasks done
csun-cpointe opened this issue Feb 7, 2025 · 3 comments · Fixed by #581
Closed
4 tasks done

Feature: Filter out records with invalid relations #572

csun-cpointe opened this issue Feb 7, 2025 · 3 comments · Fixed by #581
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@csun-cpointe
Copy link
Contributor

csun-cpointe commented Feb 7, 2025

Description

If a reference record has an invalid field, we would like to filter out the base record from the dataset. All other base records with valid reference data should be preserved.

DOD

Acceptance criteria required to realize the requested feature

  • Validate reference record fields
  • Validate that the required reference records are present
  • Remove the base record when a reference record is invalid or missing when required
  • Preserve the valid base record

BDD Scenarios:

  Scenario: An 1-1 or M-1 relation data record that has invalid data is removed

  Scenario: An 1-M relation data record that has invalid data is removed

  Scenario: A required 1-1 or M-1 relation data record that is not set is removed

  Scenario: A non-required 1-1 or M-1 relation data record that is not set is preserved

  Scenario: A required 1-M relation data record that is empty is removed

  Scenario: A required 1-M relation data record that is not set is removed

  Scenario: A non-required 1-M relation data record that is not set is preserved

Test Strategy/Script

How will this feature be verified?
The above BDD scenarios should fully test the added capability, so a passing build should suffice for verification.

References/Additional Context

A clear and concise description of any alternative solutions or features you've considered.
Add any other context, links, or screenshots about the feature request here.

  • After the validation, the dataset schema should remain the same
  • We should only use the spark transformation over the action when validating the dataset
  • Look into better performance implementation. i.e.: UDF function vs custom transformation
  • We will not implement the notification for the invalid records because without actual data processing, we will not able to detect the invalid record. This contradicts the goal of using only spark transformation
@csun-cpointe csun-cpointe added the enhancement New feature or request label Feb 7, 2025
@csun-cpointe csun-cpointe changed the title Feature: Spark-based record relation validation revision Feature: Filter out the records with invalid relations Feb 10, 2025
@nartieri nartieri changed the title Feature: Filter out the records with invalid relations Feature: Filter out records with invalid relations Feb 10, 2025
@csun-cpointe csun-cpointe added this to the 1.11.0 milestone Feb 10, 2025
@csun-cpointe csun-cpointe self-assigned this Feb 10, 2025
@csun-cpointe csun-cpointe modified the milestones: 1.11.0, 1.12.0 Feb 10, 2025
@csun-cpointe
Copy link
Contributor Author

DoD completed with @ewilkins-csi and @nartieri

@csun-cpointe
Copy link
Contributor Author

OTS completed with @carter-cundiff

@csun-cpointe csun-cpointe linked a pull request Feb 19, 2025 that will close this issue
@carter-cundiff
Copy link
Contributor

Testing passed:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants