Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADR] Source of truth data testing #4763

Open
1 of 9 tasks
phildominguez-gsa opened this issue Mar 10, 2025 · 1 comment
Open
1 of 9 tasks

[ADR] Source of truth data testing #4763

phildominguez-gsa opened this issue Mar 10, 2025 · 1 comment
Assignees
Labels
adr First step towards an architecture decision record

Comments

@phildominguez-gsa
Copy link
Contributor

phildominguez-gsa commented Mar 10, 2025

Areas of impact

  • Compliance
  • Content
  • CX
  • Design
  • Engineering
  • Policy
  • Product
  • Process
  • UX

Related documents/links

Source of truth parent ticket: #4740

API data comparison (for post-submission comparison) is discussed further in #4767.

Context

The single source of truth implementation is a complex change that impacts many areas of the app's codebase, so we need to ensure it will continue to operate normally and without loss of data. When we switch the app over to use a single source of truth, we want to ensure continued app functionality so we can have full confidence in our implementation.

Decision

We will be creating tests that compare submission data for the currently used SAC model with the new source of truth model. We will implement both real-time and out-of-band testing:

  • Real-time: Testing that will be performed on individual submissions that will run as part of the audit creation/submission/dissemination processes.
  • Out-of-band: This will be a Django command that compares existing submissions in bulk. A date range can be provided to limit the scope of the testing, which will allow us to set up a GitHub cron job to run a daily test on the previous day's submissions.

Data comparison strategy

  • Real-time
    • The intent of using the models (Audit/SAC) for real-time insertion/update checking will be to ensure that the data being captured is represented in both the legacy SAC dataset and the new Audit dataset.
    • An example, when the api/views.py is called to run access_and_submission_check it creates two objects; one SAC object and one Audit object. We want to ensure that the Audit object at the end of the Access checks/changes is present in both tables with the same permissions.
    • As a call is made to either model to add/update a field within the model, we will use something like Lodash Intersection to ensure the values in Audit always contain at least the same information that is in the SAC object.
    • This "check function" can incorporate direct string and list comparison to make sure something like SAC.auditee_ein is the same as Audit.auditee_ein when it's called, inserted, or modified.
    • If any of these checks fail, we can log the results out to New Relic or other logging source with the report_id and the mismatched field name so that it can be investigated.

Consequences

@phildominguez-gsa phildominguez-gsa added the adr First step towards an architecture decision record label Mar 10, 2025
@github-project-automation github-project-automation bot moved this to Triage in FAC Mar 10, 2025
@phildominguez-gsa phildominguez-gsa moved this from Triage to In Progress in FAC Mar 10, 2025
@jadudm
Copy link
Contributor

jadudm commented Mar 11, 2025

This is good.

It sounded like, from discussion, that the model-based approach was where you wanted to be initially. (If I'm misinterpreting, please say so.) And, the API-driven approach is blocked by the API for the SOT tables being done.

I'm going to create a new ADR placeholder for API comparison, and attach it to the API work. (I'm going to copy-paste out the API portion of this one into a new draft ADR.) That isn't to say we shouldn't do it/you won't do it/etc., but that might keep this one simpler, so it can focus on how we're going to do the live comparison work. (Edit: #4767 is the API data comparison breakout.)

If I've misread the conversation, we can easily move things back around.

👍 splitting this makes sense, and let's focus on the internal/model-based comparison
🤔 let's keep this all in one place, and discuss further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adr First step towards an architecture decision record
Projects
Status: In Progress
Development

No branches or pull requests

3 participants