[ADR] Source of truth data testing #4763

phildominguez-gsa · 2025-03-10T15:07:48Z

Areas of impact

Context

The single source of truth implementation is a complex change that impacts many areas of the app's codebase, so we need to ensure it will continue to operate normally and without loss of data. When we switch the app over to use a single source of truth, we want to ensure continued app functionality so we can have full confidence in our implementation.

Decision

We will be creating tests that compare submission data for the currently used SAC model with the new source of truth model. We will implement both real-time and out-of-band testing:

Real-time: Testing that will be performed on individual submissions that will run as part of the audit creation/submission/dissemination processes.
Out-of-band: This will be a Django command that compares existing submissions in bulk. A date range can be provided to limit the scope of the testing, which will allow us to set up a GitHub cron job to run a daily test on the previous day's submissions.
- We will also create a command to compare data via API requests, as opposed to just querying the data models (see [ADR] Data equivalence testing via SAC and SOT APIs #4767).

Data comparison strategy

Real-time
- The intent of using the models (Audit/SAC) for real-time insertion/update checking will be to ensure that the data being captured is represented in both the legacy SAC dataset and the new Audit dataset.
- An example, when the api/views.py is called to run access_and_submission_check it creates two objects; one SAC object and one Audit object. We want to ensure that the Audit object at the end of the Access checks/changes is present in both tables with the same permissions.
- As a call is made to either model to add/update a field within the model, we will use something like Lodash Intersection to ensure the values in Audit always contain at least the same information that is in the SAC object.
- This "check function" can incorporate direct string and list comparison to make sure something like SAC.auditee_ein is the same as Audit.auditee_ein when it's called, inserted, or modified.
- If any of these checks fail, we can log the results out to New Relic or other logging source with the report_id and the mismatched field name so that it can be investigated.

Consequences

The text was updated successfully, but these errors were encountered:

jadudm · 2025-03-11T11:58:39Z

This is good.

It sounded like, from discussion, that the model-based approach was where you wanted to be initially. (If I'm misinterpreting, please say so.) And, the API-driven approach is blocked by the API for the SOT tables being done.

I'm going to create a new ADR placeholder for API comparison, and attach it to the API work. (I'm going to copy-paste out the API portion of this one into a new draft ADR.) That isn't to say we shouldn't do it/you won't do it/etc., but that might keep this one simpler, so it can focus on how we're going to do the live comparison work. (Edit: #4767 is the API data comparison breakout.)

If I've misread the conversation, we can easily move things back around.

👍 splitting this makes sense, and let's focus on the internal/model-based comparison
🤔 let's keep this all in one place, and discuss further

phildominguez-gsa added the adr First step towards an architecture decision record label Mar 10, 2025

github-project-automation bot added this to FAC Mar 10, 2025

github-project-automation bot moved this to Triage in FAC Mar 10, 2025

phildominguez-gsa assigned phildominguez-gsa and anagradova Mar 10, 2025

phildominguez-gsa moved this from Triage to In Progress in FAC Mar 10, 2025

jadudm mentioned this issue Mar 11, 2025

[ADR] Data equivalence testing via SAC and SOT APIs #4767

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADR] Source of truth data testing #4763

[ADR] Source of truth data testing #4763

phildominguez-gsa commented Mar 10, 2025 •

edited

Loading

jadudm commented Mar 11, 2025 •

edited

Loading

[ADR] Source of truth data testing #4763

[ADR] Source of truth data testing #4763

Comments

phildominguez-gsa commented Mar 10, 2025 • edited Loading

Areas of impact

Related documents/links

Context

Decision

Data comparison strategy

Consequences

jadudm commented Mar 11, 2025 • edited Loading

phildominguez-gsa commented Mar 10, 2025 •

edited

Loading

jadudm commented Mar 11, 2025 •

edited

Loading