You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, both the internal tables and the dissemination tables have no indicator of their composition. For example, consider the audit for the City of Sandwich.
There are 60 fields in that record. Once published, it never changes. However, neither we (the FAC) nor users of the FAC have any way of knowing that nothing has changed.
a traditional solution
A traditional solution would be to hash the data in some deterministic, repeatable way, and publish that hash. (For example: sort the keys in alpha order, and then hash a string concatenation of all of the keys and data.) We would then add this hash to our data:
We realized that we have no way of knowing if our data at rest remains constant over time. Users of the FAC, similarly, have no idea if data they fetch on one day is the same as data fetched on the next. This is a long-standing problem for users of the FAC (agencies, oversight): their lived experience was that data would change, but they would not know how or why.
As we look at migrating data from our existing internal tables to new designs, we need a way of knowing that the data has not changed from one representation to the next. One way to achieve this is to hash the data for both representations in equivalent ways, and compare these values as part of the migration process.
As we engage in curation work, hashing would help us identify records that should have been updated (and were), and records that should not have changed (and were not). Or, stated as a problem: when we curate data, we do not (currently) have a way of asserting conclusively that records that should not have changed absolutely have not changed.
PDFs are not hashed, currently, which suggests we do not have a way of asserting that audit reports have not changed over time.
Job Story(s)
When I [situation], I want to [motivation] so I can [outcome/benefit].
What are we planning to do about it?
We believe hashing of the data is appropriate. If there is another appropriate solution/path that solves the above problems, we should consider that.
What are we not planning to do about it?
We need confidence in our migration and data over time. This should happen as part of the "source of truth" work, as it sets us up for resubmission and curation work.
How will we measure success?
When we have a way of repeatably hashing all data in our current model, and in the new model, and can confirm that data migrated between the two hashes identically (perhaps through the API?), we will know we succeeded.
It allays many concerns. While our data is encrypted at rest, it is not hashed. Therefore, we do not know when or if data has changed. This addresses that concern.
Process checklist
Has a clear story statement
Can reasonably be done in a few days (otherwise, split this up!)
Screen reader - Listen to the experience with a screen reader extension, ensure the information presented in order
Keyboard navigation - Run through acceptance criteria with keyboard tabs, ensure it works.
Text scaling - Adjust viewport to 1280 pixels wide and zoom to 200%, ensure everything renders as expected. Document 400% zoom issues with USWDS if appropriate.
The text was updated successfully, but these errors were encountered:
Problem
Currently, both the internal tables and the dissemination tables have no indicator of their composition. For example, consider the audit for the City of Sandwich.
There are 60 fields in that record. Once published, it never changes. However, neither we (the FAC) nor users of the FAC have any way of knowing that nothing has changed.
a traditional solution
A traditional solution would be to hash the data in some deterministic, repeatable way, and publish that hash. (For example: sort the keys in alpha order, and then hash a string concatenation of all of the keys and data.) We would then add this hash to our data:
How did we discover this problem?
We realized that we have no way of knowing if our data at rest remains constant over time. Users of the FAC, similarly, have no idea if data they fetch on one day is the same as data fetched on the next. This is a long-standing problem for users of the FAC (agencies, oversight): their lived experience was that data would change, but they would not know how or why.
As we look at migrating data from our existing internal tables to new designs, we need a way of knowing that the data has not changed from one representation to the next. One way to achieve this is to hash the data for both representations in equivalent ways, and compare these values as part of the migration process.
As we engage in curation work, hashing would help us identify records that should have been updated (and were), and records that should not have changed (and were not). Or, stated as a problem: when we curate data, we do not (currently) have a way of asserting conclusively that records that should not have changed absolutely have not changed.
PDFs are not hashed, currently, which suggests we do not have a way of asserting that audit reports have not changed over time.
Job Story(s)
When I [situation], I want to [motivation] so I can [outcome/benefit].
What are we planning to do about it?
We believe hashing of the data is appropriate. If there is another appropriate solution/path that solves the above problems, we should consider that.
What are we not planning to do about it?
We need confidence in our migration and data over time. This should happen as part of the "source of truth" work, as it sets us up for resubmission and curation work.
How will we measure success?
When we have a way of repeatably hashing all data in our current model, and in the new model, and can confirm that data migrated between the two hashes identically (perhaps through the API?), we will know we succeeded.
Security Considerations
Required per CM-4.
This does not introduce security concerns.
It allays many concerns. While our data is encrypted at rest, it is not hashed. Therefore, we do not know when or if data has changed. This addresses that concern.
Process checklist
If there's UI...
The text was updated successfully, but these errors were encountered: