-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR - content hashing caching strategy for business rules. #650
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
.. _16-content-checksums-for-tracked-models | ||
|
||
16 Content checksums for Tracked Models | ||
======================================= | ||
|
||
|
||
Status | ||
------ | ||
|
||
Proposed | ||
|
||
|
||
Context | ||
------- | ||
|
||
After running the business rules it is useful to store the results of checks against models, and have a mechanism to check if the result is still valid to avoid re-running the check in future. | ||
|
||
This ADR adds a mechanims to checksum the user-editable data to check if to versions of a Tracked Model are equivilent. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm uncertain whether there are fields that can be mutated but are not user editable. If there is a difference, and editable fields are a subset of mutable fields, then would mutable be a preferable term here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. --> mechanism |
||
|
||
In the simplest sense this means hashing fields that exclude the PK. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's meant by "fields that exclude the PK" ? |
||
|
||
|
||
TAP already has a mechanism to get the user editable fields, the attribute `.copyable_fields`, which provides a starting point. | ||
|
||
Unlike pythons default hashing, it is nessacary to distinguish between two different types that produce the same hash - so the string that gets passed the hashing function ("hashable_string"), will include the module and class name. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to give an example where two different types produce the same hash? It's probably trivial, but I have Monday brain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. --> necessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it worth adding a link to docs on python's default hashing? |
||
|
||
|
||
The hash generated in this ADR will not currently be attached to TrackedModel - thought it may be desirable in future. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Attached == cached? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Saved to it's own field I guess. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. --> though |
||
By limiting the scope to business rule checks this gives time for the code to bed in, and makes it easier to change things if there are issues. | ||
|
||
|
||
Decision | ||
-------- | ||
|
||
The fields to be hashed are contained in `copyable_fields` but since this doesn't communicate the intent used here, this will be proxied as `mutable_fields`, which is explicitly about providing which fields can be hashed to check equivilence. | ||
|
||
|
||
TrackedModel will gain an API named get_content_hash() that returns a sha256 hash. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose it only matters that it's a sufficiently unique hash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to be more specfic than "API"? Would it be a class_method? As I read it currently, it sounds a bit like the hash will be attached to the tracked model, which contradicts what's written on L28. I know that's not the intent, but just be good to clarify There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I guess I meant a method - e.g. .content_hash(), vs something we save forever in a field. |
||
|
||
get_content_hash() - | ||
|
||
Iterates the fields in `mutable_fields` and building a dict keyed by field name, the value contains the location __module__ and __name__ of the field. | ||
|
||
If the field is not TrackedModel then the value includes the sha256 of their content, for foreign keys that are TrackedModels the value is obtained by calling get_content_hash() | ||
|
||
|
||
Consequences | ||
------------ | ||
|
||
This provides a mechanism to store the results of business rule checks, only becomming invalid when the TrackedModels they reference change. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will part of this work involve creating a list of rules to which this kind of caching will be applicable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, though it doesn't need to be exhaustive - as it can be extended if needed later. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. --> becoming |
||
|
||
When a workbasket is published, the validity of all other unpublished workbaskets must be verified - with this system, most of the business rule checks would remain valid and the only the checksums need to be verified. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
May be clearer to say "... only the content hash needs to be verified."? |
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1168,3 +1168,5 @@ measure_qs)\n | |
MeasureComponent | ||
Components | ||
TP2000-452 | ||
Tracked Model | ||
sha256 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about what can invalidate a check after it has completed. We know that after a check has been run, then newly published tariff data can cause some types of check to become invalid - an ME32 check against a Measure, for instance. So hashing doesn't help avoid the need to run a check again in those circumstances.
It would be worth pointing out that caveat here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is really important - I've updated the PR description to cover this, and will update the ADR to reflect this (date validity checkers will be their own adr)