-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Components that consume external data as a network must generate UUIDs for nodes #415
Comments
We could probably use the nodejs built-ins for hashing: https://nodejs.org/api/crypto.html#crypto_class_hash The rest of this sounds good to me! |
To be clear, it sounds like @jthrilly is talking about hashing some stringified version of a JavaScript object, rather than the source data directly (correct?). So there does need to be a well-defined serialization step before using crypto/hash. I see pros & cons to that; just want to make sure we're on the same page. |
Can you comment more on "Any component using external data"? Would every interface, for example, be responsible for generating this? Or could the loader (#413) just take care of this centrally? |
Can it be as simple as defining a primary key?
|
Yes, I am talking about hashing the serialized object, as the
Currently the only instances where we need a UID to be added to external data are within the name generator interfaces. Other consumers of external data don't need this step, though conceptually you could generalize the UID injection to be effectively a map operation that happens after the data is loaded but before it is passed to the interface. If we implemented it that way, it could be part of the loading method. What do you think?
I think this would certainly fit the bill. It can be implemented separately, potentially in a later release. |
I was thinking we'd use |
Additionally |
Just for completeness, the pros & cons I saw with hashing objects vs source data... The primary downside with using an external lib is related to determinism. We'd have to be careful about any version update to that lib, since a minor change to serialization could render persisted local data unrecognized as 'equal'. (Technically, there can also be information loss once the original data is parsed — e.g., Hashing the source (string) data, rather than a reconstructed version of it, avoids that. One upside to custom serialization (via |
Good overview Bryan!
I think realistically we are looking for within-version consistency. We won't auto-update with a new version of NC that changes this external lib. When we offer an update, it will be on the assumption that in-progress interview data will be lost. This will be up to researchers to manage. |
And following up from #443 — |
Using the above library, caching 1000 distinct objects, each ~450 bytes (based on the clinic example in the dev protocol), default settings:
As above, but with md5:
It probably makes sense to cache the values; mechanism will depend on some details in #413. |
Only makes sense to implement this once #413 has happened.
Concerning UUID issues (#397):
The text was updated successfully, but these errors were encountered: