-
-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[back-37] jellyfish data migration #665
[back-37] jellyfish data migration #665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of comments...
) | ||
|
||
func main() { | ||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll send you what I have for the interrupt/context code later this afternoon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I looked into integrating cancelable contexts into the existing Runner interface and it is going to be a wider change than makes sense for this code. Furthermore, to "hack" it in to this code is going to duplicate much of runner.go
(error channel, signal channel, select, etc.) so perhaps it is best we just defer the work. Worst case it won't have a completely clean exit when interrupted, but perhaps that isn't such a big deal. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah based on our discussion it doesn't make sense sense for us to do this now. As you said we can restart the process as our query will just find all of those Jellyfish records that do not yet have a _depulicator.hash
set.
migrations/back_37/back_37.go
Outdated
|
||
func createDatumHash(bsonData bson.M) (string, error) { | ||
identityFields := []string{} | ||
if userID, err := getValidatedString(bsonData, "_userId"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Maybe, for the sake of consistency, userID
-> datumUserID
(and deviceID
, deliveryType
, subType
, units,
value` below).
migrations/back_37/back_37.go
Outdated
valueRaw, ok := bsonData["value"] | ||
if !ok { | ||
return "", errors.New("value is missing") | ||
} | ||
val, ok := valueRaw.(float64) | ||
if !ok { | ||
return "", errors.New("value is not of expected type") | ||
} | ||
|
||
if len(fmt.Sprintf("%v", valueRaw)) > 7 { | ||
// NOTE: we need to ensure the same precision for the | ||
// converted value as it is used to calculate the hash | ||
mgdlVal := val*18.01559 + 0.5 | ||
mgdL := glucose.MgdL | ||
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL) | ||
} | ||
strVal := strconv.FormatFloat(val, 'f', -1, 64) | ||
identityFields = append(identityFields, strVal) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Consider collapsing into one block.
valueRaw, ok := bsonData["value"] | |
if !ok { | |
return "", errors.New("value is missing") | |
} | |
val, ok := valueRaw.(float64) | |
if !ok { | |
return "", errors.New("value is not of expected type") | |
} | |
if len(fmt.Sprintf("%v", valueRaw)) > 7 { | |
// NOTE: we need to ensure the same precision for the | |
// converted value as it is used to calculate the hash | |
mgdlVal := val*18.01559 + 0.5 | |
mgdL := glucose.MgdL | |
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL) | |
} | |
strVal := strconv.FormatFloat(val, 'f', -1, 64) | |
identityFields = append(identityFields, strVal) | |
if valRaw, ok := bsonData["value"]; !ok { | |
return "", errors.New("value is missing") | |
} else if val, ok := valRaw.(float64); !ok { | |
return "", errors.New("value is not of expected type") | |
} else { | |
if len(fmt.Sprintf("%v", valueRaw)) > 7 { | |
// NOTE: we need to ensure the same precision for the | |
// converted value as it is used to calculate the hash | |
mgdlVal := val*18.01559 + 0.5 | |
mgdL := glucose.MgdL | |
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL) | |
} | |
identityFields = append(identityFields, strconv.FormatFloat(val, 'f', -1, 64)) | |
} |
I don't think the check for length > 7 is going to work. What happens if the value is "12.34567". It's length is 8 (so > 7), but it shouldn't be doing any truncating. I suppose though, it would just be unnecessary work??? So, maybe not a big deal?
Is the + 0.5
necessary? If we have a large amount of decimal places, then I'm sure they came from simply dividing a mg/dL value by 18.01559. If we add 0.5 then this may throw off the calculations. What does jellyfish (or the Uploader code that uses jellyfish) do with mg/dL values before converting to mmol/L? I suppose in one sense we need to "undo" any conversion jellyfish did and then reapply the platform conversion, yes?
We should probably be paranoid and check the units before we multiply by 18.01559. If it is already mg/dL then this would be bad.
We should use the 18.01559 constant from glucose.go
, specifically MmolLToMgdLConversionFactor
. That way there is a single instance of this value in all of the platform code.
On a bigger picture, I'm wondering if we should just truncate what we have directly rather that convert to mg/dL and back to mmol/L.
Finally, looking at the code from platform
in glucose.go
in NormalizeValueForUnits
it only does this "truncation of more that 6 digits past the decimal" if the incoming units are mg/dL. It leaves incoming mmol/L values as-is. I think what this means is that we'll need to ensure the new Uploader code that replaces jellyfish with platform will either 1) send mg/dL values, or 2) send mmol/L values and truncate to 6 digits in the Uploader before uploading. So, if the Uploader will upload mmol/L values NOT truncated to 6 digits for some devices, then this code here will need to change. My concern here is that we'd end up with a different deduplicate hash from the one after the Uploader is sending to platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok - I have reworked that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what this means is that we'll need to ensure the new Uploader code that replaces jellyfish with platform will either 1) send mg/dL values, or 2) send mmol/L values and truncate to 6 digits in the Uploader before uploading. So, if the Uploader will upload mmol/L values NOT truncated to 6 digits for some devices, then this code here will need to change. My concern here is that we'd end up with a different deduplicate hash from the one after the Uploader is sending to platform.
So wouldn't we just do the same as other devices the uploader ingests data from? From my understanding
- it takes the units and value as whatever the devices gives us and then the platform normalize's that to save as
mmol/L
units and converts the value (if it wasn't mmol/L). - the platform truncation would occur that same as any other device that is uploaded
- the hash is all good as both our JF hash and migration hash use a truncated value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all of the above is correct.
I think the problem is in the second bullet, "the platform truncation would occur that same as any other device that is uploaded". Specifically, from glucose.go
in platform
:
func NormalizeValueForUnits(value *float64, units *string) *float64 {
if value != nil && units != nil {
switch *units {
case MgdL, Mgdl:
intValue := int(*value/MmolLToMgdLConversionFactor*MmolLToMgdLPrecisionFactor + 0.5)
floatValue := float64(intValue) / MmolLToMgdLPrecisionFactor
return &floatValue
}
}
return value
}
Note that the truncation only occurs when converting from mg/dL to mmol/L and does not occur if the units uploaded is already mmol/L. Thus, for this migration to be consistent with any new uploader code that moves from Jellyfish to Platform uploads, the new Uploader code has to conform to either 1) send mg/dL values (and above normalization code will truncate), or 2) send mmol/L values already truncated to 6 digits using the same formula.
For example, with this migration code, all mmol/L values will be truncated. This will be the same iff the new Uploader code sends mg/dL or does the truncation itself. However, if the new Uploader code sends non-truncated mmol/L values, the platform
will not do any truncation (see NormalizeValueForUnits
above) and therefore the deduplicator hash generated by this migration code (which has truncated all values) will not match the deduplicator hash generated when the new Uploader code uploads non-truncated mmol/L values.
Does that make sense?
migrations/back_37/back_37.go
Outdated
hashUpdatedCount++ | ||
} | ||
if err := jellyfishDocCursor.Err(); err != nil { | ||
logger.WithError(err).Error("error while fetching data. Please re-run to complete the migration.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Capitalize first letter of log message.
I updated a couple of comments. I think this is mostly looking good, but may change when we figure out the downstream requirements (e.g. interleaved Jellyfish and Platform uploads). |
NOTE: not to merge until we have done test runs in various environments
_depulicator.hash
_id
of typeobjectId
_id
andmodifiedTime
remain the same as when we started