Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[back-37] jellyfish data migration #665

Merged
merged 5 commits into from
Nov 8, 2023

Conversation

jh-bate
Copy link
Contributor

@jh-bate jh-bate commented Oct 3, 2023

NOTE: not to merge until we have done test runs in various environments

  • process only jellyfish data that doesn't have a _depulicator.hash
  • jellyfish data is denoted by the lack of a _id of type objectId
  • only set the hash, do not modify the datum in any other way
  • only update the record id the _id and modifiedTime remain the same as when we started

@jh-bate jh-bate requested a review from darinkrauss October 4, 2023 02:32
@jh-bate jh-bate changed the title taketwo [back-37] jellyfish data migration Oct 4, 2023
Copy link
Contributor

@darinkrauss darinkrauss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of comments...

)

func main() {
ctx := context.Background()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll send you what I have for the interrupt/context code later this afternoon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I looked into integrating cancelable contexts into the existing Runner interface and it is going to be a wider change than makes sense for this code. Furthermore, to "hack" it in to this code is going to duplicate much of runner.go (error channel, signal channel, select, etc.) so perhaps it is best we just defer the work. Worst case it won't have a completely clean exit when interrupted, but perhaps that isn't such a big deal. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah based on our discussion it doesn't make sense sense for us to do this now. As you said we can restart the process as our query will just find all of those Jellyfish records that do not yet have a _depulicator.hash set.


func createDatumHash(bsonData bson.M) (string, error) {
identityFields := []string{}
if userID, err := getValidatedString(bsonData, "_userId"); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Maybe, for the sake of consistency, userID -> datumUserID (and deviceID, deliveryType, subType, units, value` below).

migrations/back_37/back_37.go Outdated Show resolved Hide resolved
Comment on lines 101 to 118
valueRaw, ok := bsonData["value"]
if !ok {
return "", errors.New("value is missing")
}
val, ok := valueRaw.(float64)
if !ok {
return "", errors.New("value is not of expected type")
}

if len(fmt.Sprintf("%v", valueRaw)) > 7 {
// NOTE: we need to ensure the same precision for the
// converted value as it is used to calculate the hash
mgdlVal := val*18.01559 + 0.5
mgdL := glucose.MgdL
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL)
}
strVal := strconv.FormatFloat(val, 'f', -1, 64)
identityFields = append(identityFields, strVal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Consider collapsing into one block.

Suggested change
valueRaw, ok := bsonData["value"]
if !ok {
return "", errors.New("value is missing")
}
val, ok := valueRaw.(float64)
if !ok {
return "", errors.New("value is not of expected type")
}
if len(fmt.Sprintf("%v", valueRaw)) > 7 {
// NOTE: we need to ensure the same precision for the
// converted value as it is used to calculate the hash
mgdlVal := val*18.01559 + 0.5
mgdL := glucose.MgdL
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL)
}
strVal := strconv.FormatFloat(val, 'f', -1, 64)
identityFields = append(identityFields, strVal)
if valRaw, ok := bsonData["value"]; !ok {
return "", errors.New("value is missing")
} else if val, ok := valRaw.(float64); !ok {
return "", errors.New("value is not of expected type")
} else {
if len(fmt.Sprintf("%v", valueRaw)) > 7 {
// NOTE: we need to ensure the same precision for the
// converted value as it is used to calculate the hash
mgdlVal := val*18.01559 + 0.5
mgdL := glucose.MgdL
val = *glucose.NormalizeValueForUnits(&mgdlVal, &mgdL)
}
identityFields = append(identityFields, strconv.FormatFloat(val, 'f', -1, 64))
}

I don't think the check for length > 7 is going to work. What happens if the value is "12.34567". It's length is 8 (so > 7), but it shouldn't be doing any truncating. I suppose though, it would just be unnecessary work??? So, maybe not a big deal?

Is the + 0.5 necessary? If we have a large amount of decimal places, then I'm sure they came from simply dividing a mg/dL value by 18.01559. If we add 0.5 then this may throw off the calculations. What does jellyfish (or the Uploader code that uses jellyfish) do with mg/dL values before converting to mmol/L? I suppose in one sense we need to "undo" any conversion jellyfish did and then reapply the platform conversion, yes?

We should probably be paranoid and check the units before we multiply by 18.01559. If it is already mg/dL then this would be bad.

We should use the 18.01559 constant from glucose.go, specifically MmolLToMgdLConversionFactor. That way there is a single instance of this value in all of the platform code.

On a bigger picture, I'm wondering if we should just truncate what we have directly rather that convert to mg/dL and back to mmol/L.

Finally, looking at the code from platform in glucose.go in NormalizeValueForUnits it only does this "truncation of more that 6 digits past the decimal" if the incoming units are mg/dL. It leaves incoming mmol/L values as-is. I think what this means is that we'll need to ensure the new Uploader code that replaces jellyfish with platform will either 1) send mg/dL values, or 2) send mmol/L values and truncate to 6 digits in the Uploader before uploading. So, if the Uploader will upload mmol/L values NOT truncated to 6 digits for some devices, then this code here will need to change. My concern here is that we'd end up with a different deduplicate hash from the one after the Uploader is sending to platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok - I have reworked that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what this means is that we'll need to ensure the new Uploader code that replaces jellyfish with platform will either 1) send mg/dL values, or 2) send mmol/L values and truncate to 6 digits in the Uploader before uploading. So, if the Uploader will upload mmol/L values NOT truncated to 6 digits for some devices, then this code here will need to change. My concern here is that we'd end up with a different deduplicate hash from the one after the Uploader is sending to platform.

So wouldn't we just do the same as other devices the uploader ingests data from? From my understanding

  • it takes the units and value as whatever the devices gives us and then the platform normalize's that to save as mmol/L units and converts the value (if it wasn't mmol/L).
  • the platform truncation would occur that same as any other device that is uploaded
  • the hash is all good as both our JF hash and migration hash use a truncated value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all of the above is correct.

I think the problem is in the second bullet, "the platform truncation would occur that same as any other device that is uploaded". Specifically, from glucose.go in platform:

func NormalizeValueForUnits(value *float64, units *string) *float64 {
	if value != nil && units != nil {
		switch *units {
		case MgdL, Mgdl:
			intValue := int(*value/MmolLToMgdLConversionFactor*MmolLToMgdLPrecisionFactor + 0.5)
			floatValue := float64(intValue) / MmolLToMgdLPrecisionFactor
			return &floatValue
		}
	}
	return value
}

Note that the truncation only occurs when converting from mg/dL to mmol/L and does not occur if the units uploaded is already mmol/L. Thus, for this migration to be consistent with any new uploader code that moves from Jellyfish to Platform uploads, the new Uploader code has to conform to either 1) send mg/dL values (and above normalization code will truncate), or 2) send mmol/L values already truncated to 6 digits using the same formula.

For example, with this migration code, all mmol/L values will be truncated. This will be the same iff the new Uploader code sends mg/dL or does the truncation itself. However, if the new Uploader code sends non-truncated mmol/L values, the platform will not do any truncation (see NormalizeValueForUnits above) and therefore the deduplicator hash generated by this migration code (which has truncated all values) will not match the deduplicator hash generated when the new Uploader code uploads non-truncated mmol/L values.

Does that make sense?

migrations/back_37/back_37.go Outdated Show resolved Hide resolved
migrations/back_37/back_37.go Show resolved Hide resolved
migrations/back_37/back_37.go Show resolved Hide resolved
hashUpdatedCount++
}
if err := jellyfishDocCursor.Err(); err != nil {
logger.WithError(err).Error("error while fetching data. Please re-run to complete the migration.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Capitalize first letter of log message.

migrations/back_37/back_37.go Outdated Show resolved Hide resolved
@jh-bate jh-bate requested a review from darinkrauss October 5, 2023 04:12
@darinkrauss
Copy link
Contributor

I updated a couple of comments. I think this is mostly looking good, but may change when we figure out the downstream requirements (e.g. interleaved Jellyfish and Platform uploads).

@jh-bate jh-bate changed the base branch from master to jellyfish-migration-updates_0 November 8, 2023 21:58
@jh-bate jh-bate marked this pull request as ready for review November 8, 2023 21:59
@jh-bate jh-bate merged commit e8e07ff into jellyfish-migration-updates_0 Nov 8, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants