Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of missing users #337

Closed
bmtcril opened this issue Aug 16, 2023 · 5 comments · Fixed by #338
Closed

Improve handling of missing users #337

bmtcril opened this issue Aug 16, 2023 · 5 comments · Fixed by #338

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Aug 16, 2023

Currently if a user can't be found to match a username or id, ERB will generate a random single-use actor UUID for the event. I think that for our purposes it would be better to drop the event since it cannot be mapped to a real user for purposes of education, and confuses the data for purposes of analytics.

The only cases I can think of where this might happen are:

1- Tracking log replay on a system that doesn't have the correct database of users to look up, which we should probably not allow since it will create a new actor id for every single event
2- Retired users, where we should respect their right to be forgotten and not load data for them anyway

@ziafazal @pomegranited @Ian2012 what do you all think about this? Am I missing anything?

@Ian2012
Copy link
Contributor

Ian2012 commented Aug 16, 2023

There are stage environments where it's essential to load real-world data without losing analytics capabilities, especially for the Aspects beta.

I'm not sure, but what if we generate an idempotent UUID based on a configurable secret ERB_SECRET_UUID/ERB_SECRET_KEY and from there generate a random UUID based on that seed?

This way we keep the anonymity, the event ID will be consistent, and the analytics capabilities will be ok.

@bmtcril
Copy link
Contributor Author

bmtcril commented Aug 16, 2023

One problem is that without the Django User model we will generate different uuids for the user id and the username since it can't link them.

@ziafazal
Copy link
Contributor

Yes skipping the event when there is not a valid user who performed the event available makes sense. It would improve integrity of data. However, we'd not be able to transform any events where a user might not be linked to event like system/device generated events. Looking at this document I was not able to find any events where a user is not linked to it.

@bmtcril
Copy link
Contributor Author

bmtcril commented Aug 16, 2023

@ziafazal I think if we wanted to have non-user actors we would need to implement them separately anyway, right? We wouldn't want the same namespace to be used for "machine actors" like management commands or anonymous API calls (if we implemented those) as normal users, right?

@ziafazal
Copy link
Contributor

ziafazal commented Aug 17, 2023

@bmtcril yes for non actor events we'll have to adopt a different approach which might be creating a UUID5 based identifiers for those actors with a specific namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants