-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of missing users #337
Comments
There are stage environments where it's essential to load real-world data without losing analytics capabilities, especially for the Aspects beta. I'm not sure, but what if we generate an idempotent UUID based on a configurable secret This way we keep the anonymity, the event ID will be consistent, and the analytics capabilities will be ok. |
One problem is that without the Django User model we will generate different uuids for the user id and the username since it can't link them. |
Yes skipping the event when there is not a valid user who performed the event available makes sense. It would improve integrity of data. However, we'd not be able to transform any events where a user might not be linked to event like system/device generated events. Looking at this document I was not able to find any events where a user is not linked to it. |
@ziafazal I think if we wanted to have non-user actors we would need to implement them separately anyway, right? We wouldn't want the same namespace to be used for "machine actors" like management commands or anonymous API calls (if we implemented those) as normal users, right? |
@bmtcril yes for non actor events we'll have to adopt a different approach which might be creating a UUID5 based identifiers for those actors with a specific namespace. |
Currently if a user can't be found to match a username or id, ERB will generate a random single-use actor UUID for the event. I think that for our purposes it would be better to drop the event since it cannot be mapped to a real user for purposes of education, and confuses the data for purposes of analytics.
The only cases I can think of where this might happen are:
1- Tracking log replay on a system that doesn't have the correct database of users to look up, which we should probably not allow since it will create a new actor id for every single event
2- Retired users, where we should respect their right to be forgotten and not load data for them anyway
@ziafazal @pomegranited @Ian2012 what do you all think about this? Am I missing anything?
The text was updated successfully, but these errors were encountered: