Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy data should be time-invariant #2102

Open
evansd opened this issue Sep 9, 2024 · 0 comments
Open

Dummy data should be time-invariant #2102

evansd opened this issue Sep 9, 2024 · 0 comments

Comments

@evansd
Copy link
Contributor

evansd commented Sep 9, 2024

Dummy data generation is currently deterministic in one sense as we used a fixed PRNG seed. However, when we generate dates we do so as offsets from today's date which means that, while the pattern of dates we generate is deterministic, the actual dates produced change every day:

def generate_patient_facts(self, patient_id):
# Seed the random generator using the patient_id so we always generate the same
# data for the same patient
self.rnd.seed(f"{self.random_seed}:{patient_id}")
# TODO: We could obviously generate more realistic age distributions than this
date_of_birth = self.today - timedelta(days=self.rnd.randrange(0, 120 * 365))
age_days = self.rnd.randrange(105 * 365)
date_of_death = date_of_birth + timedelta(days=age_days)
self.date_of_birth = date_of_birth
self.date_of_death = date_of_death if date_of_death < self.today else None
self.events_start = self.date_of_birth
self.events_end = min(self.today, date_of_death)

This was done as the quickest way of generating "reasonable" looking dates but we've always known its inadequate:

# TODO: I dislike using today's date as part of the data generation because it
# makes the results non-deterministic. However until we're able to infer a
# suitable time range by inspecting the query, this will have to do.
self.today = today if today is not None else date.today()

Ideally, we'd inspect all the various date filters used in a given dataset definition and then, based on that, come up with a suitable date range for generating dummy events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant