Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix capitalisation in example data #2143

Merged
merged 2 commits into from
Oct 8, 2024
Merged

Fix capitalisation in example data #2143

merged 2 commits into from
Oct 8, 2024

Conversation

alarthast
Copy link
Contributor

Fixes #2106.

  • Incorrect capitalisation in the example practice_registrations data leads to a ValueError being raised when used
  • Capitalisation is now fixed and a new unit test is added to check that all example data files can be loaded correctly.



@pytest.mark.parametrize("filename,ql_table", zip(filenames, ql_tables))
def test_read_rows(filename, ql_table):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment about what this test is doing? It's not immediately obvious that CSVRowsReader is validating it against the table's column specs

Copy link
Contributor

@evansd evansd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alice. I think ideally we'd split the data fixes out from the tests in separate commits as they're distinct changes.

There's a wider question here as to exactly what tables we ought to be including in the example data. We don't necessarily have to address that immediately, but we're going to need to resolve it at some point so it might make sense to think about this now.

Comment on lines 32 to 42
table_nodes = get_table_nodes(ql_table._qm_node)
[table] = table_nodes # There should only be one table
column_specs = get_column_specs_from_schema(table.schema)

CSVRowsReader(
Path(f"ehrql/example-data/{filename}"),
column_specs=column_specs,
allow_missing_columns=True,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test ends up duplicating some of the logic from LocalFileQueryEngine where it might be better to use that logic directly. It would also avoid the awkwardness of having to manually specific the filenames.

You could do this with something like:

LocalFileQueryEngine("path/to/example-data").populate_database([ql_table._qm_node])

Which will throw an error if there's anything wrong with the data for that tabe.

column_specs = get_column_specs_from_schema(table.schema)

CSVRowsReader(
Path(f"ehrql/example-data/{filename}"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to get the path from the module, rather than make assumptions about the current directory. So something like:

Path(ehrql.__file__).parent / "example-data"

Comment on lines 21 to 26
tpp.addresses,
tpp.clinical_events,
tpp.medications,
core.ons_deaths,
core.patients,
tpp.practice_registrations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious where this particular list of tables comes from. That's not a reflection on your code! I just don't think we've been systematic in deciding what tables we're providing example data for. A reasonable approach would be: everything in core and every table used in the tutorial.

But whatever we decide, we should be dynamically constructing the list of tables here otherwise someone could add a new core table, or add a table to the tutorial, and nothing would tell them that they had failed to add it to the example data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion in call with Dave: the core tables are now read from core.__all__ while the tpp tables are in a hard coded list.

If a core table is added without adding example data, the test will throw a FileValidationError.
Without updating the hard-coded TPP_TABLES list, no errors will be thrown if the tutorial uses a tpp-only table and there is no corresponding example data. This is already the case for tpp.apcs.

Track in new issue #2146 .

Copy link

cloudflare-workers-and-pages bot commented Oct 8, 2024

Deploying databuilder-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0913b68
Status: ✅  Deploy successful!
Preview URL: https://d9afac41.databuilder.pages.dev
Branch Preview URL: https://fix-example-data.databuilder.pages.dev

View logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent capitalisation of "the" causing problems with example data
3 participants