Fix capitalisation in example data #2143

alarthast · 2024-10-07T12:48:10Z

Incorrect capitalisation in the example practice_registrations data leads to a ValueError being raised when used
Capitalisation is now fixed and a new unit test is added to check that all example data files can be loaded correctly.

rebkwok · 2024-10-08T08:42:24Z

tests/unit/test_example_data.py

+
+
+@pytest.mark.parametrize("filename,ql_table", zip(filenames, ql_tables))
+def test_read_rows(filename, ql_table):


Could you add a comment about what this test is doing? It's not immediately obvious that CSVRowsReader is validating it against the table's column specs

evansd

Thanks Alice. I think ideally we'd split the data fixes out from the tests in separate commits as they're distinct changes.

There's a wider question here as to exactly what tables we ought to be including in the example data. We don't necessarily have to address that immediately, but we're going to need to resolve it at some point so it might make sense to think about this now.

evansd · 2024-10-08T08:43:34Z

tests/unit/test_example_data.py

+    table_nodes = get_table_nodes(ql_table._qm_node)
+    [table] = table_nodes  # There should only be one table
+    column_specs = get_column_specs_from_schema(table.schema)
+
+    CSVRowsReader(
+        Path(f"ehrql/example-data/{filename}"),
+        column_specs=column_specs,
+        allow_missing_columns=True,
+    )


This test ends up duplicating some of the logic from LocalFileQueryEngine where it might be better to use that logic directly. It would also avoid the awkwardness of having to manually specific the filenames.

You could do this with something like:

LocalFileQueryEngine("path/to/example-data").populate_database([ql_table._qm_node])

Which will throw an error if there's anything wrong with the data for that tabe.

evansd · 2024-10-08T08:45:31Z

tests/unit/test_example_data.py

+    column_specs = get_column_specs_from_schema(table.schema)
+
+    CSVRowsReader(
+        Path(f"ehrql/example-data/{filename}"),


It might be better to get the path from the module, rather than make assumptions about the current directory. So something like:

Path(ehrql.__file__).parent / "example-data"

evansd · 2024-10-08T08:49:14Z

tests/unit/test_example_data.py

+    tpp.addresses,
+    tpp.clinical_events,
+    tpp.medications,
+    core.ons_deaths,
+    core.patients,
+    tpp.practice_registrations,


It's not obvious where this particular list of tables comes from. That's not a reflection on your code! I just don't think we've been systematic in deciding what tables we're providing example data for. A reasonable approach would be: everything in core and every table used in the tutorial.

But whatever we decide, we should be dynamically constructing the list of tables here otherwise someone could add a new core table, or add a table to the tutorial, and nothing would tell them that they had failed to add it to the example data.

After discussion in call with Dave: the core tables are now read from core.__all__ while the tpp tables are in a hard coded list.

If a core table is added without adding example data, the test will throw a FileValidationError.
Without updating the hard-coded TPP_TABLES list, no errors will be thrown if the tutorial uses a tpp-only table and there is no corresponding example data. This is already the case for tpp.apcs.

Track in new issue #2146 .

cloudflare-workers-and-pages · 2024-10-08T08:56:13Z

Deploying databuilder-docs with Cloudflare Pages

Latest commit:	`0913b68`
Status:	✅ Deploy successful!
Preview URL:	https://d9afac41.databuilder.pages.dev
Branch Preview URL:	https://fix-example-data.databuilder.pages.dev

View logs

alarthast linked an issue Oct 7, 2024 that may be closed by this pull request

Inconsistent capitalisation of "the" causing problems with example data #2106

Closed

rebkwok approved these changes Oct 8, 2024

View reviewed changes

rebkwok reviewed Oct 8, 2024

View reviewed changes

evansd reviewed Oct 8, 2024

View reviewed changes

alarthast force-pushed the fix-example-data branch from ded72ce to c68f7df Compare October 8, 2024 08:55

github-actions bot deployed to databuilder-docs (Preview) October 8, 2024 08:55 View deployment

fix capitalisation in example data

0676366

alarthast force-pushed the fix-example-data branch from c68f7df to 0676366 Compare October 8, 2024 13:24

github-actions bot deployed to databuilder-docs (Preview) October 8, 2024 13:24 View deployment

Add unit test for example data validation

0913b68

github-actions bot deployed to databuilder-docs (Preview) October 8, 2024 14:25 View deployment

alarthast mentioned this pull request Oct 8, 2024

Define criteria for tables/columns to be provided in example-table CSV files #2146

Open

alarthast merged commit 9d917e2 into main Oct 8, 2024
8 checks passed

alarthast deleted the fix-example-data branch October 8, 2024 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix capitalisation in example data #2143

Fix capitalisation in example data #2143

alarthast commented Oct 7, 2024

rebkwok Oct 8, 2024

evansd left a comment

evansd Oct 8, 2024

evansd Oct 8, 2024

evansd Oct 8, 2024

alarthast Oct 8, 2024

cloudflare-workers-and-pages bot commented Oct 8, 2024 •

edited

Loading



		@pytest.mark.parametrize("filename,ql_table", zip(filenames, ql_tables))
		def test_read_rows(filename, ql_table):

Fix capitalisation in example data #2143

Fix capitalisation in example data #2143

Conversation

alarthast commented Oct 7, 2024

rebkwok Oct 8, 2024

Choose a reason for hiding this comment

evansd left a comment

Choose a reason for hiding this comment

evansd Oct 8, 2024

Choose a reason for hiding this comment

evansd Oct 8, 2024

Choose a reason for hiding this comment

evansd Oct 8, 2024

Choose a reason for hiding this comment

alarthast Oct 8, 2024

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented Oct 8, 2024 • edited Loading

Deploying databuilder-docs with Cloudflare Pages

cloudflare-workers-and-pages bot commented Oct 8, 2024 •

edited

Loading