-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix capitalisation in example data #2143
Conversation
tests/unit/test_example_data.py
Outdated
|
||
|
||
@pytest.mark.parametrize("filename,ql_table", zip(filenames, ql_tables)) | ||
def test_read_rows(filename, ql_table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment about what this test is doing? It's not immediately obvious that CSVRowsReader is validating it against the table's column specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alice. I think ideally we'd split the data fixes out from the tests in separate commits as they're distinct changes.
There's a wider question here as to exactly what tables we ought to be including in the example data. We don't necessarily have to address that immediately, but we're going to need to resolve it at some point so it might make sense to think about this now.
tests/unit/test_example_data.py
Outdated
table_nodes = get_table_nodes(ql_table._qm_node) | ||
[table] = table_nodes # There should only be one table | ||
column_specs = get_column_specs_from_schema(table.schema) | ||
|
||
CSVRowsReader( | ||
Path(f"ehrql/example-data/{filename}"), | ||
column_specs=column_specs, | ||
allow_missing_columns=True, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test ends up duplicating some of the logic from LocalFileQueryEngine
where it might be better to use that logic directly. It would also avoid the awkwardness of having to manually specific the filenames.
You could do this with something like:
LocalFileQueryEngine("path/to/example-data").populate_database([ql_table._qm_node])
Which will throw an error if there's anything wrong with the data for that tabe.
tests/unit/test_example_data.py
Outdated
column_specs = get_column_specs_from_schema(table.schema) | ||
|
||
CSVRowsReader( | ||
Path(f"ehrql/example-data/{filename}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to get the path from the module, rather than make assumptions about the current directory. So something like:
Path(ehrql.__file__).parent / "example-data"
tests/unit/test_example_data.py
Outdated
tpp.addresses, | ||
tpp.clinical_events, | ||
tpp.medications, | ||
core.ons_deaths, | ||
core.patients, | ||
tpp.practice_registrations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious where this particular list of tables comes from. That's not a reflection on your code! I just don't think we've been systematic in deciding what tables we're providing example data for. A reasonable approach would be: everything in core
and every table used in the tutorial.
But whatever we decide, we should be dynamically constructing the list of tables here otherwise someone could add a new core table, or add a table to the tutorial, and nothing would tell them that they had failed to add it to the example data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion in call with Dave: the core tables are now read from core.__all__
while the tpp tables are in a hard coded list.
If a core table is added without adding example data, the test will throw a FileValidationError
.
Without updating the hard-coded TPP_TABLES
list, no errors will be thrown if the tutorial uses a tpp-only table and there is no corresponding example data. This is already the case for tpp.apcs
.
Track in new issue #2146 .
ded72ce
to
c68f7df
Compare
Deploying databuilder-docs with Cloudflare Pages
|
c68f7df
to
0676366
Compare
Fixes #2106.
practice_registrations
data leads to a ValueError being raised when used