Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent capitalisation of "the" causing problems with example data #2106

Closed
inglesp opened this issue Sep 13, 2024 · 2 comments · Fixed by #2143
Closed

Inconsistent capitalisation of "the" causing problems with example data #2106

inglesp opened this issue Sep 13, 2024 · 2 comments · Fixed by #2143
Assignees

Comments

@inglesp
Copy link
Contributor

inglesp commented Sep 13, 2024

The example data reference "Yorkshire and the Humber", while our table validation checks for "Yorkshire and The Humber".

This causes problems in the sandbox:

(.venv) inglesp@malbogies:~/work/ebmdatalab/ehrql$ python -m ehrql sandbox ehrql/example-data
Python 3.11.10 (main, Sep  7 2024, 18:35:41) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from ehrql.tables.tpp import practice_registrations
>>> practice_registrations
Traceback (most recent call last):
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 145, in parser
    return convertor(value)
           ^^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 168, in wrapper
    raise ValueError(f"{value!r} not in valid categories: {category_str}")
ValueError: 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 83, in __iter__
    yield row_parser(row)
          ^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 110, in row_parser
    return tuple(parser(row) for parser in parsers)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 110, in <genexpr>
    return tuple(parser(row) for parser in parsers)
                 ^^^^^^^^^^^
  File "/home/inglesp/work/ebmdatalab/ehrql/ehrql/file_formats/csv.py", line 147, in parser
    raise ValueError(f"column {name!r}: {e}")
ValueError: column 'practice_nuts1_region_name': 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
ehrql.file_formats.base.FileValidationError: row 8: column 'practice_nuts1_region_name': 'Yorkshire and the Humber' not in valid categories: 'North East', 'North West', 'Yorkshire and The Humber', 'East Midlands', 'West Midlands', 'East', 'London', 'South East', 'South West'
@inglesp
Copy link
Contributor Author

inglesp commented Sep 13, 2024

We should:

  • Double check that "The" appears in the data
  • Fix the example data (or the validation if that's incorrect)
  • Add a test that all example data can be loaded

@evansd
Copy link
Contributor

evansd commented Sep 13, 2024

This feels fairly high priority, given that we encourage use of the sandbox for new learners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants