Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit warning / error if patients.which_exist_in_file tries to read in an empty dataset #867

Open
wjchulme opened this issue Sep 10, 2022 · 3 comments
Labels
enhancement New feature or request not urgent not a priority

Comments

@wjchulme
Copy link
Contributor

I tried to run a study definition but hit this error:

feather does not support serializing <class 'pandas.core.indexes.base.Index'> for the index; you can .reset_index() to make the index into column(s)`

This doesn't happen when running locally, only on the server.

I assume this is something that needs to be fixed in the cohort-extractor code rather than something odd going on with my code, but I don't really know how to probe further.

@wjchulme wjchulme added bug Something isn't working blocked Needs attention ASAP labels Sep 10, 2022
@inglesp
Copy link
Contributor

inglesp commented Sep 10, 2022

This is consistent with an attempt to serialize an empty dataframe, and I can reproduce it by removing all non-header rows from this test data.

Are you sure that your study definition is returning a population?

@evansd
Copy link
Contributor

evansd commented Sep 12, 2022

@inglesp's diagnosis is correct:
image

The population is empty because output/match/cumulative_matchedcontrols1.csv.gz is empty.

I haven't debugged further than that.

@wjchulme
Copy link
Contributor Author

Thanks, both. If i'd known that error was hit when saving an empty dataset I wouldn't have raised this issue. I'll spare you the details but I've found the problem -- a bug arising from trying to retain within-patient consistency in dummy data across multiple extracts.

Might be worth an explicit error if patients.which_exist_in_file is reading in an empty dataset?

@wjchulme wjchulme changed the title study definition to feather error: "does not support serializing.." Explicit warning / error if patients.which_exist_in_file tries to read in an empty dataset Sep 12, 2022
@wjchulme wjchulme added enhancement New feature or request not urgent not a priority and removed bug Something isn't working blocked Needs attention ASAP labels Sep 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request not urgent not a priority
Projects
None yet
Development

No branches or pull requests

3 participants