Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
io/read_metadata: set
low_memory=False
This suppresses the `DtypeWarnings` messages from pandas when it infers different dtypes for a column in the metadata. We do not need pandas to internally parse files in chunks since we already surface the `chunksize` parameter to control memory usage. This change was motivated by internal discussion on Slack about how these warning messages overwhelm the logs of the ncov builds and make debugging a pain.¹ I have seen surprising memory usage in the past with `low_memory=False` within ncov-ingest². However that was due to the unexpected interaction with the `usecols` parameter, where the entire file was read before being subset to the columns provided. In the future, we may want to explicitly set the dtype to `string` for all columns in the metadata as suggested by @tsibley in a separate PR.³ However, that will require wider changes throughout Augur where uses of the metadata may be expecting the inferred dtypes (such as in augur export⁴). ¹ https://bedfordlab.slack.com/archives/C0K3GS3J8/p1686671582331959?thread_ts=1685568402.393599&cid=C0K3GS3J8 ² nextstrain/ncov-ingest@7bde90a ³ #1235 (comment) ⁴ https://github.com/nextstrain/augur/blob/b61e3e7e969ff1b82fce5f2e2f388a10e6f3c305/augur/export_v2.py#L239-L245
- Loading branch information