io/read_metadata: set `low_memory=False` · nextstrain/augur@73508e8

Commit

io/read_metadata: set low_memory=False

This suppresses the `DtypeWarnings` messages from pandas when it infers
different dtypes for a column in the metadata. We do not need pandas to
internally parse files in chunks since we already surface the `chunksize`
parameter to control memory usage. This change was motivated by internal
discussion on Slack about how these warning messages overwhelm the logs
of the ncov builds and make debugging a pain.¹

I have seen surprising memory usage in the past with `low_memory=False`
within ncov-ingest². However that was due to the unexpected interaction
with the `usecols` parameter, where the entire file was read before
being subset to the columns provided.

In the future, we may want to explicitly set the dtype to `string` for
all columns in the metadata as suggested by @tsibley in a separate PR.³
However, that will require wider changes throughout Augur where uses of
the metadata may be expecting the inferred dtypes (such as in
augur export⁴).

¹ https://bedfordlab.slack.com/archives/C0K3GS3J8/p1686671582331959?thread_ts=1685568402.393599&cid=C0K3GS3J8
² nextstrain/ncov-ingest@7bde90a
³ #1235 (comment)
⁴ https://github.com/nextstrain/augur/blob/b61e3e7e969ff1b82fce5f2e2f388a10e6f3c305/augur/export_v2.py#L239-L245

Loading branch information

joverlee521 committed Jun 13, 2023

1 parent 7139595 commit 73508e8

augur/io/metadata.py

-Original file line number
+Diff line change
@@ Expand Up @@
             "engine": "c",
             "skipinitialspace": True,
             "na_filter": False,
+            "low_memory": False,
         }
         if chunk_size:
@@ Expand Down @@

0 comments on commit `73508e8`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `73508e8`

Commit

There are no files selected for viewing

0 comments on commit 73508e8

0 comments on commit `73508e8`