pipeline for Quidel flu test #181

jingjtang · 2020-08-05T13:39:21Z

Some decisions to make:

When to start report this signal? (the data volume each day is still very low ~1k - ~1.5k every day including backfilled records)
export end date: the last day D to be reported today (due to backfill problem)
export start date: how many files we want to upload every day (due to backfill problem)
how to solve the problem about one device showing in difference places at the same time (as mentioned here)
how to do correlation analysis for flu test data?

A mapping problem at 5-digit zip code level:
This problem is not severe in COVID test. There is only <10 zip codes that are not included in 02_20_uszips.csv and a very small proportion of data is related to those wired zip codes.
However in Flu test, there are ~90 such zip codes. Hard to manually check each one and fill in their mapping and population information. May need to update our mapping file?

These zip codes listed here:
{603, 622, 627, 674, 676, 683, 717, 726, 728, 732, 733, 736, 738, 754, 780, 792, 795, 907, 912, 919, 953, 957, 959, 2572, 2781, 15705, 20174, 27412, 27460, 28793, 28823, 29019, 29484, 29486, 29871, 30597, 30997, 32163, 32214, 32306, 32313,
32611, 32761, 33551, 33574, 33652, 35642, 37232, 47782, 48483, 48670, 48824, 48902, 50410, 60944, 68179, 72053,
75033, 75072, 75222, 75322, 75429, 75546, 75606, 76094, 76803, 76909, 76992, 76993, 77370, 77399, 78086, 78776,
79430, 80630, 84129, 85378, 86123, 86746, 89557, 91315, 92094, 92152, 92521, 92697, 93077,
95929, 99094, 99623}

Only 133,000 tests out of 7,519,726 are related to those zip codes until 2020-0803

(Remember to remove wip_ and change the pull_start_date to be earlier than 2020-05-08, it will take about half an hour to read all of the historical data)

jingjtang · 2020-08-05T21:39:12Z

After switching to James's new mapping file, 31 zip codes have no mapping information still:
{2572, 2781, 20174, 27460, 28823, 29019, 29871, 30997, 32761, 33551, 33652, 35642, 47782, 48483, 48902, 50410, 75322, 75429, 75546, 76992, 76993, 77370, 78086, 78776, 80630, 86123, 86746, 91315, 92094, 93077, 99094}

Only 7,583 tests out of 7,519,726 are related to those zip codes until 2020-0803

jingjtang · 2020-08-06T16:45:42Z

After switching to James's new mapping file, 31 zip codes have no mapping information still:
{2572, 2781, 20174, 27460, 28823, 29019, 29871, 30997, 32761, 33551, 33652, 35642, 47782, 48483, 48902, 50410, 75322, 75429, 75546, 76992, 76993, 77370, 78086, 78776, 80630, 86123, 86746, 91315, 92094, 93077, 99094}

Only 7,583 tests out of 7,519,726 are related to those zip codes until 2020-0803

@jsharpna helped check those zip codes. They are not valid zip codes according to https://tools.usps.com/zip-code-lookup.htm?citybyzipcode. Will ask Quidel about them.

krivard · 2020-08-06T18:11:06Z

Will email Quidel with all problems: bad zips, non-unique regions per device.

Fixing some of these requires merging or otherwise depending on #137, but that package doesn't include the home-state mappings for HRRs and MSAs that are used to fill in for insufficient sample size.

Hold off on finishing this until we can get the home-state mappings into the geo package.

quidel_flutest/DETAILS.md

chinandrew · 2020-08-15T08:23:46Z

quidel_flutest/delphi_quidel_flutest/pull.py

+    overall_total.drop(labels="FluA", axis="columns", inplace=True)
+
+    # Compute numUniqueDevices
+    numUniqueDevices = df.groupby(


snake case var names

possibly auto-fixable by linter

chinandrew · 2020-08-15T08:24:12Z

quidel_flutest/delphi_quidel_flutest/data_tools.py

+
+
+def raw_tests_per_device(devices, tests, min_obs):
+    '''


double quotes

chinandrew · 2020-08-15T08:28:56Z

quidel_flutest/delphi_quidel_flutest/export.py

@@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+"""Function to export the dataset in the format expected of the API.


super nitpick but standardizing docstrings/general linting if going one step further can be nice for organization and readability.

I've mainly used flake8 but looks like pylint is common on this repo. I imagine they're comparable.

run through black, probably

chinandrew · 2020-08-15T08:32:32Z

quidel_flutest/delphi_quidel_flutest/pull.py

+            zipcode = int(float(zipcode))
+            zipcode5.append(zipcode)
+    df['zip'] = zipcode5
+    # print('Fixing %.2f %% of the data' % (fixnum * 100 / len(zipcode5)))


is this debugging? do the fixnum lines need to exist still?

This is used for checking only. Temporarily I still want it to be there, since Quidel might change their raw data.

chinandrew · 2020-08-15T08:38:15Z

quidel_flutest/delphi_quidel_flutest/pull.py

+    zipcode5 = []
+    fixnum = 0
+    for zipcode in df['ZipCode'].values:
+        if isinstance(zipcode, str) and '-' in zipcode:


do mixed types get read into the DF which is why this if/else exists? if so, is it worth reading everything in as str? if not, and the else isn't for nans, I'm unsure why the isinstance exists .

Also I think there might be a way to do this quicker with zfill like str(zipcode).split("-")[0].zfill(5), though not sure without knowing exactly what raw input looks like

Yes. int and strings at length of 5 ("XXXXX-XXXX") both exist for "ZipCode" in raw data from Quidel. The reason that I don't read it in str is because we won't report the data in zip code level. Zip Codes are only used for geo mapping. It is easier that we read it as int and then merge the data with map_df which also has zip codes as type int.

chinandrew · 2020-08-15T08:42:55Z

quidel_flutest/delphi_quidel_flutest/data_tools.py

+    else:
+        pooled_positives = tpooled_positives
+        pooled_tests = tpooled_tests
+    ## STEP 2: CALCULATE AS THOUGH THEY'RE RAW


I assume this is STEP 2 since the geo pooling had a STEP 1 in it, but it's a bit confusing since then STEP 1 is somewhere else.

Co-authored-by: chinandrew <[email protected]>

chinandrew · 2020-08-17T03:08:35Z

quidel_flutest/delphi_quidel_flutest/pull.py

+            zipcode5.append(int(zipcode.split('-')[0]))
+            fixnum += 1
+        else:
+            zipcode = int(float(zipcode))


Suggested change

zipcode = int(float(zipcode))

zipcode = int(zipcode)

pretty sure this works

krivard · 2020-08-21T21:40:49Z

@amartyabasu, waiting on your review

amartyabasu · 2020-08-24T17:03:59Z

@amartyabasu, waiting on your review

I'll have it completed today.

amartyabasu · 2020-08-25T17:11:33Z

quidel_flutest/delphi_quidel_flutest/run.py

+EXPORT_DAY_RANGE = 40 # Number of dates to report
+
+GEO_RESOLUTIONS = [
+    # "county",


Is the county based aggregation not done because of small sample size?

Yes. There are few counties available with sample sizes larger than 50.

amartyabasu · 2020-08-25T17:19:21Z

quidel_flutest/params.json.template

+  "account": "[email protected]",
+  "password": "",
+  "sender": "",
+  "mode":"",


"mode":"" Extra comma in the end.

I ran the pipeline with pull_start_date: "2020-07-01" and export_start_date: "2020-06-01". The daily csvs got generated from 20200711 onwards. Does that mean there was no data from 2020-07-01 to 2020-07-10?

According to the implementation would the export_start_date always precede pull_start_date to account for the backfills?

The 'flu_ag_smoothed_tests_per_device' signal does not report standard errors.

Remember we only report a geo_id with sample_size larger than 50. There will be data from 2020-07-01 to 2020-07-10, but they might not have a single geo_id with sample sizes larger than 50.

Yes. export_start_date should always precede pull_start_date

Yes. Not sure the definition of se for that signal.

quidel_flutest/delphi_quidel_flutest/pull.py

krivard · 2020-08-25T18:46:48Z

TODO: Verify tests pass and linter has no substantive complaints

amartyabasu · 2020-08-25T19:25:28Z

TODO: Verify tests pass and linter has no substantive complaints

Linter test:

Score 8.75/10. No mandatory messages that need to be addressed.
Majority of the messages are with respect to single letter and double letter variable names that don't abide by the snake case. But these are variables like 'df', 'se' etc which are used in other codebases as well.
There are messages of 'too many local variables' in all files except geo_maps.py

Pytest:

89% coverage. Covers all important part of the code.
I observed only functions for reading historical data and emails not covered.

amartyabasu · 2020-08-25T19:37:29Z

quidel_flutest/delphi_quidel_flutest/generate_sensor.py

+            res_group = res_group.merge(parent_group, how="left",
+                                        on="timestamp", suffixes=('', '_parent'))
+            res_group = res_group.drop(columns=[res_key, "state_id", "state_id" + '_parent'])
+        except:


In my opinion a simpler if/else block would work better in place of the 'try/catch' block when parent_group does not exist.

…in params.json.template

jingjtang · 2020-08-26T14:48:15Z

TODO: Verify tests pass and linter has no substantive complaints

Linter test:

Score 8.75/10. No mandatory messages that need to be addressed.

Majority of the messages are with respect to single letter and double letter variable names that don't abide by the snake case. But these are variables like 'df', 'se' etc which are used in other codebases as well.

There are messages of 'too many local variables' in all files except geo_maps.py

Pytest:

89% coverage. Covers all important part of the code.

I observed only functions for reading historical data and emails not covered.

How did you conduct this linter test where you got those info?

amartyabasu · 2020-08-26T17:44:31Z

How did you conduct this linter test where you got those info?

I simply ran pylint over delphi_quidel_flutest module.

jingjtang · 2020-08-26T17:55:52Z

I simply ran pylint over delphi_quidel_flutest module.

Weird, I didn't see those results. Could you try git pull and run it again?

amartyabasu · 2020-08-28T16:03:57Z

Weird, I didn't see those results. Could you try git pull and run it again?

I ran the project again pulling the latest changes and I got the same set of messages. Attaching a screenshot of my output. I don't think these are of any concern as such. :-)

jingjtang · 2020-08-28T23:09:41Z

I got 10/10 on my computer with:
pylint 2.6.0
astroid 2.4.2
Python 3.7.4
Don't understand why there is such a big difference in the linter test result. Leave the problem here temporarily.

nmdefries

As we're still receiving the source data for this, we are interested in starting to report it! (Although it will be internal-only, like the Quidel covid data.)

A fair amount of the logic in here has since been moved to delphi_utils (export_csv and geo_map). Other stylistic choices are out of date. It's unclear right now how much of the logic is the same between this and the quidel covid indicator. If they are similar, we could just copy the covid code over and modify names/connection info, rather than updating all of this code.

quidel_covidtest has age breakdowns of signals. Are those available for flu tests too? If those are easy to add, we should add them.

nmdefries · 2024-12-14T02:40:29Z

quidel_flutest/delphi_quidel_flutest/generate_sensor.py

+MIN_OBS = 50  # minimum number of observations in order to compute a proportion.
+POOL_DAYS = 7


should be in constants

nmdefries · 2024-12-14T02:40:46Z

quidel_flutest/delphi_quidel_flutest/export.py

@@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+"""Function to export the dataset in the format expected of the API.


run through black, probably

nmdefries · 2024-12-14T02:41:58Z

quidel_flutest/cache/.gitignore

@@ -0,0 +1 @@
+*.csv


copy gitignore from quidel_covidtest. this is probably missing files/dirs

nmdefries · 2024-12-14T02:42:19Z

quidel_flutest/README.md

+python -m venv env
+source env/bin/activate
+pip install ../_delphi_utils_python/.
+pip install .


update with make commands

nmdefries · 2024-12-14T02:42:37Z

quidel_flutest/delphi_quidel_flutest/__init__.py

+from . import geo_maps
+from . import data_tools
+from . import generate_sensor
+from . import export
+from . import pull
+from . import run


not current style. also some of the functionality here has been moved to delphi_utils

nmdefries · 2024-12-14T02:51:22Z

quidel_flutest/static/fips_population.csv

remove. now in geomapper

nmdefries · 2024-12-14T02:51:31Z

quidel_flutest/static/fips_prop_pop.csv

remove. now in geomapper

nmdefries · 2024-12-14T02:52:17Z

quidel_flutest/tests/test_data/test_data.xlsx

we need to test xlsx files? Do we get some input data in that format?

nmdefries · 2024-12-14T02:52:57Z

quidel_flutest/.pylintrc

does this file appear in other (newer) indicators?

nmdefries · 2024-12-14T03:34:23Z

quidel_flutest/DETAILS.md

+```
+p = 100 * X / N 
+```
+If N < 50, we lend 50 - N  fake samples from its home state to shrink the estimate to the state's mean, which means:


IIRC, we only do this in other indicators if N <50 AND > 30. Check if this also applies here

Jingjing Tang and others added 7 commits August 3, 2020 14:26

add code for quidel flutest pipeline

d0a82b8

fixed errors in code and unittests

73e7413

update unittests and fix errors in the code

9103535

update unittests

8e257b8

update code for ingesting historical data from midas

60c8999

Update README.md

d3aa36f

Update DETAILS.md

29718a6

krivard requested a review from amartyabasu August 5, 2020 18:40

Jingjing Tang added 2 commits August 13, 2020 15:41

refactor the code for checking criteria

02a9a3a

update unit test

31899e8

chinandrew reviewed Aug 15, 2020

View reviewed changes

quidel_flutest/DETAILS.md Outdated Show resolved Hide resolved

chinandrew reviewed Aug 15, 2020

View reviewed changes

quidel_flutest/delphi_quidel_flutest/data_tools.py

def raw_tests_per_device(devices, tests, min_obs):

'''

Copy link

Contributor

chinandrew Aug 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double quotes

chinandrew reviewed Aug 15, 2020

View reviewed changes

Update quidel_flutest/DETAILS.md

93ac187

Co-authored-by: chinandrew <[email protected]>

chinandrew reviewed Aug 17, 2020

View reviewed changes

add dry-run mode

2125139

Refactored code

d9931ab

add test_data in the format of raw data received

1b9c652

amartyabasu reviewed Aug 25, 2020

View reviewed changes

quidel_flutest/delphi_quidel_flutest/pull.py Outdated Show resolved Hide resolved

amartyabasu reviewed Aug 25, 2020

View reviewed changes

quidel_flutest/delphi_quidel_flutest/pull.py Outdated Show resolved Hide resolved

amartyabasu reviewed Aug 25, 2020

View reviewed changes

Add explainations to test_date and storage_date; deleted extra comma …

4a20c67

…in params.json.template

Fixed the error in the documentation of se

5d49b9f

chinandrew mentioned this pull request Jan 4, 2021

Refactor indicators to use delphi_utils library calls #306

Closed

nmdefries added 3 commits December 13, 2024 20:39

Merge branch 'main' into quidel_flutest

2b31745

add support files; add to CI

9a92241

formatting DETAILS

ae11902

nmdefries reviewed Dec 14, 2024

View reviewed changes

list nonstandard zips in DETAILS

61b3b0a

nmdefries self-assigned this Apr 21, 2025

		@@ -0,0 +1,39 @@
		# -- coding: utf-8 --
		"""Function to export the dataset in the format expected of the API.

		MIN_OBS = 50 # minimum number of observations in order to compute a proportion.
		POOL_DAYS = 7

		@@ -0,0 +1 @@
		*.csv

pipeline for Quidel flu test #181

Are you sure you want to change the base?

pipeline for Quidel flu test #181

Uh oh!

Conversation

jingjtang commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingjtang commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jingjtang commented Aug 6, 2020

Uh oh!

krivard commented Aug 6, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jingjtang Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jingjtang Aug 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krivard commented Aug 21, 2020

Uh oh!

amartyabasu commented Aug 24, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amartyabasu Aug 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

krivard commented Aug 25, 2020

Uh oh!

amartyabasu commented Aug 25, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jingjtang commented Aug 26, 2020

Uh oh!

amartyabasu commented Aug 26, 2020

Uh oh!

jingjtang commented Aug 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amartyabasu commented Aug 28, 2020

Uh oh!

jingjtang commented Aug 28, 2020

Uh oh!

nmdefries left a comment

jingjtang commented Aug 5, 2020 •

edited

Loading

jingjtang commented Aug 5, 2020 •

edited

Loading

jingjtang Aug 16, 2020 •

edited

Loading

jingjtang Aug 16, 2020 •

edited

Loading

amartyabasu Aug 25, 2020 •

edited

Loading

jingjtang commented Aug 26, 2020 •

edited

Loading