-
Notifications
You must be signed in to change notification settings - Fork 16
pipeline for Quidel flu test #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d0a82b8
73e7413
9103535
8e257b8
60c8999
d3aa36f
29718a6
02a9a3a
31899e8
93ac187
2125139
d9931ab
1b9c652
4a20c67
5d49b9f
2b31745
9a92241
ae11902
61b3b0a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[DESIGN] | ||
|
||
min-public-methods=1 | ||
|
||
|
||
[MESSAGES CONTROL] | ||
|
||
disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Quidel Flu Test | ||
|
||
### Background | ||
|
||
Starting from 2014-08, we began getting flu test data from Quidel. The data contains a number of features for every test, including localization at 5-digit Zip Code level, a TestDate and StorageDate, patient age, and several identifiers that uniquely identify the device on which the test was performed (SofiaSerNum, the individual test (FluTestNum), and the result (ResultID). Multiple tests are stored on each device and we suspect that the device identifiers could potentially be useful to normalize the results over time. | ||
|
||
### Signal names | ||
|
||
- flu_ag_raw_pct_positive: percent of tests returning positive that day | ||
- flu_ag_smoothed_pct_positive : same as above, but for the moving average of the most recent 7 days | ||
- flu_ag_raw_tests_per_device: average number of tests per active testing device (= device that performed at least one test this period) that day | ||
- flu_ag_smoothed_tests_per_device: same as above, but for the moving average of the most recent 7 days | ||
|
||
### Estimating percent negative test proportion | ||
|
||
Let n be the number of total Flu tests taken over a given time period and a given location. The test result can be negative/positive/invalid. Let x be the number of tests taken with negative results in this location over the given time period. We are interested in estimating the percentage of negative tests which is defined as: | ||
|
||
``` | ||
p = 100 * x / n | ||
``` | ||
|
||
We estimate p across 3 temporal-spatial aggregation schemes: | ||
|
||
- daily, at the MSA (metropolitan statistical area) level; | ||
- daily, at the HRR (hospital referral region) level; | ||
- daily, at the state level. | ||
|
||
We are able to make these aggregations accurately because each test is reported with its 5-digit ZIP code. We do not report estimates for individual counties, as typically each county has too few tests to make the estimated value statistically meaningful. | ||
|
||
**MSA and HRR levels**: In a given MSA or HRR, suppose N flu tests are taken in a certain time period, X is the number of tests taken with positive results. If N >= 50, we simply use: | ||
|
||
``` | ||
p = 100 * X / N | ||
``` | ||
|
||
If N < 50, we lend 50 - N fake samples from its home state to shrink the estimate to the state's mean, which means: | ||
|
||
``` | ||
p = 100 * [ N /50 * X/N + (50 - N)/50 * Xs / Ns ] | ||
``` | ||
|
||
where Ns, Xs are the number of flu tests and the number of flu tests taken with positive results taken in its home state in the same time period. | ||
|
||
**State level**: the states with sample sizes smaller than a certain threshold are discarded. (The threshold is set to be 50 temporarily). For the rest of the states with big enough sample sizes, | ||
|
||
``` | ||
p = 100 * X / N | ||
``` | ||
|
||
The estimated standard error is simply: | ||
|
||
``` | ||
se = 100 * sqrt{ p/100 * (1-p/100)/N } | ||
``` | ||
|
||
where we assume for each time point, the estimates follow a binomial distribution. | ||
|
||
### Estimating adjusted total test numbers | ||
|
||
Similar to above, let N be the number of total flu tests taken over a given time period and a given location. Let D be the number of unique devices used for the flu tests in this location over the given time period. We are interested in estimating the adjusted total test numbers which is defined as: | ||
|
||
``` | ||
q = N / D | ||
``` | ||
|
||
We will estimate q across the same 3 temporal-spatial aggregation schemes as before. | ||
|
||
**MSA and HRR levels**: In a given MSA or HRR, suppose N flu tests are taken in a certain time period, D is the number of unique devices used. If N >= 50, we simply use: | ||
|
||
``` | ||
q = N / D | ||
``` | ||
|
||
If N < 50, we lend 50 - N fake samples from its home state to shrink the estimate to the state's mean, which means: | ||
|
||
``` | ||
q = N/50 * N/D + (50 - N)/50 * Ns/Ds | ||
``` | ||
|
||
where Ns, Ds are the number of total flu tests taken and the number of unique devices used respectively in its home state in the same time period. | ||
|
||
**State level**: If a state has fewer than 50 tests in the certain time period, no estimate is reported. For the rest of the states with big enough sample sizes, | ||
|
||
``` | ||
q = N / D | ||
``` | ||
|
||
### Temporal and Spatial Pooling | ||
|
||
We conduct temporal and spatial pooling for the smoothed signal. The spatial pooling is described in the previous section where we shrink the estimates to the state's mean if the total test number is smaller than 50 for a certain location on a certain day. Additionally, as with the Quidel Flu Test signal, we consider smoothed estimates formed by pooling data over time. That is, daily, for each location, we first pool all data available in that location over the last 7 days, and we then recompute everything described in the last two subsections. Pooling in this data makes estimates available in more geographic areas. | ||
|
||
### Exceptions | ||
|
||
There are 89 special zip codes that are included in Quidel Flu raw data but are not included in our reports temporarily since we do not have enough mapping information for them to assign them as belonging to particular parent geographic regions. | ||
|
||
For data through 08-05-2020, 133,000 tests out of 7,519,726 tests were reported for those zip codes. | ||
|
||
The zip codes are: 603, 622, 627, 674, 676, 683, 717, 726, 728, 732, 733, 736, 738, 754, 780, 792, 795, 907, 912, 919, 953, 957, 959, 2572, 2781, 15705, 20174, 27412, 27460, 28793, 28823, 29019, 29484, 29486, 29871, 30597, 30997, 32163, 32214, 32306, 32313, | ||
32611, 32761, 33551, 33574, 33652, 35642, 37232, 47782, 48483, 48670, 48824, 48902, 50410, 60944, 68179, 72053, | ||
75033, 75072, 75222, 75322, 75429, 75546, 75606, 76094, 76803, 76909, 76992, 76993, 77370, 77399, 78086, 78776, | ||
79430, 80630, 84129, 85378, 86123, 86746, 89557, 91315, 92094, 92152, 92521, 92697, 93077, | ||
95929, 99094, 99623 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
.PHONY = venv, lint, test, clean | ||
|
||
dir = $(shell find ./delphi_* -name __init__.py | grep -o 'delphi_[_[:alnum:]]*' | head -1) | ||
venv: | ||
python3.8 -m venv env | ||
|
||
install: venv | ||
. env/bin/activate; \ | ||
pip install wheel ; \ | ||
pip install -e ../_delphi_utils_python ;\ | ||
pip install -e . | ||
|
||
install-ci: venv | ||
. env/bin/activate; \ | ||
pip install wheel ; \ | ||
pip install ../_delphi_utils_python ;\ | ||
pip install . | ||
|
||
lint: | ||
. env/bin/activate; pylint $(dir) --rcfile=../pyproject.toml | ||
. env/bin/activate; pydocstyle $(dir) | ||
|
||
format: | ||
. env/bin/activate; darker $(dir) | ||
|
||
test: | ||
. env/bin/activate ;\ | ||
(cd tests && ../env/bin/pytest --cov=$(dir) --cov-report=term-missing) | ||
|
||
clean: | ||
rm -rf env | ||
rm -f params.json |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Quidel Flu Test Indicators | ||
|
||
## Running the Indicator | ||
|
||
The indicator is run by directly executing the Python module contained in this | ||
directory. The safest way to do this is to create a virtual environment, | ||
installed the common DELPHI tools, and then install the module and its | ||
dependencies. To do this, run the following code from this directory: | ||
|
||
``` | ||
python -m venv env | ||
source env/bin/activate | ||
pip install ../_delphi_utils_python/. | ||
pip install . | ||
Comment on lines
+11
to
+14
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. update with |
||
``` | ||
|
||
All of the user-changable parameters are stored in `params.json`. A template is | ||
included as `params.json.template`. At a minimum, you will need to include a | ||
password for the datadrop email account and the email address of the data sender. | ||
Note that setting `export_end_date` to an empty string will export data through | ||
today (GMT) minus **to be determined** days. Setting `pull_end_date` to an empty string will pull data | ||
through today (GMT). | ||
|
||
To execute the module and produce the output datasets (by default, in | ||
`receiving`), run the following: | ||
|
||
``` | ||
env/bin/python -m delphi_quidel_flutest | ||
``` | ||
|
||
Once you are finished with the code, you can deactivate the virtual environment | ||
and (optionally) remove the environment itself. | ||
|
||
``` | ||
deactivate | ||
rm -r env | ||
``` | ||
|
||
## Testing the code | ||
|
||
To do a static test of the code style, it is recommended to run **pylint** on | ||
the module. To do this, run the following from the main module directory: | ||
|
||
``` | ||
env/bin/pylint delphi_quidel_flutest | ||
``` | ||
|
||
The most aggressive checks are turned off; only relatively important issues | ||
should be raised and they should be manually checked (or better, fixed). | ||
|
||
Unit tests are also included in the module. To execute these, run the following | ||
command from this directory: | ||
|
||
``` | ||
(cd tests && ../env/bin/pytest --cov=delphi_quidel_flutest --cov-report=term-missing) | ||
``` | ||
|
||
The output will show the number of unit tests that passed and failed, along | ||
with the percentage of code covered by the tests. None of the tests should | ||
fail and the code lines that are not covered by unit tests should be small and | ||
should not include critical sub-routines. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## Code Review (Python) | ||
|
||
A code review of this module should include a careful look at the code and the | ||
output. To assist in the process, but certainly not in replace of it, please | ||
check the following items. | ||
|
||
**Documentation** | ||
|
||
- [ ] the README.md file template is filled out and currently accurate; it is | ||
possible to load and test the code using only the instructions given | ||
- [ ] minimal docstrings (one line describing what the function does) are | ||
included for all functions; full docstrings describing the inputs and expected | ||
outputs should be given for non-trivial functions | ||
|
||
**Structure** | ||
|
||
- [ ] code should use 4 spaces for indentation; other style decisions are | ||
flexible, but be consistent within a module | ||
- [ ] any required metadata files are checked into the repository and placed | ||
within the directory `static` | ||
- [ ] any intermediate files that are created and stored by the module should | ||
be placed in the directory `cache` | ||
- [ ] final expected output files to be uploaded to the API are placed in the | ||
`receiving` directory; output files should not be committed to the respository | ||
- [ ] all options and API keys are passed through the file `params.json` | ||
- [ ] template parameter file (`params.json.template`) is checked into the | ||
code; no personal (i.e., usernames) or private (i.e., API keys) information is | ||
included in this template file | ||
|
||
**Testing** | ||
|
||
- [ ] module can be installed in a new virtual environment | ||
- [ ] pylint with the default `.pylint` settings run over the module produces | ||
minimal warnings; warnings that do exist have been confirmed as false positives | ||
- [ ] reasonably high level of unit test coverage covering all of the main logic | ||
of the code (e.g., missing coverage for raised errors that do not currently seem | ||
possible to reach are okay; missing coverage for options that will be needed are | ||
not) | ||
- [ ] all unit tests run without errors |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.csv | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. copy gitignore from quidel_covidtest. this is probably missing files/dirs |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Module to pull and clean indicators from the Quidel COVID Test. | ||
|
||
This file defines the functions that are made public by the module. As the | ||
module is intended to be executed though the main method, these are primarily | ||
for testing. | ||
""" | ||
|
||
from __future__ import absolute_import | ||
|
||
from . import geo_maps | ||
from . import data_tools | ||
from . import generate_sensor | ||
from . import export | ||
from . import pull | ||
from . import run | ||
Comment on lines
+11
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not current style. also some of the functionality here has been moved to delphi_utils |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,11 @@ | ||||||
# -*- coding: utf-8 -*- | ||||||
"""Call the function run_module when executed. | ||||||
|
||||||
This file indicates that calling the module (`python -m MODULE_NAME`) will | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same issue in a couple other places |
||||||
call the function `run_module` found within the run.py file. There should be | ||||||
no need to change this template. | ||||||
""" | ||||||
|
||||||
from .run import run_module # pragma: no cover | ||||||
|
||||||
run_module() # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we only do this in other indicators if N <50 AND > 30. Check if this also applies here