USP Drug Classification data dictionary + tidying #33

cduvallet · 2017-02-07T02:03:31Z

Continuing on issue #14, finalize the USP Drug Classification data dictionary, etc. Taw and tidy data are on data.world.

This data may or may not be useful - it has non-Medicare Part D medications and their respective classes/categories. The classes and categories are pretty self-explanatory (e.g. Antidepressants, Antiparkinson Agents, Sleep Disorder Agents) and can likely easily be tied to usage (depending on how we decide to define usage...).

Some follow up tasks, if we decide to use this data:

Figure out if it has Medicare Part D drugs, or only outpatient/non-Medicare drugs
See if we have spending data on these drugs, either in the Medicare data or elsewhere

* Pulling drug use classes out of the CMS PUF files for categorizing the Plan D data * Fixed directory creating, refactored df, shortened loop

* Fix gitignore to ignore XLSX/ZIP and move CMS data to its own dir * Add drug names back into annual spending data, for ease of use * Forgot to add notebook in last commit * Fix exploration notebook after earlier changes * Add .DS_Store files to .gitignore * Remove Medicare drug spending dataset (migrated to data.world) * Remove data in favor of using data.world - (External) Move all data files to data.world repository (https://data.world/data4democracy/drug-spending) - Remove data/ directory - Correct notebook code to work with data.world as a source * Wrote a helper function that gets data from a URL, wrote function that downloads Part D data based on notebook * Added docstring to function * Added more functions that load data wrangle it * Added argument parser and squashed some bugs * Removed dependence on openpyxl, since Pandas does the trick * Notebook runs * Minor change to command line arg and addition to help string * Move comments into Markdown cells and add CSV output * Added functionality to decide between input/output data formats; supports cvs and feather at the moment

Markdown version of goals statement - first draft.

source: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Part-B-Drugs/McrPartBDrugAvgSalesPrice/2016ASPFiles.html

Cleaning drug manufacturer data sourced from CMS.

jenniferthompson · 2017-02-07T03:58:14Z

@cduvallet! The data summary and data dictionary are SO helpful! I've asked Matt or Daniela to review it because I'm not a Python user, but whether or not the data relates to what we're doing immediately, having all this documented so well is fantastic. Thank you!

dhuppenkothen · 2017-02-07T13:54:55Z

I can review this today, unless @mattgawarecki is on it already.

dhuppenkothen · 2017-02-07T13:58:54Z

I can also check tonight if these classifications work for the Part D data that I've been playing around with.

dhuppenkothen

Nice work! This'll be really useful!

dhuppenkothen · 2017-02-07T15:07:10Z

data/usp_drug_classification_tidying_script.py

+import pandas as pd
+
+if __name__ == "__main__":
+


Is this supposed to be just a script to be executable from the command line? Or should this supply a function that can be run from within a larger Python program as well? If it's the former, you don't really need the if __name__ == "__main__" line.

I would actually suggest moving the code below into a function tidy_kegg_data() or something, and then have it execute that here. This would allow someone to run this from within a larger programme, if necessary.

dhuppenkothen · 2017-02-07T15:09:05Z

data/usp_drug_classification_tidying_script.py

+
+if __name__ == "__main__":
+
+    fname = 'br08302.keg'


Is this data on data.world? It might be worth not assuming that the data exists on the local system, or at least check whether it exists on the local system.
There's a function in scripts/read_data.py that might make that easier (you might need to git pull upstream/master).

Yes, both great points that I meant to address but forgot! Will have time tomorrow to fix. Thanks for pointing it out! :)

cduvallet · 2017-02-08T03:06:21Z

@dhuppenkothen I made the changes you recommended, it's much nicer now. I wasn't sure of the best way to interface with read_data.py (so I just re-wrote the download data wrapper...)

Also, it seems that there are currently two ways we're keeping track of, downloading, and tidying data:

The script/read_data.py script has individual functions for each of the datasets that downloads and tidies, and
The data/ folder has individual data dictionaries and corresponding tidying scripts, one for each individual dataset.

From what I understood from @mattgawarecki, I think we're going with option 2? But let me know if not, and I can incorporate this into the read_data.py script.

…aries Merge data-dictionaries branch in preparation for restructuring

jenniferthompson · 2017-02-08T04:39:15Z

@cduvallet I'll let @dhuppenkothen speak to read_data.py, but just wanted to jump in and say we had a long discussion today about repo organization, and I just submitted a PR to reflect the updated file structure. Once we get that finalized we'll clean up all the documentation, but the idea will be to have a dictionary (md) in /datadictionaries, and tidying scripts in (in your case) /python/datawrangling/[subfolders if you need it]. Not sure if that answers all your questions, but hopefully helps! Thanks so much for bearing with us while we get more streamlined - it'll help tremendously in the long run.

added direct link to the datasets of interest

* Reorganization FTW * Reorganization FTW, part 2 * Add .gitignore * Add READMEs to each subdirectory. Rename data dictionary template (now TEMPLATE) and remove suffix from manufacturer_datadict.md. * Add link to data.world Python client * Update main README to reflect new file structure * Fix link to datadictionaries * Really fix it this time * Fix the other datadictionaries links to overview and template * More streamlining and edits to README

jenniferthompson · 2017-02-12T20:11:21Z

Hey @cduvallet and @dhuppenkothen! Just checking in on the status of this PR. No rush intended on my end, just wanted to make sure there isn't anything blocking either of you that we need to take care of administratively.

cduvallet · 2017-02-13T15:04:11Z

@jenniferthompson Nope, I was just traveling this weekend so haven't gotten around to finalizing this. Will update if I need anything from y'all! :)

cduvallet · 2017-02-14T01:53:33Z

Okay, I think we should be ready to merge! @jenniferthompson double-check and let me know if anything needs to change?

jenniferthompson · 2017-02-14T16:19:12Z

@cduvallet The data-dictionaries branch looks great! Would you mind pushing that to your master branch so it'll show up on master here? I think that should do it!

@dhuppenkothen did you have any further suggestions on the Python code?

cduvallet · 2017-02-15T04:53:29Z

@jenniferthompson I think I did it! Should be ready to merge if @dhuppenkothen doesn't have other comments.

dhuppenkothen · 2017-02-15T14:18:06Z

Looks good to me!

mattgawarecki · 2017-02-15T15:01:12Z

Oops. I'll get this into master instead of data-dictionaries.

dhuppenkothen and others added 14 commits February 3, 2017 16:22

Data Wrangling for Drug Use Visualizations (Data4Democracy#24)

ac875b1

* Pulling drug use classes out of the CMS PUF files for categorizing the Plan D data * Fixed directory creating, refactored df, shortened loop

Small bug fix to remove hard-coded directory paths (Data4Democracy#28)

a471e83

Markdown version of goals statement - first draft.

0228bbb

add USP drug classification tidying and data dictionary

03bfc17

Added anchor for citations and superscripted refs

b5f2cda

Merge pull request Data4Democracy#29 from jenniferthompson/master

3ac353d

Markdown version of goals statement - first draft.

Cleaning drug manufacturer data

1cbf1e2

source: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Part-B-Drugs/McrPartBDrugAvgSalesPrice/2016ASPFiles.html

move USP Drug Classification to

c6e4f2b

clarify usp drug classification data dict info

dcd95fa

small changes

53563a4

update links

dc023fd

small changes

8fc78cf

Merge pull request Data4Democracy#31 from skirmer/master

c105869

Cleaning drug manufacturer data sourced from CMS.

jenniferthompson requested review from mattgawarecki and dhuppenkothen February 7, 2017 03:56

dhuppenkothen reviewed Feb 7, 2017

View reviewed changes

mattgawarecki approved these changes Feb 7, 2017

View reviewed changes

download from data.world, clean up functions, add comments

8a15e47

Merge pull request Data4Democracy#35 from Data4Democracy/data-diction…

b675cdf

…aries Merge data-dictionaries branch in preparation for restructuring

Selah Lynch and others added 5 commits February 8, 2017 11:45

added direct link to the datasets of interest

468a753

Merge pull request Data4Democracy#39 from selahlynch/patch-1

aa1e34d

added direct link to the datasets of interest

Add @skirmer to maintainers!

cc445b2

Correct directory name for datadictionaries

22db3e7

jenniferthompson added 2 commits February 9, 2017 15:51

Add link to objectives doc

02b95af

Update datadictionaries/README.md to reflect updated repo structure

62c92b9

cduvallet added 2 commits February 13, 2017 18:08

Merge branch 'master' into data-dictionaries

c7fdbe1

update files to reflect repo structure changes

e63ff4d

mattgawarecki merged commit 57a218f into Data4Democracy:data-dictionaries Feb 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USP Drug Classification data dictionary + tidying #33

USP Drug Classification data dictionary + tidying #33

cduvallet commented Feb 7, 2017

jenniferthompson commented Feb 7, 2017

dhuppenkothen commented Feb 7, 2017

dhuppenkothen commented Feb 7, 2017

dhuppenkothen left a comment

dhuppenkothen Feb 7, 2017 •

edited

Loading

dhuppenkothen Feb 7, 2017

cduvallet Feb 8, 2017

cduvallet commented Feb 8, 2017 •

edited

Loading

jenniferthompson commented Feb 8, 2017

jenniferthompson commented Feb 12, 2017

cduvallet commented Feb 13, 2017

cduvallet commented Feb 14, 2017

jenniferthompson commented Feb 14, 2017

cduvallet commented Feb 15, 2017

dhuppenkothen commented Feb 15, 2017

mattgawarecki commented Feb 15, 2017

USP Drug Classification data dictionary + tidying #33

USP Drug Classification data dictionary + tidying #33

Conversation

cduvallet commented Feb 7, 2017

jenniferthompson commented Feb 7, 2017

dhuppenkothen commented Feb 7, 2017

dhuppenkothen commented Feb 7, 2017

dhuppenkothen left a comment

Choose a reason for hiding this comment

dhuppenkothen Feb 7, 2017 • edited Loading

Choose a reason for hiding this comment

dhuppenkothen Feb 7, 2017

Choose a reason for hiding this comment

cduvallet Feb 8, 2017

Choose a reason for hiding this comment

cduvallet commented Feb 8, 2017 • edited Loading

jenniferthompson commented Feb 8, 2017

jenniferthompson commented Feb 12, 2017

cduvallet commented Feb 13, 2017

cduvallet commented Feb 14, 2017

jenniferthompson commented Feb 14, 2017

cduvallet commented Feb 15, 2017

dhuppenkothen commented Feb 15, 2017

mattgawarecki commented Feb 15, 2017

dhuppenkothen Feb 7, 2017 •

edited

Loading

cduvallet commented Feb 8, 2017 •

edited

Loading