Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate a list of data sources for drugs and their uses #14

Closed
mattgawarecki opened this issue Jan 25, 2017 · 22 comments
Closed

Consolidate a list of data sources for drugs and their uses #14

mattgawarecki opened this issue Jan 25, 2017 · 22 comments

Comments

@mattgawarecki
Copy link
Contributor

Task

Create a list of data sources linking drug names to their uses and/or other similar drugs. Post the end result in this issue so we'll have a record we can look back on.

How this will help

Issue #6 also relates to drug uses; specifically, it seeks to link the drugs in our Medicare Part D data set to their respective purposes. While we started #6 with a good set of data to work from, there's a growing list of places we can look to get more information. With the right set(s) of eyes, we might be able to cover the CMS drug list more comprehensively. Before we can do that, though, we need to actually list out all the sources we're aware of.

@mattgawarecki
Copy link
Contributor Author

mattgawarecki commented Jan 25, 2017

Listing a few resources that people have already mentioned in Slack or elsewhere:

@peter0083
Copy link

adding to the list:
Merck Manual Professional
Instruction: select a drug you want to look up and you will see "Use: Labeled Indications" on the left panel

@cduvallet
Copy link
Contributor

cduvallet commented Feb 5, 2017

FYI - waiting for access to the data.world account to finalize some changes before pull requesting, but I've finished tidying the KEGG USP Drug Classification data.

Looks like it has info on non-Medicare drugs including drug categories and classes. For example, Naproxen's drug category is Analgesics and its class is Nonsteroidal Anti-inflammatory Drugs.

I'm not sure yet if this data also includes Part D Medicare drugs, or if that's in another dataset and this just has more common drugs - I think the Medicare drugs will be in the USP MMG dataset (which unfortunately doesn't seem to be on KEGG...). More info on this here.

@mattgawarecki
Copy link
Contributor Author

@cduvallet I just added you to the data.world organization. Thanks for taking this on!

@TBusen
Copy link

TBusen commented Feb 7, 2017

I've requested a UMLS license for the full RxNorm data

@cduvallet
Copy link
Contributor

@peter0083 - are you working on the Merck Manual? If not, I'll take a stab at scraping it - seems quite useful!

@peter0083
Copy link

@cduvallet - actually I haven't been able to work on Merck Manual yet. Please go ahead! 😃

@cduvallet
Copy link
Contributor

Updates after some poking around the Merck Manual and Medline Plus (I've seen a few other people working on RxNorm so I've left it alone for now...)

  • The Merck Manual seems really useful. If you go to Drug Information, Drugs by Name, Generic, and Brand there is a table of generic and brand name drugs. When you click on a drug, a box pops up with a lot of information, including different types of dosage and both on-label and off-labeled usage. This information might be difficult to parse, but I'm confident that we could get at least the on-label usage - which would be really helpful to link drugs to diseases/conditions! However, this information is in a pop-up box after you click on the drug, and doesn't seem straightforward to parse. Perhaps this can become a separate issue?

  • MedLine Plus has information on drugs as well as herbs/supplements.

    • When you browse the drugs, there is a list of brand names with a "see " at the end that takes you to the generic or general drug class page. If we still need to link generics with brand names, this could be a place for that. But I think other people already have other data sources for this linking?
    • The individual drug pages have a lot of information for the consumer, including a why is this drug prescribed? section which explains what the drug is for. This could probably be parsed to map drugs to diseases/conditions, but is not super straightforward because it's all in paragraph, free-text form. Not sure if it's worth the effort.
    • The herbs and supplements page has links to the individual herb/supplement pages with more info. I don't think this information is worth pursuing. The information seems not super straightforward to parse, and the science is still dubious on most of these - so that would make it difficult to trust data scraped from here.

Moving forward, I think that:

  1. Someone with more experience scraping data from interactive websites should take a look at the Merck Manual and determine if it's possible to extract the drug usage information from there.
  2. I can keep poking around for new data sources and do a similar first-pass for anything I find (like I did here).
  3. We should have some way to keep track of the kind of data we're looking for - either in the form of a "wish list" by the people who are starting preliminary analyses, or perhaps with "research questions" decided upon by the group, or something else. Just to make it easier to identify which data is worth pursuing and which isn't.

@TBusen
Copy link

TBusen commented Feb 15, 2017

I took a stab at scraping this. I was able to get the base table which is a xref of generic name - brand name. To even get this required that I download the source html and then parse the file. The script is currently on my fork.

Their site is not allowing direct url access. You must use the menu system from what I can gather. Even pasting the direct url in a browser doesn't work. The same goes for the pop up windows that have all that additional information. They are being called by javascript functions from what I can tell. I looked at their consumer site and it's the same story. I'll continue to work on this.

@jenniferthompson
Copy link
Contributor

This is heroic work, you guys - thank you! I definitely agree that the Merck data could be helpful in general if we can get it without too much pain. @TBusen thanks for working on it - keep us posted; if it's painful we can talk about whether it's worth it!

@cduvallet to your very valid questions - the main thing we're after at this moment is a way to "translate" drugs into pharmaceutical and therapeutic classes. The CMS data already has both brand and generic drug names, but if we want to look at medications for diabetes/anxiety/etc, we need a way to group them. The USP data you put together might do this well if it covers the Medicare Part D drugs; perhaps we can add a new issue for someone to try combining them and see what we get. You've got the deepest understanding of that data - how does that sound to you?

@cduvallet
Copy link
Contributor

Yes, happy to take a look this weekend or early next week! (I'm at a conference all day tomorrow and Friday). Do you want to make a new issue for this and assign it to me?

Also, since @TBusen seems to be chugging away at the Merck Manual scraping, perhaps that can become its own issue too?

Also, FWIW I think it'll likely be worth it - the Merck Manual has actual disease and condition names, rather than broad therapeutic or biological classes... ;)

@mattgawarecki
Copy link
Contributor Author

Created issue #50 to create a dedicated ticket for the Merck Manual effort. If anyone needs help with scraping, feel free to contact me directly for help -- I've done a little bit of work in this area and may be of use.

@davidlibland
Copy link
Contributor

Epocrates https://online.epocrates.com/drugs also has a comprehensive list of drugs with their uses... But has anyone looked into whether scraping Merck or Epocrates violates the user agreements, esp. if the results are posted publicly on data.world?

@mattgawarecki
Copy link
Contributor Author

mattgawarecki commented Feb 20, 2017

@davidlibland Thanks for reminding us on that. I've got an email out to Merck now, but I specifically left out the idea of us publishing their data on data.world, because let's be real -- that's never gonna happen. 😢 But, they may let us use their data privately, which works almost as well! 👍

@TBusen
Copy link

TBusen commented Feb 21, 2017

Another good source of data: http://www.pdr.net

Probably a long shot due to their TOS, but parking it here for completeness.
http://www.pdr.net/terms-of-use/

@davidlibland
Copy link
Contributor

davidlibland commented Feb 21, 2017 via email

@mattgawarecki
Copy link
Contributor Author

@davidlibland Looks like Google even has a listing of which sources they use and how they gather that data: https://support.google.com/websearch/answer/2364942?p=medical_conditions&visit_id=1-636232883305572138-808022200&rd=1

Hope that helps as you look into it!

@mattgawarecki
Copy link
Contributor Author

@TBusen @jenniferthompson @davidlibland

I got a reply back from Merck just now regarding our use of Merck Manuals. As much as I hate to say it, it would appear we can't use their data in our work. 😢

Dear Mr. Gawarecki:

Thank you for your contacting the Merck Manuals. We have reviewed your request below and feel that your project would be better served by obtaining a more thorough list of drugs and their various uses from another source.

We appreciate your interest in the Manuals and wish you the best with your project.

Kind regards,
Sheryl Olinsky Borg

@jenniferthompson
Copy link
Contributor

Oh, that's a bummer. Grateful for a quick response from them.

@peter0083
Copy link

man...... I thought Merck would be more generous with us because we were non-profit. 😞

@skirmer
Copy link
Member

skirmer commented Feb 22, 2017

Aw, well, maybe write back and ask her what other source she recommends?

@mattgawarecki
Copy link
Contributor Author

I think it's safe to assume the discussions in this ticket have led to a number of new tickets around linking drug names and uses. Of course, the Merck Manual didn't pan out the way we'd hoped, but other data and matching efforts have fared much better.

Closing this issue so we can focus on using the data we've found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants