Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull together the normalization information in a dataset ⛙ #3

Open
1 task
sdruskat opened this issue Mar 31, 2021 · 2 comments
Open
1 task

Pull together the normalization information in a dataset ⛙ #3

sdruskat opened this issue Mar 31, 2021 · 2 comments
Labels
required Something that needs to be done to make the hack successful

Comments

@sdruskat
Copy link
Collaborator

sdruskat commented Mar 31, 2021

What do we have?

The issue

We need some sort of dataset to count mentions according to #2.

What do we really need?

  • A dataset to count the mentions on

There are several ways this could look:

  • A list of all normalized mentions with info on which paper they appeared in
  • An enriched version of CORD-19 with annotations of the normalized mention per software mention per paper, i.e.,
    the information that Software1 and Software2 are both mentioned in this paper, even if Software1 was actually mentioned as software one or SW 1, and perhaps the count of each mention per paper
  • A new dataset which reuses information from CORD-19 but presents it in a cleaned-up fashion, and possibly some other format

How can we achieve this?

  • Ideas welcome (Jupyter Notebook perhaps?)
@olexandr-konovalov
Copy link
Collaborator

+1 for Jupyter. Can have fully automated and reproducible analysis which downloads the CSV file (or has a refined dataset in the repository) and allows to re-run it on Binder: https://github.com/rse-standrewscs/python-binder-template

@olexandr-konovalov
Copy link
Collaborator

Still some code should in in .py files, easier to keep under version control, test etc.

Obligatory reading is https://doi.org/10.1371/journal.pcbi.1007007

There is also a tool for diffing and merging Jupyter notebooks: https://nbdime.readthedocs.io/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
required Something that needs to be done to make the hack successful
Projects
None yet
Development

No branches or pull requests

2 participants