You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need some sort of dataset to count mentions according to #2.
What do we really need?
A dataset to count the mentions on
There are several ways this could look:
A list of all normalized mentions with info on which paper they appeared in
An enriched version of CORD-19 with annotations of the normalized mention per software mention per paper, i.e.,
the information that Software1 and Software2 are both mentioned in this paper, even if Software1 was actually mentioned as software one or SW 1, and perhaps the count of each mention per paper
A new dataset which reuses information from CORD-19 but presents it in a cleaned-up fashion, and possibly some other format
How can we achieve this?
Ideas welcome (Jupyter Notebook perhaps?)
The text was updated successfully, but these errors were encountered:
+1 for Jupyter. Can have fully automated and reproducible analysis which downloads the CSV file (or has a refined dataset in the repository) and allows to re-run it on Binder: https://github.com/rse-standrewscs/python-binder-template
What do we have?
The issue
We need some sort of dataset to count mentions according to #2.
What do we really need?
There are several ways this could look:
the information that
Software1
andSoftware2
are both mentioned in this paper, even ifSoftware1
was actually mentioned assoftware one
orSW 1
, and perhaps the count of each mention per paperHow can we achieve this?
The text was updated successfully, but these errors were encountered: