The scripts in this directory were used to produce derived data and plots.
aggregates all data mined from GitHub into four datasets described in the wiki. Crucially, the data is reshaped into a time-indexed format for three of those
visualises the relationship between how a repository is cited and the difference between its creation date and the publication
creates a dataset with all repositories mined from ePrints for which we manually determined the citation type. The resulting dataset contains data from ePrints as well as a label indicating whether the software was cited as created
creates one plot containing visualisations and data about all repositories. The dataset can be filtered for a subset of repositories with the--filter
creates one plot for one repository, focussing on timelined data. The code to produce these uses the raw data rather than the aggregated data produced
as this script was written
. Both scripts use the same data manipulation methods - directly plotting data produced
should result in similar graphs.github.ipynb
was used for exploratory data analysis. The most interesting visualisations were later transferred
produces visualisations illustrating the relationship between publications and GitHub links found in them.
The schemas for any produced datasets are included in the wiki.