Skip to content
Olly Butters edited this page Jun 4, 2020 · 9 revisions

The file tree looks a little like:

- cache*
-- <short_name>
--- processed
---- cleaned   <- Gold standard clean data
---- merged    <- Intermediate files
--- raw        <- Raw copies of cached data
---- doi
---- pubmed
---- scopus
---- zotero
--- geodata    <- Derived geocoded organisations
- config       <- Config files
- data*        <- Outputed csv files etc
-- <short_name>
- html*        <- Outputed html files
-- <short_name>
- logs*
- source        <- All the source code
-- add          <- Add extra metadata (geocode, citations etc)
-- analyse      <- Do some stats
-- bibliography <- Create bibliographic files
-- clean        <- Clean up the metadata
-- config       <- Parse the config file.
-- get          <- Get the metadata
-- networks     <- Generate author network graph
-- plots        <- Generate some plots of the exported data
-- setup        <- Build the file tree (deleting caches as appropriate)
-- web_pages    <- Make the static html files

Folders with a * next to them get generated when the program runs, and can be safely deleted. Although deleting the cache will make the next run take much longer.