COMHIS Project Template

This is a generic template for a COMHIS project. Clone and rewrite to start a project with the template.

The aim here is to create a model that enables somewhat painless internal reproduction of a project and to ease communication about a project's structure.

Most of the features here are recommendations, and can obviously be varied on as needed.

Repository name

A proposal for unified naming scheme for publication related repositories is as follows: document type_date_project name. An example would be: article_2019_hume_history_text_reuse. Date should follow the format YYYYMMDD, with month and day optional, and would probably refer to the projected or actual end date of the project. The whole date -element can also be optional, to be included only if relevant, eg. article_hume_history_text_reuse would be equally valid, as would other variations, such as including the name of the publication: article_2019_JEPS_Finnish_national_public_sphere.

Practices

Project overview documentation:
- Should reside in the project root in a README.md (this file).
- Should list people involved and their roles in the project.
Naming files and folders:
- Use all lowercase (except for established standards such as README.md and the .R filename extension).
- Separate words in file and directory names by underscore: _. eg. south_sea_bubble.R instead of south-sea-bubble.R or SouthSeaBubble.R.
Structure:
- Follow the directory structure laid out below.
- Include README.md in each directory documenting the contents of that directory.
  - This is especially important in data and final code directories.
- If feasible, to avoid confusion only use single .gitattributes and single .gitignore file residing in the project root.

Directory structure

The project repository structured is variation of formats laid out in a few data science project organization articles (see the end of this README). code and output -directories include work/ and final/ -subdirectories. The work/ -subdirectory is optional, but helps to keep development material separate from the polished and clean end products that should reside in the final/ directory If the project only has "final" products, that directory level can of course be omitted too.

project_name/
├── README.md              # project overview
├── documentation/         # project documentation
├── data/
│   ├── raw/               # immutable raw input data
│   ├── work/              # intermediate data
│   └── final/             # processed data for final analysis tasks
├── code/
│   ├── templates/         # code templates shared across projects. Visualization styles etc.
│   ├── work/
│   │   ├── person1/       # a directory for each person or task
│   │   ├── person2/        
│   │   └── task1/         
│   └── final/
│       ├── task1/         # A directory for each analysis task
│       └── another_task/  # if needed.
└── output/
    ├── figures/
    │   ├── work/
    │   └── final/
    └── publications/
        ├── work/
        └── final/

Logic

[documentation/]: Project meta documentation. Links to all relevant planning papers, interim notes, google drive folders, etc.
[input/]: Input data. Either a whole dataset or if that is impractical, a link pointing to the data source (likely another repository). [data_raw/] subdirectory should have immutable original input data and/or references to the repositories where it can be retrieved from. [data_processed/] holds data that has been processed to analysis ready format and should include README.md pointing to the code that is used to produce the data. [data_work/] is a development directory for work-in-progress datasets, exchanging data between coding tasks, etc. Ideally, all datasets should be producible by scripts from the raw data.
[code/]: Data processing code. Finished code used for publication should be moved to [final/] subdirectory. Organization of the development directory [work/] can vary and the breakdown by person or task is just a suggestion. All directories, but especially [final/] should include a README.md clearly documenting what each script does.
- [code/templates/] has templates for code shared across projects, such as visualization styles.
[output/]: Both figures and publication texts/files. Divided to work and final subdirectories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COMHIS Project Template

Repository name

Practices

Directory structure

Logic

Articles on data science project git repo organization

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
data		data
documentation		documentation
output		output
.gitignore		.gitignore
README.md		README.md

COMHIS/project_template

Folders and files

Latest commit

History

Repository files navigation

COMHIS Project Template

Repository name

Practices

Directory structure

Logic

Articles on data science project git repo organization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages