This README is a template of what an import's README might look like.
Author: <github_handle>
[CSV | XLSX | TXT | etc.] file is available for download from <download_url>.
Explain as much as you need so that others can understand what the dataset's variables are without digging into the data and documentation. Here's an example skeleton you may find helpful:
This dataset is broken up into 3 major families of variables:
- XXX: <description_of_XXX>
- YYY: <description_of_YYY>
- ZZZ: <description_of_ZZZ>
X is further broken down into:
- a: <description_of_a>
- b: <description_of_b>
Y is further broken down into:
- aa: <description_of_aa>
- bb: <description_of_bb>
- cc: <description_of_cc>
Z is further broken down into:
- aaa: <description_of_aaa>
- bbb: <description_of_bbb>
Some example notes/caveats:
- This dataset considers Honolulu County a city.
- This dataset is a best-effort and occasionally contains decreasing numbers for cumulative count statistics.
- This dataset's documentation warns users not to compare across years due to <some_statistical_reason>.
An example license summary:
This dataset is made available for all commercial and non-commercial use under the FooBar Agreement.
The license is available online at <license_url>.
- Documentation: <documentation_url>
- Data Visualization UI: <some_other_url>
- <file_name_with_hyperlink>: <file_description>
- <file_name_with_hyperlink>: <file_description>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description_if_not_obvious_from_name>
- <file_name_with_hyperlink>: <file_description>
- <file_name_with_hyperlink>: <file_description>
Include any import related notes here. Here's an example:
New starting 2020-06-25, two new columns were added:
- abc
- def
These two variables have not been integrated into the import yet.
For this section, imagine walking someone through the procedure of regenerating all artifacts in sequential order.
Include any steps, checks, and scripts you used to validate the source data. If you perform checks inside the processing script, simply make a note here. Here is an example of what someone might have done while getting familiar with and validating the source data:
Manual validation:
- Examined the raw CSV and did not find identify any ill-formed values.
- Plotted a few columns using matplotlib for visual inspection.
python3 plot_samples.py
Automated validation:
- In the processing script (next section), there is an assert to check that all expected columns exist in the CSV.
Include any commands for running your scripts. This is especially relevant if your code relies on command line options. Also note that you may have kept the data download and cleaning in separate scripts. Here's an example:
statvar_filename.mcf
was handwritten.
To generate template_filename.tmcf
and data_filename.csv
, run:
python3 process_csv.py
To run the test file process_csv_test.py
, run:
python3 -m unittest process_csv_test
While writing script tests help make sure processing outputs are as expected, also describe any steps, checks, and scripts you used to validate the resulting artifacts. Here is an example of what someone might have done to validate the artifacts.
- Wrote and ran csv_template_mcf_compatibility_checker.py to validate that the resulting CSV and Template MCF artifacts are compatible.