Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a Data Dictionary to Describe Data, Known Issues, and Caveats #212

Open
benghancock opened this issue Jun 4, 2021 · 1 comment
Open
Labels
documentation Improvements or additions to documentation

Comments

@benghancock
Copy link
Collaborator

To communicate more information to downstream users of the data about data types, representations, and known issues, I think it would be useful to build a data dictionary. This could just be a plain text file, with sections for each category of data we collect, and sub-sections for each field.

The entries could look something like this:

"cases"
~~~~~~~
Description:
  The number of COVID-19 cases recorded in the county for the given date

Fields:
  "date" : date
    The date of the observation

  "cases" : integer
    The number of positive COVID-19 cases observed on the date. Figure may be
    preliminary and is subject to change

    Notes:
    [ ... notes go here ...]

Each of the data scrapers contains metadata, docstrings, comments, etc., that could be valuable to help the public understand the data more clearly. The goal would be to put this information all in one place, in an easy-to-digest way. Keeping the file in plain text makes it portable, and ensures that it's human readable (with some lightweight ASCII styling).

To foster easier collaboration in creating this doc, I'm suggesting that we just keep it as a file in the base dir of the repo on a separate branch while it is being drafted. That way, collaborators can make changes locally and push them up, and others can comment and contribute. Keeping it as a file in the repo has the added benefit of keeping the dictionary in sync with the rest of the code base; changes to the code that affect how the data is represented should be accompanied by updates to the data dictionary.

I'm happy to get this rolling and would appreciate any help and/or feedback!

@benghancock benghancock added the documentation Improvements or additions to documentation label Jun 4, 2021
@benghancock
Copy link
Collaborator Author

I've started this as DICTIONARY.md in the base of the repo, on branch 212-create-data-dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant