Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a CLI, add some light unit testing, and change config from Python to Yaml/Json #2

Merged
merged 7 commits into from
Oct 13, 2023

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Oct 12, 2023

OK on my first pass at interacting with the chart-review code, I wrote down some possible improvements here: #1

This is the PR to solve them. Specifically:

  • Add a wrapping CLI so that researchers don't need to touch Python if they don't want to.
    • This CLI currently only offers one subcommand: accuracy which does the 3-way accuracy calculation from paper.py in the covid folder. 🤷 it seemed like a reasonable place to start, but we should talk about what top-level operations make sense.
  • Add some very brief initial unit tests (just one method right now)
  • Biggest change is converting the config file format from Python to Yaml/Json
    • Researchers will touch this, so static config is easier to understand/explain and there's less room to shoot yourself in the foot
    • It's awkward to import arbitrary Python from a random project dir safely/wisely
    • I'm generally preferring Yaml where possible because you can have comments in it, but config.json will work too.
  • Bumped minimum python to 3.10 -- just for convenience of using some nicer type hinting it has. (Cumulus ETL requires 3.10, so this doesn't seem crazy to me)

I've split this PR up into different commits, which should hopefully make it easier to read. But there's still a fair bit.

Also deletes a duplicate method and refreshes pyproject.toml
CLI:
- New `chart-review` script gets installed along with Python module.
- One sub-command right now: `accuracy` which calculates accuracy
  matrixes across labels for two reviewers and a base third

Config:
- Switch away from Python config files and towards yaml/json files.
- I've added yaml versions of the two studies in the repo, as examples.
This will make it easier for someone just using the python to call it,
if they want to.
@@ -0,0 +1,41 @@
"""Methods for high-level accuracy calculations."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is basically a generic version of the calculation in paper.py

Chart Review operates on a project folder that holds your config & data.
1. Make a new folder.
2. Export your Label Studio annotations and put that in the folder as `labelstudio-export.json`.
3. Add a `config.yaml` file (or `config.json`) that looks something like this (read more on this format below):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea and I strongly prefer config.json over yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah fair - but note:

  • Json is technically a subset of yaml. (That is, a yaml parser can also read json)
  • So what I've done here is use a yaml parser and look for both config.yaml and config.json -- it will read either one
  • The reason I personally prefer yaml for config files is that you can have comments, which are often very useful for explaining why a config is the way it is (and also json can be annoyingly fussy about stuff like trailing commas, but that's less important than the comments thing)

So the way I made this PR, either yaml or json works - whichever the researcher in question is more comfy with.

How do you feel about that? (Or do you feel like standardizing on a specific syntax is worth disallowing yaml?)

@mikix mikix merged commit 8aab0c2 into main Oct 13, 2023
1 check passed
@mikix mikix deleted the mikix/cli branch October 13, 2023 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants