Making `labelstudio-export.json` filename configurable #36

Dtphelan1 · 2024-06-06T19:33:29Z

Motivation

In the llm-symptom-study experiments I'm running (and I imagine many experimental setups) I have two sets of relevant labels for our experiment – those used in tuning, and those used in testing our final models. My solution for easily switching between those two chart-review evaluation steps is to have two export files – labelstudio-export.json.tuning and labelstudio-export.json.test – and manually copy over the relevant file into labelstudio-export.json before running chart-review in the project directory.

It's not the end of the world, but it would be nice to be able to specify – either as a runtime command or as an option in the config – the name of the relevent labelstudio-export file, defaulting to labelstudio-export.json. Happy to discuss more, but wanted to write this struggle down as I was having it 😄

chart-review/chart_review/cohort.py

Line 28 in 41fcc7d

saved = common.read_json(self.config.path("labelstudio-export.json"))

The text was updated successfully, but these errors were encountered:

mikix · 2024-06-07T12:20:23Z

Interesting, and reasonable.

Currently, we have two different "relocation" features:

--project-dir: basically a "change working dir" option. Many things are looked up relative to the project dir, like the default config file and external annotations.
--config: an alternate config file, designed for use cases like different label setups but still working with the same data in the same project dir.

So thinking out loud, how would a dynamically named export file best fit in...

You say you have two different sets of labels, but just for clarity when you say labels there, you are talking about two different set of "applied labels" (annotations / reviewed charts) yeah? And you're still using the same label setup for both.

An option in the config for this would slot into the existing --config workflow. But would that be annoying? How many of these exports do you have, and is that the only difference between your runs? (i.e. would you have the exact same config file, just with one line different?) -- That might be fine, just trying to get a sense of scope.

Dtphelan1 · 2024-06-07T12:49:22Z

You say you have two different sets of labels, but just for clarity when you say labels there, you are talking about two different set of "applied labels" (annotations / reviewed charts) yeah? And you're still using the same label setup for both.

Correct. Being more specific

One set of "labels" in the category sense – both labelstudio-export.json files ultimately speak the same language in terms of target labels. But there are;
Two sets of applied labels, with varying annotators and note ids because;
Two distinct sets of notes, avoiding overlap in training/tuning sets with our ultimate test set. Which to be clear also means;
Two configs are used as well, which happen to have the same 'labels' section within them, but different annotators and different note-ranges.

Right now I provide the --config option to specify which of the two config files I'm using, which is why my intuition was to have some similar CLI option for specifying what the labelstudio-export.json is named. Constraining myself to what's available today, maybe the actual guidance might be to have two subdirectories wherever I'm running this calculation, one for each dataset/labelstudio-export/config, and specify the directory as my runtime CLI option instead?

I can work with that on my end, but from a UX perspective I'm clarifying that when I saw "oh I can specify a directory altogether if I'm running this command globally or something, and oh I can specify the config file as needed", that I thought "surely I can do the same for the labelstudio-export.json file?" Also just thinking out-loud.

mikix · 2024-06-07T14:06:53Z

Hmm. I'm sympathetic to the idea that "magic" filename locations is not great - so labelstudio-export.json should probably be configurable somewhere.

I might prefer a config file field (like export-file:) for that then? To keep the number of ways your whole config can pivot down to just the --config flag. Especially since in your case, they need different configs anyway. The stuff in the config is closely tied to the contents of the export file, so a config pointing at its own export file makes some sense to my brain.

BUT... for your case, it does sound like there's little enough overlap between everything and you really are more of a "two different --project-dir flags" kind of setup, with whole separate folders.

Dtphelan1 · 2024-06-07T14:32:08Z

BUT... for your case, it does sound like there's little enough overlap between everything and you really are more of a "two different --project-dir flags" kind of setup, with whole separate folders.

I agree 😄

I might prefer a config file field (like export-file:) for that then. To keep the number of ways your whole config can pivot down to just the --config flag.

I think that sounds reasonable! It is a config-file after all, it makes sense that more granular configuration would live there. I'm not too opinionated on naming, so whatever makes the most sense to you!

Thanks Mike 👍

mikix added the enhancement New feature or request label Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making `labelstudio-export.json` filename configurable #36

Making `labelstudio-export.json` filename configurable #36

Dtphelan1 commented Jun 6, 2024 •

edited

Loading

mikix commented Jun 7, 2024

Dtphelan1 commented Jun 7, 2024

mikix commented Jun 7, 2024

Dtphelan1 commented Jun 7, 2024

Making labelstudio-export.json filename configurable #36

Making labelstudio-export.json filename configurable #36

Comments

Dtphelan1 commented Jun 6, 2024 • edited Loading

Motivation

mikix commented Jun 7, 2024

Dtphelan1 commented Jun 7, 2024

mikix commented Jun 7, 2024

Dtphelan1 commented Jun 7, 2024

Making `labelstudio-export.json` filename configurable #36

Making `labelstudio-export.json` filename configurable #36

Dtphelan1 commented Jun 6, 2024 •

edited

Loading