Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making labelstudio-export.json filename configurable #36

Open
Dtphelan1 opened this issue Jun 6, 2024 · 4 comments
Open

Making labelstudio-export.json filename configurable #36

Dtphelan1 opened this issue Jun 6, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@Dtphelan1
Copy link

Dtphelan1 commented Jun 6, 2024

Motivation

In the llm-symptom-study experiments I'm running (and I imagine many experimental setups) I have two sets of relevant labels for our experiment – those used in tuning, and those used in testing our final models. My solution for easily switching between those two chart-review evaluation steps is to have two export files – labelstudio-export.json.tuning and labelstudio-export.json.test – and manually copy over the relevant file into labelstudio-export.json before running chart-review in the project directory.

It's not the end of the world, but it would be nice to be able to specify – either as a runtime command or as an option in the config – the name of the relevent labelstudio-export file, defaulting to labelstudio-export.json. Happy to discuss more, but wanted to write this struggle down as I was having it 😄

saved = common.read_json(self.config.path("labelstudio-export.json"))

@mikix
Copy link
Contributor

mikix commented Jun 7, 2024

Interesting, and reasonable.

Currently, we have two different "relocation" features:

  • --project-dir: basically a "change working dir" option. Many things are looked up relative to the project dir, like the default config file and external annotations.
  • --config: an alternate config file, designed for use cases like different label setups but still working with the same data in the same project dir.

So thinking out loud, how would a dynamically named export file best fit in...

You say you have two different sets of labels, but just for clarity when you say labels there, you are talking about two different set of "applied labels" (annotations / reviewed charts) yeah? And you're still using the same label setup for both.

An option in the config for this would slot into the existing --config workflow. But would that be annoying? How many of these exports do you have, and is that the only difference between your runs? (i.e. would you have the exact same config file, just with one line different?) -- That might be fine, just trying to get a sense of scope.

@Dtphelan1
Copy link
Author

You say you have two different sets of labels, but just for clarity when you say labels there, you are talking about two different set of "applied labels" (annotations / reviewed charts) yeah? And you're still using the same label setup for both.

Correct. Being more specific

  • One set of "labels" in the category sense – both labelstudio-export.json files ultimately speak the same language in terms of target labels. But there are;
  • Two sets of applied labels, with varying annotators and note ids because;
  • Two distinct sets of notes, avoiding overlap in training/tuning sets with our ultimate test set. Which to be clear also means;
  • Two configs are used as well, which happen to have the same 'labels' section within them, but different annotators and different note-ranges.

Right now I provide the --config option to specify which of the two config files I'm using, which is why my intuition was to have some similar CLI option for specifying what the labelstudio-export.json is named. Constraining myself to what's available today, maybe the actual guidance might be to have two subdirectories wherever I'm running this calculation, one for each dataset/labelstudio-export/config, and specify the directory as my runtime CLI option instead?

I can work with that on my end, but from a UX perspective I'm clarifying that when I saw "oh I can specify a directory altogether if I'm running this command globally or something, and oh I can specify the config file as needed", that I thought "surely I can do the same for the labelstudio-export.json file?" Also just thinking out-loud.

@mikix
Copy link
Contributor

mikix commented Jun 7, 2024

Hmm. I'm sympathetic to the idea that "magic" filename locations is not great - so labelstudio-export.json should probably be configurable somewhere.

I might prefer a config file field (like export-file:) for that then? To keep the number of ways your whole config can pivot down to just the --config flag. Especially since in your case, they need different configs anyway. The stuff in the config is closely tied to the contents of the export file, so a config pointing at its own export file makes some sense to my brain.

BUT... for your case, it does sound like there's little enough overlap between everything and you really are more of a "two different --project-dir flags" kind of setup, with whole separate folders.

@Dtphelan1
Copy link
Author

BUT... for your case, it does sound like there's little enough overlap between everything and you really are more of a "two different --project-dir flags" kind of setup, with whole separate folders.

I agree 😄

I might prefer a config file field (like export-file:) for that then. To keep the number of ways your whole config can pivot down to just the --config flag.

I think that sounds reasonable! It is a config-file after all, it makes sense that more granular configuration would live there. I'm not too opinionated on naming, so whatever makes the most sense to you!

Thanks Mike 👍

@mikix mikix added the enhancement New feature or request label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants