Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multiple models #60

Closed
marcverhagen opened this issue Jan 25, 2024 · 10 comments
Closed

Allow multiple models #60

marcverhagen opened this issue Jan 25, 2024 · 10 comments
Labels
▶️F Migrate to next phase ✨N New feature or request

Comments

@marcverhagen
Copy link
Contributor

Because

The SWT app now ships with a fixed set of frame types that are recognized. It is very possible that a user want to choose a different set of types (as is happening right now with a request to add bars in #53).

Would it make sense to ship the app with multiple models and have a parameter for selecting a model?

Done when

No response

Additional context

No response

@marcverhagen marcverhagen added ▶️F Migrate to next phase ✨N New feature or request labels Jan 25, 2024
@clams-bot clams-bot added this to apps Jan 25, 2024
@github-project-automation github-project-automation bot moved this to Todo in apps Jan 25, 2024
@keighrim
Copy link
Member

Unless we figure out a clever way to separate app releases and models releases, I don't think this is worth trying. Empirically, we already experimented with "model picker" as a runtime param when we were working on the clip-based trainable app, but it was a hack job trying the best with what we had (and still have) without adding new "model" specific release model, and we found that it added too much complication and then was never well-received to potential users.

Given that, I'd actually like to suggest we go the opposite way and make the app metadata even more specific by explicitly enumerating frameType property values (based on the coupled CV model) of TimeFrame output objects. Note that ATM giving a list of property values to I/O metadata is not possible, but support will come when clamsproject/clams-python#189 is resolved.

@marcverhagen
Copy link
Contributor Author

Maybe it is worth thinking about whether some version of discriminators as used by LAPPS would be relevant here.

@keighrim keighrim mentioned this issue Feb 5, 2024
@keighrim
Copy link
Member

keighrim commented Feb 5, 2024

some version of discriminators as used by LAPPS

As long as I remember, the purpose of the discriminators in lappsgrid vocab (except for some namespace-subtypes that were reserved for JSON-LD contexts) was mainly to allow LAPPS apps to do non-LIF I/O. But as we discussed very early in the project (clamsproject/mmif#6), CLAMS pipelines do not allow hybrid I/O for interoperability, so I guess what you meant was very different use case than the LIF/discriminator usages. Could you elaborate this use case?


Note that ATM giving a list of property values to I/O metadata is not possible, but support will come when clamsproject/clams-python#189 is resolved.

This wasn't true actually, and as the linked issue is closed as "wont-fix", frameType specification of output labels are added in a pending PR (#63 , 61aba225c1502af1ebf6053fc1b86d6d45a2db82)

@marcverhagen
Copy link
Contributor Author

Elaborating...

Discriminators in LAPPS were indeed partially about defining media types, including non-LIF formats like UIMA and Gate. But there was more to them: they also included all elements from the LAPPS vocabulary as well as definitions of tag sets for pos (https://vocab.lappsgrid.org/ns/tagset/pos), ner and dependencies. And the latter use was what I was thinking of. Discriminators are really abbreviations of all kinds of information and provide URL as a landings spot where that information can be spelled out.

So for a multi-model app, the specifications may be able to refer to one or more discriminators which spell out the details of the models. Whether discriminators are the right approach for this is not clear.

@keighrim
Copy link
Member

keighrim commented Feb 7, 2024

If I understood this correctly, under this proposal,

  1. app developers are either
    1. responsible to send a PR with the discriminator and the label set definition to CLAMS vocab maintainers when they develop a new model with a new set of labels and have to wait until the PR is merged and published before they can release a new version of app without having an invalid discriminator URI to the label set
    2. responsible to host their own vocab-like public URI registry and publish their own label sets and don't have to wait for the label set to be part of official CLAMS vocab to release their apps
    3. just sticking to label sets predefined in the CLAMS vocabulary (if we have some in the future) when they design a new model
  2. app users are responsible for grabbing one of those model/labelset URIs (probably enumerated in the app metadata on the app dir) and pass the URI as part of URL query string when they curling (or equivalent), which implies properly escaping the embedded URI.

Is this correct?

@marcverhagen
Copy link
Contributor Author

I am not yet proposing this, just floating the idea. It will take some work to introduce discriminators, and there does not seem to be any hurry here.

Items 1.i through 1.iii seem correct and in line with annotation types. For 2, we could have a parameter that picks the model and the possible values are an enumeration of the relevant discriminators, which could indeed be put somewhere in the metadata.

@keighrim
Copy link
Member

keighrim commented Mar 4, 2024

Regarding the label sets of different models, in my previous comment, I was mixing two different problems.

  • specifying "labelset" in the app metadata and output MMIF
  • variable "labelset" values based on model selection

For the first problem, it is (mostly) solved by #41 and clamsproject/mmif#218

For second problem is probably more prevalent for this issue, and now after considering this type-matching problem (clamsproject/clams-python#194), I realized it's a much harder-to-solve problem than simply adding a runtime parameter for model selection.

From a command in the linked clams-python issue

Now, we can infer just form the TimeFrame::labelset spec of the app metadata of the SWT that it generates MMIF that's suitable for this OCR app, it might not be so obvious to a type-theoretical workflow engine, if some intelligent type coercion system is missing.

But if we allow users of this app (or generalized classifier apps) to select from models with different labelsets, that means in the app metadata output section, app dev CAN'T specify a fixed labelset value, hence in turn, the "inference" in the comment cannot work, making type-matching system impossible to implement.

@marcverhagen
Copy link
Contributor Author

Yeah, there is a problem there. In the current develop branch the added models have post-bin categories like other_opening that are not defined in the metadata.

One way to keep it consistent would be to require that the labelset in the metadata is the union of all labelsets in the models. This is not too bad a restriction, and it does make it unpleasant for a model to just come up with any wacko post-binning. Adding a new model with new categories would require updating the metadata.

@keighrim
Copy link
Member

keighrim commented Mar 4, 2024

Adding a new model with new categories would require updating the metadata.

This has been always the case, so shouldn't be a problem. However since we are now talking about separating the stitcher part and (in the process of doing so) parameterizing the "post-binning", now I'm not even sure that we can use labelset property in the app metadata at all.

@keighrim
Copy link
Member

Multiple built-in models and model selection via a runtime parameter was added in v4.1 and labelset property is configurable via a runtime parameter in v5.0.

In conclusion;

  • multiple models are allowed (with a default model).
  • a model can be selected at the runtime.
  • labelset for TimePoint is hard-coded in the model, but currently all models are using the same set (specified in the app metadata).
  • labelset for TimeFrame is fully configurable, hence not specified in the app metadata.

Closing the issue as complete.

@github-project-automation github-project-automation bot moved this from Todo to Done in apps Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
▶️F Migrate to next phase ✨N New feature or request
Projects
Archived in project
Development

No branches or pull requests

2 participants