Allow multiple models #60

marcverhagen · 2024-01-25T22:26:27Z

Because

The SWT app now ships with a fixed set of frame types that are recognized. It is very possible that a user want to choose a different set of types (as is happening right now with a request to add bars in #53).

Would it make sense to ship the app with multiple models and have a parameter for selecting a model?

Done when

No response

Additional context

No response

keighrim · 2024-01-26T15:21:05Z

Unless we figure out a clever way to separate app releases and models releases, I don't think this is worth trying. Empirically, we already experimented with "model picker" as a runtime param when we were working on the clip-based trainable app, but it was a hack job trying the best with what we had (and still have) without adding new "model" specific release model, and we found that it added too much complication and then was never well-received to potential users.

Given that, I'd actually like to suggest we go the opposite way and make the app metadata even more specific by explicitly enumerating frameType property values (based on the coupled CV model) of TimeFrame output objects. Note that ATM giving a list of property values to I/O metadata is not possible, but support will come when clamsproject/clams-python#189 is resolved.

marcverhagen · 2024-01-26T15:44:06Z

Maybe it is worth thinking about whether some version of discriminators as used by LAPPS would be relevant here.

keighrim · 2024-02-05T00:59:53Z

some version of discriminators as used by LAPPS

As long as I remember, the purpose of the discriminators in lappsgrid vocab (except for some namespace-subtypes that were reserved for JSON-LD contexts) was mainly to allow LAPPS apps to do non-LIF I/O. But as we discussed very early in the project (clamsproject/mmif#6), CLAMS pipelines do not allow hybrid I/O for interoperability, so I guess what you meant was very different use case than the LIF/discriminator usages. Could you elaborate this use case?

Note that ATM giving a list of property values to I/O metadata is not possible, but support will come when clamsproject/clams-python#189 is resolved.

This wasn't true actually, and as the linked issue is closed as "wont-fix", frameType specification of output labels are added in a pending PR (#63 , 61aba225c1502af1ebf6053fc1b86d6d45a2db82)

marcverhagen · 2024-02-05T18:44:40Z

Elaborating...

Discriminators in LAPPS were indeed partially about defining media types, including non-LIF formats like UIMA and Gate. But there was more to them: they also included all elements from the LAPPS vocabulary as well as definitions of tag sets for pos (https://vocab.lappsgrid.org/ns/tagset/pos), ner and dependencies. And the latter use was what I was thinking of. Discriminators are really abbreviations of all kinds of information and provide URL as a landings spot where that information can be spelled out.

So for a multi-model app, the specifications may be able to refer to one or more discriminators which spell out the details of the models. Whether discriminators are the right approach for this is not clear.

keighrim · 2024-02-07T13:35:20Z

If I understood this correctly, under this proposal,

app developers are either
1. responsible to send a PR with the discriminator and the label set definition to CLAMS vocab maintainers when they develop a new model with a new set of labels and have to wait until the PR is merged and published before they can release a new version of app without having an invalid discriminator URI to the label set
2. responsible to host their own vocab-like public URI registry and publish their own label sets and don't have to wait for the label set to be part of official CLAMS vocab to release their apps
3. just sticking to label sets predefined in the CLAMS vocabulary (if we have some in the future) when they design a new model
app users are responsible for grabbing one of those model/labelset URIs (probably enumerated in the app metadata on the app dir) and pass the URI as part of URL query string when they curling (or equivalent), which implies properly escaping the embedded URI.

Is this correct?

marcverhagen · 2024-02-07T19:50:12Z

I am not yet proposing this, just floating the idea. It will take some work to introduce discriminators, and there does not seem to be any hurry here.

Items 1.i through 1.iii seem correct and in line with annotation types. For 2, we could have a parameter that picks the model and the possible values are an enumeration of the relevant discriminators, which could indeed be put somewhere in the metadata.

keighrim · 2024-03-04T03:19:30Z

Regarding the label sets of different models, in my previous comment, I was mixing two different problems.

specifying "labelset" in the app metadata and output MMIF
variable "labelset" values based on model selection

For the first problem, it is (mostly) solved by #41 and clamsproject/mmif#218

For second problem is probably more prevalent for this issue, and now after considering this type-matching problem (clamsproject/clams-python#194), I realized it's a much harder-to-solve problem than simply adding a runtime parameter for model selection.

From a command in the linked clams-python issue

Now, we can infer just form the TimeFrame::labelset spec of the app metadata of the SWT that it generates MMIF that's suitable for this OCR app, it might not be so obvious to a type-theoretical workflow engine, if some intelligent type coercion system is missing.

But if we allow users of this app (or generalized classifier apps) to select from models with different labelsets, that means in the app metadata output section, app dev CAN'T specify a fixed labelset value, hence in turn, the "inference" in the comment cannot work, making type-matching system impossible to implement.

marcverhagen · 2024-03-04T15:17:50Z

Yeah, there is a problem there. In the current develop branch the added models have post-bin categories like other_opening that are not defined in the metadata.

One way to keep it consistent would be to require that the labelset in the metadata is the union of all labelsets in the models. This is not too bad a restriction, and it does make it unpleasant for a model to just come up with any wacko post-binning. Adding a new model with new categories would require updating the metadata.

keighrim · 2024-03-04T15:40:08Z

Adding a new model with new categories would require updating the metadata.

This has been always the case, so shouldn't be a problem. However since we are now talking about separating the stitcher part and (in the process of doing so) parameterizing the "post-binning", now I'm not even sure that we can use labelset property in the app metadata at all.

keighrim · 2024-06-17T11:09:58Z

Multiple built-in models and model selection via a runtime parameter was added in v4.1 and labelset property is configurable via a runtime parameter in v5.0.

In conclusion;

multiple models are allowed (with a default model).
a model can be selected at the runtime.
labelset for TimePoint is hard-coded in the model, but currently all models are using the same set (specified in the app metadata).
labelset for TimeFrame is fully configurable, hence not specified in the app metadata.

Closing the issue as complete.

marcverhagen added ▶️F Migrate to next phase ✨N New feature or request labels Jan 25, 2024

clams-bot added this to apps Jan 25, 2024

github-project-automation bot moved this to Todo in apps Jan 25, 2024

keighrim mentioned this issue Feb 5, 2024

define SWT #1

Open

marcverhagen mentioned this issue Feb 29, 2024

App versions and model versions clamsproject/clams-python#230

Open

clams-bot assigned keighrim Feb 29, 2024

keighrim mentioned this issue Feb 29, 2024

allowing multiple models via a runtime param #79

Merged

keighrim closed this as completed Jun 17, 2024

github-project-automation bot moved this from Todo to Done in apps Jun 17, 2024

clams-bot unassigned keighrim Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow multiple models #60

Allow multiple models #60

marcverhagen commented Jan 25, 2024

keighrim commented Jan 26, 2024

marcverhagen commented Jan 26, 2024

keighrim commented Feb 5, 2024 •

edited

Loading

marcverhagen commented Feb 5, 2024

keighrim commented Feb 7, 2024 •

edited

Loading

marcverhagen commented Feb 7, 2024

keighrim commented Mar 4, 2024 •

edited

Loading

marcverhagen commented Mar 4, 2024

keighrim commented Mar 4, 2024

keighrim commented Jun 17, 2024

Allow multiple models #60

Allow multiple models #60

Comments

marcverhagen commented Jan 25, 2024

Because

Done when

Additional context

keighrim commented Jan 26, 2024

marcverhagen commented Jan 26, 2024

keighrim commented Feb 5, 2024 • edited Loading

marcverhagen commented Feb 5, 2024

keighrim commented Feb 7, 2024 • edited Loading

marcverhagen commented Feb 7, 2024

keighrim commented Mar 4, 2024 • edited Loading

marcverhagen commented Mar 4, 2024

keighrim commented Mar 4, 2024

keighrim commented Jun 17, 2024

keighrim commented Feb 5, 2024 •

edited

Loading

keighrim commented Feb 7, 2024 •

edited

Loading

keighrim commented Mar 4, 2024 •

edited

Loading