Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAMS beyond music? #24

Open
bmcfee opened this issue Feb 7, 2015 · 8 comments
Open

JAMS beyond music? #24

bmcfee opened this issue Feb 7, 2015 · 8 comments

Comments

@bmcfee
Copy link
Contributor

bmcfee commented Feb 7, 2015

Just opening up a separate thread here (rather than the already bloated #13): is it worth considering designing JAMS to be extensible into domains outside of music/time-series annotation?

I think the general architecture is flexible enough to make this possible with roughly zero overhead, and it might be a good idea.

From what I can tell, all that we'd have to do is restructure the schema a little so that "*Observation" is slightly more generic. We currently define two (arguably redundant) observation types that both encode tuples of (time, duration, value, confidence). It wouldn't be hard to extend this into multiple observation forms, say for images with bounding-box annotations, we would have (x, x_extent, y, y_extent, value, confidence). For video, we would have (x, x_extent, y, y_extent, t, duration, value, confidence), etc.

Within the schema, nothing would really change, except that we change "DenseObservation" to "DenseTimeObservation" (and analogous for Sparse), and then some time down the road, allow other observation schema to be added.

I don't think we need to tackle this for the immediate (next) release, except insofar as we can design to support it in the future in a backwards-compatible way.

Opinions?

@urinieto
Copy link
Contributor

Yes, I thought of the "image" application of JAMS as well. We should do a bit of research to know what people in image/video processing use to annotate their datasets. But I like the idea of extending this, at least for a future release after the first "official" one.

@justinsalamon
Copy link
Contributor

I think a lower-hanging fruit would be non-music audio datasets (e.g. environmental sounds). I'm probably biased, but I feel this is an area where the need for annotated datasets is growing rapidly and would require minimal (or zero?) additional work to accommodate, right? Oh, and there's speech too...

@ejhumphrey
Copy link
Collaborator

Forgive overlap with other issues that escape my memory, but it seems
lyrics fall into this conversation too, no?
On 11 Feb 2015 11:29, "Justin Salamon" [email protected] wrote:

I think a lower-hanging fruit would be non-music audio datasets (e.g.
environmental sounds). I'm probably biased, but I feel this is an area
where the need for annotated datasets is growing rapidly and would require
minimal (or zero?) additional work to accommodate, right? Oh, and there's
speech too...


Reply to this email directly or view it on GitHub
#24 (comment).

@bmcfee
Copy link
Contributor Author

bmcfee commented Feb 11, 2015

I think a lower-hanging fruit would be non-music audio datasets (e.g. environmental sounds).

It depends on what the annotations look like, but I would expect most of this data to look like tag_* annotations, just like we already support. The point I was getting originally is not so much the domain of the data, but the way in which the extent of an annotation is encoded.

Forgive overlap with other issues that escape my memory, but it seems
lyrics fall into this conversation too, no?

We already support lyrics with the current schema.. afaik, nothing needs to change?

@bmcfee
Copy link
Contributor Author

bmcfee commented Jul 14, 2017

I was thinking about this today while talking to some folks working on speech / general audio. One of the issues there is that our metadata schema might not be appropriate for non-music annotations.

I think this could issue actually be merged with #98 / a schema refactor that promotes all jams classes to top-level definitions.

The reasoning here is that if we move FileMetadata up a level, we can then have that as a base class that's inherited by things like MusicMetadata, SpeechMetadata, etc. The JAMS schema would then allow an annotation to have metadata belonging to any of those particular formats. This is a pretty minimal change, and would be backward-compatible, and open JAMS up to a broader class of applications.

Similarly, we could abstract the Observation type into things like Observation1D and Observation2D, which would have (time, duration) (time) and (x, x_extent, y, y_extent) (spatial) localization fields. This again would broaden the utility of JAMS beyond music/audio, and make it applicable for things like images and video, without much effort on our end.

What do folks think? @ejhumphrey @justinsalamon ? EDIT tagging @stevemclaugh

@bmcfee
Copy link
Contributor Author

bmcfee commented Jul 17, 2017

Thinking about this more: a complication here would be that dynamic reconstruction of the corresponding jams class for alternate metadata schemas could get tricky.

We get around this (oneOf types) in annotation (dense vs sparse observation) by using the same internal data store for both types (so it doesn't matter when loading), and having an extra field in the namespace definition that determines which class to use (when saving). I'd like to avoid generalizing this kind of hack to bigger class definitions; maybe there's a way to probe the schema validator to know which part of the schema it's catching when the string input is validated on load?

@bmcfee
Copy link
Contributor Author

bmcfee commented Jul 17, 2017

The above might be resolved if we specify a type mapping for all schema objects: https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-with-additional-types

@ejhumphrey
Copy link
Collaborator

I very much agree about generalization, and have wondered this since my days hacking away at OMR (which was almost jamsy). I wonder if something like CrowdFlower would be interested in collab'ing for their image annotator...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants