-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
on possible (minor) tweaks to Annotation.to_samples #198
Comments
I'm leaning toward leaving it as is. Tuples, being immutable, can be a little unwieldy for a lot of the things we want to use values outputs for (eg slicing down to a fixed vocab).
I like this in theory, but as you say, the API for it seems awkward, especially when you consider that it should be consistent across all namespaces. You could do it in two steps by having a flag to control backfill, and a separate fill_value parameter to handle the data itself. |
yea, flag + fill_value seems a little unwieldy? it's not terrible to do these things on the user side .. i'm happy to punt for now, and if this ends up becoming a more common use case / pattern, we can figure it out then. |
No, but you raise a valid point about the semantics of annotation sampling. It's presently written from the perspective of positive-only annotations, and null/empty labels are only generated by sampling if there's an observation to that effect. This is the most conservative form of sampling, and it's not incorrect per se, but it's also not exactly what you want when integrating with sklearn (or whatever) where every input should have an output. |
Resurfacing this one too see if anyone's perspective has changed. Should we try to implement a fill parameter? Or leave it as is? It might be possible to check the fill value against the namespace schema at runtime, but that might get ugly moving forward if we unify all the namespace schemas into one master schema going forward. Alternately, we could just not validate fill values. |
My 2 cents: given that the current implementation returns a list of lists, instead of And then, I would simply not validate these custom empty values, let the user take care of it if needed. |
I guess that's valid. If a user supplies a bad fill value, that's on them. So to recap, here's the current logic:
The reason for all the list hackery is that observations can overlap, so the The proposed change would allow a user to change this by providing a list of default values that it initializes with instead of the empty list. In writing this up, I see two problems with this idea that had escaped my attention before:
I'm beginning to think this not worth implementing. It's easy enough for a user to post-process the values array as follows: values = ann.to_samples(...)
for v in values:
if not v:
v.extend(default_values)
# and repeat for confidences and then get on with their life. I think I prefer this solution over trying to implement something general-purpose that leads to awkward and confusing API decisions. |
holy moly I can't thank past us (but mostly @bmcfee and @justinsalamon) for making #109 happen, just bailed me out of an annoying issue I'm having with open-tag JAMS and issues around what to do with null events (separate issue, will cease tangent).
In working with this super useful feature, I've a couple ideas / questions for the crowd. Not super confident these are worth the effort yet, but this is why this is a question and not a PR..
tuple
instead of alist
? This would aid in the reuse of other objects (e.g.set
,collections.Counter
) which need a hashable type for reduction. This wouldn't necessarily be super useful for continuous / float values, but categorical (e.g. tags, like in my case) would be way helpful. The two counterarguments I see are: I understand from looking at the source how a mutable list makes this so much easier to implement; and, if this isn't a standard case, it's easy enough to lambda-map the results into a hashable type after and then do this.fill_value
field to the method's interface? In my case, only positive intervals are labeled, and so I get back empty lists where there is no range. It'd be great to backfill the null class at sample time, and at first blush this seems like an easy feature ... the only issue I see is, what default parameter would give the current result? It can't beNone
, because one could truly wantNone
as the backfill value, e.g.[[None], [None], ... ]
. It shouldn't be[]
, because semantically one would expect[[[]], [[]], ... ]
. Any ideas on this one?If nothing else, I skimmed the issue that originally spawned this feature, and didn't see a discussion of either (1) or (2) above, and figured they'd be worth adding to the conversation. Tagging this as wontfix is a potentially fine outcome.
The text was updated successfully, but these errors were encountered: