Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking.Partition.create() returns bogus first initial time value #21

Open
j9ac9k opened this issue Nov 18, 2019 · 6 comments
Open

tracking.Partition.create() returns bogus first initial time value #21

j9ac9k opened this issue Nov 18, 2019 · 6 comments
Assignees

Comments

@j9ac9k
Copy link
Collaborator

j9ac9k commented Nov 18, 2019

  • signalworks version: 0.2.1
  • Python version: 3.7.5
  • Operating System: macOS 10.15.1

Description

When creating a partition object wtih a non-zero initial value, an incorrectly time-marked partition object.

What I Did

>>> data = [(1.155, 1.515, 'call')]
>>> partition = tracking.Partition.create(data, fs=48000)
>>> partition.time
array([    0, 72720])
>>> partition.value
array(['call'], dtype='<U16')
@j9ac9k j9ac9k self-assigned this Nov 18, 2019
@j9ac9k
Copy link
Collaborator Author

j9ac9k commented Nov 18, 2019

Problem is in these lines of code

https://github.com/lxkain/signalworks/blob/master/signalworks/tracking/partition.py#L246-L248

        if time[0] != 0:
            logger.warning("moving first label boundary to zero")
            time[0] = 0

Correct behavior here is probably to insert an empty partition to the first label.

@lxkain
Copy link
Collaborator

lxkain commented Nov 19, 2019

First of all, I think that the create() should be merged with the read() method - they are both essentially doing the same thing.

Second, I personally think there should be an error if the first time is not zero. Most likely the user is confusing track.Label with track.Partition.

@tuanad121 is the author of create() - what do you think? What was your use-case for creating this in the first place?

@j9ac9k
Copy link
Collaborator Author

j9ac9k commented Nov 19, 2019

Forgot to close this issue when I merged my PR.

Our need came from wanting to create a partition track, but our source data was not encompassed by a single file, but was bundled in part of the file.

We effectively have a python data structure of

data: List[Tuple[float, float, str, float]] = [(startTime, endTime, label, score)]

The problem is that our data-source, does not always have the first entry in the list have a start time of 0, and silently moving the first start time to 0 (what the behavior was in signalworks =< 0.2.1) caused an unintentional behavior in our case.

Also in our case, a Partition track is definitely what we want (as opposed to a label), but it's just that I could not guarantee first entry would start at 0. I certainly could have extra checks in our application to create an empty segment if my first start time is not 0, but silently making the first time be 0 seemed very-much like non-ideal behavior, so I decided to make the change upstream.

In terms of Partition.create vs. Partition.read, I'm good with trying to merge them, I always thought of .read() as reading in a file, converting the serialized data into List[Tuple[float, float, str, <whatever>)], and then calling the create method... I'm certainly open to another suggestion.

I'll leave this issue open to keep the conversation going, but with signalworks 0.2.2 I already made the change.

@lxkain
Copy link
Collaborator

lxkain commented Nov 22, 2019

I think your changes are fine, but how about calling it Partition.from_list() instead of Partition.create()? Seems more descriptive.

@j9ac9k
Copy link
Collaborator Author

j9ac9k commented Nov 25, 2019

I'm good with from_list() instead of create().

Alternatively we could make the data be part of the __init__ method? That way if you do Partition.read and pass along a file, it returns the result of Partition.__init__() with the List[Tuple[]] as an input argument.

@lxkain
Copy link
Collaborator

lxkain commented Nov 25, 2019

So __init__ would either take time and value or data in that particular format? That would make it different from all the other initializer methods. I'd prefer to stick with from_list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants