-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add audformat.Scheme.replace_labels() #62
Conversation
Codecov Report
|
I think your solution should be the right way to proceed. There is one thing missing, your db = audformat.testing.create_db(minimal=True)
scheme = audformat.Scheme(labels=['a', 'b'])
db.schemes['scheme'] = scheme
audformat.testing.add_table(db, 'table1', 'filewise')
print(scheme)
print(db['table1']['scheme'].get())
db_new = audformat.testing.create_db(minimal=True)
scheme_new = audformat.Scheme(labels=['b', 'c'])
db_new.schemes['scheme'] = scheme_new
audformat.testing.add_table(db_new, 'table2', 'filewise')
print(scheme_new)
print(db_new['table2']['scheme'].get())
Merging those two databases work, but does not provide the desired result: scheme.update_labels(scheme_new.labels)
print(scheme)
Here we would have needed the following result:
Because otherwise we get: db.update(db_new, overwrite=True)
print(db['table1']['scheme'].get())
|
The same holds of cause if you want to merge the scheme for the same table, I just used two tables as it is easier with |
Ok, so by default we should have a union of the labels, right? Do we actually have to support the use-case to overwrite the labels at all then? |
I think it might not be necessary, but could be helpful. In your example above the goal would most likely be to remove the |
Ok, should we add a |
I'm not sure if we can cover everything with one argument. scheme1 = audformat.Scheme(labels={0: 'a', 1: 'b'})
scheme2 = audformat.Scheme(labels={1: 'c', 2: 'd'}) If we now use
or
|
The safe solution would maybe be:
as it doesn't seem harmful to have additional entries in the scheme. |
Ok, so with |
I guess you mean:
I agree. There might be a way to decide if we should drop a label: But I'm still not sure if this will result in the desired behavior all the time. I guess it is better to not drop any label for now, and require manual action for that by the user. db.schemes['my-scheme'].labels.pop(0) I guess not as it does not update the tables? So, we might need to add another method that allows a user to remove a label then. |
No. And there are a lot of bad things a user can do at the moment, e.g. setting |
I think that is getting too complicated. If we want to give the user the option to remove labels, let's introduce |
Yes, I agree and would vote for this. We should do it in a follow up merge request then. |
You are right, we should make this more safe in another pull request as well. I created #63 to track this issue. |
Actually, I wonder if we should simply stick to the current implementation but rename |
Ok, renamed. Please continue to review. |
Having #63 in mind I'm not completely sure about db.schemes['my-scheme'].replace_labels(my_labels) you do db.schemes['my-scheme'] = my_labels But I would propose we just continue with your current solution, but don't make a new release until we added the update method and figured out how to handle #63 |
I would argue that |
I just realized that one disadvantage of the |
OK, then we just stay with |
Is there some workaround, or does that mean that |
I guess the following will work: joined_labels = set(scheme.labels + new_scheme.labels) # in reality it's a bit more complicated since labels can also be a dict
scheme.replace_labels(joined_labels)
new_scheme.replace_labels(joined_labels)
db.update(new_db) I guess we should add an option to db.update(new_db, join_scheme_labels=True) |
Agree |
To summarize: Which means this one is finished and I can take a final look, correct? |
yep! |
Co-authored-by: Hagen Wierstorf <[email protected]>
Co-authored-by: Hagen Wierstorf <[email protected]>
Relates to #61
Currently,
Database.update()
fails when two different schemes with same ID are detected. This can be problematic if we have to extend a scheme with new values, e.g. we want to add data from new speakers. This addsScheme.replace_labels()
, which replaces the labels of a scheme. It will look for all columns that reference the scheme and change thedtype
accordingly. When labels are removed, values are set toNaN
.Example
Initial database has labels
'a'
and'b'
.New database has labels
'b'
and'c'
.Update fails because schemes do not match.
Now we replace the labels in original database to
'b'
and'c'
, this will set'a'
toNan
.Now we can update the database.