Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding_value attribute to features #1020

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

desh2608
Copy link
Collaborator

@desh2608 desh2608 commented Apr 6, 2023

At the moment, when we are mixing features, we expect the user to specify a padding value. It would be better if the feature extractor itself came with a default value that should be used for padding.

@pzelasko
Copy link
Collaborator

pzelasko commented Apr 6, 2023

That's smart!

As a side note I was also considering if we should support more nuanced padding strategies, e.g. with mean example/batch value (so that it doesn't influence things such as layer norm), but I'm not sure if there's much benefit to it.

self.padding_value = padding_value
self.padding_value = (
feature_extractor.padding_value
if hasattr(feature_extractor, "padding_value")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this behavior is undocumented in this PRs current shape, it may lead to some surprises / be hard to discover for implementers of other feature extractors how to handle this. What if we implemented this property on the base feature extractor class and made it return None there, and check for None here instead? Then we can add a proper docstring in feature extractor / documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. I will update it.

@pzelasko pzelasko added this to the v1.14 milestone Apr 6, 2023
@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None:
def feature_dim(self, sampling_rate: int) -> int:
return self.config.mel_opts.num_bins

@property
def padding_value(self) -> float:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csukuangfj could you check if this looks correct?

@@ -90,6 +93,10 @@ def frame_shift(self) -> Seconds:
def feature_dim(self, sampling_rate: int) -> int:
...

@property
def padding_value(self) -> float:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint for the return value does not match the actual return value.

@@ -193,6 +193,10 @@ def __init__(self, config: Optional[KaldifeatFbankConfig] = None) -> None:
def feature_dim(self, sampling_rate: int) -> int:
return self.config.mel_opts.num_bins

@property
def padding_value(self) -> float:
return -1000.0 if self.config.use_log_fbank else EPSILON
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should -1000 be replaced with

LOG_EPSILON = math.log(EPSILON)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally it should, but I was basing this on the default value used in the FeatureMixer.

@@ -41,7 +41,8 @@ def __init__(
:param frame_shift: Required to correctly compute offset and padding during the mix.
:param padding_value: The value used to pad the shorter features during the mix.
This value is adequate only for log space features. For non-log space features,
e.g. energies, use either 0 or a small positive value like 1e-5.
e.g. energies, use either 0 or a small positive value like 1e-5. This value will be
ignore if the ``feature_extractor`` has a ``padding_value`` attribute.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ignore if the ``feature_extractor`` has a ``padding_value`` attribute.
ignored if the ``feature_extractor`` has a ``padding_value`` attribute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this user-provided argument have a higher priority?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the default value for this option can be set to None. If it is None, we use the value from the feature extractor, otherwise we use the user-provided value. But I'm not sure what should be done if they are both None. Raise a error? @pzelasko what do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your solution. Raising an error in this case is acceptable if you can add the default padding value attributes to the remaining feature extractors.

@desh2608
Copy link
Collaborator Author

desh2608 commented Apr 7, 2023

I have propagated the padding value related changes to all the affected parts of the codebase. There are a few subtle things which are causing some of the test cases to fail. Basically, when we add a padding cut to a mixed cut such that the resulting duration increases, there is double addition of padding that happens. This is because the original cut is first extended to the new duration with a padding value, and then the feats of the mixed-in cut are added.

One way to avoid this is to pass an additional is_padding option to add_to_mix() to avoid this double addition.

@desh2608
Copy link
Collaborator Author

desh2608 commented Apr 7, 2023

I have propagated the padding value related changes to all the affected parts of the codebase. There are a few subtle things which are causing some of the test cases to fail. Basically, when we add a padding cut to a mixed cut such that the resulting duration increases, there is double addition of padding that happens. This is because the original cut is first extended to the new duration with a padding value, and then the feats of the mixed-in cut are added.

One way to avoid this is to pass an additional is_padding option to add_to_mix() to avoid this double addition.

This would still cause inconsistencies though. Consider the following case:

# c is a 10s MonoCut, for example
m1  = c.mix(c)
m2 = m1.mix(c)
m1_padded = m1.pad(duration=15)
m2_padded = m2.pad(duration=15)

If we keep adding tracks to the mixture, the feature value of the padding region will keep increasing, whereas ideally we only want padding to happen once.

@desh2608
Copy link
Collaborator Author

desh2608 commented Apr 7, 2023

This was not caught in the test case before because the default padding value used was -1000, and np.log(EPSILON + np.exp(-1000) + np.exp(-1000) + ...) is still almost equal to LOG_EPSILON.

@desh2608
Copy link
Collaborator Author

desh2608 commented Apr 7, 2023

This was not caught in the test case before because the default padding value used was -1000, and np.log(EPSILON + np.exp(-1000) + np.exp(-1000) + ...) is still almost equal to LOG_EPSILON.

I suppose this means that the easy solution is to use a very large negative number (like -1000) instead of LOG_EPSILON (which is -23) for padding value of log scale features.

@pzelasko
Copy link
Collaborator

pzelasko commented Apr 7, 2023

Hmm, now I remember that’s why I hardcoded -1000 in the feature mixer in the first place. I am a bit afraid that changing -23 (which is log(energy_floor) for a default value of energy floor in our feature extractors) to -1000 could somehow negatively affect the models as a „too out of distribution” value, especially if somebody uses layer norm or instance MVN. I agree the most elegant solution would be to pad once and have the feature mixer resolve that in a „smart” way to avoid the addition of energy floors, but maybe this is not really an issue? WDYT?

@@ -54,6 +54,10 @@ def _feature_fn(self, *args, **kwargs):
def feature_dim(self, sampling_rate: int) -> int:
return self.config.num_mel_bins

@property
def padding_value(self) -> float:
return -1000.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to choose -1000.0 instead of LOG_EPSILON or just 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the discussion in the main thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants