Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why use the reverberated speech signal as the training target #16

Open
flytair opened this issue Oct 4, 2023 · 4 comments
Open

why use the reverberated speech signal as the training target #16

flytair opened this issue Oct 4, 2023 · 4 comments

Comments

@flytair
Copy link

flytair commented Oct 4, 2023

hi,
it a great amazing project, thanks for your effort.
When I looked at the code, I found that the training target signal was reverberated speech. (https://github.com/Audio-WestlakeU/NBSS/blob/af66db92bb9d6f72f7100d613d3df38c40b10b09/data_loaders/ss_semi_online_dataset.py#L294C27-L294C27)
I wander why not use clean speech as the training target, as it would not only separate speakers, but also remove reverberation and even noise.

@quancs
Copy link
Member

quancs commented Oct 6, 2023

pls check sms_wsj_plus.py which is the latest dataset for jointly speech separation, denoising and dereverberation. The code you referred to is old and not used in SpatialNet.

@flytair
Copy link
Author

flytair commented Oct 19, 2023

thanks for your response!
i have another 2 questions regarding to the sms_wsj_plus dataset that the speech signal in this dataset is treated as babble noise source:
https://github.com/Audio-WestlakeU/NBSS/blob/e988a6ec845b6153910bbd106059a50b0b2c4a09/data_loaders/sms_wsj_plus.py#L95C9-L95C115
self.noises = list(set(original_sources)) # take the speech signal in this dataset as babble noise source

  1. as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?
  2. as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

thanks!

@quancs
Copy link
Member

quancs commented Oct 20, 2023

@flytair

  1. as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?

The babble noise is diffuse, while the target speech signals are directional, that is the key clue for the model to learn to distinguish them.

  1. as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

The babble noise is diffuse not directional, so it doesn't need to be convolved with rirs. And we use the method implemented in https://github.com/Audio-WestlakeU/NBSS/blob/main/data_loaders/utils/diffuse_noise.py to make it diffuse.

@flytair
Copy link
Author

flytair commented Oct 26, 2023

thanks for your response!
do you think it is reasonable to use wham noise as babble noise in sms_wsj_plus dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants