Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various configuration questions #15

Closed
skeggse opened this issue Mar 19, 2018 · 6 comments
Closed

Various configuration questions #15

skeggse opened this issue Mar 19, 2018 · 6 comments
Assignees
Labels

Comments

@skeggse
Copy link

skeggse commented Mar 19, 2018

I realize you're working on a tutorial for configuring odas, but I'm attempting to understand it in the meantime. I'm wondering what what the separated and postfiltered streams respresent. From listening to them, it sounds like the separated streams should include a single channel for each tracked source, and the postfiltered stream might contain the noise corresponding to each source - is that an apt description?

I'm noticing that there's often significant overlap between two of the channels - that they're tracking what seems to be the same source. Would that be resolved by tuning the settings via something like ODAS Studio? I built a configuration for the PS3 Eye microphone array (like @Efreeto was looking to do in introlab/manyears#2; happy to open a pull-request to add the configuration 😄), so it's likely not well-tuned.

What's the relationship between hopSize and frameSize? The example general configuration on the wiki sets hopSize to 128, and frameSize to 256; I'd assumed hopSize was the chunk size for audio processing, and frameSize was the number of samples in a frame/chunk, but that'd be backwards for 16-bit audio.

@FrancoisGrondin
Copy link
Member

Thank you for your questions. There is in fact a tutorial being written, but I'll try to answer your specific questions the best I can.

The separated streams correspond to the separation obtained from linear demixing from multiple channels to a single channels, using methods such as delay-and-sum beamforming or geometric sound source separation. This streams contains the target speech with little distortion, and some interference and noise in the background. The postfiltering step aims to reduce the gain in some frequency bands where noise or interference sources are dominant. This improves the SNR, but also introduces some distortion.

From my understanding your are using a linear array on a PS3 Eye, is this correct? If so, tracking should be performed on a 2D arc, and not a 3D sphere. I will have to code this, and we could release code to support linear array with ODAS. Would this be convenient for you?

The parameters hopSize and frameSize stand for the distance in samples between successive frames, and the frame size in samples. For instance, if you set hopSize = 100 and frameSize = 256, frame 0 contains samples [0,255], frame 1 contains samples [100,355], frame 2 contains samples [200, 455], and so on.

Does this answer your questions?

Thank you,

@skeggse
Copy link
Author

skeggse commented Mar 20, 2018

Your description of the separated streams and the postfiltering helps a lot, but doesn't entirely explain the behavior I'm seeing. The postfiltered streams appear to contain substantially more noise and distortion than the separated streams. I suppose this could be because it's a linear array?

I will have to code this, and we could release code to support linear array with ODAS. Would this be convenient for you?

That would be remarkably convenient - let me know if I can help in any way. I'll open a issue for this.

The parameters hopSize and frameSize stand for the distance in samples between successive frames, and the frame size in samples. For instance, if you set hopSize = 100 and frameSize = 256, frame 0 contains samples [0,255], frame 1 contains samples [100,355], frame 2 contains samples [200, 455], and so on.

Ah, so neither of these refer to byte sizes, but to sizes in number of samples, and each frame (normally) overlaps with one or more previous frames.

Super helpful, thanks!

@FrancoisGrondin
Copy link
Member

Your description of the separated streams and the postfiltering helps a lot, but doesn't entirely explain the behavior I'm seeing. The postfiltered streams appear to contain substantially more noise and distortion than the separated streams. I suppose this could be because it's a linear array?

Yes probably. It is hard to tell without the data. Can you provide me with the recordings in raw format and your cfg file?

@skeggse
Copy link
Author

skeggse commented Mar 20, 2018

Audio samples, config file

If there's specific source configurations or noise I can add that would be more helpful, let me know.

@FrancoisGrondin
Copy link
Member

Ok, I'll try to have a look asap, and get back to you with answers.

@FrancoisGrondin FrancoisGrondin self-assigned this Mar 22, 2018
@FrancoisGrondin
Copy link
Member

Please have a look at issue #18 and see if the new code to handle linear arrays solve this issue at the same time.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants