U-TasNet-Beam

Abstruct

PyTorch implementation of "Adaptation of robots to the real environment by simultaneous execution of dereverberation, denoising and speaker separation using neural beamformer".

Dependencies

This code was tested on Python 3.8.1 with PyTorch 1.10.0, torchvision 0.11.1 and torchaudio 0.10.0. Optionally Install espnet and espnet-model-zoo if you need it.

$ pip3 install -r requirements.txt

Prepare Dataset

Download Noisy Speech Database

Get Noisy Speech Database at https://datashare.ed.ac.uk/handle/10283/2791.

Please download the following.
- clean_testset_wav.zip
- clean_trainset_28spk_wav.zip
- noisy_testset_wav.zip
- noisy_trainset_28spk_wav.zip
- testset_txt.zip
Spatialize audio by convolving RIR (Room Impulse Response)

First, unzip zip file to desired folder.
```
$ tar *.zip 
```
Second, open jupyter notebook and run make_train_val_datasets_MCCUNet.ipynb to make a training and validation dataset for MCCU-Net. Besides, run make_train_val_datasets_MCConvTasNet.ipynb to make a training and validation dataset for MCConvTasNet.
```
$ jupyter notebook
```
Finally, run make_test_datasets.ipynb to make a test dataset for performace evaluation of U-TasNet-Beam.

Training Multi-channel Complex U-Net

$ python3 training_MCComplexUnet.py

Training Multi-channel Conv-TasNet

$ python3 training_MCConvTasNet.py

Inference and evaluation

Download pretrained model for speaker recognition system

This method utilizes speaker recognition system (d-vector embeddings).

Get pretrained model for speaker recognition system at this GDrive link.

This model was trained with VoxCeleb2 dataset, where utterances are randomly fit to time length [70, 90] frames. Tests are done with window 80 / hop 40 and have shown equal error rate about 1%. Data used for test were selected from first 8 speakers of VoxCeleb1 test dataset, where 10 utterances per each speakers are randomly selected.

Update: Evaluation on VoxCeleb1 selected pair showed 7.4% EER.
Run
```
$ python3 inference.py
```
Option
- -sr : sampling rate (Default 16000)
- -bl : batch size of mask estimator and beamformer input (Default 48000)
- -c : number of audio channels (Default 8)
- -dmt : denoising and dereverberation model type
- -ssmt : speaker separation model type
- -bt : beamformer type
If you evaluate the performance by using multiple audio data at once, use evaluate_neural_beamformer.ipynb.

Online demo

Prepare the microphone array

You can use TAMAGO-03 microphone array with 8 microphones.
Run

Open two terminals and run following commands in each terminal (Mac or Linux). Be careful to set the Julius server URL in RealTimeDemo.py and speech_extracter_interface.py to the correct one.
- Server
```
$ python3 asr_server_julius.py
```
- Client
```
$ python3 RealTimeDemo.py -em -d 0 -mg 20
```
  Option
  - -em : Whether model extracts audio or not
  - -d : Input device (numeric ID or substring) (you can check ID by running following commands)
```
$ python3 
>>> import sounddevice
>>> sounddevice.query_devices()
```
  - -mg: Increase microphone gain
If you can use g++ complier on linux, open three terminals and run following commands in each terminal (Input stream speed is faster).
- ASR server
```
$ python3 asr_server_julius.py
```
- Speech extracter interface (server & client)
```
$ python3 speech_extracter_interface.py -em -mg 20
```
- Input stream client
```
$ g++ mic_record_to_speech_extracter.cpp -lasound -lm -o mic_record_to_speech_extracter
$ ./mic_record_to_speech_extracter plughw:2,0
```
  If you run arecord -l and the following is displayed, specify the argument part as plughw:[card number],[subdevice number]
```
 card 2: TAMAGO03 [TAMAGO-03], device 0: USB Audio [USB Audio]
 Subdevices: 1/1
 Subdevice #0: subdevice #0
```

Author

Daichi Nagano at nakazawa lab

E-mail: [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

U-TasNet-Beam

Abstruct

Dependencies

Prepare Dataset

Training Multi-channel Complex U-Net

Training Multi-channel Conv-TasNet

Inference and evaluation

Online demo

Author

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
evaluation		evaluation
utils		utils
.gitignore		.gitignore
README.md		README.md
RealTimeDemo.py		RealTimeDemo.py
asr_server_julius.py		asr_server_julius.py
beamformer.py		beamformer.py
evaluate_neural_beamformer.ipynb		evaluate_neural_beamformer.ipynb
inference.py		inference.py
make_test_datasets.ipynb		make_test_datasets.ipynb
make_train_val_datasets_MCCUNet.ipynb		make_train_val_datasets_MCCUNet.ipynb
make_train_val_datasets_MCConvTasNet.ipynb		make_train_val_datasets_MCConvTasNet.ipynb
mic_record_to_speech_extracter.cpp		mic_record_to_speech_extracter.cpp
models.py		models.py
requirements.txt		requirements.txt
speech_extracter_interface.py		speech_extracter_interface.py
training_MCCUNet.py		training_MCCUNet.py
training_MCConvTasNet.py		training_MCConvTasNet.py

D1ngn/U-TasNet-Beam

Folders and files

Latest commit

History

Repository files navigation

U-TasNet-Beam

Abstruct

Dependencies

Prepare Dataset

Training Multi-channel Complex U-Net

Training Multi-channel Conv-TasNet

Inference and evaluation

Online demo

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages