Tensorflow implementation of "Speaker-Independent Speech Separation with Deep Attractor Network"
Link to original paper
STILL WORK IN PROGRESS, EXPECT BUGS
numpy / scipy
tensorflow >= 1.2
matplotlib (optional, for visualization)
h5py / fuel (optional, for certain datasets)
Currently, TIMIT and WSJ0 datasets are implemented. You can use the "toy" dataset for debugging. It just some white noise.
- TIMIT dataset
Follow app/datasets/TIMIT/readme
for dataset preparation.
- WSJ0 dataset
Follow app/datasets/WSJ0/readme
for dataset preparation.
After setting up a dataset, you may want to change DATASET_TYPE
in hyperparameters.
This is to change batch size, learning rate, dataset type etc ...
- The recommended way: using JSON file
There's a default.json
file at the root directory. You make your own and change
some of the values. For example you can create a JSON file with:
{
DATASET_TYPE="timit",
LR=1e-2,
BATCH_SIZE=8
}
Save it as my_setup.json
, now you can run the script with:
python main.py -c my_setup.json
- The direct way: using command line arguments
Some commonly used hyperparameters can be overridden by CLI args.
For example, to set learning rate:
python main.py -lr=1e-2
Here's a incomplete list of them:
# set learning rate, overrides LR
-lr
--learn-rate
# set dataset to use, overrides DATASET_TYPE
-ds
--dataset
# set batch size, overrides
-bs
--batch-size
# set
Note If you get out of memory (OOM) error from tensorflow, you can try using a lower BATCH_SIZE
.
Note If you change FFT_SIZE
, FFT_STRIDE
, FFT_WND
, SMP_RATE
,
you should do dataset preprocessing again.
Note If you change model architecture, the previously saved model parameter may not be compatible.
Under the root directory of this repo:
- train a model for 10 epoch and see accuracy, using TIMIT dataset
python main.py -ds='timit'
- train a model using your own hyperparameters
python main.py -c my_setup.json
- train a model for 100 epoch and save it
python main.py -ne=100 -o='params.ckpt'
- continue from last saved model, train 100 more epoch, save back
python main.py -ne=100 -i='params.ckpt' -o='params.ckpt'
- test the trained model on test set
python main.py -i='params.ckpt' -m=test
- draw a sample from test set, then separate it:
$ python main.py -i='params.ckpt' -m=demo
$ ls *.wav
demo.wav demo_separated_1.wav demo_separated_2.wav
- separate a given WAV file:
$ python main.py -i='params.cpkt' -m=demo -if=file.wav
$ ls *.wav
file.wav file_separated_1.wav file_separated_2.wav
- launch tensorboard and see graphs
tensorboard --logdir=./logs/`
- for more CLI arguments, do
python main.py --help
-
Make a file
app/datasets/my_dataset.py
. -
Make a subclass of
app.datasets.dataset.Dataset
@hparams.register_dataset('my_dataset')
class MyDataset(Dataset):
...
You can use app/datasets/timit.py
as an reference.
- In
app/datasets/__init__.py
, add:
import app.datasets.my_dataset
- To use your dataset, set
DATASET_TYPE
to"my_dataset"
in JSON config file
You can make subclass of Estimator
, Encoder
, or Separator
to tweak model.
-
Encoder
is for getting embedding from log-magnitude spectra. -
Estimator
is for estimating attractor points from embedding. -
Separator
uses mixture spectra, mixture embedding and attractor to get separated spectra.
You can set encoder type by setting ENCODER_TYPE
in hyperparameters.
You can set estimator type by setting
TRAIN_ESTIMATOR_METHOD
and INFER_ESTIMATOR_METHOD
in hyperparameters.
You can set separator type by setting SEPARATOR_TYPE
in hyperparameters.
Make sure to use @register_*
decorator for your class.
See code in app/modules.py
for details. There are existing sub-modules.
To change overall model architecture, modify Model.build()
in main.py
-
Only the favorable
"anchor"
method for estimating attractor location during inference is implemented. During training, it's also possible to use ground truth to give attractor location. -
TIMIT dataset is small, so we use same set for test and validation.
-
We use WSJ0
si_tr_s
/si_dt_05
/si_et_05
subsets as training / validation / test set respectively. The speakers are randomly chosen and mixed at runtime.This setup is slightly different to orignal paper.
-
Only single GPU training is implemented.
-
Doesn't work on Windows.