Skip to content

Latest commit

 

History

History
135 lines (110 loc) · 4.64 KB

README.md

File metadata and controls

135 lines (110 loc) · 4.64 KB

Deep Learning model for Speech Separation / Speech Enhancement

This repository is for desiging and training Deep Learning(DL) model for Speech Separation/Speech Enhancement originally for 2023 Clarity Challenge. It used in 2023 Clarity Challenge framework, and also can train/evaluate DL Models. Before the main contents, the naming source is from denoiser

Pipeline

  • Dataset
    • VoiceBankDEMAND
    • ClarityChallenge2023
  • Dataloader
  • Solver
  • Model
  • Inference

Model Design

DL models are for single channel, single sources, and multi-speakers Speech Separation. Each model has a property of dataset, which tested in model/model_name.py main part. Each of models are from implementations from other repository whose some of them are modified for its target parameters. The list of models are as below,

Model list

Library

  • If want to analyze with amplified signal, then refers https://github.com/ooshyun/ClarityChallenge2023

  • This library is for RTK3090

    • pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
    • pyyaml==6.0
    • julius==0.2.7
    • librosa==0.9.2
    • tqdm==4.64.1
    • matplotlib==3.6.3
    • tensorboard==2.11.2
    • torchmetrics==0.5.1
    • pesq==0.0.4
    • pypesq==1.2.4
    • pystoi==0.3.3
    • museval==0.4.0
    • pynvml==11.4.1
    • typing
  • It tested MacOS and Linux with RTK3090. When testing MacOS, it only change pytoch verion

  • The difference is torch 1.13.1 and torch 1.7.1, which makes torch.concat and torch.cat function.

Result

  • Currenlty, it denoised as below wav file, but still contiute to training.

  • Wavform

Denoised Wavform

  • Spetrogram

DenoisedSpectrogram