This project demonstrates a Recurrent Neural Network (RNN) based method for Speech Enhamencement on GAP9.
The main loop of the application continuosly samples data from the microphone at 16kHz, applies the RNN filter and reconstruct the cleaned signal via overlap-and-add.
As depitcted in the Figure below, the nosiy signal is windowed (frame size of 25 msec with an hop lenght of 6.25 msec and Hanning windowing) and the STFT is computed.
The RNN is fed with the magnitude of the STFT components and return a suppression mask. After weighting, the inverse STFT returns a cleaned audio clip.
The demo runs on the GAP9 Audio EVK, using the microphone of the GAPmod board.
cmake -B build
cmake --build build --target run
It can also run on GVSoC. Please read the GVSoC-gvcontrol section to understand what is done
cmake -B build
cmake --build build --target menuconfig # Select GVSoC in the menu. In gvsoc_option, enable "GVSOC proxy mode."
cmake --build build --target run
./gvcontrol --port 30000 # in another terminal
Optionally, the application can run on GVSOC (or board) to denoise a custom audio file (.wav).
cmake -B build
cmake --build build --target menuconfig # Select the options DenoiseWav in the DENOISER APP -> Application mode menu
cmake --build build --target run
Output wav file will be written to test_gap.wav inside the project folder.
denoiser.c
is the main file, including the application codemodel/
includes the necessary files to feed GAPflow for NN model code generation:- the onnx denoiser files
nntool_scripts/
includes the nntool recipes to quantize the LSTM or GRU models. You can refer to the quantization section for more details.
samples/
contains the audio samples for testing and quantization claibrationmodel/STFTModel.c
is the AT generator model for the STFT ad iSTFT functions. This files are manually configured. The baseline implementation exploits FP32 datatype.Graph.src
is the configuation file for Audio IO. It is used only for board target.test_accuracy/
includes the python scripts for model accuracy tests. You can refer to the Python Utilities for more details.
The Post-Training quantization process of the RNN model is operated by the GAPflow. Both LSTM and GRU models can be quantized using one of the different options:
FP16
: quantizing both activations and weights to float16 format. This does not require any calibration samples.INT8
: quantizing both activations and weights to int_8 format. A calibration step is required to quantize the activation functions. Samples included withinsamples/quant/
are used to this aim. This option is currently not suggested because of the not-negligible accuracy degradation.FP16MIXED
: only RNN layers are quantized to 8 bits, while the rest is kept to FP16. This option achives the best trade-off between accuracy degration and inference speed.NE16
: currently not supported.
In addition to individual settings, some application mode are made available to simplify the APP code configuration. This is done by setting the Application Mode in the make menuconfig
DENOISER APP menu
The code runs inference using the denoiser_dns.onnx
model with FP16MIXED
quantization. More accurate at higher energy costs can be obtained with FP16
quantization by changing the nntool_script_demo
.
Demo
is meant to run on board target and audio data comes from the microphone and output is sent to jack output on audio add on. This mode is also runnable on GVSoC, please refer to the GVSoC section.DenoiserWav = 1
is meant to run on gvsoc and board target and audio data comes from the WAV_FILE file. The wav cleaned audio can be retrieved from the root folder test_gap.wav folder.
The test_accuracy/test_GAP.py
file provides the routines for testing the NN inference model using the NNtool API. The script can be used to run tests on entire datasets (--mode test
) or to denoise individual audio files (--mode test
). Some examples are provided below.
python test_accuracy/test_GAP.py --mode sample --pad_input 300 --sample_rate 16000 --wav_input /<path_to_audio_file>/<file_name>.wav
python test_accuracy/test_GAP.py --mode sample --pad_input 300 --sample_rate 16000 --wav_input samples/dataset/noisy/p232_050.wav --quant fp16mixed
The output is saved in a file called test_gap.wav in the home of the repository
python test_accuracy/test_GAP.py --mode test --pad_input 300 --noisy_dataset_path ./<path_to_noisy_audio_dataset>/ --clean_dataset_path ./<path_to_clean_audio_dataset>/
To run the Demo
mode on GVSoC you can use the gvcontrol
file.
the gvcontrol is used to send/read data to/from the i2s interface of the gap9 gvsoc.
You can chose the input noisy wav file you want to process. The execution can be long (up to 5 minutes for 3 seconds of simulation).
Since gap is waiting for pdm data, the pcm/pdm convertion module of gvsoc is used. To learn more about this please refer to the following example in the sdk : basic/interface/sai/pcm2pdm_pdm2pcm
.