Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generic melspectrogram #404

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ pipeline {
stages {
stage('Get view') {
steps {
// Note this also creates the venv
xcorePrepareSandbox("${VIEW}", "${REPO}")
dir("${REPO}") {
viewEnv {
Expand Down Expand Up @@ -134,6 +135,24 @@ pipeline {
}
}
}
stage('MEL_SPEC test_wav_mel') {
steps {
dir("${REPO}/test/lib_melspectrogram") {
viewEnv { // Loads the xmos tools
// Note we do things differntly here. Due to module version clashes we
// need to have a local venv to satisfy the requirements of librosa
// This avoids clashes in the main requirements of fwk_voice which gets tricky
// due to all of the other requirements for py_aec,vnr etc.
createVenv("requirements_melspectrogram.txt")
withVenv("${WORKSPACE}/${REPO}/test/lib_melspectrogram") {
sh "pip install -r requirements_melspectrogram.txt"
sh "pytest -s --junitxml=pytest_result.xml"
sh 'tree'
}
}
}
}
}
stage('Reset XTAGs'){
steps{
dir("${REPO}") {
Expand Down Expand Up @@ -576,6 +595,8 @@ pipeline {
// IC artefacts
archiveArtifacts artifacts: "${REPO}/test/lib_ic/test_ic_profile/ic_prof.log", fingerprint: true
archiveArtifacts artifacts: "${REPO}/test/lib_ic/test_ic_spec/ic_spec_summary.txt", fingerprint: true
// MEL_SPEC artefacts
archiveArtifacts artifacts: "${REPO}/test/lib_melspectrogram/**/*.png", fingerprint: true
// NS artefacts
archiveArtifacts artifacts: "${REPO}/test/lib_ns/test_ns_profile/ns_prof.log", fingerprint: true
// VNR artifacts
Expand Down
1 change: 1 addition & 0 deletions doc/user_guide/audio_processing/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ Audio Features
../../../modules/lib_adec/doc/index
../../../modules/lib_ic/doc/index
../../../modules/lib_vnr/doc/index
../../../modules/lib_melspectrogram/doc/index
2 changes: 2 additions & 0 deletions modules/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ add_subdirectory( lib_agc )

add_subdirectory( lib_adec )

add_subdirectory( lib_melspectrogram )

28 changes: 28 additions & 0 deletions modules/lib_melspectrogram/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
file( GLOB_RECURSE LIB_MELSPECTROGRAM_SOURCES src/*.c )

## Create library target
add_library(fwk_voice_module_lib_melspectrogram STATIC)

target_sources(fwk_voice_module_lib_melspectrogram
PRIVATE
${LIB_MELSPECTROGRAM_SOURCES}
)

target_include_directories(fwk_voice_module_lib_melspectrogram
PUBLIC
api
)

target_compile_options(fwk_voice_module_lib_melspectrogram
PRIVATE
-Os
-g
)

target_link_libraries(fwk_voice_module_lib_melspectrogram
PUBLIC
lib_xcore_math
)

## Create an alias
add_library(fwk_voice::melspectrogram ALIAS fwk_voice_module_lib_melspectrogram)
106 changes: 106 additions & 0 deletions modules/lib_melspectrogram/api/mel_spec_settings.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Copyright 2023 XMOS LIMITED.
// This Software is subject to the terms of the XMOS Public Licence: Version 1.

#ifndef _MEL_SPEC_SETTINGS_H
#define _MEL_SPEC_SETTINGS_H

/* Values here may be changed, with caution:

- The SCALE and ZERO_POINT values control the quantisation process and
may be freely changed.
- The MIN_LOG10_IN and DB_DYNAMIC_RANGE values both pertain to the dB
conversion process; values below MIN_LOG10_IN will be set to MIN_LOG10_IN
before log10 is taken. Output values less than max(output) - DB_DYNAMIC_RANGE
will be set to max(output) - DB_DYNAMIC_RANGE. These options, and their
values, are set to mimic the default behaviour of librosa.power_to_db in
Python.
- PRE_DB_OFFSET sets a controllable scalar offset applied to the entire
matrix prior to passing through dB conversion. Note that this is applied
before the use of MIN_LOG10_IN.
- TOP_DIM_SZ, LOW_DIM_SZ, and FRAME_DIM_SZ control the size of the expected
output array. The x_melspectrogram function expects an output vector of
int8_t[TOP_DIM_SZ][N_MEL][FRAME_DIM_SZ][LOW_DIM_SZ]. This will be
memset to 0 at the start of execution. The 0th elements of the top and low
dims will be populated, such that out[0][:][:][0] will always contain data.
All other values out[1:][:][:][1:] will be 0.
- TRIM_START and TRIM_END control "trimming" the output frame count to fit
into the main output buffer. Trimmed frames output to the out_x_trim buffers.
NB: It is essential that TRIM_START + TRIM_END + FRAME_DIM_SZ == N_FRAMES.
- HOP controls the hop size used. This parameter may be freely changed.
It is usual that this value be <= N_FFT, otherwise there will be samples
in the input left unprocessed.
- N_FRAMES sets a fixed number of output frames. This number may be
freely changed. It is used in controlling the length of the overall
spectrogram, i.e. how many hops are taken.
- N_SAMPLES controls the expected size of the input vector, and may be freely
changed.
- N_MEL controls the number of Mel bands the output will be formed of.
If this value is changed, a new set of Mel filterbanks will need to be
generated using the provided gen_mel_filters.py and fed into call to
x_mel_spec. This value is also used in the expected size of the output.
- N_FFT controls the width of the FFT window used. If this value is
changed, new Mel filterbanks will need to be generated. New Hanning window
coefficients will also need to be generated using the provided
gen_hann_windows.py and fed into the call to x_mel_spec. This value should
always be a power of 2.
- CENTRE and PAD_MODE control the location of frames in the input stream. If
CENTRE is false, the first sample of frame t will be at y[t * hop]. In this
case, if N_SAMPLES is less than (N_FRAMES * HOP) + N_FFT, the input will be
0-padded to reach this length. This means that the final frames of the
output may be half-padded or may be purely 0 data. If CENTRE
is true, then the centre sample of frame t will be at y[t * hop]. In this
case, additional samples are added at the start and end of the input stream
to "centre" it. How these samples are added may be configured using members
of the enum mel_spec_pad_mode_t: MEL_SPEC_PAD_MODE_CONSTANT zero pads the
input stream, while _REFLECT and _SYMMETRIC reflect the array around its
first and last elements, as below:
CONSTANT: [1, 2, 3, 4, 5] -> [0, 0, 0, 0, 0, |1, 2, 3, 4, 5|, 0, 0, 0, 0, 0]
REFLECT: [1, 2, 3, 4, 5] -> [4, 5, 4, 3, 2, |1, 2, 3, 4, 5|, 4, 3, 2, 1, 2]
SYMMETRIC: [1, 2, 3, 4, 5] -> [5, 4, 3, 2, 1, |1, 2, 3, 4, 5|, 5, 4, 3, 2, 1]

*/

#define MEL_SPEC_COMMON_TOP_DIM_SZ 1
#define MEL_SPEC_COMMON_LOW_DIM_SZ 4
#define MEL_SPEC_COMMON_MIN_LOG10_IN 1e-10
#define MEL_SPEC_COMMON_DB_DYNAMIC_RANGE 80
#define MEL_SPEC_COMMON_CENTRE true
#define MEL_SPEC_COMMON_PAD_MODE MEL_SPEC_PAD_MODE_REFLECT

#define MEL_SPEC_SMALL_N_FFT 512
#define MEL_SPEC_SMALL_N_SAMPLES 6400
#define MEL_SPEC_SMALL_N_MEL 64
#define MEL_SPEC_SMALL_N_FRAMES 26
#define MEL_SPEC_SMALL_HOP 256
#define MEL_SPEC_SMALL_SCALE 3.325621485710144e-1
#define MEL_SPEC_SMALL_ZERO_POINT 12
#define MEL_SPEC_SMALL_PRE_DB_OFFSET 0
#define MEL_SPEC_SMALL_TRIM_START 0
#define MEL_SPEC_SMALL_TRIM_END 0
#define MEL_SPEC_SMALL_FRAME_DIM_SZ MEL_SPEC_SMALL_N_FRAMES
#define MEL_SPEC_SMALL_TOP_DIM_SZ MEL_SPEC_COMMON_TOP_DIM_SZ
#define MEL_SPEC_SMALL_LOW_DIM_SZ MEL_SPEC_COMMON_LOW_DIM_SZ
#define MEL_SPEC_SMALL_MIN_LOG10_IN MEL_SPEC_COMMON_MIN_LOG10_IN
#define MEL_SPEC_SMALL_DB_DYNAMIC_RANGE MEL_SPEC_COMMON_DB_DYNAMIC_RANGE
#define MEL_SPEC_SMALL_CENTRE MEL_SPEC_COMMON_CENTRE
#define MEL_SPEC_SMALL_PAD_MODE MEL_SPEC_COMMON_PAD_MODE

#define MEL_SPEC_LARGE_N_FFT 1024
#define MEL_SPEC_LARGE_N_SAMPLES 84800
#define MEL_SPEC_LARGE_N_MEL 128
#define MEL_SPEC_LARGE_N_FRAMES 166
#define MEL_SPEC_LARGE_HOP 512
#define MEL_SPEC_LARGE_SCALE 3.921553026884794e-3
#define MEL_SPEC_LARGE_ZERO_POINT 128
#define MEL_SPEC_LARGE_PRE_DB_OFFSET 1e-8
#define MEL_SPEC_LARGE_TRIM_START 4
#define MEL_SPEC_LARGE_TRIM_END 4
#define MEL_SPEC_LARGE_FRAME_DIM_SZ 158
#define MEL_SPEC_LARGE_TOP_DIM_SZ MEL_SPEC_COMMON_TOP_DIM_SZ
#define MEL_SPEC_LARGE_LOW_DIM_SZ MEL_SPEC_COMMON_LOW_DIM_SZ
#define MEL_SPEC_LARGE_MIN_LOG10_IN MEL_SPEC_COMMON_MIN_LOG10_IN
#define MEL_SPEC_LARGE_DB_DYNAMIC_RANGE MEL_SPEC_COMMON_DB_DYNAMIC_RANGE
#define MEL_SPEC_LARGE_CENTRE MEL_SPEC_COMMON_CENTRE
#define MEL_SPEC_LARGE_PAD_MODE MEL_SPEC_COMMON_PAD_MODE

#endif // _MEL_SPEC_SETTINGS_H
114 changes: 114 additions & 0 deletions modules/lib_melspectrogram/api/melspectrogram_api.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
// Copyright 2023 XMOS LIMITED.
// This Software is subject to the terms of the XMOS Public Licence: Version 1.

#ifndef _MEL_SPECTROGRAM_H
#define _MEL_SPECTROGRAM_H

#include <stdint.h>
#include <stdbool.h>
#include "xmath/xmath.h"
#include "mel_spec_settings.h"
#ifdef __XC__
#define _Bool int
#endif

typedef enum
{
MEL_SPEC_SMALL,
MEL_SPEC_LARGE
} mel_spec_option_t;

/* @brief Generates a Mel spectrogram from an input vector.
*
* This function very specifically supports only two modes of operation, termed
* "small" and "large".
*
* In "small" operation, the function expects @p input to be of form int16_t[6400]
* and @p output to be of the form int8_t[1][64][26][4]. It will process the input
* vector with a 512-width STFT with a 256-wide hop distance and will use a
* 512-width Hanning window. It will then create a 64-frame Mel spectrogram from
* this data. It will optionally convert the output to dB, but
* first setting all values below 1e-10 to 1e-10; after this conversion, it will
* also set all values that are less than the maximum value in the output minus
* 80 to max(output) - 80. It will then optionally subtract the mean of the
* output from every output value.
* It will then optionally quantise using the formula
* int8_t output[] = 0.3325621485710144 * (output + 12) for all
* output, before finally arranging data in the output buffer. The first and
* last dimension indices will be assumed to be 0.
* Therefore, output[0][*][*][0] will always contain data. The rest of the array
* will be 0.
* When using "small" operation, the parameters @p out_trim_top and
* @p out_trim_end **MUST** be NULL. Passing non-NULL pointers to these inputs
* in this mode will lead to untested behaviour, unless the _TRIM_START and
* _TRIM_END macros are also redefined in mel_spec_settings.h
* Set this by passing MEL_SPEC_SMALL to @p mel_spec_option.
*
* In "large" operation, the function expects @p input to be of the form
* int16_t[84800] and @p output to be of the form int8_t[1][128][158][4]. It will
* process the input vector with a 1024-width STFT with a 512-wide hop distance
* and will use a 1024-width Hanning window. It will then create a 166-frame Mel
* spectrogram from this data - note that this is greater than the 158 output.
* It will optionally convert the output to dB. It will first add 1e-8 to all
* values. After this, it will then set all values below 1e-10 to 1e-10;
* after this conversion, it will also set all values that are less than the
* maximum value in the output minus 80 to max(output) - 80.
* It will then optionally subtract the mean of the output from every output
* value. It will then optionally quantise using the formula
* int8_t output[] = 0.003921553026884794 * (output + 128) for all output,
* before finally arranging data in the output buffer. The first and
* last dimension indices will be assumed to be 0.
* Therefore, output[0][*][*][0] will always contain data. The rest of the array
* will be 0.
* As noted above, the Mel spectrogram produced in this process is 166-wide, but
* the output vector is only 158 wide. The remaining 8 frames (4 from the start,
* and 4 from the end) will be placed in @p out_trim_top and @p out_trim_end
* with the same processing as above.
* Set this by passing MEL_SPEC_LARGE to @p mel_spec_option.
*
* The above hard-coded values can be altered by editing mel_spec_settings.h
*
* This function is not thread-safe.
* /p output is edited in place, and is not safe to access during operation
* /p input is accessed in place, and is not safe to access during operation
*
* Performance statistics:
* Small | Large
* -------------------------------------------------------
* Memory usage inc. IO buffers, bytes: ~ 59628 | ~ 294796
* Time to return, milliseconds: ~ 24.13 | ~ 306.75
*
* @param[out] output - Output. Takes a pointer to int8_t[1][n_mels][n_frames][4],
* where n_mels is either 64 or 128 and
* n_frames is either 26 or 158.
* @param[out] out_trim_top - If trim settings are applied, the first TRIM_START
* frames of output are placed into this buffer. May
* be either int8_t[1][n_mels][TRIM_START][4] or NULL
* @param[out] out_trim_end - If trim settings are applied, the last TRIM_END
* frames of output are placed into this buffer. May
* be either int8_t[1][n_mels][TRIM_END][4] or NULL
* @param[in] input - Input vector. Takes a pointer to int16_t[n_samples], where
* n_samples is either 6400 or 84800.
* @param[in] mel_spec_option - Chooses between "small" operation with
* MEL_SPEC_SMALL or "large" operation with
* MEL_SPEC_LARGE.
* @param[in] quantise - Chooses whether transform with scale and zero-point
* @param[in] convert_to_db - Chooses whether to convert to dB before mean
* normalisation and quantisation
* @param[in] subtract_mean - Chooses whether to subtract the mean of the output
* tensor before quantisation
*/
void x_melspectrogram(int8_t *output,
int8_t *out_trim_top,
int8_t *out_trim_end,
int16_t *input,
mel_spec_option_t mel_spec_option,
bool quantise,
bool convert_to_db,
bool subtract_mean);

#ifdef __XC__
#undef _Bool
#endif

#endif // _MEL_SPECTROGRAM_H
Empty file.
Loading