Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

singalsu
Copy link
Collaborator

@singalsu singalsu commented Feb 3, 2025

The simpler 4th order hard-coded IIR version saves MCPS in in band split filter bank and emphasis/de-emphasis IIR filters.

@singalsu
Copy link
Collaborator Author

singalsu commented Feb 3, 2025

WIP - I'll see if I can further improve the HiFi4 and HiFi5 IIR versions.

The 4th filter with two biquads in series is commonly used in
crossover and multiband DRC components. The omitting of outer
loop for parallel biquads and check for null coefficients and
use of fixed loop count of two makes the critical code faster.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch changes crossover component to use the optimized
4th order IIR function. The LR4 (Linkwitz-Riley 4th order)
filter bank is hard-coded to 4th order, so this change does
no add restrictions.

The filter bank is used by multiband DRC component. The saving
in three bands configuration in a HiFi5 platform is 5.2 MCPS,
from 90.36 MCPS to 85.17 MCPS.

Signed-off-by: Seppo Ingalsuo <[email protected]>
This patch changes in multiband DRC component the emphasis and
de-emphasis IIR filters to use the optimized 4th order IIR code.

The patch for crossover already covered the bands filter bank.
This change saves additional 2 MCPS in a HiFi5 build of the
component. From 85.17 MCPS to  83.44 MCPS.

The change is not restricting configuration. The existing filters
are hard-coded to 4th order (SOF_EMP_DEEMP_BIQUADS).

Signed-off-by: Seppo Ingalsuo <[email protected]>
@singalsu singalsu force-pushed the crossover_iir_df1_4th branch from 558048d to c07023d Compare February 5, 2025 10:54
@singalsu
Copy link
Collaborator Author

singalsu commented Feb 5, 2025

WIP - I'll see if I can further improve the HiFi4 and HiFi5 IIR versions.

Just for the record, I tried this change for the inner loop but it's in practical application a bit slower (1.35 MCPS, while a separate test code indicated 296 cycles vs. earlier 310 cycles) than this proposed version:

		/* Load data */
		AE_LA32X2X2_IP(delay_y2y1, delay_x2x1, data_r_align, delay_r);

		/* Load coefficients */
		AE_LA32X2X2_IP(coef_a2a1, coef_b2b1, coef_align, coefp);
		AE_LA32X2X2_IP(coef_b0shift, gain, coef_align, coefp);

		acc = AE_MULF32RA_HH(AE_MOVAD32_H(coef_b0shift), in);  /* acc = b0 * in */
		AE_MULAAFD32RA_HH_LL(acc, coef_a2a1, delay_y2y1); /* + a2 * y2 + a1 * y1 */
		AE_MULAAFD32RA_HH_LL(acc, coef_b2b1, delay_x2x1); /* + b2 * x2 + b1 * x1 */
		AE_PKSR32(delay_y2y1, acc, 1);		     /* y2 = y1, y1 = acc(q1.31) */
		delay_x2x1 = AE_SEL32_LL(delay_x2x1, in);   /* x2 = x1, x1 = in */

		/* Store data */
		AE_SA32X2X2_IP(delay_y2y1, delay_x2x1, data_w_align, delay_w);

		/* Apply gain */
		acc = AE_MULF32R_LL(AE_MOVAD32_H(gain), delay_y2y1);	/* acc = gain * y1 */
		acc = AE_SLAI64S(acc, 17);		/* Convert to Q17.47 */

		/* Apply biquad output shift right parameter and then
		 * round and saturate to 32 bits Q1.31.
		 */
		acc = AE_SRAA64(acc, AE_MOVAD32_L(coef_b0shift));
		in = AE_ROUND32F48SSYM(acc);

So the current version with 7 words coefficients set becomes the proposal. Padding it to 8 words for two 128 bits loads didn't improve. In addition it needed a separate new function to copy and align the existing format coefficients.

@singalsu singalsu marked this pull request as ready for review February 5, 2025 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant