Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

singalsu · 2025-02-03T18:13:38Z

The simpler 4th order hard-coded IIR version saves MCPS in in band split filter bank and emphasis/de-emphasis IIR filters.

singalsu · 2025-02-03T18:14:26Z

WIP - I'll see if I can further improve the HiFi4 and HiFi5 IIR versions.

The 4th filter with two biquads in series is commonly used in crossover and multiband DRC components. The omitting of outer loop for parallel biquads and check for null coefficients and use of fixed loop count of two makes the critical code faster. Signed-off-by: Seppo Ingalsuo <[email protected]>

This patch changes crossover component to use the optimized 4th order IIR function. The LR4 (Linkwitz-Riley 4th order) filter bank is hard-coded to 4th order, so this change does no add restrictions. The filter bank is used by multiband DRC component. The saving in three bands configuration in a HiFi5 platform is 5.2 MCPS, from 90.36 MCPS to 85.17 MCPS. Signed-off-by: Seppo Ingalsuo <[email protected]>

This patch changes in multiband DRC component the emphasis and de-emphasis IIR filters to use the optimized 4th order IIR code. The patch for crossover already covered the bands filter bank. This change saves additional 2 MCPS in a HiFi5 build of the component. From 85.17 MCPS to 83.44 MCPS. The change is not restricting configuration. The existing filters are hard-coded to 4th order (SOF_EMP_DEEMP_BIQUADS). Signed-off-by: Seppo Ingalsuo <[email protected]>

singalsu · 2025-02-05T12:00:49Z

WIP - I'll see if I can further improve the HiFi4 and HiFi5 IIR versions.

Just for the record, I tried this change for the inner loop but it's in practical application a bit slower (1.35 MCPS, while a separate test code indicated 296 cycles vs. earlier 310 cycles) than this proposed version:

		/* Load data */
		AE_LA32X2X2_IP(delay_y2y1, delay_x2x1, data_r_align, delay_r);

		/* Load coefficients */
		AE_LA32X2X2_IP(coef_a2a1, coef_b2b1, coef_align, coefp);
		AE_LA32X2X2_IP(coef_b0shift, gain, coef_align, coefp);

		acc = AE_MULF32RA_HH(AE_MOVAD32_H(coef_b0shift), in);  /* acc = b0 * in */
		AE_MULAAFD32RA_HH_LL(acc, coef_a2a1, delay_y2y1); /* + a2 * y2 + a1 * y1 */
		AE_MULAAFD32RA_HH_LL(acc, coef_b2b1, delay_x2x1); /* + b2 * x2 + b1 * x1 */
		AE_PKSR32(delay_y2y1, acc, 1);		     /* y2 = y1, y1 = acc(q1.31) */
		delay_x2x1 = AE_SEL32_LL(delay_x2x1, in);   /* x2 = x1, x1 = in */

		/* Store data */
		AE_SA32X2X2_IP(delay_y2y1, delay_x2x1, data_w_align, delay_w);

		/* Apply gain */
		acc = AE_MULF32R_LL(AE_MOVAD32_H(gain), delay_y2y1);	/* acc = gain * y1 */
		acc = AE_SLAI64S(acc, 17);		/* Convert to Q17.47 */

		/* Apply biquad output shift right parameter and then
		 * round and saturate to 32 bits Q1.31.
		 */
		acc = AE_SRAA64(acc, AE_MOVAD32_L(coef_b0shift));
		in = AE_ROUND32F48SSYM(acc);

So the current version with 7 words coefficients set becomes the proposal. Padding it to 8 words for two 128 bits loads didn't improve. In addition it needed a separate new function to copy and align the existing format coefficients.

singalsu added 3 commits February 5, 2025 12:53

singalsu force-pushed the crossover_iir_df1_4th branch from 558048d to c07023d Compare February 5, 2025 10:54

singalsu marked this pull request as ready for review February 5, 2025 12:01

singalsu requested review from a team, lgirdwood, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners February 5, 2025 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

singalsu commented Feb 3, 2025

singalsu commented Feb 3, 2025

singalsu commented Feb 5, 2025

Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

Are you sure you want to change the base?

Audio: Multiband DRC: Use optimized 4th order IIR filter version #9808

Conversation

singalsu commented Feb 3, 2025

singalsu commented Feb 3, 2025

singalsu commented Feb 5, 2025