You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The bfly functions do multiple loads and stores of Fout
e.g.
C_ADDTO(*Fout,scratch[3]);
Code size and performance is improved by loading Fout at the top of the loop and storing it at the bottom, using registers for the operations.
On ARMv7 the bfly4 function main loop
The ADDTO macro seems like it would work ok on intel, but the above code is smaller/faster on x86 as well.
So I'd suggest removing that macro and using C_ADD, simplifying the implementation and improving code generated.
yes, this is a small change with no impact on quality. I only did bfly4... it should be done for the bfly2 and others. And then we can remove C_ADDTO and use C_ADD
The bfly functions do multiple loads and stores of Fout
e.g.
C_ADDTO(*Fout,scratch[3]);
Code size and performance is improved by loading Fout at the top of the loop and storing it at the bottom, using registers for the operations.
On ARMv7 the bfly4 function main loop
Was
170 instructions
46 loads
36 stores
Now
140 instructions
35 loads
19 stores
This is bfly4 with the change:
The text was updated successfully, but these errors were encountered: