diff --git a/mkdocs/docs/manual/errors.md b/mkdocs/docs/manual/errors.md deleted file mode 100644 index 3c25ec9e39..0000000000 --- a/mkdocs/docs/manual/errors.md +++ /dev/null @@ -1,409 +0,0 @@ -# Error messages - -Error messages are typically displayed in the form of compiler errors. They occur when the code cannot be successfully compiled, and typically indicate issues such as syntax errors or semantic errors. They can occur at different stages in the compilation process, possibly with the file and line number where the error occurred (when this information can be retrieved), as well as a brief description of the error. - -The compiler is organized in several stages: - -- starting from the DSP source code, the parser builds an internal memory representation of the source program (typically known as an [Abstract Source Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)) made here of primitives in the *Box language*. A first class of errors messages are known as *syntax error* messages, like missing the `;` character to end a line, etc. -- the next step is to evaluate the definition of `process` programe entry point. This step is basically a λ-calculus interpreter with a strict evaluation. The result is ”flat” block-diagram where everything have been expanded. The resulting block is type annoatetd to discover its number of inputs and outputs. -- the ”flat” block-diagram is then translated the *Signal language* where signals as conceptually infinite streams of samples or control values. The box language actually implements the Faust [Block Diagram Algebra](https://hal.science/hal-02159011v1), and not following the connections rules will trigger a second class of errors messages, the *box connection errors*. Other errors can be produced at this stage when parameters for some primitives are not of the correct type. -- the pattern matching meta language allows to algorithmically create and manipulate block diagrams expressions. So a third class of *pattern matching coding errors* can occur at this level. -- signal expressions are optimized, type annotated (to associate an integer or real type with each signal, but also discovering when signals are to be computed: at init time, control-rate or sample-rate..) to produce a so-called *normal-form*. A fourth class of *parameter range errors* or *typing errors* can occur at this level, like using delays with a non-bounded size, etc. -- signal expressions are then converted in FIR (Faust Imperative Representation), a representation for state based computation (memory access, arithmetic computations, control flow, etc.), to be converted into the final target language (like C/C++, LLVM IR, Rust, WebAssembly, etc.). A fifth class of *backend errors* can occur at this level, like non supported compilation options for a given backend, etc. - -Note that the current error messages system is still far from perfect, usually when the origin of the error in the DSP source cannot be properly traced. In this case the file and line number where the error occurred are not displayed, but an internal description of the expression (as a Box of a Signal) is printed. - -## Syntax errors - -Those errors happen when the language syntax is not respected. Here are some examples. - -The following program: - -``` -box1 = 1 -box2 = 2; -process = box1,box2; -``` - -will produce the following error message: - -``` -errors.dsp : 2 : ERROR : syntax error, unexpected IDENT -``` - -It means that an unexpected identifier as been found line 2 of the file test.dsp. Usually, this error is due to the absence of the semi-column `;` at the end of the previous line. - - -The following program: - -``` -t1 = _~(+(1); -2 process = t1 / 2147483647; -``` - -will produce the following error message: - -``` -errors.dsp : 1 : ERROR : syntax error, unexpected ENDDEF - -``` - -The parser finds the end of the definition (`;`) while searching a closing right parenthesis. - -The following program: - -``` -process = ffunction; -``` - -will produce the following error message: - -``` -errors.dsp : 1 : ERROR : syntax error, unexpected ENDDEF, expecting LPAR -``` - -The parser was expecting a left parenthesis. It identified a keyword of the language that requires arguments. - -The following program: - -``` -process = +)1); -``` - -will produce the following error message: - -``` -errors.dsp : 1 : ERROR : syntax error, unexpected RPAR -``` - -The wrong parenthesis has been used. - -The following program: - -``` -process = <:+; -``` - -will produce the following error message: - -``` -errors.dsp : 1 : ERROR : syntax error, unexpected SPLIT -``` - -The `<:` split operator is not correctly used, and should have been written `process = _<:+;`. - -The following program: - -``` -process = foo; -``` - - -will produce the following error message: - -``` -errors.dsp : 1 : ERROR : undefined symbol : foo -``` - -This happens when an undefined name is used. - - -## Box connection errors - -[Diagram expressions](../manual/syntax.md#diagram-expressions) express how block expressions can be combined to create new ones. The connection rules are precisely defined for each of them and have to be followed for the program to be correct. Remember the [operator priority](../manual/syntax.md#diagram-composition-operations) when writing more complex expressions. - -### The five connections rules - -A second category of error messages is returned when block expressions are not correctly connected. - -#### Parallel connection - -Combining two blocks `A` and `B` in parallel can never produce a box connection error since the 2 blocks are placed one on top of the other, without connections. The inputs of the resulting block-diagram are the inputs of `A` and `B`. The outputs of the resulting block-diagram are the outputs of `A` and `B`. - -#### Sequencial connection error - -Combining two blocks `A` and `B` in sequence will produce a box connection error if `outputs(A) != inputs(B)`. So for instance the following program: - -``` -A = _,_; -B = _,_,_; -process = A : B; -``` - -will produce the following error message: - -``` -ERROR : sequential composition A:B -The number of outputs [2] of A must be equal to the number of inputs [3] of B - -Here A = _,_; -has 2 outputs - -while B = _,_,_; -has 3 inputs -``` - -#### Split connection error - -Combining two blocks `A` and `B` with the split composition will produce a box connection error if the number of inputs of `B` is not a multiple of the number of outputs of `A`. So for instance the following program: - -``` -A = _,_; -B = _,_,_; -process = A <: B; -``` -will produce the following error message: - -``` -ERROR : split composition A<:B -The number of outputs [2] of A must be a divisor of the number of inputs [3] of B - -Here A = _,_; -has 2 outputs - -while B = _,_,_; -has 3 inputs -``` - -#### Merge connection error - -Combining two blocks `A` and `B` with the merge composition will produce a box connection error if the number of outputs of `A` is not a multiple of the number of inputs of `B`. So for instance the following program: - -``` -A = _,_; -B = _,_,_; -process = A :> B; -``` - -will produce the following error message: - -``` -ERROR : merge composition A:>B -The number of outputs [2] of A must be a multiple of the number of inputs [3] of B - -Here A = _,_; -has 2 outputs - -while B = _,_,_; -has 3 inputs -``` - -#### Recursive connection error - -Combining two blocks `A` and `B` with the recursive composition will produce a box connection error if the number of outputs of `A` is less than the number of inputs of `B`, or the number of outputs of `B` is more than the number of inputs of `A` (that is the following $$\mathrm{outputs}(A) \geq \mathrm{inputs}(B) and \mathrm{inputs}(A) \geq \mathrm{outputs}(B)$$ connection rule is not respected). So for instance the following program: - -``` -A = _,_; -B = _,_,_; -process = A ~ B; -``` - -will produce the following error message: - -``` -ERROR : recursive composition A~B -The number of outputs [2] of A must be at least the number of inputs [3] of B. The number of inputs [2] of A must be at least the number of outputs [3] of B. - -Here A = _,_; -has 2 inputs and 2 outputs - -while B = _,_,_; -has 3 inputs and 3 outputs -``` - -#### Route connection errors - -More complex routing between blocks can also be described using the [route](../manual/syntax.md#route-primitive) primitive. Two different errors can be produced in case of incorrect coding: - -``` -process = route(+,8.7,(0,0),(0,1)); -``` -will produce the following error message: - -``` -ERROR : invalid route expression, first two parameters should be blocks producing a value, third parameter a list of input/output pairs : route(+,8.7f,0,0,0,1) -``` - -And the second one when the parameters are not actually numbers: - -``` -process = route(9,8.7f,0,0,0,button("foo")); -``` -will produce the following error message: - -``` -ERROR : invalid route expression, parameters should be numbers : route(9,8.7f,0,0,0,button("foo")) -``` - -### Iterative constructions - -[Iterations](../manual/syntax.md#iterations) are analogous to `for(...)` loops in other languages and provide a convenient way to automate some complex block-diagram constructions. All `par`, `seq`, `sum`, `prod` expressions have the same form, take an identifier as first parameter, a number of iteration as an integer constant numerical expression as second parameter, then an arbitrary block-diagram as third parameter. - -The example code: - -``` -process = par(+, 2, 8); -``` - -will produce the following syntax error, since the first parameter is not an identifier: - -``` -filename.dsp : xx : ERROR : syntax error, unexpected ADD, expecting IDENT -``` - -The example code: - -``` -process = par(i, +, 8); -``` - -will produce the following error: - -``` -filename.dsp : 1 : ERROR : not a constant expression of type : (0->1) : + -``` - -## Pattern matching errors - -Pattern matching mechanism allows to algorithmically create and manipulate block diagrams expressions. Pattern matching coding errors can occur at this level. - -### Multiple symbol definition error - -This error happens when a symbol is defined several times in the DSP file: - -``` -ERROR : [file foo.dsp : N] : multiple definitions of symbol 'foo' -``` - -Since computation are done at compile time and the pattern matching language is Turing complete, even infinite loops can be produced at compile time and should be detected by the compiler. - -### Loop detection error - -The following (somewhat *extreme*) code: - -``` -foo(x) = foo(x); -process = foo; -``` - -will produce the following error: - -``` -ERROR : stack overflow in eval -``` - -and similar kind of infinite loop errors can be produced with more complex code. - -[TO COMPLETE] - -## Signal related errors - -Signal expressions are produced from box expressions, are type annotated and finally reduced to a normal-form. Some primitives expect their parameters to follow some constraints, like being in a specific range or being bounded for instance. The domain of mathematical functions is checked and non allowed operations are signaled. - -### Automatic type promotion - -Some primitives (like [route](../manual/syntax.md#route-primitive), [rdtable](../manual/syntax.md#rdtable-primitive), [rwtable](../manual/syntax.md#rwtable-primitive)...) expect arguments with an integer type, which is automatically promoted, that is the equivalent of `int(exp)` is internally added and is not necessary in the source code. - -### Parameter range errors - -#### Soundfile usage error - -The soundfile primitive assumes the part number to stay in the [0..255] interval, so for instance the following code: - -``` -process = _,0 : soundfile("foo.wav", 2); -``` -will produce the following error: - -``` -ERROR : out of range soundfile part number (interval(-1,1,-24) instead of interval(0,255)) in expression : length(soundfile("foo.wav"),IN[0]) -``` - -#### Delay primitive error - -The delay `@` primitive assumes that the delay signal value is bounded, so the following expression: - -``` -import("stdfaust.lib"); -process = @(ba.time); -``` - -will produce the following error: - -``` -ERROR : can't compute the min and max values of : proj0(letrec(W0 = (proj0(W0)'+1)))@0+-1 - used in delay expression : IN[0]@(proj0(letrec(W0 = (proj0(W0)'+1)))@0+-1) - (probably a recursive signal) -``` - -[TO COMPLETE] - -### Table construction errors - -The [rdtable](../manual/syntax.md#rdtable-primitive) primitive can be used to read through a read-only (pre-defined at initialisation time) table. The [rwtable](../manual/syntax.md#rwtable-primitive) primitive can be used to implement a read/write table. Both have a size computed at compiled time - - -The `rdtable` primitive assumes that the table content is produced by a processor with 0 input and one output, known at compiled time. So the following expression: - -``` -process = rdtable(9, +, 4); -``` - -will produce the following error, since the `+`is not of the correct type: - -``` -ERROR : checkInit failed for type RSEVN interval(-2,2,-24) -``` - -The same kind of errors will happen when read and write indexes are incorrectly defined in a `rwtable` primitive. - -## Mathematical functions out of domain errors - -Error messages will be produced when the mathematical functions are used outside of their domain, and if the problematic computation is done at compile time. If the out of domain computation may be done at runtime, then a warning can be produced using the `-me` option (see [Warning messages](#warning-messages) section). - -### Modulo primitive error - -The modulo `%` assumes that the denominator is not 0, thus the following code: - -``` -process = _%0; -``` - -will produce the following error: - -``` -ERROR : % by 0 in IN[0] % 0 -``` - -The same kind of errors will be produced for `acos`, `asin`, `fmod`, `log10`, `log`, `remainder` and `sqrt` functions. - - -## FIR and backends related errors - -Some primitives of the language are not available in some backends. - -The example code: -``` -fun = ffunction(float fun(float), , ""); -process = fun; -``` - - compiled with the wast/wasm backends using: `faust -lang wast errors.dsp` will produce the following error: - - ``` -ERROR : calling foreign function 'fun' is not allowed in this compilation mode - ``` - - and the same kind of errors would happen also with foreign variables or constants. - - [TO COMPLETE] - -## Compiler option errors - -All compiler options cannot be used with all backends. Moreover, some compiler options can not be combined. These will typically trigger errors, before any compilation actually begins. - -[TO COMPLETE] - -# Warning messages - -Warning messages do not stop the compilation process, but allow to get usefull informations on potential problematic code. The messages can be printed using the `-wall` compilation option. Mathematical out-of-domain error warning messages are displayed when both `-wall` and `-me` options are used. diff --git a/mkdocs/docs/manual/faq.md b/mkdocs/docs/manual/faq.md deleted file mode 100644 index f4682f5033..0000000000 --- a/mkdocs/docs/manual/faq.md +++ /dev/null @@ -1,241 +0,0 @@ -# Frequently Asked Questions - - -## When to use `int` or `float` cast ? - -The [Signal Processor Semantic](../manual/introduction.md#signal-processor-semantic) section explains what a Faust program describes. In particular Faust considers two type of signals: *integer signals* and *floating point signals*. Mathematical operations either occur in the domain of integer numbers, or in the domain of floating point numbers, depending of their types, read [here](../manual/syntax.md#numbers). Using explicit [int cast](../manual/syntax.md#int-primitive) or [float cast](../manual/syntax.md#float-primitive) may be needed to force a given computation to be done in the correct number domain. - -Some language primitives (like [par](../manual/syntax.md#parallel-composition), [seq](../manual/syntax.md#sequential-composition), [route](../manual/syntax.md#route-primitive), [soundfile](../manual/syntax.md#soundfile-primitive), etc.) assume that their parameters are [Constant Numerical Expressions](../manual/syntax.md#constant-numerical-expressions) of the integer type. In this case the compiler automatically does *type promotion* and there is no need to use `int` cast to have the argument be of the integer type (note that an uneeded cast will simply be ignored and will not add uneeded computation in the generated code). - -User interface items produce *floating point signals*. Depending of their use later in the computed expression, using explicit `int` cast may be needed also to force a given computation to be done in the correct number domain. - -## Does select2 behaves as a standard C/C++ like if ? - -The short answer is **no**, `select2` doesn't behave like the `if-then-else` of a traditional programming language, nor does `ba.if` of the standard library. To understand why, think of `select2` as the tuner of a radio, it selects what you listen, but does not prevent the various radio stations from broadcasting. Actually, `select2` could be easily redefined in Faust as: - -``` -select2(i, x, y) = (1-i) * x + i * y; -``` - -### Strict vs Lazy semantics - -In computer science terminology, `select2(i,x,y)` has so-called *strict* semantics. This means that its three arguments `i`, `x`, `y` are always evaluated before select2 itself is executed, in other words, even if `x` or `y` is not selected. The standard C/C++ `if-then-else` has *lazy* semantics. The *condition* is always executed, but depending of the value of the *condition*, only the *then* or the *else* branch is executed. - -The *strict* semantics of `select2` means that you cannot use it to prevent a division by 0 in an expression, or the square root of a negative number, etc... For example, the following code will not prevent a division by 0 error: - -``` -select2(x == 0, 1/x, 10000); -``` - -You cannot use `ba.if` either because it is implemented using `select2` and has the same strict semantics. Therefore the following code will not prevent a division by 0 error: - -``` -ba.if(x == 0, 10000, 1/x); -``` - -### But things are a little bit more complex... - -Concerning the way `select2` is compiled by the Faust compiler, the strict semantic is always preserved. In particular, the type system flags problematic expressions and the stateful parts are always placed outside the if. For instance the DSP code: - -``` -process = button("choose"), (*(3) : +~_), (*(7):+~_) : select2; -``` - -is compiled as the following C++ code, where `fRec0[0]` and `fRec1[0]` contains the computation of each branch: - -```c++ -for (int i = 0; (i < count); i = (i + 1)) { - fRec0[0] = (fRec0[1] + (3.0f * float(input0[i]))); - fRec1[0] = (fRec1[1] + (7.0f * float(input1[i]))); - output0[i] = FAUSTFLOAT((iSlow0 ? fRec1[0] : fRec0[0])); - fRec0[1] = fRec0[0]; - fRec1[1] = fRec1[0]; -} -``` - -For code optimization strategies, the generated code is not *fully* strict on `select2`. When Faust produces C++ code, the C++ compiler can decide to *avoid the execution of the stateless part of the signal that is not selected* (and not needed elsewhere). This doesn't change the semantics of the output signal, but it changes the strictness of the code if a division by 0 would have been executed in the stateless part. When stateless expressions are used, they are by default generated using a *non-strict* conditional expression. - -For instance the following DSP code: - -``` -process = select2((+(1)~_)%10, sin:cos:sin:cos, cos:sin:cos:sin); -``` - -is compiled in C/C++ as: - -```c++ -for (int i0 = 0; i0 < count; i0 = i0 + 1) { - iRec0[0] = iRec0[1] + 1; - output0[i0] = FAUSTFLOAT(((iRec0[0] % 10) - ? std::sin(std::cos(std::sin(std::cos(float(input1[i0]))))) - : std::cos(std::sin(std::cos(std::sin(float(input0[i0]))))))); - iRec0[1] = iRec0[0]; -} -``` - -where only one of the *then* or *else* branch will be effectively computed, thus saving CPU. - -If computing both branches is really desired, the `-sts (--strict-select)` option can be used to force their computation by putting them in local variables, as shown in the following *generated with `-sts`* code version of the same DSP code: - -```c++ -for (int i0 = 0; i0 < count; i0 = i0 + 1) { - iRec0[0] = iRec0[1] + 1; - float fThen0 = std::cos(std::sin(std::cos(std::sin(float(input0[i0]))))); - float fElse0 = std::sin(std::cos(std::sin(std::cos(float(input1[i0]))))); - output0[i0] = FAUSTFLOAT(((iRec0[0] % 10) ? fElse0 : fThen0)); - iRec0[1] = iRec0[0]; -} -``` - -to therefore preserve the strict semantic, even if a non-strict `(cond) ? then : else` form is used to produce the result of the `select2` expression. - -This can be helpful for [debugging purposes](../manual/debugging.md#debugging-at-runtime) like testing if there is no division by 0, or producing `INF` or `NaN` values. The [interp-tracer](https://github.com/grame-cncm/faust/tree/master-dev/tools/benchmark#interp-tracer) can be used for that by adding the `-sts` option. - -So again remember that `select2` cannot be used to **avoid computing something**. For computations that need to avoid some values or ranges (like doing `val/0` that would return `INF`, or `log` of a negative value that would return `NaN`), the solution is to use `min` and `max` to force the arguments to be in the correct domain of values. For example, to avoid division by 0, you can write `1/max(ma.EPSILON, x)`. - -Note that `select2` is also typically used to compute `rdtable/rwtable` access indexes. In this case computing an array *out-of-bound* index, if is not used later on, is not a problem. - -## What properties does the Faust compiler and generated code have ? [WIP] - -### Compiler - -The compiler itself is [turing complete](https://en.wikipedia.org/wiki/Turing_completeness) because it contains a pattern matching meta-programming model. Thus a Faust DSP program can loop at compile time. For instance the following: - -``` -foo = foo; -process = foo; -``` - -will loop and hopefully end with the message: *ERROR : after 400 evaluation steps, the compiler has detected an endless evaluation cycle of 2 steps* because the compiler contains an infinite loop detection heuristic. - -### Generated code - -The generated code computes the sample in a *finite number* of operations, thus a DSP program that would loop infinitely cannot be written. It means the generated code is not turing complete. This is of course a limitation because certain classes of algorithms cannot be expressed (**TODO**: Newton approximation used in diode VA model). But on the contrary it gives a strong garanty on the upper bound of CPU cost that is quite interesting to have when deploying a program in a real-time audio context. - -### Memory footprint - -The DSP memory footprint is perfectly known at compile time, so the generated code always consume a finite amount of memory. Moreover the standard deployement model is to allocate the DSP a load time, init it with a given sample-rate, then execute the DSP code, be repeatedly calling the `compute` function to process audio buffers. - -### CPU footprint - -Since the generated code computes the sample in a *finite number* of operations, the CPU use has an upper bound which is a very helpful property when deploying a program in a real-time audio context. Read the [Does select2 behaves as a standard C/C++ like if ?](#does-select2-behaves-as-a-standard-cc-like-if) for some subtle issues concerning the `select2` primitive. - - -## Pattern matching and lists - -Strictly speaking, there are no lists in Faust. For example the expression `()` or `NIL` in Lisp, which indicates an empty list, does not exist in Faust. Similarly, the distinction in Lisp between the number `3` and the list with only one element `(3)` does not exist in Faust. - -However, list operations can be simulated (in part) using the parallel binary composition operation `,` and pattern matching. The parallel composition operation is right-associative. This means that the expression `(1,2,3,4)` is just a simplified form of the fully parenthesized expression `(1,(2,(3,4)))`. The same is true for `(1,2,(3,4))` which is also a simplified form of the same fully parenthesized expression `(1,(2,(3,4)))`. - -You can think of pattern-matching as always being done on fully parenthesized expressions. Therefore no Faust function can ever distinguish `(1,2,3,4)` from `(1,2,(3,4))`, because they represent the same fully parenthesized expression `(1,(2,(3,4)))`. - -This is why `ba.count( ((1,2), (3,4), (5,6)) )` is not 3 but 4, and also why `ba.count( ((1,2), ((3,4),5,6)) )` is not 2 but 4. - -Explanation: in both cases the fully parenthesized expression is `( (1,2),((3,4),(5,6)) )`. The definition of `ba.count ` being: - -``` -count((x,y)) = 1 + count(y); // rule R1 -count(x) = 1; // rule R2 -``` -we have: - -``` -ba.count( ((1,2),((3,4),(5,6))) ) --R1-> 1 + ba.count( ((3,4),(5,6)) ) --R1-> 1 + 1 + ba.count( (5,6) ) --R1-> 1 + 1 + 1 + ba.count( 6 ) --R2-> 1 + 1 + 1 + 1 -``` -Please note that pattern matching is not limited to parallel composition, the other composition operators `(<: : :> ~)` can be used too. - -## What is the situation about Faust compiler licence and the deployed code? - -*Q: Does the Faust license (GPL) apply somehow to the code exports that it produces as well? Or can the license of the exported code be freely chosen such that one could develop commercial software (e.g. VST plug-ins) using Faust?* - -A: You can freely use Faust to develop commercial software. The GPL license of the compiler *doesn't* apply to the code generated by the compiler. - -The license of the code generated by the Faust compiler depends only on the licenses of the input files. You should therefore check the licenses of the Faust libraries used and the architecture files. On the whole, when used unmodified, Faust libraries and architecture files are compatible with commercial, non-open source use. - -## Surprising effects of vgroup/hgroup on how controls and parameters work - -User interface widget primitives like `button`, `vslider/hslider`, `vbargraph/hbargraph` allow for an abstract description of a user interface from within the Faust code. They can be grouped in a hierarchical manner using `vgroup/hgroup/tgroup` primitives. Each widget then has an associated path name obtained by concatenating the labels of all its surrounding groups with its own label. - -Widgets that have the **same path** in the hierarchical structure will correspond to a same controller and will appear once in the GUI. For instance the following DSP code does not contain any explicit grouping mechanism: - -``` -import("stdfaust.lib"); - -freq1 = hslider("Freq1", 500, 200, 2000, 0.01); -freq2 = hslider("Freq2", 500, 200, 2000, 0.01); - -process = os.osc(freq1) + os.square(freq2), os.osc(freq1) + os.triangle(freq2); -``` - -
*Shared freq1 and freq2 controllers*
- -So even if `freq1` and `freq2` controllers are used as parameters at four different places, `freq1` used in `os.osc(freq1)` and `os.square(freq1)` will have the same path (like `/foo/Freq1`), be associated to a unique controller, and will finally appear once in the GUI. And this is the same mecanism for `freq2`. - -Now if some grouping mecanism is used to better control the UI rendering, as in the following DSP code: - -``` -import("stdfaust.lib"); - -freq1 = hslider("Freq1", 500, 200, 2000, 0.01); -freq2 = hslider("Freq2", 500, 200, 2000, 0.01); - -process = hgroup("Voice1", os.osc(freq1) + os.square(freq2)), hgroup("Voice2", os.osc(freq1) + os.triangle(freq2)); -``` - -The `freq1` and `freq2` controllers now don't have the same path in each group (like `/foo/Voice1/Freq1` and `/foo/Voice1/Freq2` in the first group, and `/foo/Voice2/Freq1` and `/foo/Voice2/Freq2` in the second group), and so four separated controllers and UI items are finally created. - - -
*Four freq1 and freq2 controllers*
- -Using the relative pathname as explained in [Labels as Pathnames](../manual/syntax.md#labels-as-pathnames) possibly allows us to move `freq1` one step higher in the hierarchical structure, thus having again a unique path (like `/foo/Freq1`) and controller: - -``` -import("stdfaust.lib"); - -freq1 = hslider("../Freq1", 500, 200, 2000, 0.01); -freq2 = hslider("Freq2", 500, 200, 2000, 0.01); - -process = hgroup("Voice1", os.osc(freq1) + os.square(freq2)), hgroup("Voice2", os.osc(freq1) + os.triangle(freq2)); -``` - - -
*freq1 moved one step higher in the hierarchical structure*
- -Note that the name for a given `hgroup`, `vgroup`, or `tgroup` can be used more than once, and they will be merged. This can be useful when you want to define different names for different widget signals, but still want to group them. For example, this pattern can be used to separate a synth's UI design from the implementation of the synth's DSP: - -``` -import ("stdfaust.lib"); - -synth(foo, bar, baz) = os.osc(foo+bar+baz); - -synth_ui = synth(foo, bar, baz) -with { - ui(x) = hgroup("Synth", x); - leftcol(x) = ui(vgroup("[0]foobar", x)); - foo = leftcol(hslider("[0]foo", 100, 20, 1000, 1)); - bar = leftcol(hslider("[1]bar", 100, 20, 1000, 1)); - baz = ui(vslider("[1]baz", 100, 20, 1000, 1)); -}; - -process = synth_ui; -``` - - -
*naming and grouping*
- -## What are the rules used for partial application ? - -Assuming `F` is not an abstraction and has `n+m` inputs and `A` has `n` outputs, then we have the rewriting rule `F(A) ==> A,bus(m):F (with bus(1) = _ and bus(n+1) = _,bus(n))` - -There is an exception when `F` is a binary operation like `+,-,/,*`. In this case, the rewriting rule is `/(3) ==> _,3:/`. In other words, when we apply only one argument, it is the second one. - - -## Control rate versus audio rate - -Question: *I have a question about sample rate / control rate issues. I have a Faust code that takes channel pressure messages from my keyboard as input, therefore at control rate, and outputs an expression signal at sample rate. The first part of the code can run at control rate, but I want to force it to run at sample rate (otherwise unwanted behavior will appear). Is there a simple way of forcing my pressure signal to be at sample rate (without making a smooth which may also result in unwanted behavior)*. - -Answer: the [ba.kr2ar](https://faustlibraries.grame.fr/libs/basics/#bakr2ar) function can be used for that purpose. diff --git a/mkdocs/docs/manual/optimizing.md b/mkdocs/docs/manual/optimizing.md deleted file mode 100644 index 8660413f16..0000000000 --- a/mkdocs/docs/manual/optimizing.md +++ /dev/null @@ -1,665 +0,0 @@ -# Optimizing the Code - - -## Optimizing the DSP Code - -Faust is a Domain Specific Language helping the programmer to write very high-level and concise DSP code, while letting the compiler do the hard work of producing the best and most efficient implementation of the specification. When processing the DSP source, the compiler typing system is able to discover how the described computations are effectively separated in four main categories: - -- computations done *at compilation/specialisation time*: this is the place for algorithmic signal processors definition heavily based on the lambda-calculus constitute of the language, together with its pattern-matching capabilities -- computations done *at init time*: for instance all the code that depends of the actual sample-rate, or filling of some internal tables (coded with the `rdtable` or `rwtable` language primitives) -- computations done *at control rate*: typically all code that read the current values of controllers (buttons, sliders, nentries) and update the DSP internal state which depends of them -- computations done *at sample rate*: all remaining code that process and produce the samples - -One can think of these four categories as *different computation rates*. The programmer can possibly split its DSP algorithm to distribute the needed computation in the most appropriate domain (*slower rate* domain better than *faster rate* domain) and possibly rewrite some parts of its DSP algorithm from one domain to a slower rate one to finally obtain the most efficient code. - -### Computations Done *at Compilation/Specialisation Time* - -#### Using Pattern Matching - -**TODO**: explain how pattern-matching can be used to algorithmically describe signal processors, explain the principle of defining a new DSL inside the Faust DSL (with [fds.lib](https://faustlibraries.grame.fr/libs/fds/), [physmodels.lib](https://faustlibraries.grame.fr/libs/physmodels/), [wdmodels.lib](https://faustlibraries.grame.fr/libs/wavedigitalfilters/) as examples). - -#### Specializing the DSP Code - -The Faust compiler can possibly do a lot of optimizations at compile time. The DSP code can for instance be compiled for a fixed sample rate, thus doing at compile time all computation that depends of it. Since the Faust compiler will look for librairies starting from the local folder, a simple way is to locally copy the `libraries/platform.lib` file (which contains the `SR` definition), and change its definition for a fixed value like 48000 Hz. Then the DSP code has to be recompiled for the specialisation to take effect. Note that `libraries/platform.lib` also contains the definition of the `tablesize` constant which is used in various places to allocate tables for oscillators. Thus decreasing this value can save memory, for instance when compiling for embedded devices. This is the technique used in some Faust services scripts which add the `-I /usr/local/share/faust/embedded/` parameter to the Faust command line to use a special version of the platform.lib file. - -### Computations Done *at Init time* - -If not specialized with a constant value at compilation time, all computations that use the sample rate (which is accessed with the `ma.SR` in the DSP source code and given as parameter in the DSP `init` function) will be done at init time, and possibly again each time the same DSP is initialized with another sample rate. - -#### Using rdtable or rwtable - -**TODO**: explain how computations can be done at init time and how to use `rdtable` or `rwtable` to store pre-computed values. - -Several [tabulation functions](https://faustlibraries.grame.fr/libs/basics/#function-tabulation) can possibly be used. - -### Computations Done *at Control Rate* - -#### Parameter Smoothing - -Control parameters are sampled once per block, their values are considered constant during the block, and the internal state depending of them is updated and appears at the beginning of the `compute` method, before the sample rate DSP loop. - -In the following DSP code, the `vol` slider value is directly applied on the input audio signal: - -``` -import("stdfaust.lib"); -vol = hslider("Volume", 0.5, 0, 1, 0.01); -process = *(vol); -``` - -In the generated C++ code for `compute`, the `vol` slider value is sampled before the actual DSP loop, by reading the `fHslider0` field kept in a local `fSlow0` variable: - -```c++ -virtual void compute(int count, FAUSTFLOAT** inputs, FAUSTFLOAT** outputs) { - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* output0 = outputs[0]; - float fSlow0 = float(fHslider0); - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - output0[i0] = FAUSTFLOAT(fSlow0 * float(input0[i0])); - } -} -``` - -If the control parameter needs to be smoothed (like to avoid clicks or too abrupt changes), with the `control : si.smoo` kind of code, the computation rate moves from *control rate* to *sample rate*. If the previous DSP code is now changed with: - -``` -import("stdfaust.lib"); -vol = hslider("Volume", 0.5, 0, 1, 0.01) : si.smoo; -process = *(vol); -``` - -The `vol` slider is sampled before the actual DSP loop and multiplied by the filter `fConst0` constant computed at init time, and finally used in the DSP loop in the smoothing filter: - -```c++ -virtual void compute(int count, FAUSTFLOAT** inputs, FAUSTFLOAT** outputs) { - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* output0 = outputs[0]; - float fSlow0 = fConst0 * float(fHslider0); - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fRec0[0] = fSlow0 + fConst1 * fRec0[1]; - output0[i0] = FAUSTFLOAT(float(input0[i0]) * fRec0[0]); - fRec0[1] = fRec0[0]; - } -} -``` - -So the CPU usage will obviously be higher, and the need for parameter smoothing should be carefully studied. - -Another point to consider is the *order of computation* when smoothing control. In the following DSP code, the dB slider value is *first* converted first to a linear value, *then* smoothed: - -``` -import("stdfaust.lib"); -smoother_vol = hslider("Volume", -6.0, -120.0, .0, 0.01) : ba.db2linear : si.smoo; -process = *(smoother_vol); -``` - -And the generated C++ code for `compute` has the costly `pow` math function used in `ba.db2linear` evaluted at control rate, so once before the DSP loop: - -```c++ -virtual void compute(int count, FAUSTFLOAT** inputs, FAUSTFLOAT** outputs) { - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* output0 = outputs[0]; - float fSlow0 = fConst0 * std::pow(1e+01f, 0.05f * float(fHslider0)); - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fRec0[0] = fSlow0 + fConst1 * fRec0[1]; - output0[i0] = FAUSTFLOAT(float(input0[i0]) * fRec0[0]); - fRec0[1] = fRec0[0]; - } -} -``` - -But if the order between `ba.db2linear` and `si.smoo` is reversed like in the following code: - -``` -import("stdfaust.lib"); -smoother_vol = hslider("Volume", -6.0, -120.0, .0, 0.01) : si.smoo : ba.db2linear; -process = *(smoother_vol); -``` - -The generated C++ code for `compute` now has the `pow` math function used in `ba.db2linear` evaluated at sample rate in the DSP loop, which is obviously much more costly: - -```c++ -virtual void compute(int count, FAUSTFLOAT** inputs, FAUSTFLOAT** outputs) { - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* output0 = outputs[0]; - float fSlow0 = fConst0 * float(fHslider0); - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fRec0[0] = fSlow0 + fConst1 * fRec0[1]; - output0[i0] = FAUSTFLOAT(float(input0[i0]) * std::pow(1e+01f, 0.05f * fRec0[0])); - fRec0[1] = fRec0[0]; - } -} -``` - -So to obtain the best performances in the generated code, all costly computations have to be done on the control value (as much as possible, this may not always be the desirable behaviour), and `si.smoo` (or any function that moves the computation from control rate to sample rate) as the last operation. - -### Computations Done *at Sample Rate* - -#### Possibly deactivating table range check with -ct option - -The [-ct](../manual/debugging.md#the-ct-option) option is activated by default (starting at Faust version 2.53.4), but can possibly be removed (using `-ct 0`) to speed up the code. Read [Debugging rdtable and rwtable primitives](../tutorials/debugging.md#debugging-rdtable-and-rwtable-primitives) for a complete description. - -#### Using Function Tabulation - -The use of `rdtable` kind of compilation done at init time can be simplified using the [ba.tabulate](https://faustlibraries.grame.fr/libs/basics/#batabulate) or [ba.tabulate_chebychev](https://faustlibraries.grame.fr/libs/basics/#batabulate_chebychev) functions to *tabulate* a given unary function `fun` on a given range. A table is created and filled with precomputed values, and can be used to compute `fun(x)` in a more efficient way (at the cost of additional static memory needed for the table). - -#### Using Fast Math Functions - -When costly math functions still appear in the sample rate code, the `-fm` [compilation option](../manual/options.md) can possibly be used to replace the standard versions provided by the underlying OS (like `std::cos`, `std::tan`... in C++ for instance) with user defined ones (hopefully faster, but possibly less precise). - -### Delay lines implementation and DSP memory size - -The Faust compiler automatically allocates buffers for the delay lines. At each sample calculation, the delayed signal is written to a specific location (the *write* position) and read from another location (the *read* position), the *distance in samples* between the read and write indexes representing the delay itself. - -There are two possible strategies for implementing delay lines: either the read and write indices remain the same and the delay line memory is moved after each sample calculation, or the read and write indices move themselves along the delay line (with two possible *wrapping index* methods). These multiple methods allow arbitration between memory consumption and the CPU cost of using the delay line. - -Two compiler options `-mcd ` (`-max-copy-delay`) and `-dlt ` (`--delay-line-threshold`) allow you to play with both strategies and even combine them. - -For very short delay lines of up to two samples, the first strategy is implemented by manually shifting the buffer. Then a shift loop is generated for delay from 2 up to `-mcd ` samples. - -For delays values bigger than `-mcd ` samples, the second strategy is implemented by: - -- either using arrays of power-of-two sizes accessed using mask based index computation with delays smaller than `-dlt ` value. -- or using a wrapping index moved by an if based method where the increasing index is compared to the delay-line size, and wrapped to zero when reaching it. This method is used for to delay values bigger then `-dlt `. - -In this strategy the first method is faster but consumes more memory (since a delay line of a given size will be extended to the next power-of-two size), and the second method is slower but consume less memory. - -Note that by default `-mcd 16` is `-dlt ` values are used. Here is a scheme explaining the mecanism: - -``` -[ shift buffer |-mcd | wrapping power-of-two buffer |-dlt | if based wrapping buffer ] -``` - -Here is an example of a Faust program with 10 delay lines in parallel, each delaying a separated input, with three ways of compiling it (using the defaut `-scalar` mode): - - -
- -
-
- - -When compiling with `faust -mcd 20`, since 20 is larger than the size of the largest delay line, all of them are compiled with the *shifted memory* strategy: - -```c++ -... -// The DSP memory layout -float fVec0[11]; -float fVec1[10]; -float fVec2[9]; -float fVec3[8]; -float fVec4[7]; -float fVec5[6]; -float fVec6[5]; -float fVec7[4]; -float fVec8[2]; -float fVec9[3]; -int fSampleRate; -... -virtual void compute(int count, - FAUSTFLOAT** RESTRICT inputs, - FAUSTFLOAT** RESTRICT outputs) -{ - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* input1 = inputs[1]; - FAUSTFLOAT* input2 = inputs[2]; - FAUSTFLOAT* input3 = inputs[3]; - FAUSTFLOAT* input4 = inputs[4]; - FAUSTFLOAT* input5 = inputs[5]; - FAUSTFLOAT* input6 = inputs[6]; - FAUSTFLOAT* input7 = inputs[7]; - FAUSTFLOAT* input8 = inputs[8]; - FAUSTFLOAT* input9 = inputs[9]; - FAUSTFLOAT* output0 = outputs[0]; - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fVec0[0] = float(input9[i0]); - fVec1[0] = float(input8[i0]); - fVec2[0] = float(input7[i0]); - fVec3[0] = float(input6[i0]); - fVec4[0] = float(input5[i0]); - fVec5[0] = float(input4[i0]); - fVec6[0] = float(input3[i0]); - fVec7[0] = float(input2[i0]); - fVec8[0] = float(input0[i0]); - fVec9[0] = float(input1[i0]); - output0[i0] = FAUSTFLOAT(fVec0[10] + fVec1[9] + fVec2[8] + fVec3[7] + fVec4[6] - + fVec5[5] + fVec6[4] + fVec7[3] + fVec8[1] + fVec9[2]); - for (int j0 = 10; j0 > 0; j0 = j0 - 1) { - fVec0[j0] = fVec0[j0 - 1]; - } - for (int j1 = 9; j1 > 0; j1 = j1 - 1) { - fVec1[j1] = fVec1[j1 - 1]; - } - for (int j2 = 8; j2 > 0; j2 = j2 - 1) { - fVec2[j2] = fVec2[j2 - 1]; - } - for (int j3 = 7; j3 > 0; j3 = j3 - 1) { - fVec3[j3] = fVec3[j3 - 1]; - } - for (int j4 = 6; j4 > 0; j4 = j4 - 1) { - fVec4[j4] = fVec4[j4 - 1]; - } - for (int j5 = 5; j5 > 0; j5 = j5 - 1) { - fVec5[j5] = fVec5[j5 - 1]; - } - for (int j6 = 4; j6 > 0; j6 = j6 - 1) { - fVec6[j6] = fVec6[j6 - 1]; - } - for (int j7 = 3; j7 > 0; j7 = j7 - 1) { - fVec7[j7] = fVec7[j7 - 1]; - } - fVec8[1] = fVec8[0]; - fVec9[2] = fVec9[1]; - fVec9[1] = fVec9[0]; - - } -} -... -``` - -In this code example, the *very short delay lines of up to two samples by manually shifting the buffer* method can be seen in those lines: - -```c++ -... -// Delay line of 1 sample -fVec8[1] = fVec8[0]; -// Delay line of 2 samples -fVec9[2] = fVec9[1]; -fVec9[1] = fVec9[0]; -... -``` - -and the *shift loop is generated for delay from 2 up to `-mcd ` samples* method can be seen in those lines: - -```c++ -... -output0[i0] = FAUSTFLOAT(fVec0[10] + fVec1[9] + fVec2[8] + fVec3[7] + fVec4[6] - + fVec5[5] + fVec6[4] + fVec7[3] + fVec8[1] + fVec9[2]); -// Shift delay line of 10 samples -for (int j0 = 10; j0 > 0; j0 = j0 - 1) { - fVec0[j0] = fVec0[j0 - 1]; -} -// Shift delay line of 9 samples -for (int j1 = 9; j1 > 0; j1 = j1 - 1) { - fVec1[j1] = fVec1[j1 - 1]; -} -... -``` - -When compiled with `faust -mcd 0`, all delay lines use the *wrapping index* second strategy with power-of-two size (since `-dlt ` is used by default): - -```c++ -... -// The DSP memory layout -int IOTA0; -float fVec0[16]; -float fVec1[16]; -float fVec2[16]; -float fVec3[8]; -float fVec4[8]; -float fVec5[8]; -float fVec6[8]; -float fVec7[4]; -float fVec8[2]; -float fVec9[4]; -int fSampleRate; -... -virtual void compute(int count, - FAUSTFLOAT** RESTRICT inputs, - FAUSTFLOAT** RESTRICT outputs) -{ - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* input1 = inputs[1]; - FAUSTFLOAT* input2 = inputs[2]; - FAUSTFLOAT* input3 = inputs[3]; - FAUSTFLOAT* input4 = inputs[4]; - FAUSTFLOAT* input5 = inputs[5]; - FAUSTFLOAT* input6 = inputs[6]; - FAUSTFLOAT* input7 = inputs[7]; - FAUSTFLOAT* input8 = inputs[8]; - FAUSTFLOAT* input9 = inputs[9]; - FAUSTFLOAT* output0 = outputs[0]; - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fVec0[IOTA0 & 15] = float(input9[i0]); - fVec1[IOTA0 & 15] = float(input8[i0]); - fVec2[IOTA0 & 15] = float(input7[i0]); - fVec3[IOTA0 & 7] = float(input6[i0]); - fVec4[IOTA0 & 7] = float(input5[i0]); - fVec5[IOTA0 & 7] = float(input4[i0]); - fVec6[IOTA0 & 7] = float(input3[i0]); - fVec7[IOTA0 & 3] = float(input2[i0]); - fVec8[IOTA0 & 1] = float(input0[i0]); - fVec9[IOTA0 & 3] = float(input1[i0]); - output0[i0] = FAUSTFLOAT(fVec0[(IOTA0 - 10) & 15] + fVec1[(IOTA0 - 9) & 15] - + fVec2[(IOTA0 - 8) & 15] + fVec3[(IOTA0 - 7) & 7] + fVec4[(IOTA0 - 6) & 7] - + fVec5[(IOTA0 - 5) & 7] + fVec6[(IOTA0 - 4) & 7] + fVec7[(IOTA0 - 3) & 3] - + fVec8[(IOTA0 - 1) & 1] + fVec9[(IOTA0 - 2) & 3]); - IOTA0 = IOTA0 + 1; - } -} -... -``` - -In this code example, several delay lines of various power-of-two size (2, 4, 8, 16) are generated. A unique continuously incremented `IOTA0` variable is shared between all delay lines. The *wrapping index* code is generated with this `(IOTA0 - 5) & 7` kind of code, with a power-of-two - 1 mask (so 8 - 1 = 7 here). - -When compiled with `faust -mcd 4 -dlt 7`, a mixture of the three generation strategies is used: - -```c++ -// The DSP memory layout -... -int fVec0_widx; -float fVec0[11]; -int fVec1_widx; -float fVec1[10]; -int fVec2_widx; -float fVec2[9]; -int fVec3_widx; -float fVec3[8]; -int IOTA0; -float fVec4[8]; -float fVec5[8]; -float fVec6[8]; -float fVec7[4]; -float fVec8[2]; -float fVec9[3]; -int fSampleRate; -... -virtual void compute(int count, - FAUSTFLOAT** RESTRICT inputs, - FAUSTFLOAT** RESTRICT outputs) -{ - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* input1 = inputs[1]; - FAUSTFLOAT* input2 = inputs[2]; - FAUSTFLOAT* input3 = inputs[3]; - FAUSTFLOAT* input4 = inputs[4]; - FAUSTFLOAT* input5 = inputs[5]; - FAUSTFLOAT* input6 = inputs[6]; - FAUSTFLOAT* input7 = inputs[7]; - FAUSTFLOAT* input8 = inputs[8]; - FAUSTFLOAT* input9 = inputs[9]; - FAUSTFLOAT* output0 = outputs[0]; - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - int fVec0_widx_tmp = fVec0_widx; - fVec0[fVec0_widx_tmp] = float(input9[i0]); - int fVec0_ridx_tmp0 = fVec0_widx - 10; - int fVec1_widx_tmp = fVec1_widx; - fVec1[fVec1_widx_tmp] = float(input8[i0]); - int fVec1_ridx_tmp0 = fVec1_widx - 9; - int fVec2_widx_tmp = fVec2_widx; - fVec2[fVec2_widx_tmp] = float(input7[i0]); - int fVec2_ridx_tmp0 = fVec2_widx - 8; - int fVec3_widx_tmp = fVec3_widx; - fVec3[fVec3_widx_tmp] = float(input6[i0]); - int fVec3_ridx_tmp0 = fVec3_widx - 7; - fVec4[IOTA0 & 7] = float(input5[i0]); - fVec5[IOTA0 & 7] = float(input4[i0]); - fVec6[IOTA0 & 7] = float(input3[i0]); - fVec7[0] = float(input2[i0]); - fVec8[0] = float(input0[i0]); - fVec9[0] = float(input1[i0]); - output0[i0] = FAUSTFLOAT(fVec0[((fVec0_ridx_tmp0 < 0) ? fVec0_ridx_tmp0 + 11 : fVec0_ridx_tmp0)] - + fVec1[((fVec1_ridx_tmp0 < 0) ? fVec1_ridx_tmp0 + 10 : fVec1_ridx_tmp0)] - + fVec2[((fVec2_ridx_tmp0 < 0) ? fVec2_ridx_tmp0 + 9 : fVec2_ridx_tmp0)] - + fVec3[((fVec3_ridx_tmp0 < 0) ? fVec3_ridx_tmp0 + 8 : fVec3_ridx_tmp0)] - + fVec4[(IOTA0 - 6) & 7] + fVec5[(IOTA0 - 5) & 7] + fVec6[(IOTA0 - 4) & 7] + fVec7[3] + fVec8[1] + fVec9[2]); - fVec0_widx_tmp = fVec0_widx_tmp + 1; - fVec0_widx_tmp = ((fVec0_widx_tmp == 11) ? 0 : fVec0_widx_tmp); - fVec0_widx = fVec0_widx_tmp; - fVec1_widx_tmp = fVec1_widx_tmp + 1; - fVec1_widx_tmp = ((fVec1_widx_tmp == 10) ? 0 : fVec1_widx_tmp); - fVec1_widx = fVec1_widx_tmp; - fVec2_widx_tmp = fVec2_widx_tmp + 1; - fVec2_widx_tmp = ((fVec2_widx_tmp == 9) ? 0 : fVec2_widx_tmp); - fVec2_widx = fVec2_widx_tmp; - fVec3_widx_tmp = fVec3_widx_tmp + 1; - fVec3_widx_tmp = ((fVec3_widx_tmp == 8) ? 0 : fVec3_widx_tmp); - fVec3_widx = fVec3_widx_tmp; - IOTA0 = IOTA0 + 1; - for (int j0 = 3; j0 > 0; j0 = j0 - 1) { - fVec7[j0] = fVec7[j0 - 1]; - } - fVec8[1] = fVec8[0]; - fVec9[2] = fVec9[1]; - fVec9[1] = fVec9[0]; - } -} -... -``` - -In this code example, the *wrapping index moved by an if based method* can be recognized with the use of those `fVec0_ridx_tmp0` and `fVec0_widx_tmp0` kind of variables. - -Choosing values that use less memory can be particularly important in the context of embedded devices. Testing different combinations of the `-mcd` and `-dlt` options can help optimize CPU utilisation, to summarize: - -- chosing a big `n` value for `-mcd n` will consume less memory but the shift loop will start to be time consuming with big delay values. This model may sometimes be better suited if the C++ or LLVM compiler correctly auto-vectorizes the code and generates more efficient SIMD code. -- chosing `-mcd 0` will activate the *wrapping index* second strategy for all delay lines in the DSP code, then playing with `-dlt ` allows you to arbitrate between the *faster but consuming more memory* method and *slower but consume less memory* method. -- chosing a combinaison of `-mcd n1` and `-dlt ` can possibly be the model to chose when a lot of delay lines with different sizes are used in the DSP code. - -Using the benchmark tools [faustbench](#faustbench) and [faustbench-llvm](#faustbench-llvm) allow you to refine the choice of compilation options. - -#### Recursive signals - -In the C++ generated code, the delays lines appear as `fVecXX` arrays. When recursion is used in the DSP, a one sample delay is automatically added in the recursive path, and a very short delay line is allocated (appearing as `fRecX` arrays in the generated code). Here is the code of a recursively defined integrator: - - -
- -
-
- - -And the generated C++ code with the `iRec0` buffer: - -```c++ -... -// The DSP memory layout -int iRec0[2]; -... -virtual void compute(int count, - FAUSTFLOAT** RESTRICT inputs, - FAUSTFLOAT** RESTRICT outputs) -{ - FAUSTFLOAT* output0 = outputs[0]; - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - iRec0[0] = iRec0[1] + 1; - output0[i0] = FAUSTFLOAT(iRec0[0]); - iRec0[1] = iRec0[0]; - } -} -... -``` - -#### Delay lines in recursive signals - -Here is an example of a Faust program with 10 recursive blocks in parallel, each using a delay line of increasing value: - - -
- -
-
- - -Since a recursive signal uses a one sample delay in its loop, a buffer is allocated to handle the delay. When a delay is used in addition to the recursive signal, a *single buffer* is allocated to combine the two delay sources. The generated code using `faust -mcd 0` for instance is now: - -```c++ -... -// The DSP memory layout -int IOTA0; -float fRec0[4]; -float fRec1[4]; -float fRec2[8]; -float fRec3[8]; -float fRec4[8]; -float fRec5[8]; -float fRec6[16]; -float fRec7[16]; -float fRec8[16]; -float fRec9[16]; -int fSampleRate; -... -virtual void compute(int count, - FAUSTFLOAT** RESTRICT inputs, - FAUSTFLOAT** RESTRICT outputs) -{ - FAUSTFLOAT* input0 = inputs[0]; - FAUSTFLOAT* input1 = inputs[1]; - FAUSTFLOAT* input2 = inputs[2]; - FAUSTFLOAT* input3 = inputs[3]; - FAUSTFLOAT* input4 = inputs[4]; - FAUSTFLOAT* input5 = inputs[5]; - FAUSTFLOAT* input6 = inputs[6]; - FAUSTFLOAT* input7 = inputs[7]; - FAUSTFLOAT* input8 = inputs[8]; - FAUSTFLOAT* input9 = inputs[9]; - FAUSTFLOAT* output0 = outputs[0]; - for (int i0 = 0; i0 < count; i0 = i0 + 1) { - fRec0[IOTA0 & 3] = float(input0[i0]) + fRec0[(IOTA0 - 2) & 3]; - fRec1[IOTA0 & 3] = float(input1[i0]) + fRec1[(IOTA0 - 3) & 3]; - fRec2[IOTA0 & 7] = float(input2[i0]) + fRec2[(IOTA0 - 4) & 7]; - fRec3[IOTA0 & 7] = float(input3[i0]) + fRec3[(IOTA0 - 5) & 7]; - fRec4[IOTA0 & 7] = float(input4[i0]) + fRec4[(IOTA0 - 6) & 7]; - fRec5[IOTA0 & 7] = float(input5[i0]) + fRec5[(IOTA0 - 7) & 7]; - fRec6[IOTA0 & 15] = float(input6[i0]) + fRec6[(IOTA0 - 8) & 15]; - fRec7[IOTA0 & 15] = float(input7[i0]) + fRec7[(IOTA0 - 9) & 15]; - fRec8[IOTA0 & 15] = float(input8[i0]) + fRec8[(IOTA0 - 10) & 15]; - fRec9[IOTA0 & 15] = float(input9[i0]) + fRec9[(IOTA0 - 11) & 15]; - output0[i0] = FAUSTFLOAT(fRec0[IOTA0 & 3] + fRec1[IOTA0 & 3] + fRec2[IOTA0 & 7] + fRec3[IOTA0 & 7] + fRec4[IOTA0 & 7] + fRec5[IOTA0 & 7] + fRec6[IOTA0 & 15] + fRec7[IOTA0 & 15] + fRec8[IOTA0 & 15] + fRec9[IOTA0 & 15]); - IOTA0 = IOTA0 + 1; - } -} -... -``` - -with buffers named `fRecX` instead of `fVecX` in the previous example. The `-mcd ` and `-dlt ` options can be used with the same purpose. - -### Managing DSP Memory Layout - -On audio boards where the memory is separated as several blocks (like SRAM, SDRAM…) with different access time, it becomes important to refine the DSP memory model so that the DSP structure will not be allocated on a single block of memory, but possibly distributed on all available blocks. The idea is then to allocate parts of the DSP that are often accessed in fast memory and the other ones in slow memory. This can be controled using the `-mem` compilation option and an [adapted architecture file](../manual/architectures.md#custom-memory-manager). - -## Optimizing the C++ or LLVM Code - -From a given DSP program, the Faust compiler tries to generate the most efficient implementation. Optimizations can be done at DSP writing time, or later on when the target langage is generated (like C++ or LLVM IR). -The generated code can have different *shapes* depending of compilation options, and can run faster of slower. Several programs and tools are available to help Faust programmers to test (for possible numerical or precision issues), optimize their programs by discovering the best set of options for a given DSP code, and finally compile them into native code for the target CPUs. - -By default the Faust compiler produces a big scalar loop in the generated `mydsp::compute` method. Compiler options allow you to generate other code *shapes*, like for instance separated simpler loops connected with buffers in the so-called vectorized mode (obtained using the `-vec` option). The assumption is that auto-vectorizer passes in modern compilers will be able to better generate efficient SIMD code for them. In this vec option, the size of the internal buffer can be changed using the `-vs value` option. Moreover the computation graph can be organized in deep-first order using `-dfs`. - -Delay lines implementation can be be controlled with the `-mcd size` and `-dlt size` options, to choose for example between *power-of-two sizes and mask based wrapping* (faster but consumming more memory) or *if based wrapping*, slower but consumming less memory. - -Many other compilation choices are fully controllable with options. Note that the C/C++ and LLVM backends offer the greatest number of compilation options. Here are just a few of them: - -- `-clang` option: when compiled with clang/clang++, adds specific #pragma for auto-vectorization. -- `-nvi` option: when compiled with the C++ backend, does not add the 'virtual' keyword. **This option can be especially useful in embedded devices context** -- `-mapp` option: simpler/faster versions of 'floor/ceil/fmod/remainder' functions (experimental) - -Manually testing each of them and their combination is out of reach. So several tools have been developed to automatize that process and help search the configuration space to discover the best set of compilation options (be sure to run `make benchmark && sudo make devinstall` in Faust toplevel folder to install the benchmark tools): - -### faustbench - -The **faustbench** tool uses the C++ backend to generate a set of C++ files produced with different Faust compiler options. All files are then compiled in a unique binary that will measure DSP CPU of all versions of the compiled DSP. The tool is supposed to be launched in a terminal, but it can be used to generate an iOS project, ready to be launched and tested in Xcode. A more complete documentation is available on the [this page](https://github.com/grame-cncm/faust/tree/master-dev/tools/benchmark#faustbench). - -### faustbench-llvm - -The **faustbench-llvm** tool uses the `libfaust` library and its LLVM backend to dynamically compile DSP objects produced with different Faust compiler options, and then measure their DSP CPU usage. Additional Faust compiler options can be given beside the ones that will be automatically explored by the tool. A more complete documentation is available on the [this page](https://github.com/grame-cncm/faust/tree/master-dev/tools/benchmark#faustbench-llvm). - -### faust2bench - -The **faust2bench** tool allows you to benchmark a given DSP program: - -``` -faust2bench -h -Usage: faust2bench [Faust options] -Compiles Faust programs to a benchmark executable -``` - -So something like `faust2bench -vec -lv 0 -vs 4 foo.dsp` is used to produce an executable, then launching `./foo` gives : - -``` -./foo -./foo : 303.599 MBytes/sec (DSP CPU % : 0.224807 at 44100 Hz) -``` - -The `-inj` option allows to possibly inject and benchmark an external C++ class to be *adapted* to behave as a `dsp` class, like in the following `adapted.cpp` example. The inherited `compute` method is rewritten to call the external C++ `limiterStereo.SetPreGain` etc... code to update the controllers, and the method `limiterStereo.Process` which computes the DSP: - -```c++ -#include "faust/dsp/dsp.h" -#include "Limiter.hpp" - -struct mydsp : public dsp { - - Limiter limiterStereo; - - void init(int sample_rate) - { - limiterStereo.SetSR(sample_rate); - } - - int getNumInputs() { return 2; } - int getNumOutputs() { return 2; } - - int getSampleRate() { return 44100; } - - void instanceInit(int sample_rate) - {} - - void instanceConstants(int sample_rate) - {} - void instanceResetUserInterface() - {} - void instanceClear() - {} - - void buildUserInterface(UI* ui_interface) - {} - - dsp* clone() - { - return new mydsp(); - } - void metadata(Meta* m) - {} - - void compute(int count, FAUSTFLOAT** inputs, FAUSTFLOAT** outputs) - { - limiterStereo.SetPreGain(0.5); - limiterStereo.SetAttTime(0.5); - limiterStereo.SetHoldTime(0.5); - limiterStereo.SetRelTime(0.5); - limiterStereo.SetThreshold(0.5); - - limiterStereo.Process(inputs, outputs, count); - } - -}; -``` - -Using `faust2bench -inj adapted.cpp dummy.dsp` creates the executable to be tested with `./dummy` (remember that `dummy.dsp` is a program that is not actually used in `-inj` mode). - -### dynamic-faust - -The **dynamic-faust** tool uses the dynamic compilation chain (based on the LLVM backend), and compiles a Faust DSP source to a LLVM IR (.ll), bicode (.bc), machine code (.mc) or object code (.o) output file. This is an alternative to the C++ compilation chain, since DSP code can be compiled to object code (.o), then used and linked in a regular C++ project. A more complete documentation is available on the [this page](https://github.com/grame-cncm/faust/tree/master-dev/tools/benchmark#dynamic-faust). - -### Optimizing with any faust2xx tool - -All `faust2xx` tools compile in scalar mode by default, but can take any combination of optimal options (like `-vec -fun -vs 32 -dfs -mcd 32` for instance) the previously described tools will automatically find. So by chaining the use of **faustbench** of **faustbench-llvm** to discover the best compilation options for a given DSP, then use them in the desired **faust2xx** tool, a CPU optimized standalone or plugin can be obtained. - -Note that some **faust2xx** tools like [`faust2max6`](https://github.com/grame-cncm/faust/tree/master-dev/architecture/max-msp) or `faust2caqt` can internally call the `faustbench-llvm` tool to discover and later on use the best possible compilation options. - -## Compiling for Multiple CPUs - -On modern CPUs, compiling native code dedicated to the target processor is critical to obtain the best possible performances. When using the C++ backend, the same C++ file can be compiled with `gcc` of `clang` for each possible target CPU using the appropriate `-march=cpu` option. When using the LLVM backend, the same LLVM IR code can be compiled into CPU specific machine code using the [dynamic-faust](../manual/optimizing.md#dynamic-faust) tool. This step will typically be done using the best compilation options automatically found with the [faustbench](../manual/optimizing.md#faustbench) tool or [faustbench-llvm](../manual/optimizing.md#faustbench-llvm) tools. A specialized tool has been developed to combine all the possible options. - -### faust2object - -The `faust2object` tool either uses the standard C++ compiler or the LLVM dynamic compilation chain (the [dynamic-faust](../manual/optimizing.md#dynamic-faust) tool) to compile a Faust DSP to object code files (.o) and wrapper C++ header files for different CPUs. The DSP name is used in the generated C++ and object code files, thus allowing to generate distinct versions of the code that can finally be linked together in a single binary. A more complete documentation is available on the [this page](https://github.com/grame-cncm/faust/tree/master-dev/tools/benchmark#faust2object).