Digital ASIC FPGA IP examples
- DAFIP
- Overview
- IIR implementation
- Gray and binary number convertion
- QAM modulation
- LUT-based NCO
- Polyphase filter
IP | Implementation | Verification | Documentation | Category |
---|---|---|---|---|
IIR | âś… | âś… | âś… | DSP |
AsyncFIFO | ✅ | ❌ | ❌ | CDC |
SyncFIFO | ✅ | ❌ | ❌ | Common |
RoundRobinArbiter | ✅ | ❌ | ❌ | Common |
FrequencyDivider | ✅ | ❌ | ❌ | Common |
Multi-based divider | ✅ | ❌ | ✅ | DSP |
FSMTemplate | ✅ | ❌ | ❌ | Common |
GrayBinConversion | âś… | âś… | âś… | Common |
FIR | âś… | âś… | âś… | DSP |
QAMMOD | âś… | âś… | âś… | DSP |
NCO | âś… | âś… | âś… | DSP |
Polyphase_fir | âś… | âś… | âś… | DSP |
This IIR implementation is an example of first-order IIR filer with testbench to check the simulation results. Key features of the provided IIR are:
- Difference equation is
$$y[n] = ax[n] + (1-a)y[n-1]$$ , where a = 2^(-A). In the example, A is 3 and division operation is realized by arithmetic shift. - Data type of input and output is
ap_fixed<16,2>
- Theoretical quantization error falls in the range of (-0.000031,0.000076). Random simulation quantization error falls in the range of (-0.000019, 0.000066). Check Section Calculate theoritical quantization error for the theorithical proof.
Testbench stimuli and reference ouput are obtained from Matlab scripts./matlab/get_matlab_reference_testdata_rand.m
and ./matlab/get_matlab_reference_testdata_sin.m
.
The same IIR filter in Matlab is realized by Matlab 1-D filter function.
Testdata is stored in binary file ./testData/testdata.bin
and later read by C testbench
Run tcl file to create Vitis_hls project, run C simulation and RTL simulation
cd ./vitis_hls
vitis_hls -f run_hls.tcl
C testbench reads testdata generated from the previous step, drives the IIR with stimuli, and compares DUT and Matlab reference output with quantization error boundary. The DUT output will also be stored in ./vitis_hls/proj/solution/csim/build/dut_output.bin
for further plot.
./matlab/check_sim_result.m
reads testdata.bin and dut_output.bin, and compare the results with theoretical quantization error bound curve. Given the IIR parameter, the upper bound is
and lower bound is
,where A
is shift distance and fwidth
is the fraction width of data type.
The figure below shows that simulation error is inside of the boundary.
Note that the actual simulation error should be much better than theoretical boundary. This is because the error on the boundary can only be achieved when the maximum quantization error is introduced every time when data is quantized to a smaller precision, which explains, in the figure below, the acutal quantization error occupies only ~70% of the error boundary.
Quantization error can be considered as an unit step input
with varying range. Variable diff
use the rounding method of AP_RND
, which is equvilent to introduce an error after the shift operator in the structure, shown as the quantization error
in the figure below
IIR Structure with quantization error
The data type of diff
variable is ap_fixed<17,2>
, and one of the operands of the previous shift operation is ap_fixed<18,2>
. Therefore when the result is converted to diff
, HLS rounds the value to the nearest representable value of ap_fixed<17,2>
, which means the added quantization error ranges from
and the original difference equation shows that unit step
function with varying amplitude of y_reg_
can be derived.
This repo provides IP example of gray-code to binary-code and binary-code to gray-code conversion. Binary-to-gray conversion is simple. Implementation of gray-to-binary is straightforward as well. However, for strict timing constraints, optimizing gray-to-binary conversion can be a little bit complicated.
Visit https://en.wikipedia.org/wiki/Gray_code
Binary-to-gray is simple and easy, and there is nothing to optimize. All xor operations of each bit-pair can be performed in parallel. HLS code is as simple as below
template <class DATA_T>
DATA_T bin2gray(DATA_T bin) {
DATA_T gray = bin ^ (bin >> 1);
return gray;
}
Steps to run the implementation of binary-to-gray conversion are
cd <repo-path>/GrayBin/vitis_hls
vitis_hls -f run_hls_bin2gray
vitis_hls -p bin2gray
Gray-code to binary-code conversion is not difficult: in a word, given N-bit input (gray), each bit of binary output, bin[n], can be recursively xor-ed from bin[n+1] and gray[n], and bin[MSB] is same as gray[MSB]. Therefore, the simplest implementation of gray-to-binary conversion is nothing but an xor chain, which is very similar to ripple carry adders, i.e. the previous output of xor is the current input of xor. The cirtical path is always the longest path of the chain, which yields bin[LSB]. Run the following steps to see the implementation of this kind of non-optimized gray-to-binary conversion.
cd <repo-path>/GrayBin/vitis_hls
vitis_hls -f run_hls_gray2bin
vitis_hls -p gray2bin
Because xor operation is commutative and associative, we can xor many different bits in parallel. Implementation found in function gray2bin_paral utilizes the idea of parallel prefix algorithms. The delay for N-bit gray-to-bincary convertion roughly equals to
+----------------+-------+---------+--------+---------+----------+----+----+
| Modules | | Latency | Latency| | | | |
| & Loops | Slack | (cycles)| (ns) | Interval| Pipelined| FF | LUT|
+----------------+-------+---------+--------+---------+----------+----+----+
|+ gray2bin_hls | -0.95| 1| 1.952| 1| yes| 34| 64|
+----------------+-------+---------+--------+---------+----------+----+----+
+----------------------+------+---------+--------+---------+----------+---+-----+
| Modules | | Latency | Latency| | | | |
| & Loops | Slack| (cycles)| (ns) | Interval| Pipelined| FF| LUT|
+----------------------+------+---------+--------+---------+----------+---+-----+
|+ gray2bin_paral_hls | 0.16| 1| 1.000| 1| yes| 58| 162|
+----------------------+------+---------+--------+---------+----------+---+-----+
HLS desgin QamMod directory demonstrates 4, 16, 64, 256 QAM modulation examples with binary symbol order. Their implementations are all derived from QamMod.h, thus it is easy to expand to higher modulation formats, such as 1K QAM.
For non-normalized QAM modulation, symbol space is 2, and symbol order of 16-QAM as an example is shown as figure below. By using bin2gray converter, gray-code ordering is easily achieved.
Based on non-normalized QAM modulation, fixed-point numbers are used to represent constellation, and the average power is normalized to 1. As a comparison of the figure above, normalized 16-QAM constellation is plotted below.
Normalized QAM16 example
FPGA resource comsumption for 4-, 16-, 64-, 256- QAM are listed below, with constellation is represented by std::complex<ap_fixed<20, 2>>, in which port-level and block-level are removed and therefore they are considered as combinational logic. For device xcvu9p-flga2104-2-i, their speed, with constraint of 1 ns, are shown in the table as well.
4- QAM | 16- QAM | 64- QAM | 256- QAM |
---|---|---|---|
LUT: 2; Others: 0 | LUT: 4; Others: 0 | LUT: 12; Others: 0 | LUT: 34; Others: 0 |
Max path delay: 0.038ns (logic 0.038ns (100.000%) route 0.000ns (0.000%)) | Max path delay: 0.038ns (logic 0.038ns (100.000%) route 0.000ns (0.000%)) | Max path delay: 0.249ns (logic 0.229ns (91.968%) route 0.020ns (8.032%)) | MAX path delay: 0.593ns (logic 0.328ns (55.312%) route 0.265ns (44.688%)) |
Demodulation algorithm is reverse process of QAM modulation, which is implemented in QamMod.h.
All the examples of normalized QAM modulations use std::complex<ap_fixed<20, 2>> to represent constellations, which is not identical to ideal double datetype. Define the MSE as
As an example, steps to check MSE of 256 QAM follows below.
cd QamMod/vitis_hls
vitis_hls -f run_hls_Qam256_normalized.tcl
cd ../matlab && matlab
>> CheckQamResult(256, 1)
For normolized256QAM
average power of HLS QAM output is 0.999947576929117
average power of ideal QAM output is 1.000000000000000
quantization mse -91.630037 db
The following figure shows decreasing MSE when more fractional bits are used to represent constellations.
# generate test data and coefficients from Matlab
cd FIR/matlab
matlab -nodesktop -nojvm -r "run('getTestData.m'); exit"
# run vitis model, vitis project is located in the directory named proj
cd ../vitis_hls
vitis_hls -f run_hls.tcl
# run matlab script to check the quality of results
cd ../matlab
matlab -nodesktop -r "run('checkDutResult.m');"
# type exit to quit Matlab CLI
# generate test data and coefficients from Matlab
cd FIR/matlab
matlab -nodesktop -nojvm -r "run('getTestData_cplx.m'); exit"
# run vitis model, vitis project is located in the directory named proj_cplx
cd ../vitis_hls
vitis_hls -f run_hls_fir_cplx.tcl
# run matlab script to check the quality of results
cd ../matlab
matlab -nodesktop -r "run('checkDutResult_cplx.m');"
# type exit to quit Matlab CLI
Two Matlab scripts are used to generate stimulus, run HLS module, and check the quality of DUT output. check_dut_output_double.m runs double representation of NCO and check_dut_output_fixed.m run fixed point number representation of NCO. In the NCO implementation, a LUT containing 512 sinusoid values, ranging from
For double-number model, the SNR is derived as the follows.
Assume sine or coine from
and
, where
The SNR of double precision model is
The phase resolution (
By summarizing above, the final SNR is
In this example, N is 9. The theoretical SNR is 61.0494, and the simulated SNR is 61.0548
Double precision model simulation
If NCO output is quantized to fixed point data, 18 bits in the provided example, the SNR decreases to 55.034 dB.
Fixed-point precision model simulation
The most contribution to the noise is cased by the phase noise during quantization. The phase noise can be easily obtained and the output error can be compensated by using the product of phase noise and derivative at the selected phase.
Figure above shows the downsampling system, where H(z) is a low-pass filter. However, from HW implementation perspective, it is not efficient. Becuase H(z) is running at original frequency, but downsampling module would discard
The Z-transform of digital signal
This can be rewritten as
Note that
Figure below shows downsampling system, where H(z) is replaced by its polyphase form.
Shown in the following section, the above structure is equivalent to the structure below, where signals are downsampled first.
Matlab script demo_polyphase_filter.m shows the above two systems have the same output results.
In the figure below, y1[n] is equal to y2[n].
In frequency domain, the Fourier transform of X'[n] is