abstract.tex

%%%%%%%%%%%%%%%%%%%
\chapter*{Abstract}
%%%%%%%%%%%%%%%%%%%
Software implementation of error correction and signal processing applications on general purpose processors has gained interest in recent times. Mainly due to latest technological developments in general purpose computing world. Implementations in software have an inherent advantage not being tied to one specific hardware architecture compared to FPGA or ASIC based implementations. They require much less development and maintenance effort compared to hardware implementations. For the device manufacturer software implementation provides platform flexibility in addition to reducing the cost of product. \newline

In this thesis, we study the feasibility of developing a complete polar FEC chain of $5^{th}$ generation cellular mobile communication standard \cite{3gpp.38.212} in software. Specifically on general purpose processors. Thesis work attempts to achieve stringent latency requirements through software, algorithmic and platform specific optimizations. Many algorithms in FEC chain are optimized for hardware implementations. Direct implementation of these algorithms in software results in poor performance. To obtain best performance in terms of latency on general purpose processors, these algorithms are modified or reformulated to suit processor architecture and software implementation. Initially both encoding and decoding FEC chains are implemented naively without any optimization. Code profiling is performed on this naive implementation to identify the significant latency contributors. We split algorithms of significant latency contributing components into primitive operations. These primitive operations are optimized either with software optimizations or mapped to specialized functional units of a general purpose processor to achieve best performance. Specialized units include vector processing units (SSE, AVX and AVX2) and cache-prefetching units. \newline

We concentrate on polar encoding and decoding FEC chain which are used to transmit and receive control information. Latency contributing components are identified. Algorithms of those components are reformulated to avoid or to reduce latency contributing operations. Major latency contributors in encoding FEC chain are the cyclic redundancy check (CRC) calculation, the polar code construction itself and polar encoding. For the decoding FEC chain subblock deinterleaver, polar decoder, parity bit extraction and CRC calculation constitute the major bottlenecks. Algorithms of these components are reformulated to suit software requirements and implemented using efficient \emph{vector processing instruction sets}. Algorithms are modified to reduce complexity and lookup tables are used to avoid complex computations. Other optimizations include function unrolling, avoiding superfluous copy operations, hints for the compiler for better instruction scheduling and block wise copying et cetera. At the end of both encoding and decoding chapter latency comparisons between naive and optimized implementations are presented. In decoding FEC chain chapter, latencies of decoder of this work and state of the art decoder are compared.