Skip to content

Existing Algorithms

Aswini S edited this page Jan 9, 2015 · 4 revisions

(i) Addressing and Analyzing the Existing algorithms and limitations:-

1. LZ77:-

It is a sliding window technique where the dictionary consists of a set of fixed length phrases from a window into the previously processed text. This algorithm uses "lookahead buffer" and "search buffer". The lookahead buffer is filled with the first symbols of the data that has to be encoded, whereas the search buffer is filled with pre-defined symbols of the input alphabet. We dig the code starting from the search buffer and then to the lookahead buffer. Since LZ77 faced the drawbacks in strings replacement, So LZSS had to grab the place of LZ77.

2. Lempel Ziv Storer Szymanski (LZSS):-

LZSS (Lempel Ziv Storer Szymanski) is termed as "a dictionary encoding technique that attempts to replace a string of symbols with reference to a dictionary location of the same string''.

Comparison of LZSS over LZ77:-

 In LZ77, the phrases in the text window were stored as a single contiguous block of text. LZSS stores text in contiguous manner, but it creates an additional data structure that improves on the organization of the phrases.  At each phrases passes out to lookahead buffer and into encoded portion of the text windows in LZ77. The same followed in LZSS, but it adds the phrase to a tree structure which is a binary search tree. The savings are created by using the tree makes the compression efficient as well as encourages experiments with longer window sizes.  LZ77 output tokens consist of a phrase offset, a match length and a character that followed the phrase. LZSS instead allows pointers and characters to be freely inter-mixed.  Under LZ77, the encoder output a dummy match position with a length of zero for every symbol. LZSS instead uses a single bit as a prefix to every output to indicate whether it is an offset/length pair or a single symbol for the output.

There were also some drawbacks noticed in LZSS algorithm:-
 LZSS compresses poorly when it starts since it does not have any data in the text window for matches.  LZSS faces trouble with preloading, where it has no idea what type of data will come in the input stream.  The compression speed of LZSS may suffer as the window grows due to the extra work required to navigate and maintain a larger binary tree.  The compression of smaller files in LZSS will suffer due to increased time required to build a full dictionary.  Finally, the LZSS faces inefficiency in handling the duplicate strings. It’s possible to free up the space used by these duplicate strings in the text window allowing for expansion of the dictionary. However, the side effect is the decompression program has to keep track of the duplicate strings, which will result in significant cutback in expansion speed.

3. LZ78:-

Instead of fixed length phrases from a window into the text, it just builds up phrases into one symbol at a time by adding a new symbol to an existing phrase when a match occurs.

There were two drawbacks analyzed in LZ78 algorithm:

 While parsing the input string, the related information crossing the phrase boundaries is lost. In many situations, there would be significant number of patterns crossing phrase boundaries and these patterns would probably affect the next symbol in that sequence. The convergence rate of the LZ78 to the optimal predictability is slow.

4. Lempel Ziv Markov chain Algorithm (LZMA):-

The Lempel Ziv Markov chain Algorithm (LZMA) is used in the performance of lossless data compression, where lossless data compression is termed as the reduction of size restoring the original one without the loss of data. The algorithm is similar and more efficient than the existing LZ77 algorithm. The idea of LZMA comes from the dictionary based schemes (LZ77/LZ78) and markov models where it uses range encoder. LZMA was derived from LZ77 and LZ78, which is combination of dictionary based scheme and Markov chains. LZMA uses range coding. In other algorithms, Huffman encoding is used. So, what is difference between the range encoding and Huffman encoding? In range encoding, all symbols in a message gets encoded into one number. But in Huffman encoding, each symbol is assigned with a bit pattern and then, all the bit patterns are linked together. Hence, higher compression is seen in Range Encoding.

Markov chain is termed as the mathematical system which undergoes transitions from one state to another on a finite state space. The Markov chain can be represented in probabilistic state diagrams where the transition between the finite states are being labelled with the probabilities of occurrences. Where Markov chain is used in LZMA? It is used in parsing of source data, Encoding and in Finite State Compressor (Finite state machine). In what ways the Markov chains related with LZMA?? 1. Variable-to-Variable length codes are used in which the number of source symbols are encoded and number of bits encoded as per the codeword. If the future behavior of a process depends only on the present state, but not on the past, then the process is termed as the markov process. Let X (t) = number of bytes transmitted in time t, so that the sequence [X (t), t € [0, ∞]] forms a pure web transmission process, which helps in determining the web performance. The Markov chain is not that efficient when compared to Markov Chain Monte Carlo (MCMC) method and Hidden Markov method. Now, this needs to be improved.

The range encoding is termed as one of the entropy coding method where it produces a space efficient stream of bits for representing a stream of symbols and their probabilities. The range decoding reverses the encoding process. The range coding is also better than the arithmetic coding since it performs fast arithmetic coding. Range coding performs faster arithmetic coding. It renormalizes one byte at a time, rather than one bit at a time. Thus, it runs 2x faster. It uses bytes as encoding digits instead of bits. LZMA has issues in both statistical coding as well as dictionary coding where the performance is minimal.

(ii)Comparison Summary
1. LZ77
--- Fixed Length phrases –> Lookahead Buffer and Search Buffer --- Limitations – Fails in string replacement.

2. LZSS
--- LZSS – String replacement.
--- Limitations – No strings matching and Fails in Preloading.

3. LZ78
--- Both String matching and String replacement.
--- Limitations – Parsing issues and Optimal predictability is low.

4. LZMA
--- Parsing and Encoding of source data, finite state compression using markov chain.
--- Limitations – Limited speed in compression & attains low compression than Gzip which follows DEFLATE algorithm.

Analyzing the issues:-

Through analyzing the issues in the exisiting algorithms, it s found that there is no best algorithm to solve any issue beyond text and images. Likewise, in LZMA, it successfully compresses text, but lacks effective compression power in the areas of images. The Deflate and LZW successfully does good compression (though it lacks at times) in the areas of text and images, but not yet beyond it. There is no effective algorithms proposed for overall text, images, audio and video.