Generate Huffman codebooks! Huffman codes are the optimal way to compress individual symbols into a binary sequence that can be unambiguously decoded without inter-symbol separators (it is "prefix-free").
Provided an iterable of 2-tuples in (symbol, weight)
format, generate a Huffman codebook, returned as a dictionary in {symbol: code, ...}
format.
>>> huffman.codebook([('A', 2), ('B', 4), ('C', 1), ('D', 1)]) {'A': '10', 'B': '0', 'C': '110', 'D': '111'}
If you have an iterable of symbols, the collections.Counter
is a handy way to tally them up.
>>> huffman.codebook(collections.Counter('man the stand banana man').items()) {' ': '111', 'a': '10', 'b': '0100', 'd': '0110', 'e': '11010', 'h': '0101', 'm': '1100', 'n': '00', 's': '11011', 't': '0111'}