libfly  6.2.2
C++20 utility library for Linux, macOS, and Windows
fly::coders::HuffmanEncoder Class Reference

#include <huffman_encoder.hpp>

Inheritance diagram for fly::coders::HuffmanEncoder:
Collaboration diagram for fly::coders::HuffmanEncoder:

Public Member Functions

 HuffmanEncoder (const std::shared_ptr< CoderConfig > &config) noexcept
 
- Public Member Functions inherited from fly::coders::Encoder
virtual ~Encoder ()=default
 
virtual bool encode_string (const std::string &decoded, std::string &encoded)
 
virtual bool encode_file (const std::filesystem::path &decoded, const std::filesystem::path &encoded)
 

Protected Member Functions

bool encode_binary (std::istream &decoded, fly::BitStreamWriter &encoded) override
 
- Protected Member Functions inherited from fly::coders::BinaryEncoder
bool encode_internal (std::istream &decoded, std::ostream &encoded) final
 

Detailed Description

Implementation of the Encoder interface for Huffman coding. Forms length-limted, canonical Huffman codes to encode symbols.

Author
Timothy Flynn (trfly.nosp@m.nn89.nosp@m.@pm.m.nosp@m.e)
Version
July 7, 2019

Constructor & Destructor Documentation

◆ HuffmanEncoder()

fly::coders::HuffmanEncoder::HuffmanEncoder ( const std::shared_ptr< CoderConfig > &  config)
explicitnoexcept

Constructor.

Parameters
configReference to coder configuration.

Member Function Documentation

◆ encode_binary()

bool fly::coders::HuffmanEncoder::encode_binary ( std::istream &  decoded,
fly::BitStreamWriter encoded 
)
overrideprotectedvirtual

Huffman encode a stream.

If the input stream is large, in order to limit memory usage, the stream is encoded in chunks. Each chunk is treated as its own input stream, and the encoding sequence is repeated for each chunk.

The first bytes of the output stream are reserved as a header. Currently, the header contains: the incurred BitStream header, the version of the Huffman coder used to encode the stream, the maximum chunk length used to to split large streams (in kilobytes), and the maximum allowed Huffman code length:

|      8 bits      |  8 bits |      16 bits      |      8 bits     |
--------------------------------------------------------------------
| BitStream header | Version | Chunk length (KB) | Max code length |

The sequence to encode a stream is:

1. Create a Huffman tree from the input stream.
2. Generate standard Huffman codes from the Huffman tree.
3. Length-limit the standard Huffman codes.
4. Convert the length-limited codes to canonical Huffman codes.
5. Encode the canonical codes.
6. Encode the input stream using the canonical codes.

This sequence involves iterating over the entire input stream twice (to create the Huffman tree and to encode the stream).

The coder does not assume the Huffman codes are retained between calls. Thus, the codes are encoded before the input stream (step 5) so that they may be learned during decoding.

Length-limiting is performed on the generated Huffman codes to improve decoder performance. Worst-case, a Huffman code could have the same length as the maximum number of symbols. Limiting the length of Huffman codes awards a significant decoder performance improvement, while only incurring a small cost in compression ratio.

Canonical form is used for its property of generally being describable in fewer bits than standard form. When in canonical form, the Huffman codes are sorted by code length. With this sorting, the count of the number of symbols for each code length is computed: (N0, N1, N2, ..., Nn), where N<n> is the number of symbols of code length <n>. Call the length of this list NN.

The encoding of canonical Huffman codes then becomes:

 NN,N0,N1,N2,...,Nn,S0,S1,S2,...,Sn

Where S<n> is all symbols of code length <n>.

Encoding the input stream (step 6) consists of reading each symbol from the input stream and outputting that symbol's canonical Huffman code.

Parameters
decodedStream holding the contents to encode.
encodedStream to store the encoded contents.
Returns
True if the input stream was successfully encoded.

Implements fly::coders::BinaryEncoder.


The documentation for this class was generated from the following files: