Saving a Huffman Tree compactly in C++ -
let's i've encoded huffman tree in compressed file. have example file output:
001a1c01e01b1d
i'm having issue saving string file bit-by-bit. know c++ can output file 1 byte @ time, i'm having issue storing string in bytes. possible convert first 3 bits char without program padding byte? if pads byte traversal codes tree (and codes) messed up. if chop 1 byte @ time, happens if tree isn't multiple of 8? happens if compressed file's bit-length isn't multiple of 8?
hopefully i've been clear enough.
the standard solution problem padding. there many possible padding schemes. padding schemes pad number of bytes (i.e., multiple of 8 bits). additionally, encode either length of message in bits, or number of padding bits (from message length in bits can determined subtraction). latter solution results in more efficient paddings.
most simply, can append number of "unused" bits in last byte additional byte value.
one level up, start assuming number of padding bits fits in 3 bits. define last 3 bits of encoded file encode number of padding bits. if message uses no more 5 bits of last byte, padding can fit nicely in same byte. if necessary add byte contain padding, maximum gap 5+2=7 (5 unused high bits of byte, , 2 maximum possible space free in last byte, otherwise 3-bit padding value would've fit there). since 0-7 representable in 3 bits, works (it doesn't work 2 bits, since maximum gap larger , range of representable values smaller).
by way, 1 of main advantages of placing padding information @ end of file (rather header @ beginning of file) compression functions can operate on stream without having know length in advance. decompression can stream-based well, careful handling of eof signals.
Comments
Post a Comment