- Overview
- Tree Manipulation
- Example Tree Manipulation
- Encoding Procedure
- Decoding Procedure
- Implementations
- References and More Info

Huffman coding suffers from the fact that the uncompresser need have some knowledge of the probabilities of the characters in the compressed files. Not only can this add somewhat to the bits needed to encode the file, but, if this crucial piece of knowledge is unavailable, compressing the file requires two passes- one pass to find the frequency of each character and construct the huffman tree and a second pass to actually compress the file. Expanding on the huffman algorithm, Faller and Gallagher, and later Knuth and Vitter, developed a way to perform the huffman algorithm as a one pass procedure.(Sayood 55)

While the methods of each of these eminent gentlemen differ
slightly, the discrepancies do not effect the basic **"adaptive
huffman"** algorithm. The differences between the two methods
will be explored in greater depth later in this document.

The fact that the file is encoded dynamically has significant implications for the effectiveness of the tree as a encoder and decoder. Because the Huffman tree is constructed from the character counts of the file as a whole, it works effectively at a universal level. Adaptive Huffman coding also works at a universal level, but is far more effective than static huffman coding at a local level because the tree is constantly evolving. Say, for example, a file starts out with a series of a character that are not repeated again in the file. In static Huffman coding, that character will be low down on the tree because of its low overall count, thus taking lots of bits to encode. In adaptive huffman coding, the character will be inserted at the highest leaf possible to be decoded, before eventually getting pushed down the tree by higher-frequecy characters.

At the heart of the Adaptive Huffman algorithm is the sibling
property which states "each node (except the root) has a sibling
and the nodes can be listed in order of non-increasing weight with
each node adjacent to its siblings"(Data
Compression Library) To maintain this property, we will
keep track of each nodes **order** and its **weight** where
the order is simply used as a numbering system for the weights that
increases left to right, top to bottom, and where the weight keeps
track of the number of times the value contained in the node has
occured so far in the file, with nodes that do not contain values
(like the root and other internal nodes) having a weight equal to
the sum of their two child nodes. Nodes with higher orders will also have correspondingly
higher weights. Below is a declaration of the struct Tree used for the
adaptive hufman algorithm as well as an example tree.

struct Tree { int val; //8-bit character contained in tree node int weight; //number of times val has occured in file so far int order; //ordering system to track weights Tree* left; Tree* right; Tree* parent; Tree::Tree(int value, int wei, int num, Tree* l, Tree* r, Tree* p) :val(value), weight(wei), order(num), left(l), right(r), parent(p) {} Tree::Tree() :val(0),weight(0),order(0), left(0), right(0), parent(0) {} }

Although you do not yet know how the tree was set up, it should be obvious that the weights are set up in an numbering structure similar to a heap and that the order is determined in reverse in-order. The ordering starts at 512 because there are 256 characters that can be represented with 8 bits and a maximum total of internal nodes equal to 255, with an additional node granted for the NYT node, which will be dealt with later. For this assignment, the numbering is only important in that the ordering should remain positive. More advanced versions of the program may use mathematical prinicples to prove how many bits are needed and thereby decrease the number of bits needed to recognize a character.

This algorithm is called adaptive huffman coding because the tree is adaptive- it is created simultaneously with either the compressed or uncompressed file as it reads in the other. The tree is dynamic and changes after the reading of every character. Both the compresser and decompresser use the same method to construct the tree and therefore can decode what the other has written.

The tree is manipulated as the file is read to maintain
the following properties:

- Each node has a sibling
- Node's with higher weights have higher orders
- On each level, the node farthest to the right will have the highest order although there might be other nodes with equal weight
- Leaf nodes contain character values, except the Not Yet Transmitted(NYT) node which is the node whereat all new characters are added
- Internal nodes contain weights equal to the sum of their children's weights
- All nodes of the same weight will be in consecutive order.

Now to get into the nitty-gritty of the actual manipulation. Every tree contains a root and a NYT node, where the NYT node is the node with the lowest order in the tree. When a character is read in from a file, the tree is first checked to see if it already contains that character. If it doesn't, the NYT node spawns two new nodes. The node to its right is a new node containing the character and the new left node is the new NYT node. If the character is already in the tree, you simply update the weight of that particular tree node. In some cases, when the node is not the highest-ordered node in its weight class, you will need to swap this node so that it fulfills the property that nodes with higher weight have higher orders. To do this, before you update the node's weight, search the tree for all nodes of equal weight and swap the soon-to-be updated value with the highest ordered node of equal weight. Finally update the weight.

However in both cases for inserting values, weights are changed for a leaf and this change will effect all nodes above it. Therefore, after you insert a node, you must check the parent above the node following the same procedure you followed when updating already seen values. Check to see whether the node in question is the highest order node in its weight class prior to updating. If not, swap with the node that is the highest order making sure to reassign only the pointers to the two nodes being swapped.

**NOTE:** There are several important checks that need to be
in place prior to any swapping being done:

- The root should never be swapped with anything
- Remember that you are moving up the tree so things above are not updated. Therefore, be sure never to swap a node with its parent.
- Although the pointers must be swapped in the tree, be sure to reset the order to fit the new arrangement. The orders of the two swapped nides should not be swapped- or if they are, should be re-swapped. Order is not a measure related to the value in a node- it is related to that node's position in the tree.

Below is a flowchart detailing the tree manipulation process as carried out by both the encoder and decoder:

Figure 2: Tree Manipulation
Procedure:
Be sure to notice the key verbs here: insertnew
value, give birth to new nodes, update
weight, check if max in weight class, swap,
isRoot, move to parent. Not all of these
will be functions, but these actions will form the basis
of a tree manipulation class for encoding and
decoding. |

Chart taken in near entirety from Sayood, 57

For help relating this chart to real world experience- take a look at this example which depicts the tree manipulations done when adding the string "aardv." Or, for a quick refresher, you can check out this animated version of the process.

Once you have the functions of your tree manipulation working correctly, it is relatively easy to complete the encoding and decoding parts of adaptive huffman coding. To encode, you simply read through the file to be compressed one character at a time. If you have seen the character before, you write to the output file the root to leaf path with 1 demarkating a move right and a 0 marking a move left- the same as you would in static huffman coding. If the character is new, write out the root to leaf path of the NYT node to alert the decoder that a new character follows. Then write out the new character itself. Use nine bits in anticipation of the PSEUDO_EOF. Finally update the tree by calling the appropriate insert function with the new value. Read through the entire file in this manner and when you are done, manually write out the final root to NYT path followed by the PSEUDO_EOF character. This procedure is outlined below.

Sayood, 59

The decoding procedure is very similar to the encoding procedure and
shoudl be easy to figure out based on the information in the
previous section. To uncompress the compressed file, read it in
one bit at a time, traversing the tree as it is up to that point.
Eventually, you will come to a leaf. If that leaf has a character
value, write out the eight-bit translation of the character to the
uncompressed file and then update the count of that character in
the tree, making sure all necessary changes are made to the tree
as the whole. If the leaf is the NYT node, read in the next nine
bits and write out the eight bit translation of that character.
Then insert the new character in the tree. ** It is extremely
important to remember that the compresser and decompresser,
although reading in characters in different manners, should
construct trees exactly the same. At any given point in a file
in either operation, the trees would be the same if compared.
Do not confuse this with saying that the compresser and
decompresser run simultaneously. They don't. But they do
construct the same trees after reading the same information. You
might even say that they use the same "Adaptive Huffman Tree
class"- but that might be just a tad presumptuous.**

Below is a diagram of the decoding procedure:

Sayood, 62

There are several methods for implementing the adaptive huff algorithm. The FGK algorithm works reasonably well and is easy to implement, but far inferior to the Vitter algorithm in minimizing the height of the tree to shorten the time taken to find a root-to-leaf path. The Vitter algorithm keeps the height of the tree to an absolute minimum but is hard to implement so that the program runs at a reasonable speed. Below is a short outline of both methods, keeping in mind that the method described in the above sections follows the Vitter pattern.

- Insert element in the tree
- Increment weight of node, keep track of path and weight
- Check weight against right siblings, swap if necessary
- Move to parent
- Check weight against parent's children, swap if necessary
- Keep track of path from leaf to root so far, keep track of new node's weight
- Continue up tree keeping track of path, swapping weights with right siblings if lower

Jonathan Low Last modified: Wed Jan 2 17:45:33 EST 2008