Huffman Coding: A Nifty CS2 Assignment

What's Nifty about this Assignment?

A favorite assignment in our CS2/Data Structures course is the implementation of a pair of programs for data compressions using Huffman coding. This is a moderately sized project that combines several of the different data structures we have studied during the semester. We normally allow ten days to two weeks for all our assignments. The completed programs can compress and uncompress any file --- one of the tests is running the compression program on itself. Implementing the programs requires care in debugging and testing since the output is a sequence of bits that isn't readable using text editors. Most students build test and debugging methods as part of the program or eventually wish that they had.

Instructors of the course like the assignment because the implementation requires the use of several data structures: vectors to count character frequencies, maps/tables that map characters to a coding pair (bit sequence and number of bits), binary trees or tries for determining the coding pairs during compression and for determining characters during uncompression, priority queues for building the coding tree/trie. We typically assign the program near the end of the course when we have covered all these topics. There is a great deal of flexibility in terms of what students are expected to implement. Although we supply only a library for reading and writing n bits-at-a-time, other libraries could be supplied as well. Methods for using and modifying the project are provided.

Students like the the assignment because they build a demonstrably useful program from scratch. There is room for distinction in the program as well since students are free to develop different methods for storing information at the beginning of a compressed file that allows the file to be uncompressed. For example, the simplest method stores character frequencies, but this is expensive in storage, e.g., 32 bits per character frequency. Alternatively, the tree/trie can be stored using a pre-order traversal which leads to substantial savings.

Owen L. Astrachan
Last modified: Fri Dec 2 12:19:14 EST 2016