Huffman Codes β How Do They Work?
This is how I’ve spent my time at work the past couple of weeks, roughly:
Task | Fraction of time |
---|---|
Reading | 21 % |
Writing tests | 17 % |
Meetings | 13 % |
Debugging | 10 % |
Code review | 8 % |
Toolmaking | 7 % |
Lunch | 6 % |
Slack | 5 % |
Standup | 5 % |
Programming | 3 % |
Documenting | 2 % |
Corporate espionage through post-it notes
Tomorrow, you (a corporate spy!) will be in the office across the street and I want to signal to you what I’ve been doing throughout the day. I will do so by putting up post-it notes on a window. I have three colours of post-its: π₯, π¨ and π©.1 If a lot of people have problem with their fonts not rendering these unicode characters as coloured squares, let me know.
We have just three colours, so we cannot assign one colour to each task β that would only let us communicate the three most common tasks. What if we use two colours? Still not quite there, since that only lets us communicate \(3^2 = 9\) different tasks. NaΓ―vely, we might think we need three colours per task, resulting in the following codebook2 This is practically counting in ternary, except using coloured post-its instead of the digits 0,1,2.:
Codeword | Task |
---|---|
π© π© π© | Reading |
π© π© π¨ | Writing tests |
π© π© π₯ | Meetings |
π© π¨ π© | Debugging |
π© π¨ π¨ | Code review |
π© π¨ π₯ | Toolmaking |
π© π₯ π© | Lunch |
π© π₯ π¨ | Slack |
π© π₯ π₯ | Standup |
π¨ π© π© | Programming |
π¨ π© π¨ | Documenting |
Now if you look over at my window and you see this:
π© π© π₯ π© π© π₯ π© π© π¨ π© π¨ π¨ π© π© π© |
you know that I first spent quite some time in meetings (2Γ π© π© π₯), then worked on tests (π© π© π¨), reviewed code (π© π¨ π¨) and finally did some reading (π© π© π©). But that was a long sequence of post-its! If my boss walks past, he will think it looks very suspicious and tear them down. Using this codebook, communicating \(n\) tasks will always take \(3n\) post-it notes.
How can we approach creating a codebook that results in the shortest coded message on average, to avoid rousing suspicion?
Huffman codes are short on average
The key idea is that for a rare task, like programming or documentation, we can afford to use a long codeword. In fact, if it lets us use a short codeword for something common like reading, we should be fine with needing more than three post-it notes to encode something rare like documentation.
The Huffman code is an algorithm we can follow that generates an optimal codebook. Here’s how to do it.
First, list all tasks in order of frequency. Check β we already did.
Task | Fraction of time |
---|---|
Reading | 21 % |
Writing tests | 17 % |
Meetings | 13 % |
Debugging | 10 % |
Code review | 8 % |
Toolmaking | 7 % |
Lunch | 6 % |
Slack | 5 % |
Standup | 5 % |
Programming | 3 % |
Documenting | 2 % |
Assign post-it colours to the three least common tasks (to keep things simple, always assign colours in the same order):
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
Writing tests | 17 % | |
Meetings | 13 % | |
Debugging | 10 % | |
Code review | 8 % | |
Toolmaking | 7 % | |
Lunch | 6 % | |
Slack | 5 % | |
Standup | 5 % | π© |
Programming | 3 % | π¨ |
Documenting | 2 % | π₯ |
Now, and this is going to be a little tricky to follow, but mentally group these three together into a supertask, that has frequency 5Β +Β 3Β +Β 2Β =Β 10Β %. Move this supertask up to its correct place in the frequency table.
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
Writing tests | 17 % | |
Meetings | 13 % | |
(1) Standup | 5 % | π© |
(1) Programming | 3 % | π¨ |
(1) Documenting | 2 % | π₯ |
Debugging | 10 % | |
Code review | 8 % | |
Toolmaking | 7 % | |
Lunch | 6 % | |
Slack | 5 % |
Assign again colours to the three least common tasks.
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
Writing tests | 17 % | |
Meetings | 13 % | |
(1) Standup | 5 % | π© |
(1) Programming | 3 % | π¨ |
(1) Documenting | 2 % | π₯ |
Debugging | 10 % | |
Code review | 8 % | |
Toolmaking | 7 % | π© |
Lunch | 6 % | π¨ |
Slack | 5 % | π₯ |
Mentally group them together the same way, and the group has a combined frequency of 7+6+5=18Β %, so it goes higher up in the table.
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
(2) Toolmaking | 7 % | π© |
(2) Lunch | 6 % | π¨ |
(2) Slack | 5 % | π₯ |
Writing tests | 17 % | |
Meetings | 13 % | |
(1) Standup | 5 % | π© |
(1) Programming | 3 % | π¨ |
(1) Documenting | 2 % | π₯ |
Debugging | 10 % | |
Code review | 8 % |
Assign post-its to the three least common β but treat the entire group (1) as a single entity this time. It means all three items in group (1) gets prefixed with one colour.
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
(2) Toolmaking | 7 % | π© |
(2) Lunch | 6 % | π¨ |
(2) Slack | 5 % | π₯ |
Writing tests | 17 % | |
Meetings | 13 % | |
(1) Standup | 5 % | π© π© |
(1) Programming | 3 % | π© π¨ |
(1) Documenting | 2 % | π© π₯ |
Debugging | 10 % | π¨ |
Code review | 8 % | π₯ |
Then group these together β treating the group (1) as a unit again. Note that the entire group (1) has a probability of 10Β %, so the sum for group (3) is 10+10+8Β %, which launches it into the top of the table.
Task | Fraction of time | Code |
---|---|---|
(3) (1) Standup | 5 % | π© π© |
(3) (1) Programming | 3 % | π© π¨ |
(3) (1) Documenting | 2 % | π© π₯ |
(3) Debugging | 10 % | π¨ |
(3) Code review | 8 % | π₯ |
Reading | 21 % | |
(2) Toolmaking | 7 % | π© |
(2) Lunch | 6 % | π¨ |
(2) Slack | 5 % | π₯ |
Writing tests | 17 % | |
Meetings | 13 % |
Repeat again.
Task | Fraction of time | Code |
---|---|---|
(4) (2) Toolmaking | 7 % | π© π© |
(4) (2) Lunch | 6 % | π© π¨ |
(4) (2) Slack | 5 % | π© π₯ |
(4) Writing tests | 17 % | π¨ |
(4) Meetings | 13 % | π₯ |
(3) (1) Standup | 5 % | π© π© |
(3) (1) Programming | 3 % | π© π¨ |
(3) (1) Documenting | 2 % | π© π₯ |
(3) Debugging | 10 % | π¨ |
(3) Code review | 8 % | π₯ |
Reading | 21 % |
At this point, we only have three groups remaining: (4), (3), and reading. We prefix with one colour each again.
Task | Fraction of time | Code |
---|---|---|
(5) (4) (2) Toolmaking | 7 % | π© π© π© |
(5) (4) (2) Lunch | 6 % | π© π© π¨ |
(5) (4) (2) Slack | 5 % | π© π© π₯ |
(5) (4) Writing tests | 17 % | π© π¨ |
(5) (4) Meetings | 13 % | π© π₯ |
(5) (3) (1) Standup | 5 % | π¨ π© π© |
(5) (3) (1) Programming | 3 % | π¨ π© π¨ |
(5) (3) (1) Documenting | 2 % | π¨ π© π₯ |
(5) (3) Debugging | 10 % | π¨ π¨ |
(5) (3) Code review | 8 % | π¨ π₯ |
(5) Reading | 21 % | π₯ |
That’s it! This is our Huffman code. With this new, more efficient codebook, the message from before becomes
π© π₯ π© π₯ π© π¨ π¨ π₯ π₯ |
This is indeed the same message: meetings (2Γ π© π₯), tests (π© π¨), review (π¨ π₯) and finally reading (π₯). And we got the message across using just 9 post-it notes, compared to the 15 from before.
Here are two, perhaps obvious features of the Huffman code3 There is also one thing that confused me at first, but after brief thought I realised how satisfying it is. There is only one task that gets assigned a single post-it note, namely reading (π₯). Instinctively, I would think the code would get shorter if also the next most common task, writing tests, would get a single post-it note. But! If both reading and writing tests got a single-post-it codeword, then we would not be able to have three other two-post-it codewords (writing tests (π© π¨), meetings (π© π₯), debugging (π¨ π¨), and code review (π¨ π₯)), and with the distribution given, that would result in a worse average message length..
- Short codewords are reserved for common tasks. And indeed, it can assign longer-than-necessary codes to very rare tasks.
- Codewords are immediately decodable β once you reach the end of a codeword, you’ll know it. There is no ambiguity and you don’t need to read ahead to understand a previous codeword.
Sometimes it doesn’t work out so nicely
The extremely sharp-eyed reader will have noticed that the percentages in the table don’t quite add up to 100Β %. This is because in the real data, I have another entry for “others” at 3 %.
Task | Fraction of time |
---|---|
Reading | 21 % |
Writing tests | 17 % |
Meetings | 13 % |
Debugging | 10 % |
Code review | 8 % |
Toolmaking | 7 % |
Lunch | 6 % |
Slack | 5 % |
Standup | 5 % |
Programming | 3 % |
Miscellaneous | 3 % |
Documenting | 2 % |
If we tried the above process with this table, we would find at the end that we have two groups left and three colours of post-its to hand out. The solution is to pad out the table with dummy tasks that happen 0Β % of the time. Here’s how it would start:
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
Writing tests | 17 % | |
Meetings | 13 % | |
Debugging | 10 % | |
Code review | 8 % | |
Toolmaking | 7 % | |
Lunch | 6 % | |
Slack | 5 % | |
Standup | 5 % | |
(1) Miscellaneous | 3 % | π© |
(1) Documenting | 2 % | π¨ |
(1) Dummy | 0 % | π₯ |
Programming | 3 % |
And then we continue, treating the group (1) as a unit:
Task | Fraction of time | Code |
---|---|---|
Reading | 21 % | |
Writing tests | 17 % | |
Meetings | 13 % | |
(2) Standup | 5 % | π© |
(2) (1) Miscellaneous | 3 % | π¨ π© |
(2) (1) Documenting | 2 % | π¨ π¨ |
(2) (1) Dummy | 0 % | π¨ π₯ |
(2) Programming | 3 % | π₯ |
Debugging | 10 % | |
Code review | 8 % | |
Toolmaking | 7 % | |
Lunch | 6 % | |
Slack | 5 % |
And after a few more steps:
Task | Fraction of time | Code |
---|---|---|
(5) (3) Toolmaking | 7 % | π© π© |
(5) (3) Lunch | 6 % | π© π¨ |
(5) (3) Slack | 5 % | π© π₯ |
(5) Writing tests | 17 % | π¨ |
(5) Meetings | 13 % | π₯ |
(4) (2) Standup | 5 % | π© π© |
(4) (2) (1) Miscellaneous | 3 % | π© π¨ π© |
(4) (2) (1) Documenting | 2 % | π© π¨ π¨ |
(4) (2) (1) Dummy | 0 % | π© π¨ π₯ |
(4) (2) Programming | 3 % | π© π₯ |
(4) Debugging | 10 % | π¨ |
(4) Code review | 8 % | π₯ |
Reading | 21 % |
We are now ready for the last assignment:
Task | Fraction of time | Code |
---|---|---|
(5) (3) Toolmaking | 7 % | π© π© π© |
(5) (3) Lunch | 6 % | π© π© π¨ |
(5) (3) Slack | 5 % | π© π© π₯ |
(5) Writing tests | 17 % | π© π¨ |
(5) Meetings | 13 % | π© π₯ |
(4) (2) Standup | 5 % | π¨ π© π© |
(4) (2) (1) Miscellaneous | 3 % | π¨ π© π¨ π© |
(4) (2) (1) Documenting | 2 % | π¨ π© π¨ π¨ |
(4) (2) (1) Dummy | 0 % | π¨ π© π¨ π₯ |
(4) (2) Programming | 3 % | π¨ π© π₯ |
(4) Debugging | 10 % | π¨ π¨ |
(4) Code review | 8 % | π¨ π₯ |
Reading | 21 % | π₯ |
We still have that dummy task, which we can drop from the codebook. Now that we had 12 tasks to represent, the Huffman code introduced some four-post-it codewords in order to be able to keep short codewords for common tasks.
Huffman also handles skewed distributions
Imagine now that we had a much more skewed distribution, something like what might have happened very early in my career:
Task | Fraction of time |
---|---|
Reading | 40 % |
Programming | 31 % |
Lunch | 6 % |
Standup | 5 % |
Debugging | 4 % |
Slack | 3 % |
Miscellaneous | 3 % |
Meetings | 2 % |
Toolmaking | 2 % |
Documenting | 2 % |
Code review | 1 % |
Writing tests | 1 % |
For this distribution, we would expect the Huffman code to generate short codewords for reading and programming, and longer codewords for the rest. Let’s find out what happens!
Task | Fraction of time | Code |
---|---|---|
Reading | 40 % | π© |
Programming | 31 % | π¨ |
(5) (4) (2) Toolmaking | 2 % | π₯ π© π© π© |
(5) (4) (2) Documenting | 2 % | π₯ π© π© π¨ |
(5) (4) (2) (1) Code review | 1 % | π₯ π© π© π₯ π© |
(5) (4) (2) (1) Writing tests | 1 % | π₯ π© π© π₯ π¨ |
(5) (4) (2) (1) Dummy | 0 % | π₯ π© π© π₯ π₯ |
(5) (4) Standup | 5 % | π₯ π© π¨ |
(5) (4) Debugging | 4 % | π₯ π© π₯ |
(5) (3) Slack | 3 % | π₯ π¨ π© |
(5) (3) Miscellaneous | 3 % | π₯ π¨ π¨ |
(5) (3) Meetings | 2 % | π₯ π¨ π₯ |
(5) Lunch | 6 % | π₯ π₯ |
Indeed! At this point, the value of reserving short codewords (single post-it notes) for the two most common tasks was high enough that the Huffman code even created some five-post-it codewords for the least common tasks.4 Lunch apparently took enough of my time even as a junior that it deserved a two-post-it codeword.
What does a typical day look like with this codebook? Maybe something like
π© π¨ π¨ π© π₯ π₯ π₯ π© π₯ π¨ π¨ π¨ π© |
Despite the large variation in codeword length in this code, we still have that immediate decodability5 There is no codeword that starts with π₯ π© π₯ and then goes on to something else, so when we see that sequence, we know the codeword ends there and corresponds to debugging. which is convenient:
π© (reading) π¨ (programming) π¨ (programming) π© (reading) |
π₯ π₯ (lunch) π₯ π© π₯ (debugging) π¨ (programming) |
π¨ (programming) π¨ (programming) π© (reading) |
I don’t want to be a corporate post-it spy
I don’t either. The real reason I wrote this article is that I couldn’t figure out how to convert the Huffman coding algorithm into flashcards, so I wanted to write it down somewhere for future me.
If you have an idea on how to flashcard things like these, let me know!