The Most Mario Colours Revisited
I was reading Louie Mantia’s article on which the most Mario colours are, and I got a little upset because clearly the most Mario initial colour is blue, I say, having grown up when Super Mario 64 was all the rage.1 You should probably read Mantia’s article to understand what’s going on.
I realised not only does year of release affect which colours are perceived as the most Mario, but also sales numbers. Super Mario World for the snes has sold a ridiculous number of copies, so I’m sure there are a lot of people who think of the initial colour as green.
Fortunately for us, there is a wiki online where people have gone through the effort and collected sales numbers for all Mario titles, so I present to you, the most Mario colours by number of copies sold!2 Sorry for giving you a screenshot of a spreadsheet. Feel free to make a better table, anyone with more time on their hands than I have. Of course I used conditional formatting – I’m not a savage.
Whereas Mantia found three sequences of colours that were equally popular, we see with sales numbers that there is one sequence that stands out as most popular: that used by Super Mario 64. So I was right all along!
The sequence Mantia concludes is the most Mario (red-green-yellow-blue-green) has an interesting feature: it gets most of its legitimacy from a single game: Super Mario 3D Land. That game accounts for 70 % of the sales of the red-green-yellow-blue-green sequence.3 If we don’t care about the colour of the last character, and thus can include also the Mario Party Superstars sequence and Super Mario Odyssey as part of this sequence, then I can accept it as the most Mario sequence. But really, ignore the last character? Not on my watch.
Looking at the colour distributions for individual letters, the results are not significantly changed from Mantia’s observations, except one difference: a leading blue M becomes much more common.
The most popular colour for each letter as a fraction of the total colours sold for that letter are
M | A | R | I | O |
---|---|---|---|---|
Red (53 %) | Green (70 %) | Yellow (72 %) | Blue (41 %) | Green (72 %) |
This numerically illustrates what we already intuited from the first table: there are three letters whose colours are fairly well-established, and then two letters that are toss-ups. Another way to look at the same thing is to measure the entropy of the colour choice for each letter.4 It might sound invalid to use the entropy based on a fraction, but it’s really interpreting the fraction as a probability. Assuming someone has hidden one copy of a Mario game under a cloth and we need to predict the colour of the last letter, we should be 72 % sure it’s green. The entropy is a measure of how well determined an uncertain value is: lower entropy means we are more sure of the right answer.
M | A | R | I | O |
---|---|---|---|---|
1.91 bits | 1.42 bits | 1.16 bits | 1.94 bits | 1.37 bits |
Here, we see that the middle R is by far the most well-determined letter. It’s either yellow or blue and it cannot be much else.
That the middle R is the most well-determined letter might be surprising, given how strongly prevalent green is for the A. The reason is the immense popularity of the snes Super Mario World titles. If we remove them from the dataset, the entropies change:
M | A | R | I | O |
---|---|---|---|---|
1.66 bits | 1.05 bits | 1.25 bits | 1.99 bits | 1.53 bits |
This decreases the entropy of the first two letters, because Super Mario World conflicts with all other popular Mario titles in the choice of colours for the first two letters. Interestingly, it also increases the entropy of the other three letters, because in those Super Mario World corroborates the most popular sequence.
Anyway, one might think based on the letter entropies that the entropy of the total colour sequence would be nearly 8 bits, because
\[1.91 + 1.42 + 1.16 + 1.94 + 1.37 = 7.8\]
However, if we measure the entropy of the colour sequence we get a significantly lower number: 3.6 bits. This is because there are dependencies between the colours. If the first M is blue, it is always followed by a green A, for example.
We can use these depedencies to build a decision tree which can be used to figure out what colour sequence is used in a Mario title, revealing as few letters as possible. I threw one together for the non-sports Mario titles5 I didn’t even know there existed this many Mario sports titles. Who makes them? Who plays them?!:
This is another argument in favour of the Super Mario 64 sequence: note that if we ask the highest-probability question at each stage6 I.e. we start with “Is the fourth letter red?” then “Is the first letter blue?” etc. and the answer is yes to all three of those questions individually, then we have isolated the Super Mario 64 sequence.
To be clear, this decision tree is not optimally constructed (I winged it by trying mentally to optimise the expected information of the questions), but even so, the maximum number of letters we have to ask about are three, and that happens only in 32 % of unlucky cases. Most of the time (65 % to be specific) we only have to ask about two letters to know the full colour sequence.
An optimal (fewest bits needed for the average guess7 The Huffman code book below looks less efficient than the decision tree above, but that is because the Huffman codes reveal only one bit of information for each question. The questions in the decision tree above reveal just over two bits for each question. Also the decision tree was without sports titles, whereas the Huffman codes are for all titles in the first table of this article.) way to do this is to construct Huffman codes for the colour sequences. Huffman codes are bitstrings designed such that (a) a shorter bitstring is assigned to a more commonly occurring colour sequence, and (b) a bitstring can never be both complete and the prefix of another bitstring.
What’s interesting about this is that the Huffman coding process did not generate any two-bit strings – the top few sequences are all equally popular to the point where the most compact form to represent them is with three bits each. Indeed, these three-bit sequences make up 70 % of all Mario copies sold, and we could reasonably think of any of them as the most Mario colour sequence. But I also want to emphasise that the very meaningful bitstring zero-zero-zero happened to be assigned to the Super Mario 64 sequence. Just saying.
It would be very interesting to start slicing on year of release. Ideally as an interactive widget with a date slider which everyone can drag to their own peak Mario period, and see which are the most Mario colours to them. Alas, this is enough silliness for now.
Brief post-scriptum before the silliness stops. We should remark on the large number of alternatives that are available for Mario colour sequences. Even if we request that each letter gets its own colour, with seven colours to choose from, and five letters to assign them to, there are
\[\frac{7!}{2!} = 2520\]
different combinations possible. If we allow reuse of one colour (which seems to have been the policy so far) there are 5880 combinations possible. To imagine that of these, we have so far only seen 19. The future is bright for Mario title colour sequences.
Update a couple of days later: Mantia himself responded in a discussion on this article and pointed out that sales numbers can get complicated. For example, there is a successor to Super Mario 3D World called Bowser’s Fury. That game itself does not use the coloured polygonal letters in its box art (or whatever the modern equivalent of box art is), but when it got ported to the Nintendo Switch, it started being sold bundled with Super Mario 3D World, which did have the colourful letters on its box art. The bundle is usually presented with a diptych of the box arts from both games. This means the over 13 million copies sold of the Switch port may count as sales for Super Mario 3D World.
In combination with me using misreported sales for Mario + Rabbids: Sparks of Hope, this changes the standings for most recognised colour sequence. Given those corrections, the most recognised colour sequence would be R-G-B-Y-P. The popularity of that sequence then establishes pink as a colour that has sold more than 30 million copies! Congratulations, pink.
It also changes the letter entropies by
M | A | R | I | O |
---|---|---|---|---|
-0.08 bits | -0.08 bits | -0.06 bits | -0.06 bits | +0.11 bits |
Indicating that the Super Mario 3D World sequence strengthens the standing of R-G-B-Y as the most Mario first four colours, but makes the last colour less certain. (It used to be a sure green, now it’s a probably-green-but-possibly-pink.)
The overall entropy of the colour sequence decreases from 3.59 bits to 3.52 bits, again, corroborating the idea that the Super Mario 3D World sequence more firmly establishes the most Mario colours to something other than what I would have them be.
But I’m fine.