Troubles In Solving Uno
A few months ago I started trying to write a solver-like program for the card game of Uno. A solver is a program that, given any game state, can tell the user what the optimal next play is. There were three reasons I wanted this for Uno:
- My son was really into the game at the time1 He still likes it, but he is not as obsessed by it as he was before., and I wanted to impress him and my wife by learning to play well.
- I have struggled before with writing card game ai, so I made a more serious attempt at learning that.
- It was a decent way to practise modularisation because I didn’t actually know all the rules under which my son played at the time I started writing the game logic.
In terms of the second and third goal, I had great success. But! I never managed to make a good solver program, and I never got any good at Uno. This is the story of how I learned 10,000 ways that don’t work.
Uno is a card-shedding game
Very briefly, when a game of Uno starts, all players hold 7 cards in their hand and one starting card is face-up on the table. The players take turn to discard from their own hand onto the face-up card on the table, as long as they hold a matching card in their hand. If they cannot, they have to draw a new card. The first player to have an empty hand wins. There are some special cards that have effects such as reversing the turn order, forcing an opponent to draw extra cards, or blocking the next opponent from playing one turn.
Uno is a bad perfect-information game
Normally, Uno is played with hidden cards. My son (at age 4) plays with open cards, so I figured I’d make my program treat Uno as a perfect information game.2 Note that while the game has perfect information, it also has uncertainty: the outcome of drawing new cards is non-deterministic. This was fortunate because it’s much easier to write an ai for such a game, but it also became the downfall for the solver because apparently, once the ai is decent, it becomes really good at denying its opponent a win when playing with open cards.
It knows exactly which cards to play to block the opponent from playing3 And in some situations it even moves further from winning itself, because it can do so while increasing the overall uncertainty of who the winner is going to be. Essentially, once it’s fairly sure it will lose, it focuses on just introducing chaos to the game to even out everyone’s chances – because that also increases its own chances., and this means a large fraction of the games it plays, it runs into a stalemate. (The cards in the deck run out and nobody can play anything.)
This makes it more time consuming to evaluate ai against each other. If 80 % of games are stalemates, then one needs to simulate 5× as many games to get the same quality data on wins and losses. Since I wrote the engine with the goal of modularity rather than performance, a simulated game could take a few seconds even on good hardware.
The simplest, dumbest AI won
The first baseline ai was one which enumerated all possible actions it could perform, then for as long as it had time left in its budget, went through each action, simulated the outcome of it, and played that outcome out until the end using uniformly random moves. Then when its time budget was out, it picked the action that had the greatest win rate among its random playouts.
The Uno branching factor is fairly low. There’s often only one or two legal actions a player can perform. So the baseline ai spent a lot of time re-treading its steps through the first few nodes of the game tree. I figured maybe that could be avoided by distributing \(n\) playouts evenly across the valid actions, and then subdividing them into the valid children of each action, and so on. This worked, but I couldn’t get the allocation logic tight enough to actually save any time over the baseline ai, which meant the baseline ai could run more playouts within its time budget, and thus was stronger anyway.
Then I read up on the basics of Monte Carlo tree search (mcts), which is a little like the allocating ai just described, excpt mcts dynamically allocates more playouts to the more promising actions (instead of uniformly across all branches). Again the same problem: the bookkeeping logic for constructing the game tree cost more than it was worth. Using that part of the time budget to cram in more playouts instead (which is what the baseline ai does) comes out on top.
Rather unsatisfying!
Non-deterministic actions foil tree search
One thing I discovered when testing the mcts based ai was that it started to matter that drawing cards was a non-deterministic action. The game tree can’t just store the card drawing as an edge, because the state it ends up in can vary from time to time. It’s also not ideal to parametrise the edge by the outcome, because then the branching factor really does explode.
What I ended up doing was hanging a veil over the card drawing action, so that the mcts could not go beyond it. Any time it selected a card drawing node for evaluation, it was forced to just run random playouts from that node on, without expanding the game tree below it.
From what I understand, the techniques that can be used to solve this sort of situation can also be used to allow mcts to handle hidden information. In fact, the non-determinism of drawing a card can be viewed as the game environment playing the deck with deterministic, but hidden cards.
Next stop: hidden information and performance
I learned a little about playing Uno, but not all that much. I suspect if I want to go further, I will need to make the ai capable of handling imperfect information, i.e. they need to play with hidden cards.4 I have started skimming what I think is a PhD thesis from a Daniel Whitehouse which covers this topic in some detail. Depending on how fun it sounds, I might tackle it some day. That’s the way the game is supposed to be played anyway, and it might reduce the incidence of those annoying stalemates.
Also now that I know the rules and I have proved to myself I can write a modular engine for Uno, I can probably also spend some time rewriting it with a fixed feature set, but running more quickly.