Improving a Gut Feeling Forecast

kqr

, published 2024-01-10

Tags:

This article is a brief overview on how to improve a forecast based on our gut feeling of the probabilities involved.1¹ I originally wrote this as a prelude to another future article, but it works better standing on its own.

As a concrete example, I will use a question in the Global Pulse Metaculus forecasting tournament which asks whether there will be an active un Peacekeeping mission in Gaza on May 31 this year. At the time I started writing this, that was six months into the future, so when you read this, try to pretend it’s early December 2023 and please be kind to me about mistakes that are obvious in hindsight but would have been difficult to see back then.2² On the other hand, if I made any mistakes that should have been obvious also back then, lash into me!

Gut feeling probabilities is unreliable

My first reaction to the question is that a positive resolution (i.e. active peacekeeping on May 31) feels unlikely, but to forecast we need an idea of exactly how unlikely. My gut tells me maybe something like 35 %, but this evaluation builds on two unreliable things:

How strongly we feel about the question, or possibly a similar question that is easier to have feelings about. In Kahneman and Tversky’s terminology, this is System 1 quietly substituting in a less effortful task to replace a difficult one.3³ Although Thinking, Fast and Slow is a thick book, I recommend reading it at some point in your life.
How well we’re able to intensity match this feeling to the probability scale. I don’t have evidence here but for all its strengths, I don’t think the probability scale is particularly amenable to intensity matching. Specifically, the difference between 91 % and 92 % is very large compared to the difference between 51 % and 52 % – but at least my System 1 considers those differences similar in size.4⁴ Wait, does this account for part of the overconfidence we often see when people assign probabilities to events? I don’t think I’ve seen that mentioned anywhere but I’m forgetful. It would be neat if we could treat some overconfidence by switching to a different propensity scale!

These difficulties mean we should not rely on the gut feeling too much. Looking at the problem statistically is often a good method, when possible.

The base rate is always a good forecast

To take a statistical perspective on a forecast like this, we start by answering a different question instead:

How often in the past has a similar situation resulted in a peacekeeping mission six months later?

Note that this new question is, in some sense, an objective question about facts of the past, which makes it much easier to answer. However, it’s still a highly subjective question, because it depends on our judgment to determine what counts as a similar situation. Part of being a good forecaster is having a good sense for which details matter and which don’t when looking for similar situations.

Once we’ve found similar situations, the answer to the new question is known as the base rate of the forecast for the Gaza situation. The base rate is always a good forecast.

Hold on, I forgot to emphasise that sentence, so here it is again:

The base rate is always a good forecast. 5⁵ Sometimes we can improve on the base rate with data specific to the forecast we’re making, but this is usually about small adjustments rather than big shifts. That’s why starting with the base rate is important: it anchors the final forecast on a decently useful number, and the forecaster, when applying small corrections, can’t make too big a mistake.

Sometimes the base rate is difficult to find

Here’s a challenge: what’s the base rate in the current Gaza situation? Which similar situations would you look at? We need more than a handful in order for it to be meaningful to look statistically at them – can you name six or more similar situations?

I’m not very good at history, so I can’t. I don’t even know how to start finding that information! I don’t want to look at current peacekeeping missions because that results in a biased sample: it misses all the situations that were similar but didn’t result in peacekeeping missions at all. I think the un Security Council publishes meeting notes, so one could trawl through the archive of those and find the number of cases in which case peacekeeping was discussed, and then look up what happened in each of those cases, but that sounds like a lot of work. But I’m not sure the un Security Council has had a chance to review the Gaza thing yet because there’s currently no peace to keep, so maybe their meeting notes would also be a biased sample.

What we really want is to search for times when world leaders have signalled that they are interested in pursuing the idea of peacekeeping, and then if that resulted in peacekeeping six months later. It all gets rather complicated and at least to me it would take way too much research to find out.

Thus, we’re back to gut feeling.

Improving gut feeling by breaking down into components

Fortunately, the gut feeling forecast can be improved with very little additional research. We can use the same principles as in Fermi estimation and break the situation down into components. One way to get started is to ask

What needs to be true for there to be a peacekeeping mission in Gaza?

We can also get some hints from the Wikipedia page on un peacekeeping:

un peacekeepers operate in post-conflict areas, i.e. the idf must withdraw from Gaza as the peacekeepers are deployed.
They uphold peace agreements, meaning there would need to be an agreement between Israel and whatever governing body is relevant in Gaza after idf withdrawal.
The parties of a peace agreement are often the ones to ask the un for help upholding the peace. The governing body whose territory the peacekeeping mission is on has veto power.
un can delegate peacekeeping missions to other coalitions of nations, meaning un is not running the peacekeeping mission but someone else is.6⁶ This has happened e.g. in Africa with ecowas running the mission. It seems nato has also run peacekeeping missions on un’s behalf.
Some time will pass before the un peacekeeping forces are actually deployed, since the un has no standing resources on its own but has to get them provisioned from member countries.

We can gut feel probabilities on these items instead.

The idf withdraws before May 31: 70 %7⁷ This forecast is based on the Lindy effect: when this is written in early December, the war has been going on for two months, so we should expect its median lifespan to be another two months, meaning it’s more likely than not to be over six months from now.
Conditional on withdrawal, a peace agreement is signed: 70 %
Conditional on peace agreement, the un Security Council determines peacekeeping is appropriate: 85 %8⁸ The idea of peacekeeping in Gaza has apparently already been brought up in the Security Council, but was vetoed by the us, seemingly because they wanted to signal support for the idf operation. Once the idf has withdrawn, that reason goes out the window bringing up the probability of peacekeeping.
Conditional on peacekeeping, the governing body of Gaza does not veto: 60 %
Conditional on peacekeeping allowed, the un does not delegate: 75 %
Conditional on un implementation, the mission becomes active before May 31: 60 %

This paints a fairly dismal picture. There are so many things that need to fall in place for a peacekeeping mission to be active on May 31, that once they are taken together the probability appears to hover around 11 %. We can adjust that a little upwards because there may be alternative routes to a peacekeeping mission9⁹ E.g. the un steps in with a peace enforcement mission that transitions quickly into peacekeeping. but this seems to be the main one, so we should be careful about adjusting too much. Let’s say 15 % or something.

We have improved our previous gut feeling by breaking it down into components and applying our gut feeling to each one, then recombined them arithmetically. This works on the same basis as Fermi estimation: hopefully, the errors in estimations on individual components cancel out.

However, there is an alternative to both gut feeling and statistical reasoning, and it’s an actor–motivation framework I’m currently trying to understand better because I used it with some success in the previous Metaculus Quarterly Cup. Stay tuned for a future article!

Entropic Thoughts