The Compound Poisson Process
There is a question on Metaculus on whether armed conflicts between India and Pakistan will lead to at least 100 deaths before 2050. 1 I initially interpreted that question as asking about the total number of deaths until 2050, but a closer reading of the resolution criteria makes another interpretation possible. This article only makes sense under the first interpretation, so bear with me. See the end of the article for a similarly themed analysis using the correct interpretation.
At the time of writing, the question has a community prediction of 65 %, which seemed low to me, because the Indo–Pakistani wars are listed among the fairly active conflicts on the Wikipedia page of ongoing armed conflicts, so I wanted to know how much more serious the situation is than the community thinks.
Brief content warning: I will be talking fairly dispassionately about the lost lives of real humans, who loved, and had family members and dreams and ambitions. Rest assured my purpose here is not to reduce to numbers these rich personalities, but rather to
- Better understand the severity of an ongoing area of conflict I otherwise hear very little about; and
- Learn and teach statistical tools that can be used to improve the lives of many more humans, where ever my readership takes these tools.
Quick Data Collection
One of my first sources for an overview of topics is Wikipedia2 And do note that I’m not an expert on either India or Pakistan. I’m happy to receive corrections.. If their information on the Indo-Pakistani wars and conflicts is to be trusted, it seems like the past 15 years have been somewhat peaceful compared to the time before that, so for a conservative estimate of the future intensity of the conflict, that period can probably be used.
Wikipedia lists combatants killed for border skirmishes and strikes over these past 15 years, and to keep with the theme of conservative estimation, this table repeats those numbers chronologically, using the lowest estimate on Wikipedia for each event, and rounded down to the nearest multiple of five.
20 | 15 | 10 | 0 | 120 | 25 | 5 | 15 |
We could just resample from this data by imputing some zeroes to cover years where there has been no conflict, but I was interested in applying some recently learned theory to do the forecasting in my head.
Simple Poisson Process
A simple Poisson process is one that increments a counter by one every time an event occurs. Events occur independently but on average with a specified intensity, called \(\lambda\).
The sum of all deaths in the above skirmishes and strikes is 390, and if we take the year to be our unit of time, that is 26 dead per year, i.e. the intensity of such a Poisson process would be \(\lambda=26\). The Poisson distribution is nice in that the standard deviation \(\sigma = \sqrt{\lambda}\), so with the data above, we have \(\sigma = \sqrt{26} \approx 5.1\).3 Mental logarithm time: \(\sqrt{x} = 10^{\frac{\log{x}}{2}}\).
If we look at a timescale of \(t\) units of time (years in our case), a Poisson process gives us expectation \(\lambda t\), with standard deviation \(\sqrt{\lambda t}\). Since there are 26 years until 2050, this model forecasts the total number of deaths as
Expectation | 676 |
Standard deviation | 26 |
With these numbers, at least 100 deaths is virtually certain.
Poisson Process with Non-Unit Jumps
The above is a very rough approximation, but an inappropriate model – it models every single death as an independent event. In reality, deaths are clustered together in skirmishes and strikes. There have been 8 such events in the past 15 years according to Wikipedia, and the mean number of deaths in them has been 50.
This is still a Poisson process, but not quite as simple. In this case, we don’t want an event to increment the counter by one, but by 50. In other words, the intensity is now \(\lambda = 8/15\), but each step is of size \(x=50\).
For this, the \(t\) year expectation is \(x \lambda t\), and the standard deviation is \(x \sqrt{\lambda t}\). We get
Expectation | 693 |
Standard deviation | 186 |
Note that the expectation is roughly the same, but the standard deviation is much larger. This makes sense, because we switched from looking at each combat death as an independent event – of which 26 would happen per year – to instead thinking of the skirmishes and strikes as independent events – of which about 0.5 happen each year. This increased the standard deviation of our estimation by over 7×, because the random element is now both more uncertain, and has a greater effect on the result.
That said, 100 deaths is still more than three standard deviations below the mean4 For a thin-tailed distribution., so virtually guaranteed.
Compound Poisson Process
If we go back and look at the data again, repeated here for convenience, we may notice one number that stands out:
20 | 15 | 10 | 0 | 120 | 25 | 5 | 15 |
The 2016–2018 border skirmishes were particularly serious, with a conservative estimate of 120 dead. Maybe there are really two types of events here: brief and drawn out. Intuitively, we might draw the boundary between them at 100 deaths.
If that is the case, then we really have two parallel Poisson processes5 The interleaving of multiple Poisson processes is known as a compound Poisson process. We can think of it as a single Poisson process with an intensity that is the sum of the intensities of the component processes, meaning in this case the total event rate is \(0.47+0.07=0.54\). Any time an event happens in this compound process, the step size \(x\) is a random variable that can take any of the values \(x_i\) of the component processes, and does so with probability in proportion to the intensity of that process, i.e. \(\lambda_i / \lambda\). But we can also just treat them as two parallel independent processes and add together their results at the end.:
- Brief events: \(\lambda=0.47\) and \(x=13\).
- Drawn-out events: \(\lambda=0.07\) and \(x=120\).
If we extrapolate these into 2050, we have6 Recalling that we don’t sum the standard deviations directly, we sum the variances.
- | Brief | Drawn-out | Both |
Expectation | 159 | 218 | 377 |
Standard deviation | 45 | 162 | 168 |
Under this model, 100 deaths is about 1.65 standard deviations below the mean, which sloppily translates to about a 95 % probability of being exceeded. Since we used conservative estimates throughout, the real probability is likely higher. This was the forecast I submitted before I realised I had misread the resolution criteria.
We also learned that this is a serious conflict, worse than the community prediction was at the time.
Read The Fine Resolution Criteria
One of my biggest sources of forecasting errors on Metaculus so far has been sloppy reading of resolution criteria. In this case, it seems that the resolution criteria might be talking about a single skirmish or strike with more than 100 deaths.
For this, we can apply the continuous-time version of Laplace’s rule of succession, which is also based on Poisson theory. This heuristic says that if something has happened \(h\) times in a span of \(t_b\) years, then the probability that it will happen again in the next \(t\) years is
\[1 - \left(1 + \frac{t}{t_b}\right)^{-h - 1}\]
Or, in our case,
\[1 - \left(1 + \frac{26}{15}\right)^{-2} = 0.87\]
In other words, 87 % probability. Still a more serious conflict than the community prediction made it out to be.