Feynman vs. Computer

kqr

, published 2025-12-02

Tags:

I read Burghelea’s article on the Feynman trick for integration. Well, I’m not good enough at analysis to follow along, but I tried reading it anyway because it’s fascinating.

For people who do not have experience with analysis, integration is counting the total size of very many, very small piles of things. Analytical integration, i.e. the process by which we can get an exact result, can be very difficult. It often takes knowledge of special tricks, strong pattern recognition, and plenty of trial and error. Fortunately, in all cases in my career when I’ve needed the value of an integral, an approximate answer has been good enough.

In practical terms, this means we could spend a lot of time learning integration tricks, practice using them, and then take half an hour out of our day to apply them to an integral in front of us … or, hear me out, or, we could write four lines of JavaScript that arrive at a relatively accurate answer in less than a second.

The approximating power of random numbers

If integration is summing many small piles, we have to figure out how big the piles are. Their height is usually given by a mathematical function, and our first example will be the same as in the Feynman trick article.

\[f(x) = \frac{x - 1}{\ln{x}}\]

This is to be integrated from zero to one, i.e. we want to know the size of the shaded area in the plot below. You can think of each column of shaded pixels as one pile, and we sum the size of all of them to get the total area.1¹ Of course, this is an svg image so there are no columns of pixels. Alternatively, the more we zoom in, the thinner the columns become – but the more of them there are. This is why we need integration: it’s dealing with the limit case of infinitely many, infinitely thin columns.

We could imagine drawing six random numbers between zero and one, and plotting piles of the corresponding height at those locations. Since there are six piles, their width is one sixth of the width of the area we are integrating.

Even though some of these piles overlap by chance, and even though there are some random gaps between them, the sum of their areas (0.66) comes very close to the actual shaded area determined analytically (0.69). If we draw more piles, we have to make them correspondingly thinner, but the agreement between their sum and the total size of the area improves.

These are 100× as many piles, and they’re 1/100th as thick to compensate. Their total area is 0.70 – very close to 0.69. If we draw even more piles, we’ll get even closer.

This illustrates a neat correspondence between integrals and expected values. In the simple case, we can frame it mathematically as

\[\int_a^b f(x) \mathrm{d}x = (b-a)\mathrm{E}(f(x))\]

In words, this says that integrating the function \(f\) between \(a\) and \(b\) is the same as taking the expected value of \(f(x)\) at uniformly distributed random points between \(a\) and \(b\). See the appendix for the mistake I made in this equation at first.

Teaching the computer to do it

Here’s a JavaScript function that estimates the value of an integral in the most primitive way possible.

In[1]:

I = (B, lo, hi, f) => {
    // Generate B random values uniformly between lo and hi.
    let xs = Array.from({length: B}, _ => lo + (hi - lo) * Math.random());
    // Compute the value of f at each location.
    let ys = xs.map(f);
    // Return the total area of each corresponding pile.
    return (hi-lo)*ys.reduce((r, y) => r + y, 0)/ys.length;
}

To compute an approximation to the value of the integral we’ve seen, we run

In[2]:

I(10_000,
  0, 1,
  x => (x-1)/Math.log(x)
);

Out[1]:

0.6916867623261724

This is fairly close to 0.69. And we got there in four lines of JavaScript, as promised.

Improved approximation through splittage

We can try this on the next example too. Now we’re asking about the integral

\[\int_0^{\frac{\pi}{2}} \frac{\ln{(1 - \sin{x})}}{\sin{x}} \mathrm{d}x\]

which, translated to JavaScript, becomes2² Thanks to Adrián for pointing out a mistake in this line of code.

In[3]:

I(10_000,
  0, Math.PI/2,
  x => Math.log(1 - Math.sin(x))/Math.sin(x)
);

Out[2]:

-3.67

This is again fairly close to the desired −3.7, but not quite there yet. The tricky shape of the function is the reason we aren’t getting as close as we want.

At the upper endpoint of the integration interval, this function goes to negative infinity. The random piles we draw come primarily from the well behaved region of the function, and thus don’t help the computer realise this behaviour.

There are clever ways to sample adaptively from the trickier parts of the function, but an easy solution is to just visually find a breakpoint, split the interval on that, and then estimate the sensible part separately from the crazy-looking part. Since the total area must be the sum of both areas, we can add their results together for a final estimation.

In this case, we might want to pick e.g. 1.5 as the breakpoint, so we combine the area estimations from 0–1.5 and then 1.5–\(\frac{\pi}{2}\). The result is

In[4]:

I(2_000, 0, 1.5, x => Math.log(1 - Math.sin(x))/Math.sin(x))
+ I(8_000, 1.5, Math.PI/2, x => Math.log(1 - Math.sin(x))/Math.sin(x));

Out[3]:

-3.70

which is indeed much closer to the actual value of −3.7.

Note that we aren’t taking more samples, we’re just sprinkling them more wisely over the number line. We spend 2,000 samples in the relatively well-behaved region where the function takes values from −1 to −6, and then we spend the other 8,000 samples in the small region that goes from −6 to negative infinity. Here it is graphically:

The reason this helps us is that this latter region contributes a lot to the value of the integral, but it is so small on the number line that we benefit from oversampling it compared to the other region. This is a form of sample unit engineering, which we have seen before in different contexts.

More evidence of sufficiency

We can continue with some more examples from the Feynman trick article. That gets us the following table.

Integral	Value	Estimation	Difference	Notes
\(\int_0^1 \frac{x-1}{\ln{x}} \mathrm{d}x\)	\(\ln{2}\)	0.6943	0.2 %
\(\int_0^{\frac{\pi}{2}} \frac{\ln{(1 - \sin{x})}}{\sin{x}} \mathrm{d}x\)	\(\frac{-3 \pi^2}{8}\)	-3.702	< 0.1 %
\(\int_0^1 \frac{\ln{(1 - x + x^2)}}{x - x^2} \mathrm{d}x\)	\(\frac{-\pi^2}{9}\)	-1.097	< 0.1 %
\(\int_0^{\frac{\pi}{2}} \frac{\arctan{(\sin{x})}}{\sin{x}} \mathrm{d}x\)	\(\frac{\pi}{2}\log{(1 + \sqrt{2})}\)	1.385	< 0.1 %
\(\int_0^\infty x^2 e^{-\left(4x^2 + \frac{9}{x^2}\right)} \mathrm{d}x\)	\(\frac{13 \sqrt{\pi}}{32 e^{12}}\)	0.000004414	0.2 %	(1)
\(\int_0^1 \frac{\ln{x}}{1 - x^2} \mathrm{d}x\)	\(\frac{-\pi^2}{8}\)	-1.227	0.5 %	(2)
\(\int_0^\infty \frac{e^{-x^2}}{1 + x^2} \mathrm{d}x\)	\(\frac{\pi e}{2}\mathrm{erfc}(1)\)	0.6696	0.3 %	(3)

Notes:

The integration is from zero to infinity, but the function practically only has a value between zero and three, so that’s the region we estimate over.
This is another case where the function goes to infinity near zero, so we split up the estimation into one for the range 0–0.1, and the other for 0.1–1.0. We have not increased the sample count, only reallocated the 10,000 samples.
Again, the integration is from zero to infinity, but the function practically only has a value between zero and three, so that’s the region we estimate over.

Finding the error without a ground truth

“Now,” the clever reader says, “this is all well and good when we have the actual value to compare to so we know the size of the error. What will we do if we’re evluating a brand new integral? What is the size of the error then, huh?”

This is why we sampled the function randomly. That means our approximation is a statistical average over samples, and for that we can compute the standard error of the mean. In the JavaScript implementation below, we use the quick variance computation, but we could perhaps more intuitively have used the spc inspired method.

In[5]:

Ic = (B, lo, hi, f) => {
    let xs = Array.from(
      {length: B}, _ =>
      lo + (hi - lo) * Math.random()
    );
    let ys = xs.map(f);
    // Compute the variance of the ys from the sum and
    // the sum of squared ys.
    let s = ys.reduce((r, y) => r + y, 0);
    let ssq = ys.reduce((r, y) => r + y**2, 0);
    let v = (ssq - s**2/B)/(B-1);
    // Compute the mean and the standard error of the mean.
    let m = (hi-lo)*s/B;
    let se = (hi-lo)*Math.sqrt(v/B);
    // Compute the 90 % confidence interval of the value of
    // the integral.
    return {
        p05: m - 1.645*se,
        p95: m + 1.645*se,
    }
}

If we run this with the first integral as an example, we’ll learn that

In[6]:

Ic(10_000,
  0, 1,
  x => (x-1)/Math.log(x)
)

Out[4]:

Object {
  p05: 0.6896
  p95: 0.6963
}

Not only is this range an illustration of the approximation error (small!), it is also very likely to capture the actual value of the integral. Here are some more examples from the same integrals as above:

5 %	95 %	Actual	Contained?
0.6904	0.6972	0.6931	✅
-3.7673	-3.6787	-3.7011	✅
-1.0975	-1.0960	-1.0966	✅
1.3832	1.3871	1.3845	✅
0.4372	0.4651	0.4424	✅
-1.2545	-1.2254	-1.2337	✅
0.6619	0.6937	0.6716	✅

These are all built naïvely from 10,000 uniform samples. In other words, in none of the cases have the computation been split up to allocate samples more cleverly.

Again, we could spend a lot of time learning to integrate by hand … or we ask the computer for less than a second of its time first, and see if the accuracy it can do it with is appropriate for our use case. In my experience, it generally is.

Seeing the effect of sample unit engineering

What’s neat is we can still split up the computation like we did before, if we believe it will make the error smaller and the confidence interval narrower. Let’s use the following integral as an example.

\[\int_0^\infty \frac{\sin{x}}{x} \mathrm{d}x\]

This oscillates up and down quite a bit for small \(x\), and then decays but still provides significant contributions for larger \(x\). A naive evaluation would have a confidence interval of

In[7]:

Ic(10_000, 0, 100, x => Math.sin(x)/x)

Out[5]:

Object {
  p05: 1.461
  p95: 1.884
}

and while this is certainly correct3³ The actual value of the integral is half \(\pi\) or approximatey 1.571., we can do better. We’ll estimate the region of 0–6 separately from 6–100, using half the samples for each4⁴ Why put the break point at 6? The period of sin is a full turn, which is roughly 6 radians. This ensures we get roughly symmetric contributions from both integrals. That’s not necessary for the technique to work, but it makes the illustration a little cleaner.:

In[8]:

Ic(5_000, 0, 6, x => Math.sin(x)/x)

Out[6]:

Object {
  p05: 1.236
  p95: 1.468
}

This contains the bulk of the value of the integral, it seems. Let’s see what remains in the rest of it.

In[9]:

Ic(5_000, 6, 100, x => Math.sin(x)/x)

Out[7]:

Object {
  p05: 0.080
  p95: 0.198
}

We can work backwards to what the standard errors must have been to produce these confidence intervals.5⁵ The midpoint is the point estimation for each region, and the standard error is 1/1.645 times the distance between the 5 % point and the midpoint.

Region	Value	Standard error
0–6	1.4067	0.0372
6–100	0.1390	0.0359

The estimation of the total area would be the values summed, i.e. 1.5457. The estimation of the standard error of this we get through Pythagorean addition and it is approximately 0.05143. We convert it back to a confidence interval and compare with when we did not break it up into multiple components.

Method	5 %	95 %	Range
Single operation (10,000 samples)	1.461	1.884	0.423
Two operations (5,000 samples × 2)	1.461	1.630	0.169

Although in this case the two methods happen to share a lower bound, the upper bound has been dramatically reduced. The total range of the confidence interval is more than halved! This was because we allocated the samples more cleverly – concentrated them in the early parts of the function – rather than increased the number of samples.

That said, we’re at a computer, so we could try increasing the sample count. Or maybe both?

Method	5 %	95 %	Range
Single operation (10,000 samples)	1.461	1.884	0.423
Two operations (5,000 samples × 2)	1.461	1.630	0.169
Single operation (100,000 samples)	1.549	1.680	0.131
Two operations (50,000 samples × 2)	1.524	1.578	0.054

It seems like sampling more cleverly has almost the same effect as taking ten times as many samples.

We could play around with where to put the breakpoint, and how many samples to allocate to each side of it, and see which combination yields the lowest error. Then we can run that combination with a lot of samples to get the most accurate final result. That would take maybe 15 minutes of tooting about and exploring sensible-seeming alternatives, so it’s probably still quicker than integrating by hand.

When the computer is not enough

It should be said that there are times when numeric solutions aren’t great. I hear that in electronics and quantum dynamics, there are sometimes integrals whose value is not a number, but a function, and knowing that function is important in order to know how the thing it’s modeling behaves in interactions with other things.

Those are not my domains, though. And when that’s not the case, the computer beats Feynman any day of the week.

Appendix A: Mistake in the integral–expectation equation

The equation relating the value of the integral to the expectation taken over uniformly chosen X coordinates is, as stated above,

\[\int_a^b f(x) \mathrm{d}x = (b-a)\mathrm{E}(f(x))\]

However, I had at first published it without the factor b−a, and given the equivalence incorrectly6⁶ I discovered the need for this factor in the code empirically, without realising it should also go in the equation. as

\[\int_a^b f(x) \mathrm{d}x = \mathrm{E}(f(x))\]

I was under this mistaken assumption because I had remembered the full-er equivalence from a university course in this. The actual equivalence is given as

\[\int \phi(x) f(x) \mathrm{d}x = \mathrm{E}(\phi(x))\]

where \(\phi\) is the function of interest, \(f\) is an arbitrary distribution we can sample from, and the expectation is taken over \(f\) rather than uniformly. I thought – incorrectly – that the uniform distribution would be a special case where \(f\) is equal to one everywhere and so it just disappears from the equation.

I received a very patient and thoughtful email from Bibek Panthi (LinkedIn) who explained

Hi Chris,

In the article, you missed a factor of (b - a) in the formula that relates the integral to expected value, the formula should be:

\[\int_a^b f(x) dx = (b - a) E[f(x)]\]

This is because the probability distribution f(x) which is uniform in the range [a,b] and zero elsewhere, its expression is

\[f(x) = \frac 1 {b - a}\]

When you plug this into the equation, you get:

\[E(\phi(X)) = \frac 1 {b - a} \int_a^b \phi(x) dx\] \[\Rightarrow (b-a) E(\phi(X)) = \int_a^b \phi(x) dx\]

In words, we can reason as follows:

Integral of a function is the area under its curve within a region \([a,b]\),

Expectation of the function is the average height of that curve within the region \([a,b]\),

Multiply the average height \(E(\phi(X))\) by the width \((b-a)\) we get the area \(\int_a^b \phi(x) dx\)

With regards, Bibek

Bibek also has a digital garden which I’ve enjoyed exploring for some time now. Bibek is graduating soon, and looking for a job.