In physics it’s fairly common to come across integrals which take the form
$$Z=\int\text{d} x e^{-\lambda f(x)}$$
for some sufficiently nice function \(f(x)\) and where \(\lambda\) is some constant parameter. We will also always assume here that this integral is convergent. For example, in statistical mechanics the integral would be over phase space, the parameter \(\lambda\) would be \(\beta=1/T\), and the function would be \(f=H\), the Hamiltonian. In quantum field theory (QFT) the integral would be over the configuration space of the fields, \(f\) would be the action functional, and \(1/\lambda\) would be a loop-counting parameter.
In general, integrals of this form are effectively impossible to compute and even approximating them numerically often presents a challenge. For example, it is common to add an additional parameter to the pre-exponential function, \(f(x,J)\) so the value of the integral is now also a function of this parameter, \(Z=Z(J)\). In this way, \(Z(J)\) becomes a generating function of other integrals.
For example, we might want to compute the integral
$$Z(J)=\int\text{d} x\exp\left[-\lambda\left(\frac{1}{2}ax^2+Jx\right)\right].$$
By completing the square and shifting the integration variable, this becomes a familiar Gaussian integral whose value we know to be
$$Z(J)=\sqrt{\frac{2\pi}{a}}\exp\left[\frac{1}{2a}J^2\right].$$
By taking \(J\) derivatives, we could generate from this expression any integral of the form
$$\frac{(-1)^n}{\lambda^n}\frac{\text{d}^n}{\text{d} J^n}Z(J)=\int\text{d}x x^ne^{-\lambda\left(\frac{a}{2}x^2+Jx\right)}.$$
All of this depended on the integral in question being a Gaussian one, though. If it weren’t Gaussian, we would essentially be at a loss and even if we were to try and approximate the integral numerically, as soon as we add a parameter we would like to differentiate like \(J\), we now need to compute this integral for many different values of \(J\) and approximate the derivatives by finite differences.
There is, however, a trick we can use to obtain approximate analytic results by hand in the limit where \(\lambda\rightarrow\infty\) known as the stationary phase or steepest decent approximation.
The idea behind the stationary phase approximation lies in the observation that when \(\lambda\) becomes very large, the integral is dominated by the regions about the stationary points of the function \(f(x)\). To get a sense for this, consider the Taylor expansion of the pre-exponential function about some stationary point, say \(x_0\): \(f(x)=f(x_0)+\frac{1}{2}f^{\prime\prime}(x_0)(x-x_0)^2+\mathcal{O}(x-x_0)^3\). If we think about the ratio of the integrand at the station point to a nearby point, we would have
$$\frac{e^{-\lambda f(x_0+\delta x)}}{e^{-\lambda f(x_0)}}=e^{-\lambda\frac{1}{2}f^{\prime\prime}(x_0)\delta x^2}.$$
So long as \(f^{\prime\prime}(x_0)>0\), this will vanish as \(\lambda\rightarrow\infty\).
In the case where \(f^{\prime\prime}(x_0)<0\), we can remember that we really only are looking for the value of an integral over the real line, but there’s no reason we can’t change variables of integration \(x\rightarrow ix\). Once we do this, the ratio above changes by a sign in the exponent which is exactly the sign required by \(f^{\prime\prime}(x_0)<0\) to make the ratio vanish, so the stationary point once more dominates. The case \(f^{\prime\prime}(x_0)=0\) is more complicated and will lie outside the discussion here.
So, we can conclude that the stationary points dominate the integral in the \(\lambda\rightarrow\infty\) limit. If \(f(x)\) has just a single stationary point at the location \(x_0\), then this suggests that the integral \(Z\) is approximately equal to the integral computed over a small region surrounding \(x_0\), say \(B(x_0)\). But so long as we are restricting ourselves to a small region near \(x_0\), the pre-exponential function is also well-approximated by its Taylor expansion about \(x_0\). That is,
$$Z\approx\int_{B(x_0)}\text{d}xe^{-\lambda f(x)}\approx e^{-\lambda f(x_0)}\int_{B(x_0)}\text{d}x\exp\left[-\frac{\lambda}{2}f^{\prime\prime}(x_0)(x-x_0)^2\right].$$
However, so long as we are working in the large \(\lambda\) limit, we are essentially now looking at the integral of a Gaussian in the limit of zero variance. We know that the tails of a Gaussian contribute very little to the integral, so we can approximate the Gaussian integral over the small region \(B(x_0)\) above by the same Gaussian integral taken over all reals:
$$Z\approx e^{-\lambda f(x_0)}\int\text{d}x e^{-\frac{\lambda}{2}f^{\prime\prime}(x_0)x^2}=\sqrt{\frac{2\pi}{\lambda f^{\prime\prime}(x_0)}}e^{-\lambda f(x_0)}.$$
If we had more than just the one stationary point, at say \(x_1,x_2,\ldots,x_N\), then we could follow the same reasoning. The only thing that changes is in how we ignore some of the integration region of the original integral. We would now construct a set of small regions \(B(x_n)\) about each of the stationary points. We would now take the integral \(Z\) to be over the union of these regions. So long as \(\lambda\) is large enough, we can choose the regions \(B(x_n)\) to be small enough that their pairwise intersections are empty. An integral over a collection of disjoint regions can always be written as the sum of the integrals over each region:
$$Z\approx\sum_{n=1}^N\int_{B(x_n)}\text{d}xe^{-\lambda f(x)}.$$
Now we have \(Z\) written as a sum over integrals of the type we dealt with in the case of a single stationary point. So we follow the same reasoning to approximate each one as a Gaussian integral:
$$Z\approx\sum_{n=1}^Ne^{-\lambda f(x_n)}\int\text{d}x\exp\left[-\frac{\lambda}{2}f^{\prime\prime}(x_n)x^2\right]=\sqrt{\frac{2\pi}{\lambda}}\sum_{n=1}^N\frac{e^{-\lambda f(x_n)}}{\sqrt{f^{\prime\prime}(x_n)}}.$$
Whenever we apply this approximation in physics, it’s often mentioned offhand that the approximation is an asymptotic one. Looking at the approximation in this way, it’s clear why the series is asymptotic: the approximations we made in neglecting certain integration regions only make sense when we take \(\lambda\rightarrow\infty\) and the final expression we found is a power series in \(\lambda\). So, it never makes sense to talk about \(lambda\) small, which is what’s necessary for a properly convergent series.
It should be pointed out that this issue can’t be avoided just by redefining a new parameter \(\alpha=1/\lambda\) and considering \(\alpha\) small. If we were to do this, we would find a power series in \(1/\alpha\), which is just as bad.
I have done some numerical tests of how this works and will talk about them in a future post. In another post, I will talk about how we extend these ideas to include a parameter \(J\) like we did in the example of a Gaussian integral from the beginning.