How to discover the Contraction Mapping Theorem

A little while ago I set myself the exercise of stating and proving the Contraction Mapping Theorem. It turned out that I mis-stated it in three different aspects (“contraction”, “non-empty” and “complete”), but I was able to correct the statement because there were several points in the proof where it was very natural to do a certain thing (and where that thing turned out to rely on a correct statement of the theorem).

Here, then, is how you might go about discovering it from the point of having a definition of a Lipschitz function on a metric space (X,d)(X, d) (that is, a function ff for which there exists λR>0\lambda \in \mathbb{R}^{>0} such that for all x,yXx, y \in X, d(f(x),f(y))λd(x,y)d(f(x),f(y)) \leq \lambda d(x,y)). We’ll aim for a statement describing the fixed points of such a function.

Define the terms

What is a “fixed point”? There’s nowhere obvious to start other than working out what we mean by one of these. Well, what we mean is “a point xXx \in X such that f(x)=xf(x) = x”. We’ll also define (X,d)(X, d) to be an arbitrary metric space, and ff an arbitrary Lipschitz function on that space with Lipschitz constant λ\lambda.

How might we proceed?

We’re looking for a fixed point. We have a Lipschitz function (that is, one which “draws points together”, in the sense that two points which are originally δ\delta apart end up λδ\lambda \delta apart, or closer, after ff is applied to them). That suggests the idea of starting out with two arbitrary points, repeatedly pulling them closer together with ff, and seeing where we end up. Actually, on second thoughts, we can dispense with one of the arbitrary points, because we can make another point given our arbitrary xx - namely f(x)f(x).

What did we assume?

So far, we’ve made a (silly) assumption: that the space XX is not empty, because we’ve just picked a point in it. In order to use this “ff draws points together”, we’re going to want λ<1\lambda < 1, otherwise it’s actually blowing them outwards.

How might we proceed?

We have two points, xx and f(x)f(x). We want to pull them together using ff, so it’s natural to keep applying ff to them. So that we can have access to all these values, we’ll define a sequence zi=f(zi1)z_i = f(z_{i-1}) and z0=xz_0 = x. What we really want is for this sequence to converge to the fixed point (after all, if we’re drawing the points together to some limit, we’d imagine that the limit of the sequence is a local accumulator in some sense).

Now, we know nothing about this metric space, and we know nothing about the limit of the sequence. There’s a key thing we do in analysis if we want a limit of a sequence but know nothing about it: we show that it is Cauchy. In order to use this, though, we’ll need to suppose that the metric space is complete (so that Cauchy sequences converge).

Then we want to show that this sequence ziz_i is Cauchy. That is, we want d(zi,zj)0d(z_i,z_j) \to 0 as i,ji,j \to \infty independently of each other, which means that for all ϵ>0\epsilon > 0 there exists NNN \in \mathbb{N} such that for all i,j>Ni, j > N, d(fi(x),fj(x))<ϵd(f^i(x), f^j(x)) < \epsilon.

Aha - we have d(fi(z),fj(z))d(f^i(z), f^j(z)). We know ff is Lipschitz, so (wlog iji \leq j) this is d(fi(z),fj(z))λid(x,fji(x))d(f^i(z), f^j(z)) \leq \lambda^i d(x, f^{j-i}(x)). It would be very convenient if the dd expression were bounded, because then as ii \to \infty, the λi\lambda^i will take care of the rest (since λ<1\lambda < 1).

But what else do we know about dd? We’re going to need something to bound d(x,fji(x))d(x, f^{j-i}(x)), but we don’t know anything about this expression - we only know about d(zi,f(zi))λd(zi1,zi)d(z_i, f(z_i)) \leq \lambda d(z_{i-1}, z_i), by the Lipschitzness of ff. But in fact we can make d(x,fji(x))d(x, f^{j-i}(x)) in terms of those: dd is a metric, which means that it obeys the triangle inequality.

Hence d(x,fji(x))d(x,f(x))+d(f(x),fji(x))k=1jid(zk1,zk)\displaystyle d(x, f^{j-i}(x)) \leq d(x, f(x)) + d(f(x), f^{j-i}(x)) \leq \dots \leq \sum_{k=1}^{j-i} d(z_{k-1}, z_k). This we can bound: it’s k=1jiλk1d(z0,z1)=d(z0,z1)k=1jiλk1\displaystyle \leq \sum_{k=1}^{j-i} \lambda^{k-1} d(z_0, z_1) = d(z_0, z_1) \sum_{k=1}^{j-i} \lambda^{k-1}. And, joy of joys, this sum is bounded, because the infinite sum k=1λk1=11λ\displaystyle \sum_{k=1}^{\infty} \lambda^{k-1} = \dfrac{1}{1-\lambda}.

Hence d(zi,zj)<λid(z0,z1)11λd(z_i, z_j) < \lambda^i d(z_0, z_1) \dfrac{1}{1-\lambda}. This goes to 00 as ii \to \infty, so the sequence ziz_i is Cauchy.

What did we assume?

In this section, we assumed that the space was complete.

Summary

So far, we have shown that the sequence fi(x)f^i(x) is Cauchy, so it converges to a limit. We’ll call the limit LL: so we have fi(x)Lf^i(x) \to L as ii \to \infty.

What next?

It feels like we’re very close to a result now. What we really want is for LL to be a fixed point: we need f(L)=Lf(L) = L. Equivalently, we need f(limzi)limzif(\lim z_i) \to \lim z_i; but zi=f(zi1)z_i = f(z_{i-1}), so this is f(limzi)limf(zi)f(\lim z_i) \to \lim f(z_i). This will be trivial if ff is continuous. But ff is Lipschitz, so it is uniformly continuous and hence continuous (this is a really simple lemma).

That is, LL is a fixed point of ff: we have proved that ff has a fixed point.

Extension

But we don’t have to stop there - if we’re drawing points together using ff, and we end up at a fixed point, surely there can’t be two fixed points (since if there were, ff would draw them together). Let’s aim to prove that ff’s fixed point is unique, by supposing that L1,L2L_1, L_2 are fixed points. Then d(L1,L2)=d(f(L1),f(L2))d(L_1, L_2) = d(f(L_1), f(L_2)), because L1,L2L_1, L_2 are fixed points, and then d(f(L1),f(L2))λd(L1,L2)<d(L1,L2)d(f(L_1), f(L_2)) \leq \lambda d(L_1, L_2) < d(L_1, L_2), contradiction.

Summary

We have shown that there exists a unique fixed point LL of a Lipschitz function ff with Lipschitz constant λ<1\lambda < 1 on a non-empty complete metric space. Moreover, we have shown that f(i)(x)Lf^(i)(x) \to L for all xx, because we can perform this same construction of ziz_i starting from any point xx. Even more, we have shown that convergence is geometrically fast (by the λi\lambda^i term). This is a really strong theorem, and all I needed to remember in order to construct it was that Lipschitz functions were important and that we were looking for information about fixed points. (I didn’t look up anything during the proof - I checked my statement of it afterwards, and it turned out to be correct. I didn’t change anything after I finished it.)