How to discover the Contraction Mapping Theorem
A little while ago I set myself the exercise of stating and proving the Contraction Mapping Theorem. It turned out that I mis-stated it in three different aspects (“contraction”, “non-empty” and “complete”), but I was able to correct the statement because there were several points in the proof where it was very natural to do a certain thing (and where that thing turned out to rely on a correct statement of the theorem).
Here, then, is how you might go about discovering it from the point of having a definition of a Lipschitz function on a metric space (that is, a function for which there exists such that for all , ). We’ll aim for a statement describing the fixed points of such a function.
Define the terms
What is a “fixed point”? There’s nowhere obvious to start other than working out what we mean by one of these. Well, what we mean is “a point such that ”. We’ll also define to be an arbitrary metric space, and an arbitrary Lipschitz function on that space with Lipschitz constant .
How might we proceed?
We’re looking for a fixed point. We have a Lipschitz function (that is, one which “draws points together”, in the sense that two points which are originally apart end up apart, or closer, after is applied to them). That suggests the idea of starting out with two arbitrary points, repeatedly pulling them closer together with , and seeing where we end up. Actually, on second thoughts, we can dispense with one of the arbitrary points, because we can make another point given our arbitrary - namely .
What did we assume?
So far, we’ve made a (silly) assumption: that the space is not empty, because we’ve just picked a point in it. In order to use this “ draws points together”, we’re going to want , otherwise it’s actually blowing them outwards.
How might we proceed?
We have two points, and . We want to pull them together using , so it’s natural to keep applying to them. So that we can have access to all these values, we’ll define a sequence and . What we really want is for this sequence to converge to the fixed point (after all, if we’re drawing the points together to some limit, we’d imagine that the limit of the sequence is a local accumulator in some sense).
Now, we know nothing about this metric space, and we know nothing about the limit of the sequence. There’s a key thing we do in analysis if we want a limit of a sequence but know nothing about it: we show that it is Cauchy. In order to use this, though, we’ll need to suppose that the metric space is complete (so that Cauchy sequences converge).
Then we want to show that this sequence is Cauchy. That is, we want as independently of each other, which means that for all there exists such that for all , .
Aha - we have . We know is Lipschitz, so (wlog ) this is . It would be very convenient if the expression were bounded, because then as , the will take care of the rest (since ).
But what else do we know about ? We’re going to need something to bound , but we don’t know anything about this expression - we only know about , by the Lipschitzness of . But in fact we can make in terms of those: is a metric, which means that it obeys the triangle inequality.
Hence . This we can bound: it’s . And, joy of joys, this sum is bounded, because the infinite sum .
Hence . This goes to as , so the sequence is Cauchy.
What did we assume?
In this section, we assumed that the space was complete.
Summary
So far, we have shown that the sequence is Cauchy, so it converges to a limit. We’ll call the limit : so we have as .
What next?
It feels like we’re very close to a result now. What we really want is for to be a fixed point: we need . Equivalently, we need ; but , so this is . This will be trivial if is continuous. But is Lipschitz, so it is uniformly continuous and hence continuous (this is a really simple lemma).
That is, is a fixed point of : we have proved that has a fixed point.
Extension
But we don’t have to stop there - if we’re drawing points together using , and we end up at a fixed point, surely there can’t be two fixed points (since if there were, would draw them together). Let’s aim to prove that ’s fixed point is unique, by supposing that are fixed points. Then , because are fixed points, and then , contradiction.
Summary
We have shown that there exists a unique fixed point of a Lipschitz function with Lipschitz constant on a non-empty complete metric space. Moreover, we have shown that for all , because we can perform this same construction of starting from any point . Even more, we have shown that convergence is geometrically fast (by the term). This is a really strong theorem, and all I needed to remember in order to construct it was that Lipschitz functions were important and that we were looking for information about fixed points. (I didn’t look up anything during the proof - I checked my statement of it afterwards, and it turned out to be correct. I didn’t change anything after I finished it.)