The tiny proof that primes 1 mod 4 are sums of two squares

Recall the incredibly terse proof by Zagier that primes \(1 \pmod{4}\) can be expressed as sums of two squares:

The An involution is a function which is its own inverse. on the finite set \(S = \{(x, y, z) \in \mathbb{N}^3 : x^2 + 4yz = p \}\) defined by:

\( (x, y, z) \mapsto \begin{cases} (x+2z, z, y-x-z) & \text{if $x < y-z$} \\ (2y-x, y, x-y+z) & \text{if $y-z < x < 2y$} \\ (x-2y, x-y+z, y) & \text{if $x > 2y$} \end{cases} \)

has exactly one fixed point, so \(|S|\) is odd and the involution defined by \((x,y,z) \mapsto (x, z, y)\) also has a fixed point.

This proof is really quite beautiful, and I want to wave at how one might come up with it.

Core idea

The point is to look at the question of finding a solution to \(x^2 + y^2 = p\), and to note that this can be rephrased as finding a fixed point of a certain function. Moreover, if we build the function to have the property that it is an involution (that is, it is inverse to itself), then we get access to a wildly non-constructive but correspondingly perhaps-easier method to show that there is a fixed point. We then use exactly the same method in reverse, coupled with a different choice of involution which is absolute witchcraft, and the answer pops out.

Core idea, but concrete

An involution on a set can be viewed as a way to pair up elements of that set. For example, \(x \mapsto -x\) pairs up the reals, each one being paired with its negation.

Some elements, though, have a “degenerate” pairing formed this way, where we pair up the elemet with itself. (In the negations example, there’s exactly one degenerate element which gets paired up with itself, namely \(0\).) To be precise, an involution pairs up each element with exactly one other element, except for the fixed points of that involution. The fixed points of an involution are precisely those which aren’t in a pair but instead stand alone.

The involution is a permutation of the set. It's self-inverse, so the subgroup of the symmetric group generated by that permutation has size 2. Therefore the orbits of the set under the usual action of permutations on sets are of size 2 or 1, by the orbit-stabiliser theorem.

So to show that an involution has a fixed point, it’s enough to show that the set it’s permuting has an odd number of elements. Indeed, if the set has an odd number of elements, then we can’t possibly pair all up (that’s what it means for a number to be odd!), so in particular no involution can possibly pair them all up, and in even more particular our chosen involution can’t pair them all up.

Moreover, to show that a set has an odd number of elements, it’s enough to find some involution on that set which has an odd number of fixed points. In greater detail: if an involution has an odd number of fixed points, then the set consists of “everything which is paired with something” (necessarily there are evenly many of these, because we constructed it out of pairs), together with “everything which doesn’t get paired with something” (i.e. “the fixed points”, which we’ve just assumed is a set of odd cardinality); and even plus odd is odd.

The upshot is: we can show that one particular involution on a set has a fixed point by showing that some other involution of our choice on that set has an odd number of fixed points. (Then the set itself must have odd cardinality, so our original involution must have a fixed point because all involutions on that set must.)

The involution that would witness \(p\) being a sum of two squares

We want \(x^2 + y^2 = p\).

Since \(p\) was assumed to be odd, we know by parity that exactly one of \(x\) and \(y\) is even. So (since squaring and square-rooting preserves parity) we could instead assume \(y\) is even and divide through to write the equation as \(x^2 + 4y^2 = p\).

The magic rabbit out of the hat is to introduce a third variable \(z\), so that we’re not finding solutions to \(x^2 + 4y^2 = p\) but instead \(x^2 + 4yz = p\) subject to the constraint that \(y=z\). Then that’s just the same as asking for a fixed point of the involution \((x, y, z) \mapsto (x, z, y)\) where the involution is defined on a set whose members are all solutions to \(x^2 + 4yz = p\).

This involution is super general: it’s an involution with only very weak restrictions on its domain (that is, you need to be able to swap \(y\) with \(z\), but nothing else matters). To make that involution useful for our problem, we’ve additionally assumed that it’s defined on a subset of \(\mathbb{Z}^3\), so that our fixed point is giving us integer solutions to the equation.

The other half of the proof requires defining an involution on that same subset with an odd number of fixed points, and that’s not necessarily easy; we’ll later find it helpful to restrict the domain further so that that involution has fewer possible fixed points.

The involution that would witness \(|S|\) being odd

Now we’ve defined \(S = {(x, y, z) : x^2 + 4yz = p}\) - which took the magic idea of introducing the third variable. So far, we’ve also hand-waved around the exact domain because we don’t want to paint ourselves into a corner: we need all the flexibility we can get to define an involution that has an odd number of fixed points (at which point we’ll be done with the entire proof).

In fact, we’ll make it conceptually easier for ourselves and go for an involution that has exactly one fixed point.

Choosing the fixed point

How can we guarantee exactly one fixed point? Well, since \(p\) is \(1 \pmod{4}\), and we need some way for our proof to fail when \(p\) violates the hypotheses (otherwise we might prove too much and show the falsehood that every number is a sum of two squares!), it’s not too much of a stretch to declare that our fixed point is going to be \((x, y, z) = \left(1, 1, \frac{p-1}{4} \right)\). (I chose that because it’s definitely in \(S\), so the line of the proof that’s going to fail when \(p\) violates the hypothesis is the line that says “this proposed fixed point is in \(S\)”.)

Now we “just” need to build our involution of some subset of \(S\), in such a way that the above \((x, y, z)\) is a fixed point, and in such a way that there are no other fixed points! (This isn’t necessarily as hard as it sounds: there’s a lot of freedom, especially because we’ve now listed literally all the requirements we’re imposing on the involution.)

Possible functions which stay in \(S\)

We can make this a fixed point with something like \((x, y, z) \mapsto (y, x, z)\), but we do need to make sure that we stay within \(S\), and we don’t want to use up too much freedom by defining \(S\) so early to be some perverse subset of what we’ve currently got.

Let’s just hope for the best, and define \(y \mapsto y\). (We could also have tried keeping the \(x\) coordinate unchanged. I haven’t checked happens when you try that.)

Then \((x, y, z) \mapsto (f(x,y,z), y, g(x,y,z))\), and also \(f(x,y,z)^2 + 4 y g(x,y,z) = p\) (which is \(x^2 + 4yz\)).

How can we make the left and right hand sides look more alike? \(f\) is definitely going to want to contain an \(x\) component so that we get the squared terms to simplify, so say \(f(x,y,z) = \pm x + f’(x, y, z)\). Expanding, \(x^2 \pm 2 x f’ + f’^2 + 4yg = x^2 + 4yz\); so \(\pm 2xf’ + f’^2 + 4yg = 4yz\).

This is screaming at us to factor out at least a \(2\), so let \(f’ := 2 f’’ \); then \(\pm xf’’ + f’’^2 + yg = yz\). Similarly, we now want to factor out at least a \(y\), so let \(f’’ = y \hat{f}\); then \(\pm x \hat{f} + y \hat{f}^2 + g = z\) (assuming \(y\) is nonzero, but we can assume whatever we want when we’re playing around, as long as we check our final answer works at the end).

To summarise, our involution so far looks like \((x, y, z) \mapsto (\pm x + 2y \hat{f}, y, z \mp x \hat{f} - y \hat{f}^2)\). All of this was done essentially without loss of generality: any generality we’ve lost can be made up for in \(\hat{f}\).

Making sure this has the right fixed point

We haven’t yet done anything to make this be an involution - everything so far was aimed at making sure we stayed within \(S\). We also haven’t yet made sure that \(\left(1, 1, \frac{p-1}{4}\right)\) is a fixed point, so let’s plug those in (writing \(k\) for the fraction): \((1, 1, k) = (\pm 1 + 2 \hat{f}(1, 1, k), 1, k \mp \hat{f}(1, 1, k) - \hat{f}(1, 1, k)^2)\).

Looking at the \(x\)-coordinate, we see that either \(\hat{f} = 0\) or \(\hat{f} = 1\) at these coordinates. Moreover, if we look at what it means to have an involution, we end up with a very complicated constraint on what \(\hat{f}\) must be at certain points, but actually that condition is trivial if \(\hat{f} = 0\) always, or if \(\hat{f} = 1\) always and the \(\pm\) is the negative case. Before we tie ourselves completely in knots, it’s worth checking whether either of those will do.

Firstly, if \(\hat{f} = 0\) then our involution is \((x, y, z) \mapsto (\pm x, y, z)\). We’ll discard the positive case straight away because that’s got all the fixed points in the world. In the negative case: this certainly is an involution, and it’s indeed well-defined as a function \(S \to S\), but it’s not got \((1, 1, k)\) as a fixed point.

So we try \(\hat{f} = 1\). Then the involution is \((x, y, z) \mapsto (2y-x, y, z + x - y)\).

Checking for any other fixed points

We constructed this to be an involution, but we should sanity-check at this point, because some of my reasoning was a bit sloppy (e.g. I didn’t check that its image was a subset of \(S\)!).

  • If \((x, y, z) \in S\), then we need \((2y-x, y, z+x-y) \in S\). This follows from some handle-cranking algebra.
  • It’s got a fixed point at \((1, 1, k)\) for any \(k\).

But in fact its fixed points are precisely when \(x=y\), which might be a bit of a problem, because we wanted to ensure there was exactly one fixed point. So far, we’ve got fixed points for all \(x, z\) such that \(x^2 + 4xz = p\).

That’s not such a problem, though. The left-hand side is divisible by \(x\), so (since the right-hand side is prime) we have \(x = 0\) (impossible because \(p\) is positive), or \(x = 1\) (in which case there’s exactly one solution for \(z\)), or \(x = p\) (which we can rule out by decreeing that \(x,y,z\) all be positive, whereupon this choice would make \(z\) negative).

Adjusting for the further restriction we just applied

We’re actually kind of done! The only problem is that we ruled out \(x=p\) by decreeing that the members of \(S\) were nonnegative, which means our involution now no longer maps into \(S\). So we just need to carve up the involution so that it does what we’ve just described wherever that lands in \(S\), and otherwise we define it to be anything that retains the “involution” property but has no fixed points.

When does our involution escape \(S\)? That’s when \(2y-x < 0\) or \(z+x-y < 0\). We can do whatever we want to make sure we don’t leave any fixed points in our new definitions in those ranges; one way is just to send the points from one case into the other case, and vice versa.

If \(2y < x\), then, we’ll be sending \((x,y,z)\) to some new values such that \(x < y-z\), and vice versa. We also need to preserve the property that \(x^2 + 4yz = p\), and we need to stay nonnegative.

For the first case, \(x\) is very big, so let’s just translate it down to be small enough: \(x \mapsto x - 2y\). (It’s got to stay positive, so this is the obvious thing to try: make it as close as possible to being negative.) Then we have \((x-2y)^2 + 4 \hat{y} \hat{z} = p\), which after a litle algebra (using that \(x^2 + 4yz = p\)) simplifies by Mathematica’s FullSimplify to \(\hat{y} \hat{z} = y (x - y + z) \). That suggests two possible ways to define the unknown \(\hat{y}, \hat{z}\), and once we’ve done that, it only remains to find the right one of those two ways!

Summary

There was only really one magic idea: we can express \(p\) as a sum of two squares if we can find a fixed point of a certain involution. I don’t have a good reason why one might not consider switching out \(x^2 = xz\) instead to get an involution. (I haven’t tried it, and maybe it actually works!) “Could this be a fixed point of an involution?” is the sort of magic which you could have in the general box of tricks: it doesn’t strike me as being particularly specialised, it just happens that this is a problem on which it works beautifully.

The rest comes from following your nose, and in fact the situation turns out to be very welcoming:

  • There was actually an involution with the right properties that kept one coordinate constant.
  • It could easily have been that a much more complicated \(\hat{f}\) was needed.