Sylow theorems

(This post is mostly to set up a kind of structure for the website; in particular, to be the first in a series of posts summarising some mathematical results I stumble across.)

EDIT: There is now an Anki deck of this proof, and a collection of poems summarising it.

In Part IB of the Mathematical Tripos (that is, second-year material), there is a course called Groups, Rings and Modules. I took it in the academic year 2012-2013, when it was lectured by Imre Leader. He told us that there were three main proofs of the Sylow theorems, two of which were horrible and one of which was nice; he presented the “nice” one. At the time, I thought this was the most beautiful proof of anything I’d ever seen, although other people have told me it’s a disgusting proof.

Theorem - the Sylow Theorems

Let \(G\) be a group, of order \(p^k m\) for some prime \(p\), where the HCF \((p,m) = 1\). Then:

  1. There is a subgroup \(H\) of \(G\), of order \(p^k\) (a Sylow p-subgroup);
  2. All such subgroups are conjugate to each other;
  3. The number of such subgroups, \(n_p\), satisfies \(n_p \equiv 1 \pmod p\) and \(n_p \mid m\).


The proof goes as follows: pick a p-subgroup \(P\) to be of maximal size; then introduce its normaliser \(N\), and show that the orbit of \(P\) under the conjugation action when \(P\) acts on itself is precisely the set of Sylow p-subgroups.

First Sylow theorem

The proof starts out in a natural way, by naming a subgroup \(P\) of order \(p^a\) for some \(a\). Such a subgroup certainly exists, by Cauchy’s Theorem (which has \(a=1\)). If we select \(a\) to be maximal, then we wish to show that \(a=k\), or equivalently (which seems even easier) that \(\dfrac{ \vert G \vert }{ \vert P \vert }\) is not a multiple of \(p\).

Now, how do we show that \(\dfrac{ \vert G \vert }{ \vert P \vert }\) is not a multiple of \(p\)? Well, we don’t know anything about such a quotient unless \(P\) is normal in \(G\). But we can’t guarantee this - so let’s introduce a subgroup, \(N\), in which \(P\) is normal. The natural one to pick, because we’re trying to make the subgroup as big as possible, is the normaliser \(N(P)\) - that is, \({g : g P g^{-1} = P}\), or \(Stab_G(P)\) under the conjugation action. This is the largest subgroup of \(G\) in which \(P\) is normal.

Then we want to show that \(\dfrac{ \vert G \vert }{ \vert N \vert } \times \dfrac{ \vert N \vert }{ \vert P \vert }\) is not a multiple of \(p\); this is true if and only if neither of the multiplicands is divisible by \(p\).

The second multiplicand

It looks like it will be easier to start with the second multiplicand, because it’s got a really really obvious interpretation.

We want to show that \(\dfrac{ \vert N \vert }{ \vert P \vert }\) is not a multiple of \(p\). Now, from the First Isomorphism Theorem we have \(\dfrac{ \vert N \vert }{ \vert P \vert } = \vert \dfrac{N}{P} \vert \).

Suppose \( \vert \dfrac{N}{P} \vert \equiv 0 \pmod p\). Then by Cauchy’s Theorem, there is an element \(h \in \dfrac{N}{P}\) such that the order \(o(h) = p\); let \(H = \langle h \rangle\), the group generated by \(h\). But we got to this quotient group \(\dfrac{N}{P}\) by applying the projection map \(\pi : N \rightarrow \dfrac{N}{P}\), so what happens when we “un-quotient” (that is, apply \(\pi^{-1}\))? We have \(\pi^{-1}(H)\) has order \( \vert H \vert \vert P \vert \), because \(\pi\) was a \( \vert P \vert \)-to-one mapping, and so \(\pi^{-1}(H) \leq P\) has order \(p \vert P \vert \). This is a contradiction.

Hence \( \vert \dfrac{N}{P} \vert \not \equiv 0 \pmod p\).

The first multiplicand

The first multiplicand, \(\dfrac{ \vert G \vert }{ \vert N \vert }\): this is the number of conjugates of \(P\), by the Orbit-Stabiliser Theorem (by using the conjugation action: the stabiliser is \(N\); while the orbit of \(P\) is simply the set of conjugate subgroups). We want to show that this is not divisible by \(p\). We can do much more with the conjugates themselves, so let \(X = {gPg^{-1}, g \in G}\).

We would like to show that \( \vert X \vert \not \equiv 0 \pmod p\). This expression rings a bell - we’ve seen it before, as a key idea in the class equation. In order to use the class equation, we need to act on \(X\). There are only three groups we’ve met so far: \(N\), \(P\) and \(G\). The group we haven’t yet used is \(P\), and it’s a p-group (and we know a bit about actions of p-groups). What’s the only obvious action to use? It has to be conjugation.

Let \(P\) act on \(X\) by conjugation. Since the orbits partition the set \(X\) and have order dividing \( \vert P \vert \), the order of each orbit is one of \(1, p, p^2, \dots , p^a = \vert P \vert \). \(P\) is clearly in an orbit all of its own (since \(p P p^{-1} \in P\) for every \(p \in P\)). What we really want is for \(P = e P e^{-1}\) to be the only conjugate of \(P\) which is in its own orbit, because then we have \( \vert X \vert \equiv 1 \pmod p\) (since the orbits partition the set).

Suppose we have \(g\) such that \(g P g^{-1}\) is in an orbit of size 1. Then \(p g P g^{-1} p^{-1} = g P g^{-1}\) for all \(p \in P\), and so (by conjugating with \(g^{-1}\)) we have \(g^{-1} p g P g^{-1} p^{-1} g = P\), and so \(g^{-1} p g\) stabilises \(P\) and so is in \(N\). So \(g^{-1} P g\) is contained within \(N\).

Now, we know that \(g^{-1} P g\) is contained within \(N\), so we can now use functions defined on \(N\). We have that \(\pi : N \rightarrow \dfrac{N}{P}\) (the quotient map) is a homomorphism with kernel \(P\). That is, \(\pi(P) = {e}\). Hence considering \(\pi(g^{-1} P g) = \pi(g^{-1}) \pi(P) \pi(g)\) because \(\pi\) is a homomorphism; but \(\pi(P) = {e}\) so this expression is just \({\pi(g^{-1}) \pi(g)} = {\pi(g^{-1} g)} = {e}\).

Hence \(g^{-1} P g\) is contained in the kernel of \(\pi\). But it’s also the same size as \(P\) which is itself the kernel of \(\pi\). Hence \(g^{-1} P g = P\).

So there is only one orbit of size \(1\), and hence because orbits partition the set, \(\dfrac{ \vert G \vert }{ \vert N \vert }\) is not divisible by \(p\).

This concludes the proof of the first Sylow theorem.

Second Sylow theorem

Given a Sylow p-subgroup \(Q\) of \(G\), we want to show that it is conjugate to \(P\).

Use \(X\) as before, the set of \({g P g^{-1}, g \in G }\). In the first theorem, we had \(P\) acting on \(X\); now let’s use \(Q\) in the same way. We want to show that there is some \(g \in G\) such that \(g^{-1} Q g = P\), or equivalently that \(Q \in X\).

Let \(Q\) act on \(X\) by conjugation. We have that \( \vert X \vert \) is not a multiple of \(p\) by the earlier part, but \(X\) is a union of orbits which are of size \(p^s\) for some \(s\). Hence there is a \(g \in G\) such that \({g P g^{-1} }\) is the entire orbit of \(P\) when \(Q\) acts on that conjugate. (That is, there is \(g \in G\) such that \(q g P g^{-1} q^{-1} = g P g^{-1}\) for all \(q \in Q\).) Hence, as before, all elements of \(g^{-1} Q g\) fix \(P\) under conjugation, and hence \(g^{-1} Q g \subset N\).

Now, \(g^{-1} Q g \subset N\) so we can apply the projection map \(\pi\) to it. We show that \(\pi(g^{-1} Q g) = {e}\). Indeed, suppose it isn’t. Then \(H = \pi(g^{-1} Q g)\) is a non-trivial subgroup of \(\dfrac{N}{P}\), because \(g^{-1} Q g\) was a subgroup of \(N\). It has order dividing that of \(g^{-1} Q g\), because applying a homomorphism to a subgroup yields a subgroup of order dividing that of the original - and so its order is a multiple of \(p\). Also, its order divides that of \(\dfrac{N}{P}\), by Lagrange, because it’s a subgroup of \(\dfrac{N}{P}\) - and this is not a multiple of \(p\). But now we have a multiple of \(p\) which divides a non-multiple of \(p\) - contradiction.

Then \({e} = \pi(g^{-1} Q g) = \pi(g^{-1}) \pi(Q) \pi(g)\); and hence we must have \(\pi(Q) = {e}\). So \(g^{-1} Q g \subset \mathrm{Ker}(\pi)\) and hence \(g^{-1} Q g = P\).

This concludes the proof of the second Sylow theorem.

Third Sylow theorem

We now want to show that the number \(n_p\) of Sylow p-subgroups is \(1 \pmod p\) and divides \(m\).

We certainly have that \(n_p = \vert X \vert \), because every Sylow p-subgroup is a conjugate of \(P\), but also every conjugate of \(P\) (that is, every member of \(X\)) is itself a subgroup of \(G\), and has the same size as \(P\), so is also a Sylow p-subgroup. Hence, just as before, \(n_p \equiv 1 \pmod p\).

Also, \(n_p\) is the size of an orbit under conjugation, and hence by the Orbit/Stabiliser Theorem, it divides \( \vert G \vert = p^a m\); but \(n_p\) does not have a factor of \(p\), so it must divide \(m\).

This concludes the proof of the third Sylow theorem.


So the proof went as follows:

  1. We’re looking for information about Sylow p-subgroups, so we pick the maximum possible p-subgroup and hope that it’s a Sylow one.
  2. How do we know whether this p-group is Sylow? If \(\dfrac{ \vert G \vert }{ \vert P \vert }\) is not divisible by \(p\).
  3. What can we do with a quotient? Not much, but we can use a quotient of a normal subgroup. We can’t guarantee that \(P\) is normal in \(G\), so we split up the fraction into \(\dfrac{ \vert G \vert }{ \vert N \vert }\) and \(\dfrac{ \vert N \vert }{ \vert P \vert }\).
  4. What’s a good normal subgroup to use? We have a choice. We’ll go for the normaliser \(N = N(P)\), because that gives a nice interpretation to \(\dfrac{ \vert G \vert }{ \vert N \vert }\). (But otherwise, this step seems a bit arbitrary to me.)
  5. Now we’ll go for \(\dfrac{ \vert N \vert }{ \vert P \vert }\); this is definitely something to do with the quotient group \(\dfrac{N}{P}\). Let’s imagine its size were divisible by \(p\); then we can use Cauchy on \(\dfrac{N}{P}\) and get a contradiction on moving back to \(N\).
  6. Let’s now consider \(\dfrac{ \vert G \vert }{ \vert N \vert }\); the normaliser is something to do with conjugates, so we’ll consider the conjugation action. Happily, this expression then becomes the size of the orbit of \(P\) under the conjugation action; call that orbit \(X\).
  7. We need \( \vert X \vert \not \equiv 0 \pmod p\). Remember the class equation; we want to act on \(X\) using a p-group. \(P\) is such a p-group, so we’ll let \(P\) act on \(X\). The only natural action to use is conjugation. We know straight away that \(P\) is in an orbit all to itself; we need it to be the only one.
  8. Name a different conjugate of \(P\); call it \(g P g^{-1}\). We need this to be exactly \(P\). It’s got the right size already, so we just need it to be contained in \(P\). Here a leap of faith: what’s special about \(P\)? It’s the kernel of a homomorphism \(\pi: N \rightarrow \dfrac{N}{P}\) (because it’s a normal subgroup of \(N\)). So, after proving that \(\pi\) is defined on what we want to give as its arguments (that is, after showing that \(g P g^{-1}\) is contained in \(N\), or equivalently that all elements of \(g P g^{-1}\) stabilise \(P\) under conjugation), consider \(\pi(g^{-1} P g)\). This is clearly \({e}\), and hence \(g^{-1} P g\) is in the kernel of \(\pi\), and hence is a subset of \(P\), as required.
  9. Now the second theorem: all the Sylow p-subgroups need to be conjugate. Name a Sylow p-subgroup \(Q\), and have it act on \(X\) as above. Then in exactly the same way as in step 7, since \( \vert X \vert \) is not a multiple of \(p\), we have that there is some \(h \in G\) such that \({h P h^{-1}}\) is an entire orbit under conjugation by \(Q\).
  10. Exactly as in step 8, a conjugate \(h P h^{-1}\) is on its own in an orbit, so it is fixed under conjugation by every element in \(h^{-1} Q h\). Hence \(H = h^{-1} Q h\) is contained within \(N\) and we can use \(\pi\). Suppose that \(H\) is not fully contained in the kernel of \(\pi\); then applying \(\pi\) to it gives us a subgroup, which must have prime power order (from the fact that \(h^{-1} Q h\) had prime power order); it also has order dividing that of \(\dfrac{N}{P}\), which is not a multiple of \(p\): contradiction.
  11. \(H\), a conjugate of \(Q\), is hence contained in the kernel of \(\pi\). Then since it is of the same size as the kernel, it must be the kernel, but that is \(P\).
  12. Now the third theorem: we’ve just shown that \(X\) is precisely the set of Sylow p-subgroups, so \( \vert X \vert \equiv 1 \pmod p\) is just what we want (but we’ve already shown it back in step 8); and since it is also precisely an orbit when \(G\) acts on \(P\) by conjugation, it must have order dividing that of \(G\).