Invitation to Modular Forms: Representation of an integer into sum of eight squares

Created: August 23, 2025 8:54 PM

Last Updated: September 13, 2025

Note: This is an edited reupload which is intended to be hosted on my blog. The reason is that I've received a performance issue complain from the old site hosted at Notion, so here is a spare one. There are minor typographical fixes but no new content changes.

Introduction

We’re interested in the following computational problem: Given nNn \in \mathbb{N}, how many ways (i.e. number of tuples (x1,,x8)Z8(x_1, \dots, x_8) \in \mathbb{Z}^8) are there such that nn can be written as n=x12++x82n = x_1^2 + \dots + x_8^2.

Our basic computer programming knowledge tells us that we can design a naive algorithm as follows:

n = int(input())
ans = 0
for x1 in range(-n, n+1):
    for x2 in range(-n, n+1):
        for x3 in range(-n, n+1):
            for x4 in range(-n, n+1):
                for x5 in range(-n, n+1):
                    for x6 in range(-n, n+1):
                        for x7 in range(-n, n+1):
                            for x8 in range(-n, n+1):
                                if x1**2 + x2**2 + x3**2 + x4**2 + x5**2 + x6**2 + x7**2 + x8**2 == n:
                                    ans += 1
print(ans)

Of course, this takes a time complexity of O(n8)O(n^8), which is not very good. A basic obvious improvement is to observe that x20x^2 \geq 0 for all xNx \in \mathbb{N}, so in order for a tuple to work, no entry can be larger than n\sqrt{n} because otherwise (say, if x1>nx_1 > \sqrt{n}) we would have x12++x82>n+x22++x82n+1x_1^2 + \dots + x_8^2 > n+x_2^2 + \dots + x_8^2 \geq n+1. This allows us to decrease the loop to the size of O(n)O(\sqrt{n}).

n = int(input())
ans = 0
k = int(n**0.5)+1
for x1 in range(-k, k+1):
    for x2 in range(-k, k+1):
        for x3 in range(-k, k+1):
            for x4 in range(-k, k+1):
                for x5 in range(-k, k+1):
                    for x6 in range(-k, k+1):
                        for x7 in range(-k, k+1):
                            for x8 in range(-k, k+1):
                                if x1**2 + x2**2 + x3**2 + x4**2 + x5**2 + x6**2 + x7**2 + x8**2 == n:
                                    ans += 1
print(ans)

This takes time O(n4)O(n^4), a bit better, huh?

Obviously, from the eye of an algorithmist, we can do better. One direct way when we see something in this form is to do the meet-in-the-middle trick: instead of trying to solve for x12++x82=nx_1^2 + \dots + x_8^2 = n, we solve for x12+x22+x32+x42=mx_1^2 + x_2^2 + x_3^2 + x_4^2 = m and x52+x62+x72+x82=nmx_5^2 + x_6^2 + x_7^2 + x_8^2 = n-m, count the number of solutions to this, and sum over the scenarios for each m{0,,n}m \in \{0, \dots, n\}. Let us try to implement.

n = int(input())

def count_sol(m):
    k = int(m**0.5)+1
    local_ans = 0
    for a in range(-k, k+1):
        for b in range(-k, k+1):
            for c in range(-k, k+1):
                for d in range(-k, k+1):
                    if a**2 + b**2 + c**2 + d**2 == m:
                        local_ans += 1
    return local_ans

ans = 0
for m in range(n+1):
    ans += count_sol(m) * count_sol(n-m)
print(ans)

The count_sol subroutine runs in time O(m4)=O(m2)O(\sqrt{m}^4) = O(m^2). We repeat the process 2(n+1)2(n+1) times, each of mnm \leq n so the total time is in 2(n+1)O(n2)=O(n3)2(n+1) \cdot O(n^2) = O(n^3).

Can we do better? Still yes. Skilled algorithmists might even come up of this formulation immediately (note: I’m not that good though — I asked this problem to a bunch of friends and they immediately gave me the following formulation). We formulate this in terms of knapsack problem. Define dpi,j\mathrm{dp}_{i,j} to be the quantity #{(x1,,xi)Zi ⁣:k=1ixk2=j}\#\{(x_1, \dots, x_i) \in \mathbb{Z}^i \colon \sum_{k=1}^i x_k^2 = j\}, and look at what we can do about its recurrence.

In order to know the quantity dpi,j\mathrm{dp}_{i,j}, we can enumerate all possible xix_i’s and observe that

{(x1,,xi)Zi ⁣:k=1ixk2=j}=0xij{(x1,,xi1)Zi1 ⁣:k=1ixk2=j}×{(xi),(xi)}=0xij{(x1,,xi1)Zi1 ⁣:k=1i1xk2=jxi2}×{(xi),(xi)}.\begin{align*} &\{(x_1, \dots, x_i) \in \mathbb{Z}^i \colon \sum_{k=1}^i x_k^2 = j\} \\ &= \bigcup_{0 \leq x_i \leq \sqrt{j}} \{(x_1, \dots, x_{i-1}) \in \mathbb{Z}^{i-1} \colon \sum_{k=1}^{i} x_k^2 = j\} \times \{(x_i), (-x_i)\} \\ &= \bigcup_{0 \leq x_i \leq \sqrt{j}} \{(x_1, \dots, x_{i-1}) \in \mathbb{Z}^{i-1} \colon \sum_{k=1}^{i-1} x_k^2 = j-x_i^2\} \times \{(x_i), (-x_i)\}. \end{align*}

Observe that the union in disjoint, so the cardinality of the whole is the sum of the cardinality of each case, so

dpi,j=1aj2dpi1,ja2+dpi1,j.\mathrm{dp}_{i,j} = \sum_{1 \leq a \leq \sqrt{j}} 2\mathrm{dp}_{i-1, j-a^2} + \mathrm{dp}_{i-1,j}.

With this formula, we can implement our dynamic programming algorithm as follows.

n = int(input())
dp = [[0 for j in range(n+1)] for i in range(9)]
dp[0][0] = 1
for i in range(1, 9):
    for j in range(0, n+1):
        dp[i][j] += dp[i-1][j]
        for a in range(1, int(j**0.5)+1):
            if j >= a**2:
                dp[i][j] += 2*dp[i-1][j-a**2]
print(dp[8][n])

This takes time O(n3/2)O(n^{3/2}), which is already very much improved from before. Is this the end? Can we do better? An algorithmist might give up at this point, but I refuse to give up just yet! Can we use some mathematical magic to improve this even more? It turns out that we can!

In this article, I’m going to invite you to the fascinating (quite relatively deep) theory of modular forms, not as just an algorithmist, but more as an algorithmic number theorist.

The prerequisites may be quite demanding in order to fill in all the gaps, but I intend to make this suitable to most general audiences, so for deep results from mathematics, if you’re not used to it, please feel free to “just assume the claim is true, move on to see how things fit, and try to understand the proof later”. The true prerequisites are:

Acknowledgement

I’d like to dedicate this section to thank my professor, Emmanuel Kowalski, for teaching me the basics of number theory and modular forms. I want to also thank a friend of mine, Pitchayut (Mark) Saengrungkongka, for helping me through mathematical obstacles during my studies. Finally, after all the messes I’ve made in my life, I’d like to thank everyone who still believes in me, who still believes that I can become a mathematician too, despite my sluggish brain and the distractions that slowed me down.

Disclaimer. Any mistakes found in this article belong to me. I’m more than happy to receive feedback and report of errors.

Now let us begin.

Modular Forms

Note. The main reference for this whole section is the course Number Theory II: Introduction to Modular Forms given by Emmanuel Kowalski (see https://metaphor.ethz.ch/x/2025/fs/401-3112-23L/). Other small references will be given throughout the article.

It is quite surprising that modular form is a structure that is studied extensively and, even though initially it is of number theoretic nature, it now has applications in very different areas of mathematics. When I first see it, it looks very artificial, as if it comes out of nowhere. (Who on earth would have think up of this structure???) The following quote is taken from Martin Eichler.

There are five fundamental operations in mathematics: addition, subtraction, multiplication, division and modular forms.

So what is a modular form? In simple terms, it is a function f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} that satisfies the following property (here H:={zC ⁣:Im(z)>0}\mathbb{H} := \{z \in \mathbb{C} \colon \operatorname{Im}(z) > 0\} is the Poincaré upper-half plane):

There exists an integer k0k \geq 0 such that f(az+bcz+d)=(cz+d)kf(z)f(\frac{az+b}{cz+d}) = (cz+d)^kf(z) for all zHz \in \mathbb{H} and for all a,b,c,dZa, b, c, d \in \mathbb{Z} such that adbc=1ad-bc = 1.

If furthermore, ff is holomorphic everywhere in H\mathbb{H}, we say that it is a holomorphic modular form of weight kk and of level 11. For meromorphic modular forms, we don’t require ff to be defined on the whole H\mathbb{H} — there can be isolated poles, only that the relation f(az+bcz+d)=(cz+d)kf(z)f(\frac{az+b}{cz+d}) = (cz+d)^kf(z) holds for all zz such that f(z)f(z) is defined (i.e. not a pole). We will generalize to other levels later.

Now we ramp up the machinery a bit for some spiciness. Define GLn(K)\mathrm{GL}_n(K) as the set of n×nn \times n invertible matrices with entry in the field KK. Equivalently, it is the set of n×nn \times n matrices with nonzero determinant. Since det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B) for square matrices A,BA, B of the same size, and by invertibility, we see clearly that GLn(K)\mathrm{GL}_n(K) has a group structure defined by multiplication. Now, we define a subset SLn(K)GLn(K)\mathrm{SL}_n(K) \subseteq \mathrm{GL}_n(K) as the subset of matrices with determinant one. This subset is also a group, so it’s a subgroup. We can generalize this a bit for the case that KK is not a field, for GLn(K)\mathrm{GL}_n(K) it looks like we run into an immediate problem: it’s not a group anymore. However, fortunately, for SL2(Z)\mathrm{SL}_2(\mathbb{Z}), there is no problem. If (abcd)SL2(Z)\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \mathrm{SL}_2(\mathbb{Z}) is a matrix with determinant one, i.e. adbc=1ad-bc = 1, then consider the matrix (dbca)\begin{pmatrix}-d & b\\c & -a\end{pmatrix}. We have

(abcd)(dbca)=(ad+bcabbacd+dcbcad)=(1001).\begin{pmatrix}a & b\\c & d\end{pmatrix}\begin{pmatrix}-d & b\\c & -a\end{pmatrix} = \begin{pmatrix}-ad+bc &ab-ba\\-cd +dc & bc -ad\end{pmatrix} = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}.

This means (dbca)\begin{pmatrix}-d & b\\c & -a\end{pmatrix} is the inverse of (abcd)\begin{pmatrix}a & b\\c & d\end{pmatrix}! So SL2(Z)\mathrm{SL}_2(\mathbb{Z}) forms a group.

Now, define a group action k|_k of SL2(Z)\mathrm{SL}_2(\mathbb{Z}) acting from the right on the space of functions HC\mathbb{H} \to \mathbb{C} by

fk(abcd):=(zf(az+bcz+d)(cz+d)k).f \Big\vert_k \begin{pmatrix}a & b\\c & d\end{pmatrix} := \left(z \mapsto f\left(\frac{az+b}{cz+d}\right)(cz+d)^{-k}\right).

It restricts to the space of holomorphic functions. Now, in fancier words, we can define modular forms of weight kk and of level 11 to be functions f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} such that fkg=ff|_kg = f for all gSL2(Z)g \in \mathrm{SL}_2(\mathbb{Z}), i.e., it is the subset of (SL2(Z),k)(\mathrm{SL}_2(\mathbb{Z}), |_k)-invariant functions. In particular, it is enough to show that ff is invariant under a generating set of SL2(Z)\mathrm{SL}_2(\mathbb{Z}) to conclude that it is invariant under the whole group. We will use this fact later.

Definition. (Modular form of weight kk and of level 11) A function f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} is said to be a modular form of weight kk and of level 11 if ff is invariant under the group action k|_k of the group SL2(Z)\operatorname{SL}_2(\mathbb{Z}).

Now, consider the action of SL2(Z)\mathrm{SL}_2(\mathbb{Z}) on H\mathbb{H} defined by (abcd)z=az+bcz+d\begin{pmatrix}a & b\\c & d\end{pmatrix} \cdot z = \frac{az+b}{cz+d}. This action sends point of H\mathbb{H} to another (or maybe the same) point of H\mathbb{H}. We can start by just being randomly curious about this and ask interesting questions on this action. First of all, given a fixed zHz \in \mathbb{H}, what is the orbit of this point by the action? Well, we can pick (1101)\begin{pmatrix}1 & 1\\0 & 1\end{pmatrix} from SL2(Z)\mathrm{SL}_2(\mathbb{Z}) and observe that it sends zz to 1z+10z+1=z+1\frac{1z+1}{0z+1} = z+1. Therefore, z+nz+n is inside the orbit, for all nZn \in \mathbb{Z}. Of course, this is not the end. Let’s try something different. Try (0110)SL2(Z)\begin{pmatrix}0 & -1\\1 & 0\end{pmatrix} \in \mathrm{SL}_2(\mathbb{Z}). This sends z1zz \mapsto \frac{-1}{z}. It may be hard to visualize at first, but observe that if z>1|z| > 1, it sends to 1z=1z<1|\frac{-1}{z}| = \frac{1}{|z|} < 1. This means, a point outside the unit disk can be sent into somewhere inside the unit disk. This allows us to conclude that “it is enough to just consider points inside a strip [z0,z0+1][z_0, z_0+1] because the points outside this strip can be reached from one inside this strip by applying the group action” and also “it is enough to just consider points inside the unit disk because points outside the unit disk can be reached by the z1/zz \mapsto -1/z transformation”. Turns out this is something that we’ll state it in more precise terms and use it repeatedly afterwards (but for reasons that will come later, instead of inside the disk, we’d rather use outside the disk).

The Fundamental Domain

Definition. The fundamental domain is defined as D:={zH ⁣:z1;Re(z)12}\mathcal{D} := \{z \in \mathbb{H} \colon |z| \geq 1; |\operatorname{Re}(z)| \leq \frac{1}{2}\}.

Theorem 1 [Serre, Ch. VII, §1, Th 1. and 2.].

  1. For every zHz \in \mathbb{H} there exists gSL2(Z)g \in \mathrm{SL}_2(\mathbb{Z}) such that gzDgz \in \mathcal{D}.
  2. Suppose z,zDz, z' \in \mathcal{D} are two distinct points, such that there exists gSL2(Z)g \in \mathrm{SL}_2(\mathbb{Z}) with z=gzz' = g \cdot z, then either (a) Re(z){12,12}\operatorname{Re}(z) \in \{-\frac{1}{2}, \frac{1}{2}\} and z=z±1z = z' \pm 1, or (b) z=1|z| = 1 and z=1zz' = \frac{-1}{z}.
  3. For all zDz \in \mathcal{D}^\circ, StabSL2(Z)(z)={id}.\operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z) = \{\operatorname{id}\}.
  4. The group SL2(Z)\mathrm{SL}_2(\mathbb{Z}) is generated by S=(0110)S = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix} and T=(1101)T = \begin{pmatrix}1 & 1\\0 & 1\end{pmatrix}.

Proof. Define G:=S,TSL2(Z)G' := \langle S, T \rangle \leq \mathrm{SL}_2(\mathbb{Z}). We will eventually show that G=SL2(Z)G' = \mathrm{SL}_2(\mathbb{Z}) later, but now let us show 1. Take zHz \in \mathbb{H}, and let g=(abcd)Gg = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in G'be arbitrary first (we will fix it later). We have

Im(gz)=Im(az+bcz+d)=Im((az+b)(cz+d)cz+d2)=1cz+d2Im(acz2+bcz+adz+bd)=1cz+d2Im(bcz+adz)=1cz+d2(adbc)Im(z)=Im(z)cz+d2.\begin{align*} \operatorname{Im}(gz) &= \operatorname{Im}\left(\frac{az+b}{cz+d}\right) \\ &= \operatorname{Im}\left(\frac{(az+b)\overline{(cz+d)}}{|cz+d|^2}\right) \\ &= \frac{1}{|cz+d|^2}\operatorname{Im}(a\overline{c}|z|^2+b\overline{cz} + a\overline{d}z + b\overline{d})\\ &= \frac{1}{|cz+d|^2} \operatorname{Im}(bc\overline{z}+adz)\\ &= \frac{1}{|cz+d|^2} (ad-bc) \operatorname{Im}(z)\\ &= \frac{\operatorname{Im}(z)}{|cz+d|^2}. \end{align*}

(We used the simple facts that a,b,c,dZa, b, c, d \in \mathbb{Z}, so c=c;d=d\overline{c} = c; \overline{d} = d. We know that adbc=1ad-bc = 1, and we know that Im(z)=Im(z)\operatorname{Im}(\overline{z}) = -\operatorname{Im}(z).) Now, the idea is that for a given zz, we will try to send it as far away (from the origin) as possible, then slide back by repeatedly applying TT until it stays in D\mathcal{D}. For any fixed M>0M > 0, The set Sz,M:={(c,d)Z2 ⁣:cz+dM}S_{z, M} :=\left\{(c, d) \in \mathbb{Z}^2 \colon |cz+d| \leq M\right\} is finite. To see this, write z=x+iyz = x+iy and observe that cz+dM|cz+d| \leq M if and only if M2cz+d2=c(x+iy)+d2=(cx+d)2+(cy)2M^2 \geq |cz+d|^2 = |c(x+iy)+d|^2 = (cx+d)^2 + (cy)^2. So if (c,d)(c, d) is in that set, then cMy|c| \leq \frac{M}{|y|} (finitely many choices), and for a fixed cc, the integer dd must satisfy cx+dM|cx+d| \leq M, that is, d[cxM,cx+M]d \in [-cx-M, -cx+M], which contains finitely many integers. This proves that the set Sz,MS_{z, M} is finite. If we fix M=zM = |z|, then we see that (1,0)Sz,M(1, 0) \in S_{z, M}, so the set is not empty. Furthermore, we’re looking for such pair with smallest cz+d|cz+d|, so enlarging MM wouldn’t help. This means min(c,d)Z2cz+d\min_{(c,d) \in \mathbb{Z}^2} |cz+d| exists, with (c,d)Sz,z(c, d) \in S_{z, |z|}. However, we want to only look at pairs corresponding to elements of GG', so we consider the set

S~z:={(abcd)G ⁣:cz+dz}.\tilde{S}_z := \left\{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in G' \colon |cz+d| \leq |z|\right\}.

Well, SS~zS \in \tilde{S}_z so it is not empty, and there are only finitely many choices of (c,d)(c, d) here, so, to minimize cz+d|cz+d|, we loop over all possible pairs in Sz,zS_{z,|z|}, and check if there exists a pair (a,b)(a, b) making (abcd)S~z\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \tilde{S}_z. We don’t need to loop over all possible (a,b)(a, b). Just that if one exists, we track the minimum over such cz+d|cz+d|. This shows that there exists gS~zg \in \tilde{S}_z minimizing cz+d|cz+d|.

Now that we got gGg \in G', we want to slide gzgz back into the strip [12,12]+iR>0[-\frac{1}{2},\frac{1}{2}]+i\mathbb{R}_{>0}. Indeed, pick nZn \in \mathbb{Z} such that n+gz[12,12]n+gz \in [-\frac{1}{2},\frac{1}{2}], and so Tngz[12,12]T^ngz \in [-\frac{1}{2},\frac{1}{2}]. Furthermore, we have Tngz1|T^ngz| \geq 1 because otherwise Tngz<1|T^ngz| < 1 and so when we consider STngGST^ng \in G' we get

Im(STngz)=Im(Tngz)Tngz>Im(Tngz)=Im(gz),\operatorname{Im}(ST^ngz) = \frac{\operatorname{Im}(T^ngz)}{|T^ngz|} > \operatorname{Im}(T^ngz) = \operatorname{Im}(gz),

a contradiction of the maximality of Im(gz)\operatorname{Im}(gz) for gGg \in G'. This shows that TngzDT^ngz \in \mathcal{D}.

Phew. That was long. Now let us move to the second point (which is even longer, because I want to make it very clear to myself, and not leave the casework to the reader). We want to now show that if two points inside the fundamental domain are congruent modulo GG', then they must be in a few special cases only. Suppose z,zDz, z' \in \mathcal{D} are two distinct points and (abcd)=gG\begin{pmatrix}a & b\\c & d\end{pmatrix} =g \in G' such that z=gzz' = gz. Assume without loss of generality that Im(z)Im(z)\operatorname{Im}(z) \leq \operatorname{Im}(z') (otherwise just swap zz and zz'). And so Im(z)Im(gz)=Im(z)cz+d2\operatorname{Im}(z) \leq \operatorname{Im}(gz) = \frac{\operatorname{Im}(z)}{|cz+d|^2}, so cz+d1|cz+d| \leq 1. Write now z=x+iyz = x+iy with x12|x| \leq \frac{1}{2} and y>32y > \frac{\sqrt{3}}{2}. If c2|c| \geq 2 , then

cz+d2=cx+icy+d2=(cx+d)2+(cy)2c2y243222=3,\begin{align*} |cz+d|^2 &= |cx+icy+d|^2 = (cx+d)^2 + (cy)^2 \\ &\geq c^2y^2 \geq 4 \frac{\sqrt{3}^2}{2^2} = 3, \end{align*}

a contradiction. So c<2|c| < 2, i.e. c{1,0,1}c \in \{-1, 0, 1\}. If c=0c = 0 then cz+d1|cz+d| \leq 1 implies d{1,1}d \in \{-1, 1\} (remember that c=d=0c = d = 0 is not possible since this means detg=0\det g = 0, impossible in GSL2(Z)G' \subseteq \mathrm{SL}_2(\mathbb{Z})). Furthermore, in SL2(Z)\mathrm{SL}_2(\mathbb{Z}), we have adbc=1ad - bc = 1 so for c=0c = 0, ad=1ad = 1 means a=da = d. This means z=gz=az+bd=z±(integer)z' = gz = \frac{az+b}{d} = z \pm (\text{integer}), but z,zDz, z' \in \mathcal{D} so such integer must be ±1\pm 1 only. And in this case, Re(z)Re(z)=1|\operatorname{Re}(z)-\operatorname{Re}(z')| = 1, so they must lie on somewhere inside the vertical boundaries of D\mathcal{D}. This concludes that in case c=0c = 0, we have Re(z)=±12\operatorname{Re}(z) = \pm \frac{1}{2} and z=z±1z' = z \pm 1, i.e., case (a) in the theorem.

Now, for the case that c{1,1}c \in \{-1, 1\}, we have 1cz+d=cx+icy+d=(cx+d)2+y2(cx+d)2+(32)2=(cx+d)2+34.1 \geq |cz+d| = |cx+icy+d| = (cx+d)^2 + y^2 \geq (cx+d)^2 + (\frac{\sqrt{3}}{2})^2 = (cx+d)^2 + \frac{3}{4}. This means cx+d12|cx+d| \leq \frac{1}{2}. This means dd cannot be too far away, say, d>1|d| > 1 is not possible, because then cx[d12,d+12]cx \in [-d-\frac{1}{2}, -d+\frac{1}{2}] is impossible for cx=x12|cx| = |x| \leq \frac{1}{2}. For d=1d = -1, the only possible place for cxcx to be is 12\frac{1}{2}, and in this case, 1cz+d2=(cx+d)2+y2=14+y21 \geq |cz+d|^2 = (cx+d)^2 + y^2 = \frac{1}{4} + y^2, so 32y32\frac{\sqrt{3}}{2} \leq y \leq \frac{\sqrt{3}}{2}, i.e., y=32y = \frac{\sqrt{3}}{2}, and for d=1d = 1, cx=12cx = -\frac{1}{2}, and 1cz+d2=14+y21 \geq |cz+d|^2 = \frac{1}{4}+y^2 so y=32y = \frac{\sqrt{3}}{2} as well. In these two subcases, we have z=±12+i32z = \pm \frac{1}{2} + i\frac{\sqrt{3}}{2}, and so z=1|z| = 1.

In both subcases, we have d+2cx=0d+2cx = 0. Consider the fact that gz=zDgz = z' \in \mathcal{D}, so Re(gz)12|\operatorname{Re}(gz)| \leq \frac{1}{2}, i.e. that

12Re(az+bcz+d)=Re((az+b)(cz+d)(az+b)(cz+d))=1cz+d2Re((a(x+iy)+b)(c(xiy)+d))=1cz+d2(ax+b)(cx+d)+acy2.\begin{align*} \frac{1}{2} &\geq \left|\operatorname{Re}\left(\frac{az+b}{cz+d}\right)\right| \\ &= \left|\operatorname{Re}\left(\frac{(az+b)\overline{(cz+d)}}{(az+b)\overline{(cz+d)}}\right)\right| \\ &= \frac{1}{|cz+d|^2} |\operatorname{Re}((a(x+iy)+b)(c(x-iy)+d))| \\ &= \frac{1}{|cz+d|^2} |(ax+b)(cx+d) + acy^2|. \end{align*}

But we knew that cz+d1|cz+d| \leq 1, so 12acx2+bcx+adx+bd+acy2\frac{1}{2} \geq |acx^2+bcx+adx+bd+acy^2|. In both subcases, 1=z2=x2+y21 = |z|^2 = x^2+y^2, so we have 12ac+(bc+ad)x+bd=ac+(2bc+1)x+bd=b(d+2cx)+ac+x=ac+x\frac{1}{2} \geq |ac + (bc+ad)x + bd| = |ac + (2bc+1)x + bd| = |b(d+2cx) + ac+x| = |ac+x|. If a2|a| \geq 2, then 2ac+xxac+x+x=ac+x+122 \leq |ac+x-x| \leq |ac+x| + |x| = |ac+x| + \frac{1}{2}, a contradiction, so a{1,0,1}a \in \{-1, 0, 1\}.

Observe that in both subcases, 2cxd=12cxd = -1. If b0b \ne 0, then ad=1+bcad = 1+bc means ad1ad-1 also cannot be zero, i.e. ad1ad \ne 1. If a=da = -d then 12ac+x=dc+x=cdx+x2x=12+14±12=32\frac{1}{2} \geq|ac+x| = |-dc+x| = |\frac{-cdx+x^2}{x}| = |\frac{\frac{1}{2}+\frac{1}{4}}{\pm\frac{1}{2}}| = \frac{3}{2}, a contradiction, so the only option when b0b \ne 0 is that a=0a = 0. In such scenario, 1=adbc=bc1 = ad-bc = -bc so b=cb = -c. Now for b=0b = 0, we have 1=adbc=ad1 = ad - bc = ad so a=da = d.

To summarize, for the case (b.1) that c{1,1}c \in \{-1, 1\} and d0d \ne 0, we have deduced the candidates for gg:

g{(0ccd) ⁣:cx=d2}{(d0cd) ⁣:cx=d2}.g \in \left\{\begin{pmatrix}0 & -c\\c & d\end{pmatrix} \colon cx = -\frac{d}{2}\right\} \cup \left\{\begin{pmatrix}d & 0\\c & d\end{pmatrix} \colon cx = -\frac{d}{2}\right\}.

If gg is in the first set, gz=ccz+d=cc(x+iy)+d=cd2+ic32=1d2c+i32=1x+iy=1xiy=1z=zz2=zgz = \frac{-c}{cz+d} = \frac{-c}{c(x+iy)+d} = \frac{-c}{\frac{d}{2}+ic\frac{\sqrt{3}}{2}} = \frac{-1}{\frac{d}{2c}+i\frac{\sqrt{3}}{2}} = \frac{-1}{-x+iy} = \frac{1}{x-iy} = \frac{1}{\overline{z}} = \frac{z}{|z|^2} = z, a contradiction with the fact that zzz' \ne z. If gg is in the second set, gz=dzcz+d=dzcz+dcz+dcz+d=(dz)(cz+d)cz+d2=dcz2+d2z=cd+z=2x+x+iy=x+iy=zˉ=zzˉz=1zgz = \frac{dz}{cz+d} = \frac{dz}{cz+d} \frac{\overline{cz+d}}{\overline{cz+d}} = \frac{(dz)(\overline{cz+d})}{|cz+d|^2} = dc|z|^2 + d^2z = cd+z = -2x+x+iy = -x+iy = -\bar{z} = -\frac{z\bar{z}}{z} = -\frac{1}{z}, and now it is clear that g=(102x1)g = \begin{pmatrix}1 & 0\\-2x & 1\end{pmatrix} or g=(102x1)g = \begin{pmatrix}-1 & 0\\2x & -1\end{pmatrix} for the case (b.1).

Now we’re left with the case that c{1,1}c \in \{-1, 1\} and d=0d = 0 (b.2). In this case, 1=adbc=bc1=ad-bc=-bc so b=cb = -c. We’ve assumed that 1cz+d1 \geq |cz+d|, but for d=0d = 0 and c=1|c| = 1, this means z=1|z| = 1. Recall that Re(gz)12|\operatorname{Re}(gz)| \leq \frac{1}{2} implies that 12acx2+bcx+adx+bd+acy2\frac{1}{2} \geq |acx^2+bcx+adx+bd+acy^2|. Plug in what we have to get 12acx2c2x+acy2=ac(x2+y2)x.\frac{1}{2} \geq |acx^2 -c^2x +acy^2| = |ac(x^2+y^2) - x|. Suppose a2|a| \geq 2, then, we have, by triangle inequality, that ac(x2+y2)ac(x2+y2)x+x|ac(x^2+y^2)| \leq |ac(x^2+y^2)-x| +|x|, and so

12ac(x2+y2)xac(x2+y2)x=acz2x21112=32,\begin{align*} \frac{1}{2} &\geq |ac(x^2+y^2)-x| \\ &\geq |ac(x^2+y^2)|-|x| \\ &= |a| |c| |z|^2 -|x| \\ &\geq 2 \cdot 1 \cdot 1 - \frac{1}{2} = \frac{3}{2}, \end{align*}

a contradiction. So, a{1,0,1}a \in \{-1, 0, 1\}. If a0a \ne 0, then a=±1a = \pm 1, so ac=±1ac = \pm 1, and 12Re(gz)=1cz+d2(ax+b)(cx+d)+acy2=(axc)(cx)+acy2=acx2+acy2x=acz2x=acx\frac{1}{2} \geq |\operatorname{Re}(gz)| = \frac{1}{|cz+d|^2} |(ax+b)(cx+d) + acy^2| = |(ax-c)(cx) + acy^2| = |acx^2 + acy^2 - x| = |ac|z|^2 - x| = |ac-x| would mean xx must be near ac=±1ac = \pm 1 with distance no more than 12\frac{1}{2}. The only case is back to when x=12|x| = \frac{1}{2}, and this shows that y=32y = \frac{\sqrt{3}}{2} and Re(gz)=12|\operatorname{Re}(gz)| = \frac{1}{2} also in this case. Furthermore, if ac=1ac = -1 then x=12x = -\frac{1}{2} and if ac=1ac = 1 then x=12x = \frac{1}{2}. We can summarize this as 2acx=12acx=1 (or equivalently ac=2xac = 2x). This shows that gz2=azc2cz2=(axc)2+(ay)2=22acx=1|gz|^2 = \frac{|az-c|^2}{|cz|^2} = (ax-c)^2 + (ay)^2 = 2-2acx = 1, so gz=1|gz|=1 also. Now, we have

gz=azccz=ac1z=ac1z=2x1z=z+zˉ+1z=z2+z21z=z2z=z,\begin{align*} gz &= \frac{az-c}{cz} = \frac{a}{c}-\frac{1}{z} = ac-\frac{1}{z} = 2x-\frac{1}{z} \\ &= z+\bar{z}+\frac{1}{z} = \frac{z^2+|z|^2-1}{z} = \frac{z^2}{z} = z, \end{align*}

a contradiction. This rules out a=±1a = \pm 1. Now let us check a=0a = 0. We have

gz=ccz=1z.gz = \frac{-c}{cz} = \frac{-1}{z}.

This shows that for c0c \ne 0, z=1|z| = 1 and z=1zz' = -\frac{1}{z}, and completes the case (b).

Now the third part follows by the same argument as in the previous part: Take zDz \in \mathcal{D}^\circ. Suppose g=(abcd)StabSL2(Z)(z)g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z), then gz=zgz = z. We have Im(gz)=Im(z)cz+d2=Im(z)\operatorname{Im}(gz) = \frac{\operatorname{Im}(z)}{|cz+d|^2} = \operatorname{Im}(z), so cz+d=1|cz+d|=1. By the same argument as before, it is impossible for c2|c| \geq 2, so we know that c{1,0,1}c \in \{-1, 0, 1\}. If c0c \ne 0 then we follow the same argument as before to deduce that z=±12+32iz = \pm \frac{1}{2} + \frac{\sqrt{3}}{2}i, a contradiction since we’ve assumed that zz is in the interior, so we conclude that c=0;d=1c = 0; d = 1, and adbc=1ad-bc = 1 implies a=1a = 1. This means z=gz=az+bcz+d=z+bz = gz = \frac{az+b}{cz+d} = z+b, so b=0b = 0 and this tells us that StabSL2(Z)(z){id}\operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z) \subseteq \{\operatorname{id}\}.

Now let us show the fourth part, that G=SL2(Z)G' = \mathrm{SL}_2(\mathbb{Z}). Take gSL2(Z)g \in \mathrm{SL}_2(\mathbb{Z}) and let us show that gGg \in G'. Choose z0:=2iDz_0 := 2i \in \mathcal{D}^\circ, and let z=gz0Hz = gz_0 \in \mathbb{H}. We’ve seen that by 1., there exists gGg' \in G' such that gzDg'z \in \mathcal{D}. The points z0z_0 and gz=ggz0g'z = g'gz_0 are both in D\mathcal{D} and are congruent modulo GG'. But z0z_0 is not on the boundary, so both points fail to be distinct. Therefore, ggz0=z0Dg'gz_0 = z_0 \in \mathcal{D}^\circ. So ggStabSL2(Z)(z0)={id}g'g \in \operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z_0) = \{\operatorname{id}\}, i.e. g=g1Gg = g'^{-1} \in G'. This shows that SL2(Z)G\operatorname{SL}_2(\mathbb{Z}) \subseteq G' and hence completes the proof. \square

We now turn to the more restricted notion: holomorphic modular forms which is also defined and holomorphic at infinity.

At Infinity

The real line R\mathbb{R} is not compact. The sequence 1,2,3,1, 2, 3, \dots has no accumulation point in R\mathbb{R}. However, we can compactify it by adding a point at infinity, say, define R^:=R{}\hat{\mathbb{R}} := \mathbb{R} \cup \{\infty\}. In the same way, the complex plane is not compact, but we can think of it as a sphere with one punctured hole corresponding to \infty.

Riemann Sphere

Riemann Sphere, GKFXtalk, de:User:Bjoern_klipp, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

Now, with this picture in mind, we go back to modular forms and try to complete them at infinity.

Proposition 2. Let ff be a meromorphic modular form of weight kk, then there exists a meromorphic function f~ ⁣:D×{isolated poles}C\tilde{f} \colon \mathbb{D}^\times \setminus \{\text{isolated poles}\}\to \mathbb{C} such that f(z)=f~(e2πiz)f(z) = \tilde{f}(e^{2\pi i z}) for all zH{isolated poles}z \in\mathbb{H} \setminus \{\text{isolated poles}\}, where D×:={zC ⁣:0<z<1}\mathbb{D}^\times := \{z \in \mathbb{C} \colon 0 < |z| < 1\}.

Proof. First, for any aRa \in \mathbb{R}, there is a bijection

Φa ⁣:{StripaD×{re2πia ⁣:0<r<1}ze2πiz\Phi_a \colon \begin{cases}\text{Strip}_a &\to \mathbb{D}^\times \setminus \{re^{2\pi i a} \colon 0 < r < 1\}\\ z &\mapsto e^{2\pi i z}\end{cases}

where Stripa:={zC ⁣:Re(z)(a,a+1);Im(z)(0,+)}\text{Strip}_a :=\{z \in \mathbb{C} \colon \operatorname{Re}(z) \in (a, a+1); \operatorname{Im}(z) \in (0, +\infty)\}, which is very nice in our use case. Furthermore, Φ\Phi is holomorphic everywhere in its strip, and we can compute the derivative as Φ(z)=2πie2πiz\Phi'(z) = 2\pi i e^{2\pi i z} which never vanishes in the strip. This shows that Φ\Phi is a conformal equivalence. Now, we can define f~\tilde{f} by patching up two charts. The first chart is Φ12\Phi_{-\frac{1}{2}}sending our favorite strip to D×(1,0)\mathbb{D}^\times \setminus (-1, 0), and the second chart is Φ12\Phi_{\frac{1}{2}} sending an offset strip to D×(0,1)\mathbb{D}^\times \setminus (0, 1). We define f~(q)\tilde{f}(q) as f(Φ121(q))f(\Phi_{-\frac{1}{2}}^{-1}(q)) whenever qD×(1,0)q \in \mathbb{D}^\times \setminus (-1, 0) and define as f(Φ01(q))f(\Phi_0^{-1}(q)) whenever qD×(0,1)q \in \mathbb{D}^\times \setminus (0, 1). Observe that this is well-defined (except for some not too many points that are not defined even on ff since it’s meromorphic) since the values agree on the intersection. Indeed, if qD×(1,1)q \in \mathbb{D}^\times \setminus (-1, 1) then z12:=Φ121(q)z_{-\frac{1}{2}} := \Phi_{-\frac{1}{2}}^{-1}(q) and z0:=Φ01(q)z_0 := \Phi_0^{-1}(q) satisfies e2πiz1/2=e2πiz0e^{2\pi iz_{-1/2}} = e^{2\pi i z_0}, so e2πi(z1/2z0)=1e^{2 \pi i(z_{-1/2}-z_0)} = 1. This means n:=z1/2z0Zn := z_{-1/2}-z_0 \in \mathbb{Z}, and so f(z1/2)=f(z0+n)=f((1n01)z0)=(0z0+1)kf(z0)=f(z0)f(z_{-1/2}) = f(z_0 + n) = f\left(\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} z_0\right) = (0z_0+1)^k f(z_0) = f(z_0) by the modularity of ff (if it is defined there). The two charts extend each other to f~ ⁣:D×C\tilde{f} \colon \mathbb{D}^\times \to \mathbb{C}, which is meromorphically defined, such that f(z)=f~(e2πiz)f(z) = \tilde{f}(e^{2\pi iz}) for all zHz \in \mathbb{H} wherever defined. (One can check this by sliding zz back to our favorite strip or the offset strip, and note that ff is periodic due to modularity, and ze2πizz \mapsto e^{2\pi iz} is already periodic by default) \square

Note. Inspecting the proof carefully, we haven’t actually used the (rather strong) fact that ff is modular. We’ve only used the fact that ff is periodic. We will recall this construction for “just periodic, but not necessarily modular” meromorphic functions later on.

So now the disk D×\mathbb{D}^\times is our model for H\mathbb{H}. Think of it as the upper hemisphere of the Riemann sphere, with the center removed. Do you see the point of doing this now? If we can complete f~\tilde{f} at the hole, we would complete ff at infinity! If there is a meromorphic continuation of f~\tilde{f} to all of D:=D×{0}\mathbb{D} := \mathbb{D}^\times \cup \{0\}, then we’re done.

Definition. A meromorphic modular form ff is said to be meromorphic at infinity if there exists a meromorphic continuation of f~\tilde{f} to D\mathbb{D}. It is holomorphic at infinity if f~\tilde{f} is holomorphic at 00. If f~\tilde{f} is defined at 00, we may write f()f(\infty) to denote f~(0)\tilde{f}(0).

Fourier Expansion

For meromorphic modular form that is also meromorphic at infinity, by complex analysis, we can expand f~\tilde{f} into a Laurent series f~(q)=nkanqn\tilde{f}(q) = \sum_{n \geq k} a_n q^n for some kZk \in \mathbb{Z} (maybe negative) and for some sequence (an)nk(a_n)_{n \geq k} of complex numbers with ak0a_k \ne 0, such that equality holds for all qq in a neighborhood of 00 (except maybe zero itself if k<0k < 0). Such expansion is unique, i.e., for a given ff, the pair (k,(an)nk)(k, (a_n)_{n \geq k}) is uniquely defined (even if we pick different neighborhoods, the pair coincides). Equivalently, we have f(z)=nkane2πinzf(z) = \sum_{n \geq k} a_ne^{2 \pi i n z} for all zz with large enough Im(z)\operatorname{Im}(z), with an extra value of f()=nkanf(\infty) = \sum_{n \geq k} a_n. To simplify things, we would rather denote (an)nZ(a_n)_{n \in \mathbb{Z}} but keep in mind that it is always zero when the index is negative enough.

Definition. For a meromorphic modular form ff that is also meromorphic at infinity, there exists a unique sequence (an)nZ(a_n)_{n \in \mathbb{Z}} of complex numbers, which takes value zero when the index is negative enough, such that f~(q)=nZanqn\tilde{f}(q) = \sum_{n \in \mathbb{Z}} a_nq^n in a neighborhood of zero but maybe without zero itself. The sequence is called the Fourier expansion of ff at infinity, and the sequence is called the Fourier coefficients.

Definition. We write Mk\mathcal{M}_k for the set of holomorphic modular forms of weight kk, which is also holomorphic at infinity, and we write Sk\mathcal{S}_k for the subset of those forms that take value 00 at infinity.

Also observe that the Fourier coefficients starts at n0n \geq 0 for forms in Mk\mathcal{M}_k and starts at n1n \geq 1 for forms in Sk\mathcal{S}_k.

Now that we’ve defined Mk\mathcal{M}_k and Sk\mathcal{S}_k, what do you see? This is where things get interesting.

Proposition 3. For f,hMkf, h \in \mathcal{M}_k and λC\lambda \in \mathbb{C}, we have f+hMkf+h \in \mathcal{M}_k and λfMk\lambda f \in \mathcal{M}_k. The result is still true if we replace Mk\mathcal{M}_k by Sk\mathcal{S}_k.

Proof. For g=(abcd)SL2(Z)g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}) and for all zHz \in \mathbb{H},

(f+h)kg(z)=(f+h)(gz)(cz+d)k=(f(gz)+h(gz))(cz+d)k=fkg(z)+hkg(z)=f(z)+h(z)=(f+h)(z).\begin{align*} (f+h)|_k g(z) &= (f+h)(gz)(cz+d)^{-k}\\ &= (f(gz)+h(gz))(cz+d)^{-k}\\ &= f|_kg(z) + h|_kg(z)\\ &= f(z)+h(z)\\ &= (f+h)(z). \end{align*}

Also,

(λf)kg(z)=(λf)(gz)(cz+d)k=λ(fkg)(z)=λf(z).(\lambda f)|_k g(z) = (\lambda f)(gz)(cz+d)^{-k} = \lambda (f|_kg)(z) = \lambda f(z).

This checks that the modularity condition still holds. The fact that they are holomorphic and also holomorphic at infinity is a standard fact from complex analysis. Same thing holds when we replace Mk\mathcal{M}_k by Sk\mathcal{S}_k. Easy to check that they still attain zero at infinity. \square

This shows that Mk\mathcal{M}_k is a vector space, and Sk\mathcal{S}_k is a subspace.

Soon, we will want to study the shape of Mk\mathcal{M}_k and Sk\mathcal{S}_k. Let us just be curious about these objects, and so we can ask natural questions: Can we compute the dimension of the spaces for each weight? Is there an explicit formula? Furthermore, can we find a basis for them? It turns out that luckily, we can do all of that. But anyway, we need some heavy complex analytic machinery here first.

The Valence Formula

Here I’m going to follow almost exactly the same as in [Serre, Ch. VII, §3, 3.1.], but written in my own wording (and I’ll try to simplify as much as I can).

From complex analysis, for any meromorphic function f≢0f \not\equiv 0 in an open set UU, if we fix aUa \in U then we can find a neighborhood NN of aa inside UU and a Laurent expansion nkan(za)n\sum_{n \geq k} a_n (z-a)^n (with ak0a_k \ne 0) such that the expansion coincides with ff in N{a}N \setminus \{a\}. Such expansion is unique, even if the neighborhood is not, that is, even in a different choice of neighborhood, the expansion is still the same. The number kk is called the order of vanishing of ff at aa, and is denoted by orda(f)\operatorname{ord}_a(f). The residue of ff at aa, denoted by Resz=a(f(z))\operatorname{Res}_{z=a}(f(z)) or Resa(f)\operatorname{Res}_a(f), is defined as a1a_{-1}.

Let us first set up the complex-analytic machinery.

Lemma 4. For a meromorphic function ff defined on an open set UU, and for a point aUa \in U, we have

Resa(ff)=orda(f).\operatorname{Res}_a\left(\frac{f'}{f}\right) = \operatorname{ord}_a(f).

Proof. Write the Laurent expansion of ff at aa as nkan(za)n\sum_{n \geq k} a_n (z-a)^n, ak0a_k \ne 0. Put g=nk+1an(za)ng = \sum_{n \geq k+1} a_n(z-a)^n, then, in a neighborhood of aa except at aa, we have

ff=akk(za)k1+gak(za)k+g=ak(za)kk(za)1+gk(za)1+ggk(za)1ak(za)k+g=k(za)1+ggk(za)1ak(za)k+g.\begin{align*} \frac{f'}{f} &= \frac{a_k k (z-a)^{k-1} + g'}{a_k (z-a)^k + g} \\ &= \frac{a_k(z-a)^kk(z-a)^{-1} + gk(z-a)^{-1}+g'-gk(z-a)^{-1}}{a_k(z-a)^k+g} \\ &= k(z-a)^{-1} + \frac{g'-gk(z-a)^{-1}}{a_k(z-a)^k+g}. \end{align*}

Observe that the second term is

(ggk(za)1)(za)kak+g(za)k\frac{(g'-gk(z-a)^{-1})(z-a)^{-k}}{a_k+g(z-a)^{-k}}

which, after plugging in z=az = a, gives kak+1ak\frac{-k a_{k+1}}{a_k}, i.e., well-defined without a pole, so the function zggk(za)1ak(za)k+gz \mapsto \frac{g'-gk(z-a)^{-1}}{a_k(z-a)^k + g} is holomorphic in that neighborhood, i.e. its Laurent series is Taylor, and so the residue of f/ff'/f must be kk, which is the order of ff at aa. \square

Recall the basic formula from complex analysis (Cauchy Integral Formula): If f ⁣:UCf \colon U \to \mathbb{C} is a holomorphic function on an open set UU, let γ ⁣:S1U\gamma \colon S^1 \to U be a null-homotopic Jordan curve with Jordan interior N\mathcal{N} such that NU\mathcal{N} \subseteq U and with winding number +1+1. Then, for any aNa \in \mathcal{N}, we have

12πiγf(z)zadz=f(a).\frac{1}{2\pi i}\int_\gamma \frac{f(z)}{z-a} \mathrm{d}z =f(a).

Lemma 5. Suppose UCU \subseteq \mathbb{C} is an open set, where f ⁣:UCf \colon U \to \mathbb{C} is holomorphic. We define, for any aU;r>0;α(0,2π];θRa \in U;\, r > 0;\, \alpha \in (0, 2\pi];\, \theta \in \mathbb{R} such that the disk {a+zC ⁣:zr}\{a+z \in \mathbb{C} \colon |z| \leq r\} is contained in UU, the curve

γa,r,α,θ ⁣:{[0,1]Cta+rei(θ+tα).\gamma_{a, r, \alpha, \theta} \colon \begin{cases} [0, 1] &\to \mathbb{C}\\ t &\mapsto a+re^{i (\theta + t \alpha)}. \end{cases}

Then, we have for any aUa \in U

limr0r>0γa,r,α,θf(z)zadz=iαf(a).\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f(z)}{z-a} \mathrm{d}z = i\alpha f(a).

Proof. Since ff is holomorphic at any aUa \in U, we fix aUa \in U then there exists a Taylor expansion in a neighborhood N\mathcal{N} of aa in UU. By definition of a neighborhood, there exists a closed disk of some radius RR centered at aa that lies entirely in N\mathcal{N}. Write f(z)=f(a)+g(z)(za)f(z) = f(a) + g(z)(z-a) with gg holomorphic. This equality holds in N\mathcal{N}. Now, we fix θR\theta \in \mathbb{R}, and we have, for 0<rR0 < r \leq R,

γa,r,α,θf(z)zadz=f(a)γa,r,α,θ1zadz+γa,r,α,θg(z)dz.\int_{\gamma_{a, r, \alpha, \theta}} \frac{f(z)}{z-a} \mathrm{d}z = f(a)\int_{\gamma_{a,r,\alpha,\theta}} \frac{1}{z-a} \mathrm{d}z + \int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z.

For the main part, we have

γa,r,α,θ1zadz=011a+rei(θ+tα)aγa,r,α,θ(t)dt=011rei(θ+tα)iαrei(θ+tα)dt=01iαdt=iα.\begin{align*} \int_{\gamma_{a,r,\alpha,\theta}} \frac{1}{z-a} \mathrm{d}z &= \int_0^1 \frac{1}{a+re^{i(\theta+t\alpha)}-a} \gamma_{a,r,\alpha,\theta}'(t)\mathrm{d}t \\ &= \int_0^1 \frac{1}{re^{i(\theta+t\alpha)}} i\alpha re^{i(\theta+t\alpha)} \mathrm{d}t \\ &= \int_0^1 i\alpha \mathrm{d}t = i\alpha. \end{align*}

For the error part, observe that

γa,r,α,θg(z)dz=01g(a+rei(θ+tα))iαrei(θ+tα)dt\int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z = \int_0^1 g(a+re^{i(\theta+t\alpha)}) i\alpha re^{i(\theta+t\alpha)} \mathrm{d}t

and consider the compact disk {a+z ⁣:zR}N\{a+z \colon |z| \leq R\} \subseteq \mathcal{N}. Its image under zg(z)z \mapsto |g(z)| must also be compact, so we can bound g(a+rei(θ+tα))C|g(a+re^{i(\theta+t\alpha)})| \leq C for some constant CC not depending on rr. This means

γa,r,α,θg(z)dz01g(a+rei(θ+tα))iαrei(θ+tα)dtαr01Cdt=αrC.\begin{align*} \left|\int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z\right| &\leq \int_0^1 |g(a+re^{i(\theta+t\alpha)})| |i\alpha r e^{i(\theta+t\alpha)}| \mathrm{d}t \\ &\leq \alpha r \int_0^1 C \mathrm{d}t = \alpha r C. \end{align*}

Taking limit r>00r \xrightarrow{> 0} 0 yields the result. \square

Now we bring another tool from the basic complex analysis toolbox. (Cauchy’s Argument Principle). For a meromorphic function ff meromorphic on an open set UCU \subseteq \mathbb{C}, if γ ⁣:S1C\gamma \colon S^1 \to \mathbb{C} is a null-homotopic Jordan curve with winding number +1+1 and with Jordan interior NU\mathcal{N} \subseteq U, then

γff=2πizNordz(f).\int_\gamma \frac{f'}{f} = 2\pi i \sum_{z \in \mathcal{N}} \operatorname{ord}_z(f).

Corollary 6. For a meromorphic function ff meromorphic on an open set UCU \subseteq \mathbb{C}, for any aUa \in U, reusing the notation of curves from Lemma 5., we have

limr0r>0γa,r,α,θff=iαorda(f).\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f'}{f} = i \alpha \operatorname{ord}_a(f).

Proof. Recall from Lemma 4. that in some neighborhood N\mathcal{N} of aUa \in U, we have that f/forda(f)za+gf'/f \equiv \frac{\operatorname{ord}_a(f)}{z-a} + g is the Laurent expansion at aa, with gg holomorphic. Apply Lemma 5 on zorda(f)+g(z)(za)z \mapsto \operatorname{ord}_a(f) + g(z)(z-a), which is holomorphic in a (maybe another) neighborhood N\mathcal{N}' of aa, to see that

limr0r>0γa,r,α,θorda(f)+g(z)(za)za=iαorda(f).\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a,r,\alpha,\theta}} \frac{\operatorname{ord}_a(f)+g(z)(z-a)}{z-a} = i\alpha \operatorname{ord}_a(f).

Now observe that in NN\mathcal{N} \cap \mathcal{N}', the meromorphic equality fforda(f)za+g\frac{f'}{f} \equiv \frac{\operatorname{ord}_a(f)}{z-a} +g is actually defined, i.e., that f(z)f(z)=orda(f)za+g(z)\frac{f'(z)}{f(z)} = \frac{\operatorname{ord}_a(f)}{z-a} + g(z) for all zNNz \in \mathcal{N} \cap \mathcal{N}'. This shows that

limr0r>0γa,r,α,θff=iαorda(f)\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f'}{f} = i\alpha \operatorname{ord}_a(f)

and hence completes the proof. \square

Now our complex analysis toolbox is complete, and we’re ready to proceed to our case of meromorphic modular forms of weight kk. Define ord(f)\operatorname{ord}_\infty(f) to be ord0(f~)\operatorname{ord}_0(\tilde{f}).

Theorem 7. Let f≢0f \not\equiv 0 be a meromorphic modular form of weight kk. One has

ord(f)+12ordi(f)+13ordexp(2iπ/3)(f)+zD{i,e2iπ/3,e2iπ/6}/SL2(Z)ordz(f)=k12.\operatorname{ord}_\infty(f) + \frac{1}{2}\operatorname{ord}_i(f) + \frac{1}{3}\operatorname{ord}_{\exp(2i\pi/3)}(f) + \sum_{z \in \mathcal{D} \setminus \{i,e^{2i\pi/3}, e^{2i\pi/6}\}/\operatorname{SL}_2(\mathbb{Z})} \operatorname{ord}_z(f) = \frac{k}{12}.

Proof. First, we observe that there are finitely many zeros and poles. Indeed, since f~\tilde{f} is meromorphic at zero, there exists a neighborhood (and hence we can assume to be a smaller disk) U={zC ⁣:0<z<r}U = \{z \in \mathbb{C} \colon 0 < |z| < r\} such that f~\tilde{f} has neither zeros nor poles in UU. This means ff has neither zeros nor poles when 0<e2πiz<r0 < |e^{2\pi i z}| < r. Writing z=x+iyz = x+iy, this is equivalent to 0<e2πi(x+iy)<r0 < |e^{2\pi i (x+iy)}| < r, but e2πi(x+iy)=e2πixe2πye^{2\pi i(x+iy)} = e^{2\pi i x} e^{-2\pi y} so this is equivalent to 2πy<logr-2\pi y < \log r, which is y>logr2π=log(1/r)2πy > \frac{-\log r}{2\pi} = \frac{\log(1/r)}{2\pi}. So zeros and poles of ff must have imaginary part no more than log(1/r)2π\frac{\log (1/r)}{2\pi}. Now, the set Dr:=D{x+iyH ⁣:ylog(1/r)2π}\mathcal{D}_r := \mathcal{D} \cap \{x+iy \in \mathbb{H} \colon y \leq \frac{\log(1/r)}{2\pi}\} is compact, and so there can be only finitely many zeros and poles.

Now we set up an integral over the contour Dr\mathcal{D}_r.

image.png

Let ΓR\Gamma_R denote the boundary curve of Dr\mathcal{D}_r minus small disks of radius RR around e2πi/6,i,e2πi/3e^{2\pi i/6}, i, e^{2\pi i/3}, going counterclockwise, with any parametrization with nonzero speed. Suppose first that there is no zero and no pole of ff on the boundary curve ΓR\Gamma_R. We will take care of this later. We have, by the Cauchy’s argument principle, that

ΓRff=2πizDrordz(f).\int_{\Gamma_R} \frac{f'}{f} = 2\pi i \sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f).

Now, we try to do the same integral but from another point of view. First, the segment EA\mathrm{EA} gets mapped by the conformal equivalence to a circle CC in the unit disk. Let DD be the disk such that its boundary is CC. This means

EAf(z)f(z)dz=AE2πif~(e2πiz)f~(e2πiz)dz=C2πif~(q)f(q)dq2πi=2πiqDordq(f~)=2πiord0(f~)=2πiord(f).\begin{align*} \int_{\mathrm{EA}} \frac{f'(z)}{f(z)} \mathrm{d}z &= -\int_{\mathrm{AE}} \frac{2 \pi i\tilde{f}'(e^{2\pi i z})}{\tilde{f}(e^{2\pi i z})} \mathrm{d}z \\ &= -\int_C \frac{2\pi i\tilde{f}(q)}{f(q)} \frac{\mathrm{d}q}{2\pi i}\\ &= -2\pi i \sum_{q \in D^\circ} \operatorname{ord}_q(\tilde{f})\\ &= -2\pi i \operatorname{ord}_0(\tilde{f})\\ &= -2\pi i\operatorname{ord}_\infty(f). \end{align*}

Observe that f~(e2πiz)=f(z)\tilde{f}(e^{2\pi iz}) = f(z), so 2πif~(e2πiz)=f(z)2\pi i \tilde{f}'(e^{2\pi iz}) = f'(z). The equality before the last comes from the fact that we’ve already assumed that there are no zeros or poles of ff in the higher region of our favorite strip, and so this means there are no zeros or poles in D{0}D^\circ \setminus \{0\}.

Next, we have

ABf(z)f(z)dz+DEf(z)f(z)dz=ABf(z)f(z)dz+DEf(z1)f(z1)dz=ABf(z)f(z)dz+BAf(z)f(z)dz=0.\begin{align*} &\int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{DE}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= \int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{DE}} \frac{f'(z-1)}{f(z-1)} \mathrm{d}z \\ &= \int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{BA}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= 0. \end{align*}

The fact that f(z)=f(z1)f(z) = f(z-1) comes from the modularity of ff.

Now, we have

BCf(z)f(z)dz+CDf(z)f(z)dz=BCf(z)f(z)dz+CDf(1/z)zk2kzk1f(1/z)f(1/z)zkdz\begin{align*} &\int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= \int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(-1/z)z^{-k-2} - kz^{-k-1}f(-1/z)}{f(-1/z)z^{-k}} \mathrm{d}z \end{align*}

since the fact that f(1/z)=zkf(z)f(-1/z) = z^kf(z) tells us that f(1/z)z2=kzk1f(z)+zkf(z)f'(-1/z) z^{-2} = kz^{k-1} f(z) + z^k f'(z) and so f(z)=f(1/z)zk2kz1f(z)=f(1/z)zk2kzk1f(1/z)f'(z) = f'(-1/z)z^{-k-2} - kz^{-1}f(z) = f'(-1/z)z^{-k-2} - kz^{-k-1}f(-1/z). Simplifying this, we have

BCff+CDff=BCf(z)f(z)dz+CDf(1/z)f(1/z)z2dz+CDkz1dz.\int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} = \int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(-1/z)}{f(-1/z)}z^{-2} \mathrm{d}z + \int_{\mathrm{C'D}} -kz^{-1} \mathrm{d}z.

Putting u=1/zu = -1/z, we have du=z2dz\mathrm{d}u = z^{-2}\mathrm{d}z, and the segment CD\mathrm{C'D} gets mapped into CB\mathrm{CB'}. This means

BCff+CDff=BCff+CBff+CDkz1dz=CDkz1dz.\int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} = \int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{CB'}} \frac{f'}{f} + \int_{\mathrm{C'D}} -kz^{-1}\mathrm{dz} = \int_{\mathrm{C'D}}-kz^{-1}\mathrm{d}z.

Now we’re left with taking care of the removed disks. For BB\mathrm{BB'}, we have

BBff=γ2πi/3,R,αR,θRff\int_{\mathrm{BB'}} \frac{f'}{f} = - \int_{\gamma_{2\pi i/3, R, \alpha_R, \theta_R}} \frac{f'}{f}

where αR\alpha_R is the arc angle of BB\mathrm{BB'}, and θR\theta_R is the offset angle. Observe that as RR tends to zero, αR\alpha_R tends to π3\frac{\pi}{3}. The same argument applies for DD\mathrm{DD'}. Same for CC\mathrm{CC'} but the angle tends to π\pi instead.

Summing up all the integrals, we see that

2πiord(f)=ΓRff=ABff+BBff+BCff+CCff+CDff+DDff+DEff+EAff=BBff+CCff+DDff+CDkz1dz\begin{align*} 2\pi i \operatorname{ord}_\infty(f) &= \int_{\Gamma_R} \frac{f'}{f} \\ &= \int_{\mathrm{AB}} \frac{f'}{f} + \int_{\mathrm{BB'}} \frac{f'}{f} + \int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{CC'}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} + \int_{\mathrm{DD'}} \frac{f'}{f} + \int_{\mathrm{D'E}} \frac{f'}{f} + \int_{\mathrm{EA}} \frac{f'}{f}\\ &= \int_{\mathrm{BB'}} \frac{f'}{f} + \int_{\mathrm{CC'}} \frac{f'}{f} + \int_{\mathrm{DD'}} \frac{f'}{f} + \int_{\mathrm{C'D}} -kz^{-1}\mathrm{d}z \end{align*}

Taking the limit as RR tends to zero, and applying Corollary 6 (and Lemma 5. for the last integral)., we have

2πizDrordz(f)=2πiord(f)iπ3orde2iπ/3(f)iπ3orde2iπ/6(f)iπordi(f)+iπ6k.2 \pi i\sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f) = -2 \pi i \operatorname{ord}_\infty(f) -i \frac{\pi}{3} \operatorname{ord}_{e^{2i\pi/3}}(f) - i\frac{\pi}{3} \operatorname{ord}_{e^{2i\pi/6}}(f) - i\pi\operatorname{ord}_i(f) + i \frac{\pi}{6} k.

We’ve seen that e2iπ/3e^{2i\pi/3} and e2iπ/6e^{2i\pi/6} are congruent modulo SL2(Z)\operatorname{SL}_2(\mathbb{Z}), so, we have

2ord(f)+2orde2π/3(f)3+ordi(f)+2zDrordz(f)=k6,2\operatorname{ord}_\infty(f) + \frac{2\operatorname{ord}_{e^{2\pi/3}}(f)}{3} + \operatorname{ord}_i(f) + 2\sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f) =\frac{k}{6},

or, finally,

k12=12ordi(f)+13orde2iπ/3(f)+ord(f)+zDrordz(f).\frac{k}{12} = \frac{1}{2}\operatorname{ord}_i(f) + \frac{1}{3}\operatorname{ord}_{e^{2i\pi/3}}(f) + \operatorname{ord}_\infty(f) + \sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f).

Now we’re left with handling the case where there exists some zeros or poles on Γ0{e2iπ/3,e2iπ/6,i}\Gamma_0 \setminus \{e^{2i\pi/3}, e^{2i\pi/6}, i\}. Note that we don’t have to worry in the process of taking R0R \to 0 because zeros and poles are isolated, and so the limit still works (we can just skip those RR’s). In this limiting scenario, B\mathrm{B} and B\mathrm{B'} coincide, same for (C,C)(\mathrm{C}, \mathrm{C'}) and (D,D)(\mathrm{D}, \mathrm{D'}). If zABz \in\mathrm{AB} is a zero or a pole, then by modularity, z+1DEz+1 \in \mathrm{DE} would also be the same. We can edit ΓR\Gamma_R by perturbing it by a disk of a very small radius.

image 1.png

In this way, the integral would stay the same, but the sum of orders will count this point once. In order to avoid problems of double counting, we denote by zDr/SL2(Z)ordz(f)\sum_{z \in \mathcal{D}_r/\operatorname{SL}_2(\mathbb{Z})} \operatorname{ord}_z(f) instead of simply summing over the whole contour (or just the interior). Now we’re left with dealing with zeros and poles on the arc BCD{i,e2πi/3,e2πi/6}\mathrm{BCD} \setminus \{i, e^{2\pi i/3}, e^{2 \pi i/6}\}. We apply the same trick, but when we see any problem point zz now we will also see the congruent one at 1/z-1/z. We can just put two disks, remove one from the contour and indent one to the contour. This completes the proof. \square

First Examples of Modular Forms: Eisenstein Series and Poincaré Series

When I was writing everything before this, I thought I could skip these explicit examples and then move on to our special case, but now I changed my mind and I think that it’s a nicer way to approach the subject by introducing these explicit examples.

The thing is, even if I said it’s “explicit”, it’s not that explicit. I’ll begin by showing the idea and then move on to the definition.

Let’s play a little game here. If I have a grid with nn rows and mm columns, in each cell there exists at most one object, and I have some function ff that has its value depending on the multiset of objects in each row (not depending on order). What happens now? If we permute the columns of the grid, then the multiset of objects in each row doesn’t change, and so the value of this grid under ff stays the same, right? Now if we want to construct a function that has its value preserved under permutations of rows and permutations of columns, how would we construct? We would define

f~(grid):=σ permuting rowsf(σgrid).\tilde{f}(\text{grid}) := \sum_{\sigma \text{ permuting rows}} f(\sigma \cdot \text{grid}).

And now, for any permutation σ\sigma of rows, of columns, or composition of both, we have f~(σgrid)=f~(grid)\tilde{f}(\sigma \cdot \text{grid}) = \tilde{f}(\text{grid}).

Now I’ll take this to the algebraic setting and make it more precise.

Lemma 8. For a group GG, a subgroup HH of GG, a set XX in which GG acts on XX, and a function f ⁣:XCf \colon X \to \mathbb{C} which is HH-invariant (i.e. f(hx)=f(x)f(hx) = f(x) for all hHh \in H and xXx \in X), we can define

f~(x):=gˉH\Gf(gx).\tilde{f}(x) := \sum_{\bar{g} \in H \backslash G} f(gx).

If the sum makes sense, i.e., if it converges absolutely, then f~(gx)=f~(x)\tilde{f}(gx) = \tilde{f}(x) for all gGg \in G and xXx \in X.

Proof. First, f~\tilde{f} is well-defined, because for two elements g1g_1 and g2g_2 in GG corresponding to the same coset, we have Hg1=Hg2Hg_1 = Hg_2, i.e., there exists hHh \in H with h=g2g11h = g_2 g_1^{-1}. This means f(g2x)=f(g2g11g1x)=f(hg1x)f(g_2x) = f(g_2g_1^{-1}g_1x) = f(hg_1x). But ff is HH-invariant, so this is actually f(g1x)f(g_1x). This means one can take any element in the same coset inside the sum. The condition of absolute convergence implies that we can commute the sum however we want.

Now, pick any gGg' \in G. We have

f~(gx)=gˉH\Gf(ggx).\tilde{f}(g'x) = \sum_{\bar{g} \in H \backslash G} f(gg'x).

But Φ ⁣:HgHgg\Phi \colon Hg \mapsto Hgg' only permutes the cosets in H\GH \backslash G. Indeed, for any coset Hg0Hg_0 we can pick g=g0g1g = g_0g'^{-1} then Φ(Hg)=Hgg=Hg0g1g=Hg0\Phi(Hg) = Hgg' = Hg_0g^{-1}g' = Hg_0, so Φ\Phi is surjective. Now if Hg1g=Hg2gHg_1g' = Hg_2g', then g2gg1g11Hg_2g'g'^{-1}g_1^{-1} \in H means there exists h=g2gg1g11h = g_2g'g'^{-1}g_1^{-1} and so hg1=g2hg_1 = g_2, i.e., Hg1=Hg2Hg_1 = Hg_2, so Φ\Phi is injective. This means the sum inside can be rewritten as the original form, and so f~(gx)=f~(x)\tilde{f}(g'x) = \tilde{f}(x). \square

We will adapt this idea into our settings of modular forms. Let G:=SL2(Z)G := \operatorname{SL}_2(\mathbb{Z}), H:=±TH := \langle \pm T \rangle, and X:=HX := \mathbb{H}. Then there exists a very simple function that is ±T\pm T-invariant (i.e. periodic). Fix mZm \in \mathbb{Z} and define

fm(z):=e2πimz.f_m(z) := e^{2\pi i m z}.

This is periodic. Now we can do something similar to the lemma above, but adapt things a bit to make it satisfy the modularity condition. Define

Pm,k(z):=[(abcd)]±T\SL2(Z)fm(az+bcz+d)(cz+d)k.P_{m,k}(z):= \sum_{\left[ \begin{pmatrix}a & b\\c & d\end{pmatrix}\right] \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m\left(\frac{az+b}{cz+d}\right)(cz+d)^{-k}.

(To make it easier, we write gg and gg' for (abcd)\begin{pmatrix}a & b\\c & d\end{pmatrix} and (abcd)\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} next.)

We have, for any gSL2(Z)g' \in \operatorname{SL}_2(\mathbb{Z}), that

Pm,k(gz)=gˉ±T\SL2(Z)fm(ggz)(cgz+d)k=gˉ±T\SL2(Z)fm(ggz)(caz+bcz+d+d)k=gˉ±T\SL2(Z)fm(ggz)((ca+dc)z+cb+ddcz+d)k=(cz+d)kg±T\SL2(Z)fm(gz)(cz+d)k=(cz+d)kPm,k(z),\begin{align*} P_{m,k}(g'z) &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)(cg'z+d)^{-k} \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)\left(c\frac{a'z+b'}{c'z+d'} + d\right)^{-k} \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)\left(\frac{(ca'+dc')z + cb'+d'd}{c'z+d'}\right)^{-k} \\ &= (c'z+d')^{k} \sum_{\overline{g''} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(g''z)(c''z+d'')^{-k} \\ &= (c'z+d')^k P_{m,k}(z), \end{align*}

where g=ggg'' = gg', with g=(abcd)g'' =\begin{pmatrix}a'' & b''\\c'' & d''\end{pmatrix}. This Pm,kP_{m,k} is called the Poincaré series, and for m=0m = 0 it is called the Eisenstein series, denoted by Ek:=P0,kE_k := P_{0,k}. We’ve showed that if they are well-defined, then they are modular forms of weight kk. We’re now left with showing that they are well-defined, i.e., that the series converges absolutely.

Proposition 9. For even k4k \geq 4 and m0m \geq 0, the series Pm,kP_{m,k} converges locally uniformly absolutely for all zHz \in \mathbb{H}.

Proof. Let us observe the absolute form:

gˉ±T\SL2(Z)fm(gz)(cz+d)k=gˉ±T\SL2(Z)e2πimgzcz+dk.\begin{align*} &\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |f_m(gz)(cz+d)^{-k}| \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |e^{2 \pi i m gz}| |cz+d|^{-k}. \end{align*}

We know that gzHgz \in \mathbb{H} for all gSL2(Z)g \in \operatorname{SL}_2(\mathbb{Z}), so we may write gz=x+iygz = x+iy with y>0y > 0, and observe that e2πimgz=e2πim(x+iy)=e2πimx2πmye^{2\pi i m gz} = e^{2 \pi i m (x+iy)} = e^{2 \pi i m x - 2 \pi m y}, and so its absolute value (i.e. complex modulus) is e2πmye^{-2\pi m y}, which is between zero and one. This means

gˉ±T\SL2(Z)e2πimgzcz+dkgˉ±T\SL2(Z)cz+dk.\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |e^{2 \pi i m gz}| |cz+d|^{-k} \leq \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}.

Let us now show the convergence for gˉT\SL2(Z)cz+dk\sum_{\bar{g} \in \langle T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}. We claim that for each (c,d)Z2(c, d) \in \mathbb{Z}^2 with gcd(c,d)=1\gcd(c,d)=1, there exists (a,b)Z2(a, b) \in \mathbb{Z}^2 such that (abcd)SL2(Z)\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}). Furthermore, if there are multiple candidates, then they belong to the same coset. To show this, fix (c,d)Z2(c, d) \in \mathbb{Z}^2 with gcd(c,d)=1\gcd(c,d)=1. By Bézout’s identity from elementary number theory, there exists an integer solution (a0,b0)Z2(a_0, b_0) \in \mathbb{Z}^2 satisfying a0db0c=1a_0d-b_0c = 1. This is a linear Diophantine equation, and so all solutions are described by (at,bt)(a_t, b_t), defined by at=a0+cta_t = a_0+ct and bt=b0+dtb_t = b_0+dt for tZt \in \mathbb{Z}. Now we’re left with showing that (a0b0cd)\begin{pmatrix}a_0 & b_0\\c & d\end{pmatrix} and (atbtcd)\begin{pmatrix}a_t & b_t\\c & d\end{pmatrix} belong to the same coset for all tZt \in \mathbb{Z}. Indeed, we have

(atbtcd)(a0b0cd)1=(atbtcd)(db0ca0)=(atdbtcb0at+bta0cddccb0+da0)=(1b0at+bta001)±T.\begin{align*} \begin{pmatrix}a_t & b_t\\c & d\end{pmatrix}\begin{pmatrix}a_0 & b_0\\c & d\end{pmatrix}^{-1} &= \begin{pmatrix}a_t & b_t\\c & d\end{pmatrix}\begin{pmatrix}d & -b_0\\-c & a_0\end{pmatrix} \\ &= \begin{pmatrix}a_td-b_tc &-b_0a_t + b_ta_0\\cd-dc & -cb_0+da_0\end{pmatrix}\\ &= \begin{pmatrix}1 & -b_0a_t + b_ta_0\\0 & 1\end{pmatrix} \in \langle \pm T \rangle. \end{align*}

This shows that they belong to the same coset. Now, also observe that elements from the same coset have the same (c,d)(c, d). This means what we were doing is essentially defining a projection

π ⁣:±T\SL2(Z){(c,d)Z2 ⁣:gcd(c,d)=1},\pi \colon \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z}) \to \{(c, d) \in \mathbb{Z}^2 \colon \gcd(c,d) = 1\},

sending (abcd)\begin{pmatrix}a & b\\c & d\end{pmatrix} to (c,d)(c, d), and showed that it is well-defined, injective, and surjective, hence a bijection. Now we go back to the sum and see that

gˉ±T\SL2(Z)cz+dk=(c,d)Z2gcd(c,d)=1cz+dk(c,d)Z2{(0,0)}cz+dk.\begin{align*} &\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}\\ &= \sum_{\substack{(c,d) \in \mathbb{Z}^2\\ \gcd(c,d)=1}} |cz+d|^{-k}\\ &\leq \sum_{(c,d) \in \mathbb{Z}^2 \setminus\{(0,0)\}} |cz+d|^{-k}. \end{align*}

Now we recall some basic facts from analysis. In R2\mathbb{R}^2, all norms are equal. Observe that (x,y)xz+y(x, y) \mapsto |xz+y| is a norm, so there exists α(z),β(z)>0\alpha(z), \beta(z) > 0 such that for all (x,y)R2(x, y) \in \mathbb{R}^2, α(z)xz+ymax(x,y)β(z)xz+y\alpha(z) |xz+y| \leq \max(|x|, |y|) \leq \beta(z) |xz+y|. This means

(c,d)Z2{(0,0)}cz+dk(c,d)Z2{(0,0)}α(z)kmax(c,d)kα(z)kr1(8r)rk\begin{align*} \sum_{(c,d) \in \mathbb{Z}^2 \setminus \{(0,0)\}} |cz+d|^{-k} &\leq \sum_{(c,d) \in \mathbb{Z}^2 \setminus \{(0,0)\}} \alpha(z)^k\max(|c|, |d|)^{-k} \\ &\leq \alpha(z)^k \sum_{r \geq 1} (8r)r^{-k} \end{align*}

where the last inequality follows from counting integer lattice points (c,d)(c,d) that has max(c,d)=r\max(|c|, |d|) = r. (They are integer lattice points on the square of side length 2r+12r+1. Remove one from each side to avoid double counting, then multiply by four to count them all.) Observe that for k>2k > 2, this converges, so in particular, for k4k \geq 4, the Poincaré series converges absolutely for fixed zz. Now, let us show that it converges locally uniformly absolutely. We can actually define 1α(z):=sup(x,y)(0,0)xz+ymax(x,y)=supmax(x,y)=1xz+y\frac{1}{\alpha(z)} := \sup_{(x,y) \ne (0, 0)} \frac{|xz+y|}{\max(|x|, |y|)} = \sup_{\max(|x|, |y|) = 1} |xz+y|. This is continuous, so for fixed zz, one can pick a small compact neighborhood KK of zz and then the value of α(z)\alpha(z') for zKz' \in K will be bounded away from zero and infinity. This proves that the convergence is locally uniform. \square

Now that Pm,kP_{m, k} is locally uniformly absolutely convergent, for any m0m \geq 0 and k3k \geq 3, we observe that each piece zfm(z)(cz+d)kz \mapsto f_m(z)(cz+d)^{-k} is a holomorphic function, and so the sum over all pieces in a locally uniformly absolutely convergent series must also be holomorphic. This means the following.

Corollary 10. For any m1m \geq 1 and even k4k \geq 4, we have Pm,kSkP_{m,k} \in \mathcal{S}_k, and for m=0,P0,k=EkMkSkm = 0, P_{0, k} = E_k \in \mathcal{M}_k \setminus \mathcal{S}_k.

Proof. We’ve shown the holomorphicity in H\mathbb{H}, and that they satisfy the modularity condition. Now we’re left with studying their behavior at infinity. We have, for m1m \geq 1, that

limyPm,k(x+iy)=limygˉ±T\SL2(Z)e2πim(x+iy)(cz+d)k=0.\lim_{y \to \infty} P_{m,k}(x+iy) = \lim_{y \to \infty} \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} e^{2 \pi i m (x+iy)}(cz+d)^{-k} = 0.

In particular, it is holomorphic and vanishes, so Pm,kSkP_{m,k} \in \mathcal{S}_k. However, for P0,k=EkP_{0,k} = E_k, we have

limyEk(x+iy)=limygˉ±T\SL2(Z)(cx+icy+d)k.\lim_{y \to \infty} E_k(x+iy) = \lim_{y \to \infty} \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cx+icy+d)^{-k}.

For the trivial coset, we have (c,d)=(0,1)(c,d) = (0, 1), and so the term inside is one. For the rest, the terms inside tends to zero, and so we have Ek()=1E_k(\infty) = 1, so EkMkSkE_k \in \mathcal{M}_k \setminus \mathcal{S}_k. \square

It seems like we’ve constructed a huge family of modular forms, namely the Poincaré series. However, this is a bit far from obvious, but a lot of them turn out to be zero, so we get nothing new. Anyway, the Eisenstein series are surely not zero, since they extends to one at infinity.

The Space of Holomorphic Modular Forms

Now we’re ready to study Mk\mathcal{M}_k and Sk\mathcal{S}_k. Recall that Mk\mathcal{M}_k is the space of holomorphic modular forms that are also holomorphic at infinity. This means, if fMkf \in \mathcal{M}_k, then ordz(f)0\operatorname{ord}_z(f) \geq 0 for all zH{}z \in \mathbb{H} \cup \{\infty\}. Before we continue, observe that if ff is a modular form, we have

f(1/z)=(z)kf(z),f(-1/z) = (-z)^kf(z),

and so

f(z)=f(1/(1/z))=((1/z))kf(1/z)=zk(z)kf(z)=(1)kf(z).f(z) = f(-1/(-1/z)) = (-(-1/z))^kf(-1/z) = z^{-k}(-z)^k f(z) = (-1)^kf(z).

So if kk is odd, then ff must be identically zero! This tells us that Mk=0\mathcal{M}_k = 0 for kk odd, and so from now on we will only consider even kk.

Now, we recall the valence formula from Theorem 7. Suppose f≢0f \not\equiv 0, then

ord(f)+12ordi(f)+13ordexp(2iπ/3)(f)+zD{i,e2iπ/3,e2iπ/6}/SL2(Z)ordz(f)=k12.\operatorname{ord}_\infty(f) + \frac{1}{2}\operatorname{ord}_i(f) + \frac{1}{3}\operatorname{ord}_{\exp(2i\pi/3)}(f) + \sum_{z \in \mathcal{D} \setminus \{i,e^{2i\pi/3}, e^{2i\pi/6}\}/\operatorname{SL}_2(\mathbb{Z})} \operatorname{ord}_z(f) = \frac{k}{12}.

The idea here is to try to solve the equation (solve for ordz(f)\operatorname{ord}_z(f)) given by the valence formula.

Let us begin with M0\mathcal{M}_0. For k=0k = 0, the right hand side is zero, and we knew that the orders of a holomorphic modular form that is also holomorphic at infinity must be nonnegative, so the only possible outcome is the the order of ff is everywhere (including at infinity) is zero! This means f|f| is bounded away from zero and infinity. Let f(z)=f~(e2πiz)=n0ane2πinzf(z) =\tilde{f}(e^{2\pi i z}) = \sum_{n \geq 0} a_n e^{2\pi i n z} be its Fourier expansion at infinity. Recall the fundamental domain D:={zH ⁣:z1;Re(z)12}\mathcal{D} := \{z \in \mathbb{H} \colon |z| \geq 1; |\operatorname{Re}(z)| \leq \frac{1}{2}\}. We thicken the fundamental domain by a small margin ε>0\varepsilon > 0 (to make it concrete one can just pick ε:=1100\varepsilon := \frac{1}{100}) and define Dε:={zH ⁣:dist(z,D)<ε}\mathcal{D}_\varepsilon := \{z \in \mathbb{H} \colon \operatorname{dist}(z, \mathcal{D}) < \varepsilon\}. We push Dε\mathcal{D}_\varepsilon into the conformal equivalence Φ\Phi and send it into an open subset of the disk. This misses the point at infinity (zero of the disk), so we complete it and define Dεˉ:=Φ(Dε){0}\bar{\mathcal{D}'_\varepsilon} := \Phi(\mathcal{D}_\varepsilon) \cup \{0\}. This is very similar to an open disk (except maybe the boundary isn’t a perfectly round circle) inside the unit disk. We also send the original fundamental domain (with completion) to the unit disk as Dˉ:=Φ(D){0}\bar{\mathcal{D}'} := \Phi(\mathcal{D}) \cup \{0\}. Now this is a compact disk inside the unit disk, and DˉDεˉ\bar{\mathcal{D}'} \subseteq \bar{\mathcal{D}'_\varepsilon}. By the maximum modulus principle, if f~\tilde{f} is not constant, there cannot be a local maximum of f~\tilde{f} inside Dεˉ\bar{\mathcal{D}'_\varepsilon}. But since ff is modular, the image of ff under H{}\mathbb{H} \cup \{\infty\} is already presented by the image of ff under D{}\mathcal{D} \cup \{\infty\}, which is the image of f~\tilde{f} under Dˉ\bar{\mathcal{D}'}, so a global maximum of ff (including maybe at infinity) must correspond to a local maximum of f~\tilde{f} inside Dεˉ\bar{\mathcal{D}'_\varepsilon}. It turns out that this is not possible. Therefore, f~\tilde{f} must be constant on Dεˉ\bar{\mathcal{D}'_\varepsilon}. By modularity, it is constant on the whole unit disk, and so ff is constant. This means M0={f ⁣:HC ⁣:f(z)=k,z}\mathcal{M}_0 = \{f \colon \mathbb{H} \to \mathbb{C} \colon f(z) = k, \forall z\} is the space of constant functions.

For M2\mathcal{M}_2, we put k=2k = 2 into the valence formula and observe that there is no solution to the equation. Putting any point to have order greater than zero gives too high value on the left hand side, and there cannot be points with negative order, so M2={0}\mathcal{M}_2 = \{0\} is the zero vector space.

Before moving on to other weights, let me clarify that, in Mk\mathcal{M}_k for a fixed kk, two nonzero modular forms are linearly dependent (i.e. one is a multiple of another) if and only if their orders (at every orbit) match. Why? First, if f=λgMk{0}f = \lambda g \in \mathcal{M}_k \setminus \{0\} for some λC×\lambda \in \mathbb{C}^\times then ordz(f)=ordz(λg)=ordz(λ)+ordz(g)=0+ordz(g)\operatorname{ord}_z(f) = \operatorname{ord}_z(\lambda g) = \operatorname{ord}_z(\lambda) + \operatorname{ord}_z(g) = 0 + \operatorname{ord}_z(g). So the orders match everywhere. Now, if we know that f,gMk{0}f, g \in \mathcal{M}_k \setminus \{0\} are two modular forms that the orders match everywhere, how can we deduce that they’re linearly dependent? Well, put h=f/gh = f/g. This is a meromorphic modular form of weight zero, and for any zH{}z \in \mathbb{H} \cup \{\infty\}, we have ordz(h)=ordz(f/g)=ordz(f)ordz(g)=0\operatorname{ord}_z(h) = \operatorname{ord}_z(f/g) = \operatorname{ord}_z(f) - \operatorname{ord}_z(g) = 0, since the orders match at every point. This means hh is holomorphic everywhere, including at infinity, so hM0{0}h \in \mathcal{M}_0 \setminus \{0\}, and we’ve already deduced that M0\mathcal{M}_0 are just constant functions, so we have h=f/g=λh = f/g = \lambda for some λC×\lambda \in \mathbb{C}^\times, i.e., that f=λgf = \lambda g. The case for which one of them (or both) is identically zero is uninteresting and doesn’t give much sense to the “order” at a point.

For M4\mathcal{M}_4, put k=4k = 4 into the valence formula, and observe that in order to make the sum of orders to be exactly 13\frac{1}{3}, one order must come from the point exp(2iπ/3)\exp(2i\pi/3), and nothing else (because everything else contributes strictly greater than 13\frac{1}{3} to the equation). Recall that E4M4{0}E_4 \in \mathcal{M}_4 \setminus \{0\}. This means any nonzero modular form in M4\mathcal{M}_4 must be a multiple of E4E_4, so (E4)(E_4) forms a basis of M4\mathcal{M}_4.

For M6\mathcal{M}_6, we repeat the same game, and deduce that the only possible scenario is when one order comes from the point ii. Recall that E6M6{0}E_6 \in \mathcal{M}_6 \setminus \{0\} and so it forms a basis.

For M8\mathcal{M}_8, we repeat the same game, and deduce that the only possible way is that exp(2iπ/3)\exp(2i\pi/3) has double order. Recall that E8M8{0}E_8 \in \mathcal{M}_8 \setminus \{0\} and so it forms a basis.

For M10\mathcal{M}_{10}, we have 1012=56=13+12.\frac{10}{12} = \frac{5}{6} = \frac{1}{3} + \frac{1}{2}. One order comes from ii, another comes from exp(2iπ/3)\exp(2i\pi/3) and no other possibilities. So E10E_{10} forms a basis.

For M12\mathcal{M}_{12}, this is where things get harder. We have 1212=1\frac{12}{12} = 1. Maybe triple order at exp(2iπ/3)\exp(2i\pi/3)? Maybe double order at ii? Or maybe single order at somewhere else? We’re still not sure, but at least we have E12M12E_{12} \in \mathcal{M}_{12}.

It turns out that we cannot simply repeat the same game and apply it to large weights. However, we can go through this by some other ways around.

Lemma 11. The form Δ:=E43E62\Delta := E_4^3 - E_6^2 is actually in S12\mathcal{S}_{12}. It is not identically zero, and it has no zero and no pole in H\mathbb{H}. The zero at infinity is simple.

Proof. We know that E43,E62M12E_4^3, E_6^2 \in \mathcal{M}_{12}, and so their sum is also in there. We’re left with showing that Δ()=0\Delta(\infty) = 0. Indeed, we’ve seen in Corollary 10. that E4()=E6()=1E_4(\infty) = E_6(\infty) = 1 and so Δ()=0\Delta(\infty) = 0. This shows that it belongs to S12\mathcal{S}_{12}. Now, it is not identically zero, because Δ(i)=E4(i)3E6(i)3=E4(i)30\Delta(i) = E_4(i)^3 - E_6(i)^3 = E_4(i)^3 \ne 0. Now, we see that ord(Δ)1\operatorname{ord}_\infty(\Delta) \geq 1. By the valence formula, it must be equal to one, and the order of everywhere else is zero, hence no zero and no pole anywhere else. Since the order is one, the zero at infinity is simple. \square

Lemma 12. The subspace SkMk\mathcal{S}_k \subseteq \mathcal{M}_k is a vector subspace of dimension dimMk1\dim \mathcal{M}_k -1 or dimMk\dim \mathcal{M}_k, for all k0k \geq 0. Moreover, for even k4k \geq 4, dimSk=dimMk1\dim \mathcal{S}_k = \dim \mathcal{M}_k - 1.

Proof. The subspace Sk\mathcal{S}_k can be defined as the kernel of the linear map ff()f \mapsto f(\infty). By the rank-nullity theorem, dimker+dimim=dimMk\dim \operatorname{ker} + \dim \operatorname{im} = \dim \mathcal{M}_k. The dimension of the image can be zero (if the whole space matches the subspace of cusp forms), or one (since the linear map lands in C\mathbb{C}). Now, we’ve seen from Corollary 10., that for even k4k \geq 4, EkMkSkE_k \in \mathcal{M}_k \setminus \mathcal{S}_k, so necessarily dimSk\dim \mathcal{S}_k cannot be equal to dimMk\dim \mathcal{M}_k. This completes the proof. \square

Lemma 13. The map MkSk+12\mathcal{M}_k \to \mathcal{S}_{k+12} sending ffΔf \mapsto f\Delta is an isomorphism of vector spaces.

Proof. Obviously the map is well-defined and C\mathbb{C}-linear. It is injective: if fΔf\Delta is identically zero, then on H\mathbb{H}, ff must be identically zero (because Δ\Delta is nowhere zero), and so f0f \equiv 0. It is surjective: if gSk+12g \in \mathcal{S}_{k+12}, we can define f=g/Δf = g/\Delta. A priori this is just a meromorphic modular form, but by evaluating the order at each point zz, we see that ordz(f)=ordz(g)ordz(Δ)=ordz(g)0\operatorname{ord}_z(f) = \operatorname{ord}_z(g) - \operatorname{ord}_z(\Delta) = \operatorname{ord}_z(g) \geq 0 for all zHz \in \mathbb{H}. Now, for \infty, we have ord(g)1\operatorname{ord}_\infty(g) \geq 1 since gg is a cusp form. And we knew (Lemma 11) that ord(Δ)=1\operatorname{ord}_\infty(\Delta) = 1, so ord(f)=ord(g)ord(Δ)0\operatorname{ord}_\infty(f) = \operatorname{ord}_\infty(g) - \operatorname{ord}_\infty(\Delta) \geq 0. This means ff is also holomorphic at infinity. This shows that fMkf \in \mathcal{M}_k satisfies fΔ=gf\Delta = g, and so the linear map is surjective. Therefore, it is an isomorphism of vector spaces. \square

This shows us that M0S12\mathcal{M}_0 \cong \mathcal{S}_{12}, and so dimS12=1\dim \mathcal{S}_{12} = 1. Since 0≢ΔS120 \not\equiv \Delta \in \mathcal{S}_{12}, it forms a basis there.

By Lemma 12., we see that dimM12=2\dim \mathcal{M}_{12} = 2, and E12M12S12E_{12} \in \mathcal{M}_{12} \setminus \mathcal{S}_{12}, so {E12,Δ}\{E_{12}, \Delta\} forms a basis for M12\mathcal{M}_{12}. We’re now ready to attack the spaces of bigger weights. But before moving on, let us quickly recap our knowledge of smaller spaces.

k A basis for M_k A basis for S_k
0 {1} {}
2 {} {}
4 {E_4} {}
6 {E_6} {}
8 {E_8} {}
10 {E_{10}} {}
12 {E_{12}, Delta} {Delta}

Theorem 14. For even k0k \geq 0, one has dimMk={k12;k21(mod6)k12+1;k2≢1(mod6).\dim \mathcal{M}_k = \begin{cases} \lfloor \frac{k}{12}\rfloor &; \frac{k}{2} \equiv 1 \pmod{6}\\ \lfloor \frac{k}{12}\rfloor+1 &; \frac{k}{2} \not\equiv 1 \pmod{6}.\end{cases} Otherwise, for kk odd or negative, one has Mk={0}\mathcal{M}_k = \{0\}. Moreover, there exists an algorithm to construct a basis for Mk\mathcal{M}_k in terms of Eisenstein series and Δ\Delta. The same can be done for Sk\mathcal{S}_k as well, where for even k4k \geq 4, dimSk=dimMk1\dim \mathcal{S}_k = \dim \mathcal{M}_k - 1.

Proof. For k{0,2,4,6,8,10,12}k \in \{0, 2, 4, 6, 8, 10, 12\} we’ve finished the proof already. For the rest, we proceed by induction. For a fixed even k14k \geq 14, one has SkMk12\mathcal{S}_k \cong \mathcal{M}_{k-12} by Lemma 13. Suppose Bk12\mathscr{B}_{k-12} is a basis for Mk12\mathcal{M}_{k-12}. We can use the isomorphism SkMk12\mathcal{S}_k \cong \mathcal{M}_{k-12} to construct a basis for Sk\mathcal{S}_k. More explicitly, a possible basis for Sk\mathcal{S}_k can be given by {fΔ ⁣:fBk12}\{f\Delta \colon f \in \mathscr{B}_{k-12}\}, and we see that dimSk=dimMk12\dim \mathcal{S}_k = \dim \mathcal{M}_{k-12}. Apply Lemma 12 to see that dimSk=dimMk1\operatorname{dim} \mathcal{S}_k = \operatorname{dim} \mathcal{M}_k-1. Since SkMk\mathcal{S}_k \subseteq \mathcal{M}_k, from such basis, we can add one more modular form that is not in Sk\mathcal{S}_k (hence linearly independent), to form a basis for Mk\mathcal{M}_k. We have one such natural candidate: EkE_k. Hence,

Bk:={fΔ ⁣:fBk12}{Ek}\mathscr{B}_k := \{f\Delta \colon f \in \mathscr{B}_{k-12}\} \cup \{E_k\}

is a basis for Mk\mathcal{M}_k. Now, we have dimMk=dimMk12+1\dim \mathcal{M}_k = \dim \mathcal{M}_{k-12}+1. Plugging in the formula for dimMk12\dim \mathcal{M}_{k-12} to conclude that the formula for dimMk\dim \mathcal{M}_k holds as well. \square

Now we’ve made a huge progress. We’ve studied the space of modular forms to the point that we can construct a basis for holomorphic space of every possible integer weight! What’s so nice about this? The idea is: If we know that a function f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} is a holomorphic modular form of weight kk that is also holomorphic at infinity, then we know that fMkf \in \mathcal{M}_k. But we’ve already construct the bases! And so Mk=SpanC(Bk)\mathcal{M}_k = \operatorname{Span}_{\mathbb{C}}(\mathscr{B}_k), so there exists a unique (λ1,,λd)Cd(\lambda_1, \dots, \lambda_d) \in \mathbb{C}^d such that f=i=1dλibif = \sum_{i=1}^d \lambda_i b_i, where d:=dimMkd := \dim \mathcal{M}_k and Bk={b1,,bd}\mathscr{B}_k = \{b_1, \dots, b_d\}. We’ve transformed the problem of studying ff into the problem of studying the elements b1,,bdb_1, \dots, b_d of the basis!

Now the next thing we would want to do is to study more on the Eisenstein series (and maybe also the Poincaré series, but maybe we can skip them since we don’t use them to construct the bases). However, the content there would require heavier complex-analytic machinery like the Poisson summation formula, so let’s just take a break from modular forms and take a quick look at these tools from analysis.

Fourier Transform and Poisson Summation Formula

First, we recall our basic result from the theory of Fourier transform that will be useful for us.

Definition. For a Lebesgue-integrable function f ⁣:RCf \colon \mathbb{R} \to \mathbb{C}, we define its Fourier transform f^ ⁣:RC\hat{f} \colon \mathbb{R} \to \mathbb{C} as

f^(t):=f(x)e2πixtdx.\hat{f}(t) := \int_{-\infty}^\infty f(x) e^{-2\pi i x t} \mathrm{d}x.

(For simplicity of notation, we use the variable xx in f(x)f(x) and use tt in f^(t)\hat{f}(t).)

Lemma 15. The function g(x):=eπx2g(x) := e^{-\pi x^2} defined for all xRx \in \mathbb{R} has g^(t)=eπt2\hat{g}(t) = e^{-\pi t^2}, i.e., its Fourier transform is itself.

Note. We’re not going to use Lemma 15 immediately right now, so you can comfortably skip this and come back later. I’m just putting it here because I don’t want to put results from Fourier transform all over the place.

Proof. We first expand the definition for g^\hat{g} and see that for all tRt \in \mathbb{R}, we have

g^(t)=eπx2e2πixtdx,\hat{g}(t) = \int_{-\infty}^\infty e^{-\pi x^2}e^{-2\pi i x t} \mathrm{d}x,

i.e., the integral converges, and the function g^\hat{g} is at least C1\mathcal{C}^1. Now, we take derivative by tt on both sides to see

tg^(t)=teπx22πixtdx=teπx22πixtdx,\partial_t \hat{g}(t) = \partial_t \int_{-\infty}^\infty e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty \partial_t e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x,

where the right equality holds by Lebesgue’s dominated convergence theorem for derivatives. Let us check the conditions we need. Define f(x,t):=eπx22πixtf(x, t) := e^{-\pi x^2-2\pi i xt} for all x,tRx, t \in \mathbb{R}. We need to check:

  1. For each t0Rt_0 \in \mathbb{R}, the function tf(x,t)t \mapsto f(x, t) is differentiable at t0t_0.
  2. For each t0Rt_0 \in \mathbb{R} there exists a neighborhood UU of t0t_0 in R\mathbb{R} and a function hL1(R)h \in L^1(\mathbb{R}) such that tf(x,t)h(x)|\partial_t f(x, t)| \leq h(x) for all xRx \in \mathbb{R} and all tUt \in U.
  3. For each t1Ut_1 \in U, the function xf(x,t1)x \mapsto f(x, t_1) is in L1(R)L^1(\mathbb{R}).

First of all, we take tf(x,t)=eπx2(2πix)e2πixt\partial_t f(x, t) = e^{-\pi x^2} (-2\pi i x)e^{- 2 \pi i x t} to see that it is indeed differentiable everywhere. The first condition is satisfied. Now, for any fixed t0Rt_0 \in \mathbb{R}, we take U:=(t01,t0+1)U := (t_0-1, t_0+1) and observe that tf(x,t)2πixeπx2suptUeπixt=2πxeπx2|\partial_t f(x, t)| \leq |-2\pi i x|e^{-\pi x^2} \sup_{t \in U} |e^{-\pi i xt}| = 2\pi |x| e^{-\pi x^2}. Define h(x):=2πxeπx2h(x) := 2\pi |x| e^{-\pi x^2} and observe that

h1=h(x)dx=2πxeπx2=4π0xeπx2dx.\|h\|_1 =\int_{-\infty}^{\infty} |h(x)| \mathrm{d}x = \int_{-\infty}^\infty 2\pi|x|e^{-\pi x^2} = 4\pi \int_0^\infty xe^{-\pi x^2} \mathrm{d}x.

Put u=x2u = x^2 substitution (this works since xx2x \mapsto x^2 is a C1\mathcal{C}^1-diffeomorphism on (0,)(0, \infty)) with du=2xdx\mathrm{d}u = 2x\mathrm{d}x. We have

h1=2π0eπudu=2π[1πeπu]0=2π(eπ0limxeπx)=2π.\|h\|_1 = 2\pi \int_0^\infty e^{-\pi u} \mathrm{d}u = 2\pi \left[\frac{1}{-\pi} e^{-\pi u}\right]_0^\infty = 2\pi(e^{-\pi 0} - \lim_{x \to \infty} e^{-\pi x}) = 2\pi.

This shows that hL1(R)h \in L^1(\mathbb{R}), and the second condition is satisfied. Now, let us check the third condition. Fix t1Ut_1 \in U and observe that

xf(x,t1)1=eπx2e2πixt1dx=eπx2dx=1.\|x \mapsto f(x, t_1)\|_1 = \int_{-\infty}^\infty |e^{-\pi x^2}| |e^{-2\pi i x t_1}| \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} \mathrm{d}x = 1.

This shows that the third condition is satisfied. Now let us forget about f(x,t),h(x),t0,t1,Uf(x,t), h(x), t_0, t_1, U and the extra notations we used in DCT, and let’s go back to tg^(t).\partial_t \hat{g}(t) . We have

tg^(t)=teπx22πixtdx=eπx2(2πix)e2πixtdx.\partial_t \hat{g}(t) = \int_{-\infty}^\infty \partial_t e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} (-2\pi i x)e^{- 2 \pi i x t} \mathrm{d}x.

To continue, let us try to attack this with integration by parts. Observe that the derivative of xeπx2x \mapsto e^{-\pi x^2} is x2πxeπx2x \mapsto -2\pi x e^{-\pi x^2}. Put u:=ie2πixtu := ie^{-2\pi i x t} and dv:=2πxeπx2\mathrm{d}v := -2\pi x e^{-\pi x^2} so that

udv=[uv]vdu.\int_{-\infty}^\infty u \mathrm{d}v = [uv]_{-\infty}^{\infty} - \int_{-\infty}^\infty v \mathrm{d}u.

We have du=i(2πit)e2πixtdx=2πte2πixtdx\mathrm{d}u = i(-2\pi i t)e^{-2\pi i x t} \mathrm{d}x = 2\pi te^{-2\pi i xt} \mathrm{d}x and v=eπx2v = e^{-\pi x^2}, so we have

tg^(t)=udv=[ie2πixteπx2]eπx22πte2πixtdx.\partial_t \hat{g}(t) = \int_{-\infty}^\infty u\mathrm{d}v = [ie^{-2\pi i x t} e^{-\pi x^2}]_{-\infty}^{\infty} - \int_{-\infty}^{\infty} e^{-\pi x^2} 2\pi t e^{-2\pi i x t} \mathrm{d}x.

Since limx+eπx2=limxeπx2=0\lim_{x \to +\infty} e^{-\pi x^2} = \lim_{x \to -\infty} e^{-\pi x^2} = 0, the middle boundary term vanishes, and we have

tg^(t)=2πteπx2e2πixtdx=2πtg^(t).\partial_t \hat{g}(t) = -2\pi t \int_{-\infty}^\infty e^{-\pi x^2} e^{- 2 \pi i x t} \mathrm{d}x = -2\pi t \hat{g}(t).

Now we’re looking for g^\hat{g} such that g^=2πtg^\hat{g}' = -2\pi t \hat{g}. This is a basic ODE. We give up our dignity and resort to the physicists’ notation for a while:

dg^dt=2πtg^\frac{\mathrm{d}\hat{g}}{\mathrm{d}t} = -2\pi t \hat{g}

so

1g^dg^=2πtdt\frac{1}{\hat{g}} \mathrm{d}\hat{g} = -2\pi t \mathrm{d}t

and we integrate both sides like a boss:

logg^+(const)=1g^dg^=2πtdt=πt2+(const).\log \hat{g} + (\mathrm{const}) =\int \frac{1}{\hat{g}} \mathrm{d}\hat{g} = \int-2\pi t\mathrm{d}t = -\pi t^2 + (\mathrm{const}).

Exponentiating both sides, we end up with

g^=Ceπt2\hat{g} = Ce^{-\pi t^2}

for some CCC \in \mathbb{C}. Let us try to plug in some basic value for g^\hat{g} from the definition, say, evaluate at t=0t = 0 and see that g^(0)=eπx2e2πix(0)dx=eπx2dx=1.\hat{g}(0) = \int_{-\infty}^\infty e^{-\pi x^2} e^{-2\pi i x (0)} \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} \mathrm{d}x = 1. This means C=1C = 1, and our only solution for g^\hat{g} is g^(t)=eπt2\hat{g}(t) = e^{-\pi t^2}, defined for all tRt \in \mathbb{R}. We can easily plug this in and check that it satisfies the ODE and the initial conditions. This completes the proof of Lemma 15. \square

Theorem 16. (A version of the Poisson Summation Formula) Let f ⁣:RCf \colon \mathbb{R} \to \mathbb{C} be C1\mathcal{C}^1 and L1(R)L^1(\mathbb{R}), such that fL1(R)f' \in L^1(\mathbb{R}) also. If the series nZf(n+x)\sum_{n \in \mathbb{Z}} f'(n+x) is locally uniformly absolutely convergent, then

nZf(n)=mZf^(m).\sum_{n \in \mathbb{Z}} f(n) = \sum_{m \in \mathbb{Z}} \hat{f}(m).

Note. The final hypothesis that nZf(n+x)\sum_{n \in \mathbb{Z}} f'(n+x) is locally uniformly absolutely convergent can be removed, but it will complicate the proof too much since we would otherwise have to detour through Dirichlet-Jordan theorem for pointwise Fourier series convergence for functions with bounded variation.

Proof. Put F(x):=nZf(n+x)F(x) := \sum_{n \in \mathbb{Z}} f(n+x). We will check that this is well-defined by proving that the series is locally uniformly absolutely convergent. First, observe that, by the fundamental theorem of calculus, we have abf(t)dt=f(b)f(a)\int_a^b f'(t) \mathrm{d}t = f(b) - f(a). Put a=n+xa = n+x and observe that

f(n+x)=f(b)n+xbf(t)dt.f(n+x) = f(b) - \int_{n+x}^b f'(t)\mathrm{d}t.

By taking absolute values, and applying triangle inequality, we have

f(n+x)f(b)+n+xbf(t)dt|f(n+x)| \leq |f(b)| + \int_{n+x}^b |f'(t)| \mathrm{d}t

Taking integral for bb from n+xn+x to n+x+1n+x+1, we get

f(n+x)=n+xn+x+1f(n+x)dbn+xn+x+1f(b)db+n+xn+x+1n+xbf(t)dtn+xn+x+1f(b)db+n+xn+x+1n+xn+x+1f(t)dt=n+xn+x+1(f(b)+f(b))db.\begin{align*} |f(n+x)| &= \int_{n+x}^{n+x+1} |f(n+x)| \mathrm{d}b \\ &\leq \int_{n+x}^{n+x+1}|f(b)| \mathrm{d}b + \int_{n+x}^{n+x+1}\int_{n+x}^b |f'(t)| \mathrm{d}t \\ &\leq \int_{n+x}^{n+x+1} |f(b)| \mathrm{d}b + \int_{n+x}^{n+x+1} \int_{n+x}^{n+x+1} |f'(t)| \mathrm{d}t \\ &= \int_{n+x}^{n+x+1} (|f(b)|+|f'(b)|)\mathrm{d}b. \end{align*}

Now, for any compact interval IRI \subseteq \mathbb{R}, say I=[cL,cR]I = [c_L, c_R], we have

supxIn>Nf(n+x)supxIn>Nn+xn+x+1(f(b)+f(b))db=supxI(,xN+1][x+N,)(f(b)+f(b))dbsupx[cL,cR](,xN+1](f(b)+f(b))db+supx[cL,cR][x+N,)(f(b)+f(b))db(,cRN+1](f(b)+f(b))db+[cL+N,)(f(b)+f(b))db,\begin{align*} \sup_{x \in I} \sum_{|n| > N} |f(n+x)| &\leq \sup_{x \in I} \sum_{|n| > N} \int_{n+x}^{n+x+1}(|f(b)|+|f'(b)|) \mathrm{d}b \\ &= \sup_{x \in I} \int_{(-\infty, x-N+1] \cup [x+N, \infty)} (|f(b)|+|f'(b)|) \mathrm{d}b \\ &\leq \sup_{x \in [c_L, c_R]} \int_{(-\infty, x-N+1]} (|f(b)|+|f'(b)|)\mathrm{d}b + \sup_{x \in [c_L, c_R]} \int_{[x+N, \infty)}(|f(b)|+|f'(b)|)\mathrm{d}b \\ &\leq \int_{(-\infty, c_R-N+1]} (|f(b)|+|f'(b)|)\mathrm{d}b + \int_{[c_L+N, \infty)} (|f(b)|+|f'(b)|) \mathrm{d}b, \end{align*}

and as NN tends to infinity, since f,fL1(R)f, f' \in L^1(\mathbb{R}), both integrals vanish. This shows the local uniform absolute convergence for FF. By local uniform absolute convergence, we also see that FF is continuous, and since nZf(n+x)\sum_{n \in \mathbb{Z}} f'(n+x) is also locally uniformly absolutely convergent, we see that FF is also C1\mathcal{C}^1. Now, by definition, FF is 11-periodic. By the theory of Fourier series, such function converges uniformly to its Fourier series

nZcne2πinx\sum_{n \in \mathbb{Z}} c_ne^{2 \pi inx}

where cn:=01F(x)e2πinxdxc_n := \int_0^1 F(x) e^{-2\pi i n x} \mathrm{d}x.

In particular, we have (the left equality comes from definition of FF and the right equality comes from the convergence of the Fourier series at zero)

nZf(n)=F(0)=nZcn.\sum_{n \in \mathbb{Z}} f(n)= F(0) = \sum_{n \in \mathbb{Z}} c_n.

Now let us compute cnc_n. We have

cn=01F(x)e2πinxdx=01mZf(m+x)e2πin(m+x)dx=f(x)e2πinxdx=f^(n),c_n = \int_{0}^1 F(x) e^{-2\pi i n x} \mathrm{d}x = \int_{0}^1 \sum_{m \in \mathbb{Z}} f(m+x)e^{-2\pi i n (m+x)} \mathrm{d}x = \int_{-\infty}^\infty f(x)e^{-2 \pi i n x} \mathrm{d}x = \hat{f}(n),

where the equality before the last follows from σ\sigma-additivity of the integral. And so this completes the proof. \square

The Fourier Coefficients of Eisenstein Series

Now that we have our toolbox ready, let us go back to modular forms and compute those coefficients!

Theorem 17. Let k4k \geq 4 be an even integer. Then, the Fourier coefficients of EkE_k is given by

an=(2πi)kζ(k)(k1)!σk1(n)a_n = \frac{(2\pi i)^k}{\zeta(k)(k-1)!} \sigma_{k-1}(n)

for n1n \geq 1, and a0=1a_0 = 1, where σk1(n):=dndk1\sigma_{k-1}(n) := \sum_{d \mid n} d^{k-1} is the sum of (k1)(k-1)-th power of divisors function.

The proof is going to be quite long, so before getting into it, let me try to give a brief overview. The first step is to recall the definition of EkE_k that we have to sum over cosets in ±T\SL2(Z)\langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z}), where we can select any representative and the result will be the same. This means we’d want to parametrize ±T\SL2(Z)\langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z}) by a nice collection of representatives. This is called Bruhat decomposition. Once this is done, we try to compute the sum over those representatives, but we will be stuck. However, the innermost piece of the sum will be sum over mZm \in \mathbb{Z}, and the function is quite nice, nice enough to apply the Poisson summation formula and sum over its Fourier transform instead. This will give another ugly integral over a vertical line in the complex plane. We solve this by applying the residue theorem and switch to an integral over a semicircle arc. When the radius goes to infinity, the term from the semicircle vanishes, and so the ugly integral is solved. The residue is the source of the 1(k1)!\frac{1}{(k-1)!} term. Now the outer sums are parametrized in an arithmetically convenient way, and this gives something called the Ramanujan sum. The Ramanujan sum can be solved explicitly into sum in terms of divisors, and then we will obtain a Dirichlet series. Applying Möbius inversion, we get ζ(k)\zeta(k).

To prepare for the proof, let me introduce the basic analytic theory of numbers: Dirichlet convolution, Dirichlet series, and Möbius inversion.

Definition. For functions f,g ⁣:Z1Cf, g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}, we define its Dirichlet convolution, denoted by fg ⁣:Z1Cf \star g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C} as

(fg)(n):=dnf(d)g(nd).(f\star g)(n) := \sum_{d \mid n} f(d)g\left(\frac{n}{d}\right).

Observe that \star is commutative, associative, and there is a neutral element 1=1\mathbf{1}_{=1} defined by

1=1(n):={1;n=10;n1.\mathbf{1}_{=1}(n) := \begin{cases}1 &; n=1\\ 0 &; n\ne 1.\end{cases}

Furthermore, for each function ff, if f(1)0f(1) \ne 0, then there is an inverse ff^{-\star} such that ff=1=1f \star f^{-\star} = \mathbf{1}_{=1}. To see this, we can define ff^{-\star} inductively. We want such function to satisfy

dnf(d)f(n/d)=(ff)(n)=1=1(n).\sum_{d \mid n} f(d) f^{-\star}(n/d)=(f \star f^{-\star})(n) = \mathbf{1}_{=1}(n).

For n=1n = 1, define f(1):=1f(1)f^{-\star}(1) := \frac{1}{f(1)}. Now, for any nn, assume that ff^{-\star} has already been defined for all positive integers less than nn. Then, we have

f(n):=dn;d1f(d)f(n/d)f(1).f^{-\star}(n) := \frac{-\sum_{d \mid n; d \ne 1} f(d)f^{-\star}(n/d)}{f(1)}.

Observe that ff^{-\star} is uniquely defined, and this is the only choice to make it an inverse of ff. This shows that {f ⁣:Z1C ⁣:f(1)0}\{f \colon \mathbb{Z}_{\geq 1} \to \mathbb{C} \colon f(1) \ne 0\} forms an abelian group under the operation \star.

Proposition 18. (Möbius Inversion) The Möbius function μ\mu defined by

μ(n):={(1)k;n is a product of k distinct primes0;p prime,p2n\mu(n) := \begin{cases}(-1)^k &; n \text{ is a product of } k \text{ distinct primes}\\0 &; \exists p \text{ prime}, p^2 \mid n\end{cases}

for n1n \geq 1 (in particular μ(1)=1\mu(1) = 1) satisfies

μ1=1=1.\mu \star 1 = \mathbf{1}_{=1}.

In other words, it is the inverse of the function that sends everything to one.

Proof. It is enough to show that

(μ1)(n)=0(\mu \star 1)(n) = 0

for n2n \geq 2. (The case for n=1n = 1 is obvious) Let us compute.

(μ1)(n)=dnμ(d).(\mu \star 1)(n) = \sum_{d \mid n} \mu(d).

Suppose n=i=1mpiain = \prod_{i=1}^m p_i^{a_i} with ai>0a_i > 0 (apply the fundamental theorem of arithmetic to get this). Then, we have

dnμ(d)=(x1,,xm){0,1}mμ(i=1mpixi)=x{0,1}mμ(x1)=k=0m(mk)(1)k=(11)m=0.\sum_{d \mid n} \mu(d) = \sum_{(x_1, \dots, x_m) \in \{0,1\}^m} \mu\left(\prod_{i=1}^m p_i^{x_i}\right) = \sum_{x \in \{0,1\}^m} \mu(\|x\|_1) = \sum_{k=0}^m {m \choose k} (-1)^k = (1-1)^m = 0.

The equality before the last follows from the binomial theorem. The equality before that follows by reparametrizing the sum by x1=k\|x\|_1 = k for each kk, then count the number of tuples satisfying the condition. \square

For the Dirichlet series, the actual power lies in defining it over the complex plane (with suitable analytic continuation). However, that is an overkill for our purpose right now, and so I’ll only introduce it for a subset of positive reals.

Definition. For a function f ⁣:Z1Cf \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}, one can define its Dirichlet series DfD_f as

Df(s):=n1f(n)ns.D_f(s) := \sum_{n \geq 1} \frac{f(n)}{n^s}.

If ff grows polynomially, i.e. f(n)=O(nc)f(n) = O(n^c) for some c>0c > 0, then the series converges absolutely (pointwise) for s>c+1s > c+1.

Note. For functions f,g ⁣:Z1Cf, g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}, if both DfD_f and DgD_g converge absolutely, we have DfDg=DfgD_f D_g = D_{f \star g} for sufficiently large input. To see this, write

Df(s)Dg(s)=(n1f(n)ns)(m1g(m)ms)=n1;m1f(n)g(m)(nm)s.D_f(s) D_g(s) = \left(\sum_{n \geq 1}\frac{f(n)}{n^{s}}\right)\left(\sum_{m \geq 1} \frac{g(m)}{m^{s}}\right) = \sum_{n \geq 1; m \geq 1} \frac{f(n)g(m)}{(nm)^s}.

Now we reparametrize the sum by nm=knm = k to see that

Df(s)Dg(s)=k1nm=kf(n)g(m)ks=k1nkf(n)g(k/n)ks=k1(fg)(k)ks=Dfg(s).D_f(s)D_g(s) = \sum_{k \geq 1} \sum_{nm = k} \frac{f(n)g(m)}{k^s} = \sum_{k \geq 1} \sum_{n \mid k} \frac{f(n)g(k/n)}{k^s} = \sum_{k \geq 1} \frac{(f \star g)(k)}{k^s} = D_{f \star g}(s).

Now we can go back to Theorem 17 by proving a sequence of results here.

Lemma 19. (Bruhat Decomposition)

SL2(Z)=±Tc10dc1gcd(c,d)=1±T(ac,dbc,dcd)T\operatorname{SL}_2(\mathbb{Z}) = \langle \pm T \rangle \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d) = 1}} \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle

where ac,d,bc,dZa_{c,d}, b_{c,d} \in \mathbb{Z} can be chosen arbitrarily such that ac,ddbc,dc=1a_{c,d}d - b_{c,d}c = 1.

Proof. Take g=(αβγδ)SL2(Z)g =\begin{pmatrix}\alpha & \beta\\\gamma & \delta\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}). If γ=0\gamma = 0, then αδβγ=1\alpha \delta - \beta\gamma = 1 implies α=δ=±1\alpha = \delta = \pm 1 and so g±Tg \in\langle \pm T \rangle. Now assume γ0\gamma \ne 0. First let’s assume γ1\gamma \geq 1. If we multiply gg by TnT^n, we have

gTn=(αβγδ)(1n01)=(αnα+βγnγ+δ).gT^n = \begin{pmatrix}\alpha & \beta\\\gamma & \delta\end{pmatrix}\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} = \begin{pmatrix}\alpha & n\alpha +\beta\\\gamma & n\gamma + \delta\end{pmatrix}.

Observe that there exists unique nZn \in \mathbb{Z} such that 0nγ+δγ10 \leq n\gamma + \delta \leq \gamma -1. Fix such nn and put c=γ;d=nγ+δc = \gamma;\, d = n\gamma + \delta. Write a=αa = \alpha and b=nα+βb = n\alpha + \beta. If we multiply gTngT^n by TmT^m from the left, we get

TmgTn=(1m01)(abcd)=(a+mcb+mdcd).T^mgT^n = \begin{pmatrix}1 & m\\0 & 1\end{pmatrix} \begin{pmatrix}a & b\\c & d\end{pmatrix} = \begin{pmatrix}a+mc & b+md\\c & d\end{pmatrix}.

But since ac,ddbc,dc=1a_{c,d}d - b_{c,d}c = 1 and adbc=1ad-bc=1, we see that there exists a unique mZm \in \mathbb{Z} such that a+mc=ac,da+mc = a_{c,d} and b+md=bc,db+md = b_{c,d}. Fix such mm to get

TmgTn=(ac,dbc,dcd),T^mgT^n = \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix},

and so

g=Tm(ac,dbc,dcd)Tn±T(ac,dbc,dcd)T.g = T^{-m}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^{-n} \in \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.

This shows that gg belongs to the decomposition when γ1\gamma \geq 1. Now for γ1\gamma \leq -1, we know that g-g belongs to such decomposition. Multiplying the left TmT^{-m} by 1-1 gives

g=Tm(ac,dbc,dcd)Tn±T(ac,dbc,dcd)T.g = -T^{-m}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^{-n} \in \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.

This shows that

SL2(Z)±Tc10dc1gcd(c,d)=1±T(ac,dbc,dcd)T.\operatorname{SL}_2(\mathbb{Z}) \subseteq \langle \pm T \rangle \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d) = 1}} \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.

The other inclusion is trivial since everything is included in SL2(Z)\operatorname{SL}_2(\mathbb{Z}). The fact that the union is disjoint comes from the fact that mm and nn are unique. If there were to be a nonempty intersection between the decompositions, then there must be a gSL2(Z)g \in \operatorname{SL}_2(\mathbb{Z}) such that it can be put into multiple decompositions, but by uniqueness of (m,n)(m, n), we see that there is only one choice of (c,d)(c, d). This completes the proof. \square

Corollary 20.

Ek(z)=(abcd)±T\SL2(Z)(cz+d)k=1+c10dc1gcd(c,d)=1mZ(c(z+m)+d)k.E_k(z) = \sum_{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cz+d)^{-k} = 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (c(z+m)+d)^{-k}.

Proof. By definition, Ek(z)=(abcd)±T\SL2(Z)(cz+d)kE_k(z) = \sum_{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cz+d)^{-k}. Parametrize the cosets by Bruhat decomposition to get

Ek(z)=(abcd){(1001)}c10dc1gcd(c,d)=1(ac,dbc,dcd)T(cz+d)k=1+c10dc1gcd(c,d)=1(abcd)(ac,dbc,dcd)T(cz+d)k=1+c10dc1gcd(c,d)=1mZ(c(z+m)+d)k,\begin{align*} E_k(z) &= \sum_{\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \left\{\begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}\right\} \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle} (c'z+d')^{-k} \\ &= 1+ \sum_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}\langle T \rangle} (c'z+d')^{-k} \\ &= 1+ \sum_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (c(z+m)+d)^{-k}, \end{align*}

where the last equality comes from the fact that T={Tm ⁣:mZ}\langle T \rangle = \{T^m \colon m \in \mathbb{Z}\} and

(abcd)=(ac,dbc,dcd)Tm=(ac,dmac,d+bc,dcmc+d)\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} =\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^m = \begin{pmatrix}a_{c,d} & ma_{c,d} + b_{c,d}\\c & mc+d\end{pmatrix}

varies over all mZm \in \mathbb{Z}. \square

Now we look at the innermost sum and use the next lemma to apply Poisson summation formula there.

Lemma 21. Put φ(x):=(c(z+x)+d)k\varphi(x) := (c(z+x)+d)^{-k}. Then φ^(t)=(2πi)ktk1cke2πi(d/c+z)t1(k1)!\hat{\varphi}(t) = (2\pi i)^k t^{k-1}c^{-k} e^{2 \pi i (d/c+z)t} \frac{1}{(k-1)!}.

Proof. This is nothing more than computing the Fourier transform for φ\varphi. However, even if it looks trivial, we still need to do some work. We have

φ^(t)=φ(x)e2πixtdx=(c(z+x)+d)ke2πixtdx.\hat{\varphi}(t) = \int_{-\infty}^\infty \varphi(x) e^{-2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty (c(z+x)+d)^{-k} e^{-2\pi i x t} \mathrm{d}x.

Put u:=c(z+x)+du := c(z+x)+d. Obviously this is a C1\mathcal{C}^1-diffeomorphism, with du=cdx\mathrm{d}u = c \mathrm{d}x, and so we get

φ^(t)=icyicy+uke2πi(udcz)t1cdu=1ce2πi(dc+z)ticyicy+uke2πi1cutdu.\hat{\varphi}(t) = \int_{icy-\infty}^{icy+\infty} u^{-k}e^{-2\pi i(\frac{u-d}{c}-z)t} \frac{1}{c} \mathrm{d}u = \frac{1}{c} e^{2\pi i (\frac{d}{c}+z)t}\int_{icy-\infty}^{icy+\infty} u^{-k} e^{-2\pi i \frac{1}{c}ut} \mathrm{d}u.

Put v=2πi1cutv = 2 \pi i \frac{1}{c} ut. This is a C1\mathcal{C}^1-diffeomorphism, with dv=2πitcdu\mathrm{d}v = 2\pi i \frac{t}{c} \mathrm{d}u, and so

φ^(t)=1ce2πi(dc+z)t2πyi2πy+i(vc2πit)kevdv2πitc=e2πi(dc+z)tck(2πit)k12πyi2πy+ivkevdv.\hat{\varphi}(t) = \frac{1}{c}e^{2\pi i(\frac{d}{c}+z)t} \int_{-2\pi y-i\infty}^{-2 \pi y+i\infty} \left(\frac{vc}{2 \pi it}\right)^{-k} e^{-v} \frac{\mathrm{d}v}{2 \pi i \frac{t}{c}} = e^{2 \pi i(\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} \int_{-2\pi y-i\infty}^{-2\pi y+i\infty} v^{-k}e^{-v} \mathrm{d}v.

Now we’re left with computing the integral Ia:=aia+ivkevdvI_a := \int_{-a-i\infty}^{-a+i\infty} v^{-k} e^{-v} \mathrm{d}v. For a>0a > 0, draw a semicircle centered at a-a, down to aiR-a-iR and circle counterclockwise to a+iR-a+iR, then down back to a-a. This defines a contour Ca,RC_{a, R}, where we would want to compute our integral in. By the residue theorem, the contour integral is equal to the sum of residues, times 2πi2 \pi i. Let us check the poles for vvkevv \mapsto v^{-k}e^{-v}. We solve for vkev=0v^ke^v = 0, and we see that the only solution is at v=0v = 0. Observe that, near zero, we have a Laurent expansion

vkev=vkn0(v)nn!=n0(1)nvnkn!.v^{-k}e^{-v} = v^{-k} \sum_{n \geq 0} \frac{(-v)^n}{n!} = \sum_{n \geq 0} \frac{(-1)^nv^{n-k}}{n!}.

So the residue, that is, the (1)(-1)-th term, is (1)k1(k1)!\frac{(-1)^{k-1}}{(k-1)!}. Since kk is known to be even, the residue is 1(k1)!\frac{-1}{(k-1)!}. We get

(2πi)1(k1)!=Ca,Rvkevdv=a+iRaiRvkevdv+v=a+Reπiθ;θ=0θ=1vkevdv.(2 \pi i)\frac{-1}{(k-1)!} = \int_{C_{a,R}} v^{-k} e^{-v} \mathrm{d}v = \int_{-a+iR}^{-a-iR}v^{-k}e^{-v} \mathrm{d}v + \int_{v = -a+Re^{\pi i \theta}; \theta = 0}^{\theta = 1} v^{-k} e^{-v} \mathrm{d}v.

The change of variable θa+Reπiθ\theta \mapsto -a+Re^{\pi i \theta} is again a C1\mathcal{C}^1-diffeomorphism (this one is less obvious because the inverse isn’t globally well-defined, but one can define such inverse in a chart containing the whole semicircle). We have dv=Rπieπiθdθ\mathrm{d}v = R\pi i e^{\pi i \theta} \mathrm{d}\theta, and so the integral becomes

01(a+Reπiθ)keaR(cos(πθ)+isin(πθ))Rπieπiθdθ.\int_0^1 (-a+Re^{\pi i \theta})^{-k} e^{a-R(\cos(\pi\theta)+i\sin(\pi\theta))} R \pi i e^{\pi i \theta} \mathrm{d}\theta.

Taking absolute value inside, we see that its value behaves like RCeRR^Ce^{-R} as RR \to \infty, for a fixed CC, and so this integral vanishes as RR goes to infinity. The left integral, as RR \to \infty, becomes Ia-I_a. Taking limits on both sides, we get

2πi(k1)!=Ia.\frac{-2\pi i}{(k-1)!} = -I_a.

Plugging back to φ^(t)=e2πi(dc+z)tck(2πit)k1I2πy\hat{\varphi}(t) = e^{2 \pi i(\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} I_{2 \pi y} to get

φ^(t)=e2πi(dc+z)tck(2πit)k12πi(k1)!.\hat{\varphi}(t) = e^{2\pi i (\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} \frac{2 \pi i}{(k-1)!}.

This completes the proof. \square

In order to apply Poisson summation formula, we must check that the function φ\varphi satisfies the hypotheses. First, it is easy to see that it’s C1\mathcal{C}^1 since Im(z)>0\operatorname{Im}(z) > 0 and so the value inside never becomes zero. Now, φ(x)=O((1+x)k)\varphi(x) = O((1+x)^{-k}) and so its integral (of its absolute value) over the reals converges. We have φ(x)=kc(c(z+x)+d)k1\varphi'(x) = -kc(c(z+x)+d)^{-k-1} and hence is integral also converges. Now we’re left with checking that the infinite sum nZφ(n+x)\sum_{n \in \mathbb{Z}} \varphi'(n+x) absolutely converges. Well, for a fixed xx, φ(n+x)=O(nk1)\varphi'(n+x) = O(n^{-k-1}) and so the sum absolutely converges. This allows us to apply Poisson summation formula. We have

Ek(z)=1+c10dc1gcd(c,d)=1mZφ(m)=1+c10dc1gcd(c,d)=1mZφ^(m)=1+c10dc1gcd(c,d)=1mZ(2πi)kmk1cke2πi(d/c+z)m1(k1)!=1+(2πi)k1(k1)!c1ck0dc1gcd(c,d)=1mZmk1e2πi(d/c+z)m=1+(2πi)k(k1)!mZe2πimzmk1c1ck0dc1gcd(c,d)=1e2πimd/c.\begin{align*} E_k(z) &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} \varphi(m) \\ &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} \hat{\varphi}(m) \\ &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (2\pi i)^k m^{k-1}c^{-k} e^{2 \pi i (d/c+z)m} \frac{1}{(k-1)!} \\ &= 1 + (2\pi i)^k \frac{1}{(k-1)!} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} m^{k-1} e^{2 \pi i (d/c+z)m} \\ &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c}. \end{align*}

Lemma 22. Fix c1c \geq 1 and vary dd. We have

0dc1gcd(c,d)=1e2πimd/c=δgcd(c,m)μ(c/δ)δ.\sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c} = \sum_{\delta \mid \gcd(c, m)} \mu(c/\delta)\delta.

Proof. Define Rn(c):=0dc1gcd(c,d)=1e2πind/cR_n(c) := \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2\pi i nd/c}. First, observe that the full sum (without the coprime constraint) is easier to calculate. We have

0dc1e2πiNd/c={c;cN0;cN.\sum_{0 \leq d \leq c-1} e^{2 \pi i N d/c} = \begin{cases}c &; c \mid N\\0 &; c \nmid N.\end{cases}

Why? The first case is clear: if cc divides NN then the exponent is 2πi2\pi i times an integer, and so, by periodicity we get e0=1e^0 = 1. Summing over cc times, we get cc. Now, if cc doesn’t divide NN, consider the identity 1+z++zc1=zc1z11+z+\dots+z^{c-1} = \frac{z^c-1}{z-1}. The complex number z=e2πiN/cz = e^{2\pi i N/c} is not equal to 11, and so by plugging into this identity, the right hand side becomes zero. Now, we have

0dc1e2πiNd/c=δc0dc1gcd(d,c)=δe2πiNd/c=δc0d<cδgcd(d,cδ)=1e2πiNdδcδδ=δcRN(cδ).\sum_{0 \leq d \leq c-1} e^{2 \pi i N d/c} = \sum_{\delta \mid c} \sum_{\substack{0 \leq d \leq c-1\\\gcd(d, c) = \delta}} e^{2\pi i N d/c} = \sum_{\delta \mid c} \sum_{\substack{0 \leq d' <\frac{c}{\delta}\\\gcd(d', \frac{c}{\delta}) = 1}} e^{2 \pi i N \frac{d'\delta}{\frac{c}{\delta} \delta}} = \sum_{\delta \mid c} R_N\left(\frac{c}{\delta}\right).

In Dirichlet convolution notation, this means id1N=1RN\mathrm{id}\mathbf{1}_{\cdot \mid N} = 1 \star R_N. Apply Möbius inversion to see that

μid1N=μ1RN=1=1RN=RN.\mu \star \mathrm{id}\mathbf{1}_{\cdot \mid N} = \mu \star 1 \star R_N = \mathbf{1}_{=1} \star R_N = R_N.

Put N:=mN := m and observe that

Rm(c)=(id1mμ)(c)=δcid1m(δ)μ(c/δ)=δcδmδμ(cδ)=δgcd(c,m)δμ(c/δ).R_m(c) = (\mathrm{id}\mathbf{1}_{\cdot \mid m} \star \mu)(c) = \sum_{\delta \mid c} \mathrm{id}\mathbf{1}_{\cdot \mid m}(\delta) \mu(c/\delta) = \sum_{\substack{\delta \mid c\\\delta \mid m}} \delta\mu\left(\frac{c}{\delta}\right) = \sum_{\delta \mid \gcd(c,m)} \delta \mu(c/\delta).

This completes the proof. \square

Continuing from where we left, applying the previous lemma, we have

Ek(z)=1+(2πi)k(k1)!mZe2πimzmk1c1ck0dc1gcd(c,d)=1e2πimd/c=1+(2πi)k(k1)!mZe2πimzmk1c1ckδgcd(c,m)μ(c/δ)δ.\begin{align*} E_k(z) &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c} \\ &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, m)} \mu(c/\delta)\delta. \end{align*}

Comparing with our template of Fourier expansion Ek(z)=n0ane2πinzE_k(z) = \sum_{n \geq 0} a_n e^{2 \pi i n z}, we have a0=1a_0 = 1 and

an=(2πi)k(k1)!nk1c1ckδgcd(c,n)μ(c/δ)δa_n = \frac{(2\pi i)^k}{(k-1)!} n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu(c/\delta)\delta

for n1n \geq 1. We can simplify this by the next lemma.

Lemma 23.

nk1c1ckδgcd(c,n)μ(cδ)δ=σk1(n)ζ(k).n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu\left(\frac{c}{\delta}\right) \delta = \frac{\sigma_{k-1}(n)}{\zeta(k)}.

Proof. We have

nk1c1ckδgcd(c,n)μ(cδ)δ=nk1δnc1δcckμ(cδ)δ=nk1δnc1(cδ)kμ(c)δ=nk1δnδ1kc1ckμ(c).\begin{align*} &n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu\left(\frac{c}{\delta}\right) \delta \\ &= n^{k-1} \sum_{\delta \mid n} \sum_{\substack{c \geq 1\\\delta \mid c}} c^{-k} \mu\left(\frac{c}{\delta}\right) \delta \\ &= n^{k-1} \sum_{\delta \mid n} \sum_{c' \geq 1} (c'\delta)^{-k} \mu(c') \delta \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k}\sum_{c' \geq 1} c'^{-k} \mu(c'). \end{align*}

Now observe that kc1ckμ(c)k \mapsto \sum_{c' \geq 1} c'^{-k} \mu(c') is actually the Dirichlet series for μ\mu, evaluated at k4k \geq 4. Since μ=O(1)\mu = O(1), the Dirichlet series absolutely converges for input greater than one. Also, note that D1(s)D_1(s) absolutely converges for s>1s > 1, and so, we have Dμ(s)D1(s)=Dμ1(s)D_\mu(s)D_1(s) = D_{\mu \star 1}(s) for s>1s > 1. In particular, putting s=k4s = k \geq 4 works. And by the Möbius inversion, μ1=1=1\mu \star 1 = \mathbf{1}_{=1}, so, we have Dμ(k)D1(k)=D1=1(k)=11k=1D_\mu(k)D_1(k) = D_{\mathbf{1}_{=1}}(k) = \frac{1}{1^k} = 1. But we have D1(s)=n11ns=ζ(s)D_1(s) = \sum_{n \geq 1} \frac{1}{n^s} = \zeta(s), so Dμ(k)=1ζ(k)D_\mu(k) = \frac{1}{\zeta(k)}. This means

nk1δnδ1kc1ckμ(c)=nk1δnδ1kDμ(k)=nk1δnδ1k1ζ(k)=1ζ(k)δn(n/δ)k1=1ζ(k)δnδk1=σk1(n)ζ(k).\begin{align*} &n^{k-1} \sum_{\delta \mid n} \delta^{1-k}\sum_{c' \geq 1} c'^{-k} \mu(c') \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k} D_\mu(k) \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k} \frac{1}{\zeta(k)} \\ &= \frac{1}{\zeta(k)} \sum_{\delta \mid n} (n/\delta)^{k-1} \\ &= \frac{1}{\zeta(k)} \sum_{\delta' \mid n} \delta'^{k-1} \\ &= \frac{\sigma_{k-1}(n)}{\zeta(k)}. \end{align*}

And this completes the proof. \square

With this lemma in place, we apply it to our Fourier coefficients to get

an=(2πi)k(k1)!nk1c1ckδgcd(c,n)μ(c/δ)δ=(2πi)kσk1(n)(k1)!ζ(k).\begin{align*} a_n &= \frac{(2\pi i)^k}{(k-1)!} n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu(c/\delta)\delta \\ &= \frac{(2\pi i)^k \sigma_{k-1}(n)}{(k-1)! \zeta(k)}. \end{align*}

This completes the Proof of Theorem 17. \square

That was a long, hard, and quite dry journey. But finally we’ve obtained the coefficients for EkE_k. Let us now begin the real thing (finally) about representing an integer into sum of eight squares.

Theta Functions

The actual theory of theta functions is quite vast and could be generalized heavily (even into the case of modular forms of half-integer weights, which we’re not going to touch for now). We begin with defining our favorite theta function:

θ(z):=nZe2πin2z\theta(z) := \sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}

What if we raise this to the eighth power? We get

θ8(z)=(nZe2πin2z)8=(x1,,x8)Z8e2πi(x12++x82)z.\theta^8(z) = \left(\sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}\right)^8 = \sum_{(x_1, \dots, x_8) \in \mathbb{Z}^8} e^{2 \pi i (x_1^2 + \dots + x_8^2)z}.

If we can reparametrize the sum as a Fourier expansion

θ8(z)=n0ane2πinz,\theta^8(z) = \sum_{n \geq 0} a_n e^{2\pi i n z},

then, by comparing the coefficients, the nnth Fourier coefficient is precisely the number of ways to represent nn into sum of eight squares! Now our goal is to compute the Fourier coefficients of θ8\theta^8 efficiently. The idea is to first show that θ8\theta^8 satisfies some automorphy condition (which is a bit weaker than our classical modularity condition), and consider the whole space of forms that satisfy this condition. Once we can show that it’s a vector space, and we can compute a basis for this, then we can write θ8\theta^8 as a linear combination of elements from the basis. And if we manage to find an explicit formula for the Fourier coefficients of such basis, then we get an explicit formula for ana_n!

Let us define φz(x):=e2πix2z\varphi_z(x) := e^{2 \pi i x^2 z} for zHz \in \mathbb{H}. We see that θ(z)=nZφz(n)\theta(z) = \sum_{n \in \mathbb{Z}} \varphi_z(n). This is where we want to apply the Poisson summation formula. Observe that φz\varphi_z is smooth and decays (even more rapidly than) exponentially. Just this fact already shows that φz\varphi_z is C1\mathcal{C}^1, L1(R)L^1(\mathbb{R}), and that φz\varphi'_z is L1(R)L^1(\mathbb{R}) with nZφz(n+x)\sum_{n \in \mathbb{Z}} \varphi'_z(n+x) absolutely convergent. This allows us to apply the Poisson summation formula. Let us first compute φ^z(t):=φz(x)e2πixtdx\hat{\varphi}_z(t) := \int_{-\infty}^\infty \varphi_z(x) e^{-2 \pi i x t} \mathrm{d}x. We have

φ^z(t)=e2πix2ze2πixtdx.\hat{\varphi}_z(t) = \int_{-\infty}^\infty e^{2 \pi i x^2 z} e^{-2 \pi i x t} \mathrm{d}x.

We want to transform the main term into something like eπu2e^{-\pi u^2} so that we can apply Lemma 15. To do this we have to put u:=i2izxu := i\sqrt{2 iz}x. But \sqrt{\cdot} is not well-defined in C\mathbb{C}! Luckily, we observe that since zHz \in \mathbb{H}, Re(iz)<0\operatorname{Re}(iz) < 0 and we can define the square root locally inside the half-plane {wC ⁣:Re(w)<0}\{w \in \mathbb{C} \colon \operatorname{Re}(w) < 0\} as reiθreiθ/2re^{i\theta} \mapsto \sqrt{r}e^{i\theta/2}. This makes the change of variable a well-defined C1\mathcal{C}^1-diffeomorphism. We get

φ^z(t)=iiziizeπu2e2πiui2iztdui2iz\hat{\varphi}_z(t) = \int_{-i\sqrt{iz}\infty}^{i\sqrt{iz}\infty} e^{-\pi u^2} e^{-2\pi i \frac{u}{i\sqrt{2iz}} t} \frac{\mathrm{d}u}{i\sqrt{2iz}}

Now we’re stuck. But if we assume zz is a purely imaginary number, then we have iizR<0i\sqrt{iz} \in \mathbb{R}_{< 0} and so in such case we have

φ^z(t)=1i2izeπu2e2πiuti2izdu=1i2izg^(ti2iz)=1i2izg(ti2iz)=1i2izeπt22iz=i2izeπi12zt2.\begin{align*} \hat{\varphi}_z(t) &= \frac{1}{i\sqrt{2iz}}\int_{\infty}^{-\infty} e^{-\pi u^2} e^{-2\pi i u \frac{t}{i\sqrt{2iz}}} \mathrm{d}u\\ &= -\frac{1}{i\sqrt{2iz}} \hat{g}\left(\frac{t}{i\sqrt{2iz}}\right) \\ &= -\frac{1}{i\sqrt{2iz}} g\left(\frac{t}{i\sqrt{2iz}}\right) \\ &= -\frac{1}{i\sqrt{2iz}} e^{-\pi \frac{t^2}{-2iz}} = \frac{i}{\sqrt{2iz}} e^{-\pi i \frac{1}{2z} t^2}. \end{align*}

This means

θ(iy)=nZφiy(n)=nZφ^iy(n)=nZi2ye2πin214iy=12yθ(14iy).\theta(iy) = \sum_{n \in \mathbb{Z}} \varphi_{iy}(n) = \sum_{n \in \mathbb{Z}} \hat{\varphi}_{iy}(n) = \sum_{n \in \mathbb{Z}} \frac{i}{\sqrt{-2y}} e^{2\pi i n^2\frac{-1}{4iy}} = \frac{1}{\sqrt{2y}} \theta\left(-\frac{1}{4iy}\right).

More precisely, for any zHz \in \mathbb{H} with Re(z)=0\operatorname{Re}(z) = 0, we have

θ(z)=12izθ(14z)\theta(z) = \frac{1}{\sqrt{-2iz}} \theta\left(-\frac{1}{4z}\right)

where this square root is the square root function for real numbers. We want to extend this result to other zHz \in \mathbb{H}.

Proposition 24. For all zHz \in \mathbb{H}, for 2iz=eiθ:=eiθ/2\sqrt{-2iz} = \sqrt{e^{i\theta}} := e^{i\theta/2} defined for θ[32π,52π]\theta \in [\frac{3}{2}\pi, \frac{5}{2}\pi], we have θ(z)=12izθ(14z)\theta(z) = \frac{1}{\sqrt{-2iz}} \theta\left(-\frac{1}{4z}\right).

Proof. Define Φ(z):=θ(z)12izθ(14z)\Phi(z) := \theta(z) - \frac{1}{\sqrt{-2iz}}\theta(-\frac{1}{4z}) for zHz \in \mathbb{H}. We’ve seen that φz(x)\varphi_z(x) decays rapidly, and so θ\theta is holomorphic on H\mathbb{H}. Since z2izz \mapsto \sqrt{-2iz}, defined that way, is also holomorphic on zHz \in \mathbb{H}, and is never zero, we see that Φ\Phi is holomorphic on H\mathbb{H}. Since it takes value zero on the set {iy ⁣:y>0}\{iy \colon y > 0\}, and this set has an accumulation point on H\mathbb{H}, apply the identity theorem for holomorphic functions to see that Φ0\Phi \equiv 0 on H\mathbb{H}. This completes the proof. \square

As a result, for θ8\theta^8, we observe that it satisfy some automorphy conditions in the following.

Corollary 25. For all zHz \in \mathbb{H},

  1. θ8(z+1)=θ8(z)\theta^8(z+1) = \theta^8(z)
  2. θ8(14z)=(2z)4θ8(z)\theta^8(-\frac{1}{4z}) = (2z)^4 \theta^8(z)
  3. θ8(z4z+1)=(4z+1)4θ8(z)\theta^8(\frac{z}{4z+1}) = (4z+1)^4\theta^8(z).

Proof. The first one is obvious by periodicity of ze2iπn2zz \mapsto e^{2 i \pi n^2 z} for each fixed nn. Apply Proposition 24 and take it to the power of eight to see the second one. For the third one, we have (by the second one)

θ8(z4z+1)=θ8(14(4z14z))=(2(4z14z))4θ8(4z14z).\theta^8\left(\frac{z}{4z+1}\right) = \theta^8\left(-\frac{1}{4(\frac{-4z-1}{4z})}\right) = \left(2\left(\frac{-4z-1}{4z}\right)\right)^4\theta^8\left(\frac{-4z-1}{4z}\right).

We also have (by the first one, then the second one),

θ8(4z14z)=θ8(14z)=(2z)4θ8(z).\theta^8\left(\frac{-4z-1}{4z}\right) = \theta^8\left(-\frac{1}{4z}\right) = (2z)^4 \theta^8(z).

Putting this back, we get

θ8(z4z+1)=(2(4z14z))4(2z)4θ8(z)=(4z+1)4θ8(z).\theta^8\left(\frac{z}{4z+1}\right) = \left(2\left(\frac{-4z-1}{4z}\right)\right)^4(2z)^4 \theta^8(z) = (4z+1)^4 \theta^8(z).

This completes the proof. \square

What does this mean? This means, for U:=(1041)U := \begin{pmatrix}1 & 0\\4 & 1\end{pmatrix}, and the same old TT, we have θ84U=θ84T=θ8\theta^8 |_4 U = \theta^8 |_4 T = \theta^8. This means θ8\theta^8 is invariant under the subgroup generated by UU and TT. This is strictly smaller than SL2(Z)\operatorname{SL}_2(\mathbb{Z}), but it’s structured enough to study it in. Let us continue by studying modular forms invariant under this subgroup.

Sum of Eight Squares of Integers

Define

Γ0(n):={(abcd)SL2(Z) ⁣:nc}.\Gamma_0(n) := \left\{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}) \colon n \mid c\right\}.

This is a subgroup of SL2(Z)\operatorname{SL}_2(\mathbb{Z}). One can check by multiplying two elements and see that the lower-left entry of their product is still divisible by nn. We claim that UU and TT (and a little ±\pm) generate Γ0(4)\Gamma_0(4).

The Subgroup Γ0(4)\Gamma_0(4)

Proposition 26. U,T,I=Γ0(4)\langle U, T, -I \rangle = \Gamma_0(4), where I=(1001)-I = \begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}.

Proof. We’d actually want to mimic the proof of Theorem 1. However, (spoiler alert) the fundamental domain for Γ0(4)\Gamma_0(4) isn’t that nice. It’s a union of six copies of the original fundamental domain, patched for each coset. We’ll see this after the characterization of cosets given in Proposition 27.

So, right now, let us try to do an algebraic proof (without resorting to the action U,T,I×SL2(Z)SL2(Z)\langle U, T, -I \rangle \times \operatorname{SL}_2(\mathbb{Z}) \to \operatorname{SL}_2(\mathbb{Z})). Put G:=U,T,IG' := \langle U, T, -I \rangle and observe that GG' forms a subgroup of Γ0(4)\Gamma_0(4). Indeed, it is a subgroup of SL2(Z)\operatorname{SL}_2(\mathbb{Z}), and the generators belong to Γ0(4)\Gamma_0(4), hence a subgroup of Γ0(4)\Gamma_0(4). We’re left with showing that Γ0(4)G\Gamma_0(4) \subseteq G'.

Well actually not an algebraic proof, but an algorithmic proof. Let us construct a reduction algorithm (i.e. descent). We claim that for all gΓ0(4)g \in \Gamma_0(4) there exists a finite sequence of generators (i.e. a word) such that g=g1g2gkg =g_1g_2\dots g_k, with gi{U,T,I,U1,T1}g_i \in \{U, T, -I, U^{-1}, T^{-1}\}.

To do this, suppose we’re given an input g=(abcd)Γ0(4)g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4). We observe that dd cannot be zero, because adbc=1ad - bc = 1 and cc is divisible by four (two, in particular) means dd is odd. If c=0c = 0 then ad=1ad = 1 means a=d=1a = d = 1 or a=d=1a = d = -1. This means g=Tbg = T^b or g=(I)Tbg = (-I)T^{-b}, and we’re done with this base case.

Note. We can rephrase this “algorithmic proof” into induction: We perform an induction on NN for the statement “for all g=(abcd)Γ0(4)g=\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4) such that c+d=N|c|+|d| = N, there exists a finite sequence g1,g2,,gkg_1, g_2, \dots, g_k with gi{U,T,I,U1,T1}g_i \in \{U, T, -I, U^{-1}, T^{-1}\}, such that g=g1g2gkg = g_1g_2\dots g_k”. The base case covers N=0,1N = 0, 1.

Now, we consider a general induction for a given g=(abcd)Γ0(4)g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4) with c+d=N|c|+|d|=N, supposing that the induction hypothesis holds for 0,,N10, \dots, N-1. Since we’ve cleared the case c=0c = 0, we suppose c0c \ne 0, then we compare c|c| with 2d2|d|. First, it is impossible that c=2d|c| = 2|d| because in such case, since 4c4 \mid c, we see that 2d2 \mid d, and so adbcad-bc is even, but it must satisfy adbc=1ad-bc = 1, a contradiction.

Case A: c<2d|c| < 2|d|. If c<0c < 0 then we replace gg with g(I)g(-I) so that c>0c > 0. By the division algorithm, there exists a unique nZn \in \mathbb{Z} such that 0cn+d<c0 \leq cn+d < c. And so, we get

(abcd):=gTn=(abcd)(1n01)=(aan+bccn+d).\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} := gT^n = \begin{pmatrix}a & b\\c & d\end{pmatrix}\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} = \begin{pmatrix}a & an+b\\c & cn+d\end{pmatrix}.

We see that d=cn+d<c=cd' = cn+d < c = c'. If 0dc20 \leq d' \leq \frac{c'}{2} then we’re done (for now), with the fact that c+dc+c2=c+c2<c+d=N|c'|+|d'| \leq |c'| + |\frac{c'}{2}| = |c| + \frac{|c|}{2} < |c|+|d| = N. Since c+dN1|c'| + |d'| \leq N-1, we can write gTn=g1gkgT^n = g_1 \dots g_k and so g=g1gkTng = g_1 \dots g_k T^{-n}. Otherwise we have c2<d<c\frac{c'}{2} < d' < c'. Observe that we can reflect dd' to d-d' then add cc' back: c2>d>c-\frac{c'}{2} > -d' > -c' and so c2>cd>0\frac{c'}{2} > c'-d' > 0. We have

(abcd):=gTn(I)T1=(abcd)(1001)(1101)=(aabccd),\begin{pmatrix}a'' & b'' \\ c'' & d''\end{pmatrix} :=gT^n(-I)T^{-1} = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}\begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}\begin{pmatrix}1 & -1\\0 & 1\end{pmatrix} = \begin{pmatrix}-a' & a'-b'\\-c' & c'-d'\end{pmatrix},

with c+d=c+cd<c+c2<c+d=N|c''| + |d''| = |-c'| + |c'-d'| < |c| + \frac{c'}{2} < |c|+|d| = N. By induction hypothesis, write gTn(I)T1=g1gkgT^n(-I)T^{-1} = g_1 \dots g_k and so g=g1gkT(I)Tng = g_1 \dots g_k T (-I) T^{-n}.

Case B: c>2d|c| > 2|d|. If d<0d < 0 we replace gg with g(I)g(-I) so that d>0d > 0. By the division algorithm, there exists a unique nZn \in \mathbb{Z} such that 0c+4dn<4d0 \leq c+4dn < 4d. Fix such nn and put

(abcd):=gUn=(abcd)(104n1)=(a+4bnbc+4dnd).\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} := gU^n = \begin{pmatrix}a & b\\c & d\end{pmatrix} \begin{pmatrix}1 & 0\\4n & 1\end{pmatrix} = \begin{pmatrix}a+4bn & b\\ c+4dn & d\end{pmatrix}.

This means d=dd' = d and 0c=c+4dn<4d0 \leq c' = c+4dn < 4d. If c2dc' \leq 2d' we’re done, because c+d2d+d=2d+d<c+d=N|c'|+|d'| \leq |2d'| + |d'| = 2|d| + |d| < |c|+|d| = N and so the induction hypothesis applies to gUngU^n, and we’re done. Now if c>2dc' > 2d', then put

(abcd):=gUn(I)U1=(abcd)(1001)(1041)=(a+4bbc+4dd).\begin{pmatrix}a'' & b''\\c'' & d''\end{pmatrix} := gU^n(-I)U^{-1} = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}\begin{pmatrix}1 & 0\\-4 & 1\end{pmatrix} = \begin{pmatrix}-a'+4b' & -b'\\-c'+4d' & -d'\end{pmatrix}.

Since 4d>c>2d4d > c' > 2d', we have 4d<c<2d-4d' < -c' < -2d' and so 0<c+4d<2d0 < -c'+4d' < 2d'. This shows that c+d<2d+d<c+d=N|c''| + |d''| < 2|d| + |d| < |c|+|d|= N. By induction hypothesis, gUn(I)U1gU^n(-I)U^{-1} can be written as a word with each character in {U,T,I,U1,T1}\{U, T, -I, U^{-1}, T^{-1}\}, and so gg is that word multiplied by U(I)UnU(-I)U^{-n} from the right.

This completes the inductive step for c+d=N|c|+|d|=N, and hence completes the strong induction. This shows that Γ0(4)U,T,I\Gamma_0(4) \subseteq \langle U, T, -I\rangle. \square

But now that we see that Γ0(4)=U,T,I\Gamma_0(4) = \langle U, T, -I\rangle, and that θ84U=θ84T=θ84(I)=θ8\theta^8|_4U = \theta^8|_4 T = \theta^8|_4 (-I) = \theta^8, it becomes clear that θ8\theta^8 is stabilized by the action 4|_4 of Γ0(4)\Gamma_0(4) from the right. This is the “modularity condition” for our subgroup Γ0(4)\Gamma_0(4).

Proposition 27. [SL2(Z) ⁣:Γ0(4)]=6[\operatorname{SL}_2(\mathbb{Z}) \colon \Gamma_0(4)] = 6, where the cosets can be represented by

A1=I;A2=(1021);A3=(0110);A4=(0111);A5=(0112);A6=(0113).\begin{align*} A_1 &= I;\\ A_2 &= \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix};\\ A_3 &= \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix};\\ A_4 &= \begin{pmatrix}0 & -1\\1 & 1\end{pmatrix};\\ A_5 &= \begin{pmatrix}0 & -1\\1 & 2\end{pmatrix};\\ A_6 &= \begin{pmatrix}0 & -1\\1 & 3\end{pmatrix}. \end{align*}

Proof. Observe that for g,gSL2(Z)g, g' \in \operatorname{SL}_2(\mathbb{Z}) to belong to the same coset, i.e., Γ0(4)g=Γ0(4)g\Gamma_0(4)g = \Gamma_0(4)g', it is equivalent to gg1Γ0(4)g'g^{-1} \in \Gamma_0(4). Write g=(abcd)g = \begin{pmatrix}a & b\\c & d\end{pmatrix} and g=(abcd)g' = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}. The necessary and sufficient condition is exactly that

(adbcab+bacddccb+da)=(abcd)(dbca)Γ0(4),\begin{pmatrix}a'd - b'c & -a'b + b'a\\c'd - d'c & -c'b + d'a\end{pmatrix} =\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}\begin{pmatrix}d & -b\\-c & a\end{pmatrix} \in \Gamma_0(4),

or equivalently, that cddcc'd - d'c is divisible by four, i.e. cddc(mod4)c'd \equiv d'c \pmod{4}. Now we vary gg as an input and claim that for any gg there exists g{A1,A2,,A6}g' \in \{A_1, A_2, \dots, A_6\} satisfying cddc(mod4)c'd \equiv d'c \pmod{4}. First of all, suppose c0(mod4)c \equiv 0 \pmod{4} then g=A1g' = A_1 does the job. Now if c1(mod4)c \equiv 1 \pmod{4} we split into four subcases depending on dmod4d \bmod 4.

  1. d0(mod4)d \equiv 0 \pmod {4}: then this forces d0(mod4)d' \equiv 0 \pmod{4}. Pick g=A3g' = A_3.
  2. d1(mod4)d \equiv 1 \pmod{4}: then this forces cd(mod4)c' \equiv d' \pmod{4}. Pick g=A4g' = A_4.
  3. d2(mod4)d \equiv 2 \pmod{4}: then this forces 2cd(mod4)2c' \equiv d' \pmod{4}. Pick g=A5g' = A_5.
  4. d3(mod4)d \equiv 3 \pmod{4}: then this forces 3cd(mod4)3c' \equiv d' \pmod{4}. Pick g=A6g' = A_6.

Now, if c3(mod4)c \equiv 3 \pmod{4} then cddcd(1)(mod4)c'd \equiv d'c \equiv d'(-1) \pmod{4} so c(d)d(mod4)c'(-d) \equiv d' \pmod{4}, i.e., we’ve reduced to the same cases as in c1(mod4)c \equiv 1 \pmod{4}, just, instead of comparing dmod4d \bmod 4, we compare (d)mod4(-d)\bmod 4 and pick suitable g=A3,A4,A5,A6g' = A_3, A_4, A_5, A_6.

The only case left is c2(mod4)c \equiv 2 \pmod{4}, where we would have cd2d(mod4)c'd \equiv 2d' \pmod{4}. Observe that d≢0,2(mod4)d \not\equiv 0, 2 \pmod{4} because otherwise adbc=1ad-bc=1 won’t be satisfied. So d1,1(mod4)d \equiv 1, -1 \pmod{4} but 22(mod4)2 \equiv -2 \pmod{4} so we’re left with solving c2d(mod4)c' \equiv 2d' \pmod{4}. Picking g=A2g' = A_2 does the job. This shows that (Γ0(4)A1,,Γ0(4)A6)(\Gamma_0(4)A_1, \dots, \Gamma_0(4)A_6) are all the cosets. The fact that for each given gg we can pick only one such gg' shows that they are unique. \square

Now that we see the coset representatives, we could try to visualize the fundamental domain, which is

DΓ0(4):=i=16Di;Di:=AiD.\mathcal{D}_{\Gamma_0(4)} := \bigcup_{i=1}^6 \mathcal{D}_i; \qquad \mathcal{D}_i := A_i\mathcal{D}.

Here I try to sample points using the following code (where go_to_domain_1 is the algorithm given in Theorem 1 that takes a point to the original fundamental domain):

domain_1_sample = []
for rep in range(10000):
	x = random.uniform(-3, 3)
	y = 2**random.uniform(-5, 5)
	z = go_to_domain_1(x + 1j * y)
	domain_1_sample.append(z)

Then, I construct A1,,A6A_1, \dots, A_6 and let them act on the samples.

g1 = CongruentElement(1, 0, 0, 1)
g2 = CongruentElement(1, 0, -2, 1)
g3 = CongruentElement(0, -1, 1, 0)
g4 = CongruentElement(0, -1, 1, 1)
g5 = CongruentElement(0, -1, 1, 2)
g6 = CongruentElement(0, -1, 1, 3)
D1 = [g1.act(z) for z in domain_1_sample]
D2 = [g2.act(z) for z in domain_1_sample]
D3 = [g3.act(z) for z in domain_1_sample]
D4 = [g4.act(z) for z in domain_1_sample]
D5 = [g5.act(z) for z in domain_1_sample]
D6 = [g6.act(z) for z in domain_1_sample]

Then I plot the points.

img.png

We can see that the fundamental domain looks uglier than before, and now, even though in principle we could just do another integral on this, avoiding points which have nontrivial stabilizers, and derive the valence formula, and once we’ve obtained the valence formula, we could deduce the dimension and basis for the space of modular forms for Γ0(4)\Gamma_0(4). I’m simply too lazy for this. It’s going to take a lot of hard work. So I propose that we take a shortcut directly to our special case of M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)), but before that, we should at least define this space properly.

Definition. For a subgroup Γ\Gamma' of SL2(Z)\operatorname{SL}_2(\mathbb{Z}), a cusp is an orbit of the action Γ×P1(Q)P1(Q)\Gamma' \times \mathbb{P}^1(\mathbb{Q}) \to \mathbb{P}^1(\mathbb{Q}) defined by ((abcd),x)ax+bcx+d\left(\begin{pmatrix}a & b\\c & d\end{pmatrix}, x\right) \mapsto \frac{ax+b}{cx+d} with the usual rule that 10=\frac{1}{0} = \infty and 1=0\frac{1}{\infty} = 0.

Proposition 28. For Γ=SL2(Z)\Gamma' = \operatorname{SL}_2(\mathbb{Z}), such action is transitive, and so there is only one cusp (where we’d usually represent with \infty).

Proof. Let us show that for any pqP1(Q)\frac{p}{q} \in \mathbb{P}^1(\mathbb{Q}), there exists (abcd)SL2(Z)\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}) sending pq\frac{p}{q} to \infty. (If the input is already infinity, we could just take the identity matrix and it’s done, so we assume the input is a finite rational number. Furthermore, we assume gcd(p,q)=1\gcd(p,q) = 1 without loss of generality.) Consider the equation a(p)bq=1a(-p) - bq = 1. Since gcd(p,q)=1\gcd(p,q) = 1, by Bézout’s identity there exists a pair (a,b)Z2(a, b) \in \mathbb{Z}^2 satisfying the equation. Now take the matrix (abqp)SL2(Z)\begin{pmatrix}a & b\\q & -p\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}). It sends pq\frac{p}{q} to apq+bqpqp=apq+b0=\frac{a\frac{p}{q}+b}{q\frac{p}{q}-p} = \frac{\frac{ap}{q}+b}{0} = \infty. This works because the denominator isn’t zero. Why? If it were to be zero, then ap+bq=0ap+bq = 0, a contradiction with a(p)bq=1a(-p)-bq = 1. This shows that any rational number can be sent into infinity, and conversely, the inverse of this matrix sends infinity into the input rational number. Furthermore, one can send a rational number into another, by sending to infinity first, then send back to the second one. \square

Lemma 29. If ΓSL2(Z)\Gamma' \leq \operatorname{SL}_2(\mathbb{Z}) with Γ\SL2(Z)={Γg1,,Γgk}\Gamma' \backslash \operatorname{SL}_2(\mathbb{Z}) = \{\Gamma'g_1, \dots, \Gamma'g_k\}, then

{OrbΓ(gi) ⁣:i{1,,k}}=Γ\P1(Q),\{\operatorname{Orb}_{\Gamma'}(g_i \infty) \colon i \in \{1, \dots, k\}\} = \Gamma' \backslash \mathbb{P}^1(\mathbb{Q}) ,

i.e., the cusps are parametrized by the orbits of the old cusp (but they may repeat).

Proof. The inclusion from left to right is clear: each orbit correspond to a cusp. The question is, are all cusps found by just these candidates? Can there be some cusp not connected to any of the gig_i’s? And let us show that this cannot happen. A strong algebraist might even say this is obvious. Well, let aΓ\P1(Q)\mathfrak{a} \in \Gamma' \backslash \mathbb{P}^1(\mathbb{Q}) be a cusp, and represent it by some xP1(Q)x \in \mathbb{P}^1(\mathbb{Q}). By the previous proposition, we see that there exists gSL2(Z)g \in \operatorname{SL}_2(\mathbb{Z}) such that g=xg\infty = x. Project gg into the cosets, i.e., consider the natural quotient ρ ⁣:SL2(Z)Γ\SL2(Z)\rho \colon \operatorname{SL}_2(\mathbb{Z}) \to \Gamma'\backslash \operatorname{SL}_2(\mathbb{Z}) and suppose ρ(g)=gj\rho(g) = g_j (so that g=ggjg = g'g_j for some gΓg' \in \Gamma'). Then, OrbΓ(gj)={ggj ⁣:gΓ}{ggj}={g}={x}.\operatorname{Orb}_{\Gamma'}(g_j\infty) = \{g''g_j\infty \colon g'' \in \Gamma'\} \supseteq \{g'g_j\infty\} = \{g\infty\} = \{x\}. This means a=OrbΓ(x)=OrbΓ(gj).\mathfrak{a} =\operatorname{Orb}_{\Gamma'}(x) = \operatorname{Orb}_{\Gamma'}(g_j\infty). \square

Proposition 30. There are three cusps for Γ0(4)\Gamma_0(4), which can be represented by ,0,12\infty, 0, -\frac{1}{2}.

Note. A priori we might not know this, but this information is suggested by the plot (above) using numerical data. And then we can try to show this.

Proof. Let us apply Lemma 29 and compute xi:=Aix_i := A_i\infty. We have

x1=A1=(1001)=;x2=A2=(1021)=2+1=12;x3=A3=(0110)=1=0;x4=A4=(0111)=1+1=0;x5=A5=(0112)=1+2=0;x6=A6=(0113)=1+3=0.\begin{align*} x_1 &= A_1\infty = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}\infty = \infty; \\ x_2 &= A_2\infty = \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix} \infty = \frac{\infty}{-2\infty + 1} = -\frac{1}{2}; \\ x_3 &= A_3\infty = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}\infty = \frac{-1}{\infty} = 0; \\ x_4 &= A_4\infty = \begin{pmatrix}0 & -1\\1 & 1\end{pmatrix}\infty = \frac{-1}{\infty+1} = 0; \\ x_5 &= A_5\infty = \begin{pmatrix}0 & -1\\1 & 2\end{pmatrix}\infty = \frac{-1}{\infty+2} = 0;\\ x_6 &= A_6\infty = \begin{pmatrix}0 & -1\\1 & 3\end{pmatrix} = \frac{-1}{\infty+3} = 0. \end{align*}

This shows that Γ0(4)\P1(Q)={Γ0(4),Γ0(4)(12),Γ0(4)0}\Gamma_0(4)\backslash\mathbb{P}^1(\mathbb{Q}) = \{\Gamma_0(4)\infty, \Gamma_0(4)(-\frac{1}{2}), \Gamma_0(4)0\}. \square

Now we can do Fourier expansion as before. Recall that in Proposition 2, we have a conformal equivalence

Φa ⁣:{StripaD×{re2πia ⁣:0<r<1}ze2πiz\Phi_a \colon \begin{cases}\text{Strip}_a &\to \mathbb{D}^\times \setminus \{re^{2\pi i a} \colon 0 < r < 1\}\\z &\mapsto e^{2\pi i z}\end{cases}

which allowed us to patch the disk D\mathbb{D}. Now we want to do something similar for each of the cusps (not just infinity). We define f~x\tilde{f}_x as follows.

Proposition 31. Assume TΓT \in \Gamma'. If xP1(Q)x \in \mathbb{P}^1(\mathbb{Q}) represents a cusp for Γ\Gamma', then by Proposition 28, there exists an element σxSL2(Z)\sigma_x \in \operatorname{SL}_2(\mathbb{Z}) such that σx=x\sigma_x \infty = x. Analogous to Proposition 2, if f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} is holomorphic on H\mathbb{H} and satisfies the modularity condition (with weight kk) for Γ\Gamma', then there exists a holomorphic function f~x ⁣:D×C\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C} such that (fkσx)(z)=f~x(e2πiz)(f |_k \sigma_x)(z) = \tilde{f}_x(e^{2\pi iz}) for all zHz \in \mathbb{H}.

Proof. Let fˉx:=fkσx\bar{f}_x := f|_k \sigma_x. Write σx=(abcd)SL2(Z)\sigma_x = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}). First, observe that σx1StabΓ(x)σx=StabΓ()=±T\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x = \operatorname{Stab}_{\Gamma'}(\infty) = \langle \pm T \rangle. To show the first equality, take gStabΓ(x)g \in \operatorname{Stab}_{\Gamma'}(x) and see that

σx1gσx=σx1gx=σx1x=.\sigma_x^{-1}g\sigma_x \infty = \sigma_x^{-1}g x = \sigma_x^{-1}x = \infty.

This means σx1StabΓ(x)σxStabΓ()\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x \subseteq \operatorname{Stab}_{\Gamma'}(\infty). For the reverse inclusion, take γStabΓ()\gamma \in \operatorname{Stab}_{\Gamma'}(\infty) and observe that

σxγσx1x=σxγ=σx=x.\sigma_x\gamma\sigma_x^{-1}x = \sigma_x\gamma\infty = \sigma_x\infty = x.

This means σxγσx1StabΓ(x)\sigma_x\gamma\sigma_x^{-1} \in \operatorname{Stab}_{\Gamma'}(x) and so γσx1StabΓ(x)σx\gamma \in \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x, i.e. StabΓ()σx1StabΓ(x)σx\operatorname{Stab}_{\Gamma'}(\infty) \subseteq \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x, and so the first equality is shown. Now, for the second equality, we observe that ±T=\pm T\infty = \infty, so TT and T-T stabilize infinity. Now take any (abcd)StabΓ()\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \operatorname{Stab}_{\Gamma'}(\infty). We have a+bc+d=\frac{a'\infty + b'}{c'\infty + d'} = \infty, so c=0c' = 0 (otherwise the result is ac\frac{a'}{c'} which is finite). Since adbc=1a'd'-b'c' = 1, we have a=d=1a' = d' = 1 or a=d=1a' = d' = -1. This means the element is actually TbT^{b'} or Tb-T^{b'}. This proves that StabΓ()=±T\operatorname{Stab}_{\Gamma'}(\infty) = \langle \pm T \rangle.

Now, observe that

fˉx(z+1)=(fkσx)(Tz)=((fkσx)kT)(z)(0z+1)k=(fkσxT)(z).\bar{f}_x(z+1) = (f|_k\sigma_x)(Tz) = ((f|_k\sigma_x)|_kT)(z)(0z+1)^k = (f|_k\sigma_xT)(z).

But we’ve seen that σx1StabΓ(x)σx=±T\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x = \langle \pm T \rangle, so, in particular, σxTStabΓ(x)σx\sigma_x T \in \operatorname{Stab}_{\Gamma'}(x)\sigma_x. Pick such γStabΓ(x)\gamma \in \operatorname{Stab}_{\Gamma'}(x) satisfying σxT=γσx\sigma_xT = \gamma \sigma_x. We have

fˉx(z+1)=(fkσxT)(z)=(fkγσx)(z)=(fkσx)(z)=fˉx(z),\bar{f}_x(z+1) = (f|_k\sigma_xT)(z) = (f|_k\gamma\sigma_x)(z) = (f|_k\sigma_x)(z) = \bar{f}_x(z),

where the equality before the last follows since ff is Γ\Gamma'-invariant under the action k|_k (this is the modularity condition for Γ\Gamma'). This shows that fˉx\bar{f}_x is periodic. Apply Proposition 2 to deduce the existence of f~x ⁣:D×C\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C} such that fˉx(z)=f~x(e2πiz)\bar{f}_x(z) = \tilde{f}_x(e^{2 \pi i z}) for all zHz \in \mathbb{H}. If ff is holomorphic on H\mathbb{H}, then f~x\tilde{f}_x is holomorphic on D×\mathbb{D}^\times. \square

Proposition 32. Such f~x\tilde{f}_x doesn’t depend on the choice of σx\sigma_x. Moreover, if yP1(Q)y \in \mathbb{P}^1(\mathbb{Q}) is another representative of the same cusp as xx, then f~x=f~y\tilde{f}_x = \tilde{f}_y.

Proof. Suppose σxSL2(Z)\sigma'_x \in \operatorname{SL}_2(\mathbb{Z}) is another element sending \infty to xx. Then σx1σxStabΓ()=σx1StabΓ(x)σx\sigma_x^{-1}\sigma'_x \in \operatorname{Stab}_{\Gamma'}(\infty) = \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x. Take γStabΓ(x)\gamma \in \operatorname{Stab}_{\Gamma'}(x) such that σx1σx=σx1γσx\sigma^{-1}_x\sigma'_x = \sigma^{-1}_x\gamma\sigma_x, so that σx=γσx\sigma'_x = \gamma \sigma_x, then fkσx=fkγσx=fkσxf|_k\sigma'_x = f|_k\gamma\sigma_x = f|_k\sigma_x. This means f~x\tilde{f}_x doesn’t depend on the choice of σx\sigma_x.

Now, suppose yy is another representative of the same cusp, then there exists γΓ\gamma \in \Gamma' such that y=γxy = \gamma x. Let σx\sigma_x be arbitrary, but fix σy=γσx\sigma_y = \gamma \sigma_x. Then, fkσy=fkγσx=fkσxf|_k \sigma_y = f|_k \gamma \sigma_x = f|_k\sigma_x, so f~y=f~x\tilde{f}_y = \tilde{f}_x. \square

This allows us to properly define the space M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)) accordingly.

Definition. The space M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)) is the set of holomorphic functions f ⁣:HCf \colon \mathbb{H} \to \mathbb{C} such that

  1. (Modularity/Automorphy) ff satisfies the modularity condition for Γ0(4)\Gamma_0(4) with weight 44, i.e., for all gΓ0(4)g \in \Gamma_0(4), f4g=ff|_4 g = f.
  2. (Holomorphy) ff is holomorphic at each cusp. More precisely, for all x{,0,12}x \in \{\infty, 0, -\frac{1}{2}\}, f~x ⁣:D×C\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C} can be extend analytically to a holomorphic function DC\mathbb{D} \to \mathbb{C}.

Theorem 33. θ8M4(Γ0(4))\theta^8 \in \mathcal{M}_4(\Gamma_0(4)).

Proof. We’ve already checked that θ8\theta^8 is holomorphic on H\mathbb{H} and satisfies the automorphy for U,T,IU, T, -I, which generate Γ0(4)\Gamma_0(4). We’re left with checking the holomorphy at the cusps. We’ve seen that the series

nZe2πin2z\sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}

converges absolutely, and also normally away from the real axis. This allows us to write

θ8(z)=(nZe2πin2z)8=n0ane2πinz\theta^8(z) =\left(\sum_{n \in \mathbb{Z}} e^{2\pi i n^2 z}\right)^8 = \sum_{n \geq 0} a_n e^{2 \pi i n z}

with a0=1a_0 = 1. So it is holomorphic at \infty, where we can extend it to attain value 11. Now, consider

θ~08(e2πiz)=(θ84σ0)(z)\tilde{\theta}^8_0(e^{2 \pi i z}) = (\theta^8 |_4 \sigma_0)(z)

with σ0=(0110)\sigma_0 = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}. Put z=x+iyz = x+iy and take the limit yy \to \infty. We get

limy(θ84σ0)(x+iy)=limyθ8(1x+iy)(x+iy)4.\lim_{y \to \infty} (\theta^8 |_4 \sigma_0)(x+iy) = \lim_{y \to \infty} \theta^8\left(-\frac{1}{x+iy}\right)(x+iy)^{-4}.

Using Corollary 24, 2. to see that θ8(1x+iy)=(x+iy2)4θ8(x+iy4)\theta^8(-\frac{1}{x+iy}) = (\frac{x+iy}{2})^4 \theta^8(\frac{x+iy}{4}), and so

limy(θ84σ0)(x+iy)=limy116(x+iy)4(x+iy)4θ8(x+iy4)=116.\lim_{y \to \infty} (\theta^8 |_4 \sigma_0)(x+iy) = \lim_{y \to \infty} \frac{1}{16}(x+iy)^4(x+iy)^{-4} \theta^8\left(\frac{x+iy}{4}\right) = \frac{1}{16}.

This works for any fixed xRx \in \mathbb{R}, so by patching the charts, we can put θ~08(0):=116\tilde{\theta}^8_0(0) := \frac{1}{16}. Now let us observe the cusp at 12-\frac{1}{2}. We have

θ~128(e2πiz)=(θ84σ12)(z).\tilde{\theta}_{-\frac{1}{2}}^8(e^{2\pi i z}) = (\theta^8 |_4 \sigma_{-\frac{1}{2}})(z).

Take σ12:=(1021)\sigma_{-\frac{1}{2}} := \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix} and so

(θ84σ12)(z)=θ8(z2z+1)(2z+1)4.(\theta^8 |_4 \sigma_{-\frac{1}{2}})(z) = \theta^8\left(\frac{z}{-2z+1}\right)(-2z+1)^{-4}.

limy(θ84σ12)(x+iy)=limyθ8(x+iy2x2iy+1)(2x2iy+1)4.\lim_{y \to \infty} (\theta^8|_4\sigma_{-\frac{1}{2}})(x+iy) = \lim_{y \to \infty}\theta^8\left(\frac{x+iy}{-2x-2iy+1}\right)(-2x-2iy+1)^{-4}.

The way I’ll proceed here is to consider the limit

L:=limyy1x[1,1]θ8(x+iy2x2iy+1)1(2x2iy+1)4.L := \lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1,1]}} \theta^8\left(\frac{x+iy}{-2x-2iy+1}\right)\frac{1}{(-2x-2iy+1)^4}.

To evaluate this, we consider θ8(z)=mZ8e2πim2z\theta^8(z) = \sum_{m \in \mathbb{Z}^8} e^{2 \pi i \|m\|^2 z} and plug in z=x+iy2x2iy+1z = \frac{x+iy}{-2x-2iy+1}. We want to now show that the series

mZ8fm(x,y);fm(x,y):=e2πim2(x+iy2x2iy+1)(2x2iy+1)4.\sum_{m \in \mathbb{Z}^8} f_{m}(x,y); \qquad f_{m}(x, y) :=\frac{e^{2 \pi i \|m\|^2 (\frac{x+iy}{-2x-2iy+1})}}{(-2x-2iy+1)^4}.

converges normally for x[1,1]x \in [-1, 1] and y[1,+)y \in [1, +\infty). Write am:=supx[1,1]y[1,+)fm(x,y)a_{m} := \sup_{\substack{x \in [-1, 1]\\y \in [1, +\infty)}} |f_{m}(x,y)|. We have

fm(x,y)=12x2iy+14e2πm2Im(x+iy2x2iy+1).|f_{m}(x,y)| = \frac{1}{|-2x-2iy+1|^4} e^{-2 \pi \|m\|^2 \operatorname{Im}(\frac{x+iy}{-2x-2iy+1})}.

Now 2x2iy+1=(2x+1)2+(2y)22y|-2x-2iy+1| = \sqrt{(-2x+1)^2 + (-2y)^2} \geq 2y, and Im(x+iy2x2iy+1)=Im(x+iy2x2iy+12x2iy+12x2iy+1)=Im((x+iy)(2x+2iy+1))1(2x+1)2+(2y)2=y(2x+1)2+(2y)2y9+4y2y9y2+4y2=113y.\operatorname{Im}(\frac{x+iy}{-2x-2iy+1}) = \operatorname{Im}(\frac{x+iy}{-2x-2iy+1}\frac{\overline{-2x-2iy+1}}{\overline{-2x-2iy+1}}) = \operatorname{Im}((x+iy)(-2x+2iy+1)) \frac{1}{(-2x+1)^2+(-2y)^2} = \frac{y}{(-2x+1)^2+(-2y)^2} \geq \frac{y}{9 + 4y^2} \geq \frac{y}{9y^2+4y^2} = \frac{1}{13y}. This means

fm(x,y)116y4e2πm2113y.|f_{m}(x,y)| \leq \frac{1}{16y^4} e^{-2\pi \|m\|^2 \frac{1}{13y}}.

Observe that et4t4(logt)2e^{-t} \leq \frac{4}{t^4 (\log t)^2} for all t>0t > 0 (and the only zero case is f0,0(x,y)12y12|f_{0,0}(x,y)| \leq \frac{1}{2y} \leq \frac{1}{2}), so for (n,m)(0,0)(n, m) \ne (0, 0), we have

fm(x,y)116y44(2πm2113y)4(log(2πm2113y))2=4134y4162y4π4m8(2logm+log(2π)log(13y))22856164π4m81(2logm)2=28564256π4m8(logm)2.\begin{align*} |f_{m}(x,y)| &\leq \frac{1}{16y^4}\frac{4}{(2\pi \|m\|^2\frac{1}{13y})^4 (\log(2\pi\|m\|^2\frac{1}{13y}))^2}\\ &= \frac{4 \cdot 13^4 \cdot y^4}{16^2 y^4 \pi^4 \|m\|^8 (2 \log \|m\| + \log(2\pi) - \log(13y))^2}\\ &\leq \frac{28561}{64\pi^4 \|m\|^8} \frac{1}{(2\log\|m\|)^2}\\ &= \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2}. \end{align*}

And so, we observe that 0am28564256π4m8(logm)2.0 \leq a_{m} \leq \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2}. for m0m \ne 0 and 0a0,0,0,0,0,0,0,0120 \leq a_{0,0,0,0,0,0,0,0} \leq \frac{1}{2}. Since R8\mathbb{R}^8 is a finite dimensional vector space, all norms are equivalent. This means there exists α,β>0\alpha, \beta > 0 such that αvvβv\alpha \|v\|_{\infty} \leq \|v\| \leq \beta \|v\|_\infty for all vR8v \in \mathbb{R}^8 where (v1,,v8):=max(v1,,v8)\|(v_1, \dots, v_8)\|_\infty := \max(|v_1|, \dots, |v_8|). This means, for m0m \ne 0,

0am28564256π4m8(logm)228564256π4α8m8(log(αm))2.0 \leq a_m \leq \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2} \leq \frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}.

Now, we have

mZ8am12+mZ8{0}28564256π4α8m8(log(αm))2.\sum_{m \in \mathbb{Z}^8} a_m \leq \frac{1}{2} + \sum_{m \in \mathbb{Z}^8 \setminus \{0\}} \frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}.

Reparametrize the latter sum by r=mr = \|m\|_\infty. The number of tuples satisfying r=mr = \|m\|_\infty is bounded by 28(2r+1)734992r72 \cdot 8 \cdot (2r+1)^7 \leq 34992r^7 (fix an axis jj and set mj=±rm_j = \pm r, then vary the rest). This means

12+mZ8{0}28564256π4α8m8(log(αm))212+r12856434992r7256π4α8r8(log(αr))2=12+999511488256π4α7r11αr(log(αr))2.\begin{align*} &\frac{1}{2} + \sum_{m \in \mathbb{Z}^8\setminus\{0\}}\frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}\\ &\leq \frac{1}{2} + \sum_{r \geq 1} \frac{28564 \cdot 34992r^7}{256\pi^4\alpha^8r^8 (\log(\alpha r))^2}\\ &= \frac{1}{2} + \frac{999511488}{256\pi^4\alpha^7} \sum_{r \geq 1} \frac{1}{\alpha r (\log(\alpha r))^2}. \end{align*}

Now we evaluate the inner sum using the integral test. We have

11αt(log(αt))2dt=1αα1u(logu)2du=1αlogα1v2dv=1α[1v]logα=1αlogα.\begin{align*} \int_1^\infty \frac{1}{\alpha t (\log(\alpha t))^2} \mathrm{d}t &= \frac{1}{\alpha}\int_\alpha^\infty \frac{1}{u (\log u)^2} \mathrm{d}u\\ &= \frac{1}{\alpha} \int_{\log \alpha}^\infty \frac{1}{v^2} \mathrm{d}v\\ &= \frac{1}{\alpha} \left[-\frac{1}{v}\right]_{\log \alpha}^\infty\\ &= \frac{1}{\alpha \log \alpha}. \end{align*}

It converges, so the inner sum converges as well, and this shows that the series mZ8fm(x,y)\sum_{m \in \mathbb{Z}^8} f_{m}(x,y) normally converges for x[1,1];y[1,+)x \in [-1, 1]; y \in [1, +\infty). This means

L=limyy1x[1,1]mZ8fm(x,y)=mZ8limyy1x[1,1]fm(x,y)=mZ80=0.L =\lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1, 1]}} \sum_{m \in \mathbb{Z}^8} f_m(x,y) = \sum_{m \in \mathbb{Z}^8} \lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1, 1]}} f_m(x,y) = \sum_{m \in \mathbb{Z}^8} 0 = 0.

In particular, limzθ~128(z)=0\lim_{z \to \infty} \tilde{\theta}^8_{-\frac{1}{2}}(z) = 0. After all these work, we see that θ8\theta^8 is holomorphic at the cusps. This shows that θ8M4(Γ0(4))\theta^8 \in \mathcal{M}_4(\Gamma_0(4)). \square

The Space M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4))

Now, we take the shortcut and try to come up with a basis for M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)). Consider the following proposition.

Proposition 34. If fM4f \in \mathcal{M}_4, define fn:=(zf(nz))f \circ n := (z \mapsto f(nz)) for nZn \in \mathbb{Z}, then f,f2,f4M4(Γ0(4))f, f \circ 2, f \circ 4 \in \mathcal{M}_4(\Gamma_0(4)).

Proof. Take fM4f \in \mathcal{M}_4. Since U,T,ISL2(Z)U, T, -I \in \operatorname{SL}_2(\mathbb{Z}), we see immediately that f4U=f4T=f4(I)=ff |_4 U = f|_4 T = f|_4 (-I) = f. Moreover, ff behaves the same way at each cusp, hence holomorphic. This means fM4(Γ0(4))f \in \mathcal{M}_4(\Gamma_0(4)). Now, (f2)(z)=f(2z)=f(2z+2)=(f2)(z+1)(f \circ 2)(z) = f(2z) = f(2z+2) = (f \circ 2)(z+1) and (f2)(z4z+1)=f(2z4z+1)=f(4z+12z)(2z4z+1)4=f(212z)(2z4z+1)4=f(12z)(2z4z+1)4=f(2z)(2z)4(2z4z+1)4=(f2)(z)(4z+1)4.(f \circ 2)(\frac{z}{4z+1}) = f(\frac{2z}{4z+1}) = f(-\frac{4z+1}{2z})(\frac{2z}{4z+1})^{-4} = f(-2-\frac{1}{2z})(\frac{2z}{4z+1})^{-4} = f(-\frac{1}{2z})(\frac{2z}{4z+1})^{-4} = f(2z)(2z)^4 (\frac{2z}{4z+1})^{-4} = (f \circ 2)(z)(4z+1)^4. This shows that (f2)4U=(f2)4T=(f2)4(I)=f2(f \circ 2)|_4 U = (f \circ 2)|_4 T = (f \circ 2)|_4 (-I) = f\circ 2. Moreover, for x{,0,12}x \in \{\infty, 0, -\frac{1}{2}\}, 2xOrbΓ0(4)()2x \in \operatorname{Orb}_{\Gamma_0(4)}(\infty), so it is holomorphic at all cusps. Now, (f4)(z+1)=f(4(z+1))=f(4z+4)=f(4z)=(f4)(z)(f \circ 4)(z+1) = f(4(z+1)) = f(4z+4) = f(4z) = (f \circ 4)(z). And (f4)(z4z+1)=f(4z4z+1)=f(114z+1)=f(14z+1)=(4z+1)4f(4z+1)=(4z+1)4f(4z)=(4z+1)4(f4)(z)(f \circ 4)(\frac{z}{4z+1}) = f(\frac{4z}{4z+1}) = f(1-\frac{1}{4z+1}) = f(-\frac{1}{4z+1}) = (4z+1)^4f(4z+1) = (4z+1)^4f(4z) = (4z+1)^4 (f \circ 4)(z). This shows that (f4)4U=(f4)4T=(f4)4(I)=f4(f \circ 4)|_4 U = (f \circ 4)|_4 T = (f \circ 4)|_4 (-I) = f \circ 4. With the same argument, the cusps are holomorphic. \square

Proposition 35. The forms E4,E42,E44M4(Γ0(4))E_4, E_4 \circ 2, E_4 \circ 4 \in \mathcal{M}_4(\Gamma_0(4)) form a linearly independent family.

Proof. We recall their expansions at infinity:

E4(z)=n0ane2πinz;E4(2z)=n0ane2πin2z=n02nan2e2πinz;E4(4z)=n0ane2πin4z=n04nan4e2πinz.\begin{align*} E_4(z) &= \sum_{n \geq 0} a_n e^{2 \pi i n z}; \\ E_4(2z) &= \sum_{n \geq 0} a_n e^{2 \pi i n 2z} = \sum_{\substack{n \geq 0\\2 \mid n}} a_{\frac{n}{2}} e^{2 \pi i n z}; \\ E_4(4z) &= \sum_{n \geq 0} a_n e^{2 \pi i n 4z} = \sum_{\substack{n \geq 0\\4 \mid n}} a_{\frac{n}{4}} e^{2 \pi i n z}. \end{align*}

Suppose there exists λ1,λ2,λ4C\lambda_1, \lambda_2, \lambda_4 \in \mathbb{C} such that λ1E4+λ2(E42)+λ4(E44)=0\lambda_1E_4 + \lambda_2(E_4 \circ 2) + \lambda_4(E_4 \circ 4) = 0, then, by comparing the 0,1,20, 1, 2-coefficients, we have

λ1a0+λ2a0+λ4a0=0λ1a1+λ20+λ40=0λ1a2+λ2a1+λ40=0.\begin{align*} \lambda_1 a_0 +\lambda_2 a_0 + \lambda_4 a_0 &= 0\\ \lambda_1a_1 + \lambda_2 0 + \lambda_4 0 &= 0\\ \lambda_1 a_2 + \lambda_2 a_1 + \lambda_4 0 &= 0. \end{align*}

Solving this system, since we’ve seen (Theorem 17) that a0,a1,a20a_0, a_1, a_2 \ne 0, we conclude that λ1=λ2=λ4=0\lambda_1 = \lambda_2 = \lambda_4 = 0, and so they’re linearly independent. \square

In fact, these forms form a basis for M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)). We will prove this.

Lemma 36. If f(z)=n0ane2πinzM4(Γ0(4))f(z) = \sum_{n \geq 0} a_n e^{2\pi i nz} \in \mathcal{M}_4(\Gamma_0(4)) has a0=a1=a2=0a_0 = a_1 = a_2 = 0, then f0f \equiv 0.

Proof. Define F(z):=j=16(f4Aj)(z)F(z) := \prod_{j=1}^6 (f |_4 A_j)(z). Then F(z+1)=j=16(f4Aj)(Tz)=j=16(f4AjT)(z)F(z+1) = \prod_{j=1}^6 (f|_4A_j)(Tz) = \prod_{j=1}^6 (f|_4A_jT)(z), but AjAjTA_j \mapsto A_jT permutes the cosets. To see this, if Γ0(4)AjT=Γ0(4)AjT\Gamma_0(4)A_jT = \Gamma_0(4)A_{j'}T then AjTT1Aj1Γ0(4)A_{j'}TT^{-1}A_j^{-1} \in \Gamma_0(4) means AjAj1Γ0(4)A_{j'}A_j^{-1} \in \Gamma_0(4), i.e. that Γ0(4)Aj=Γ0(4)Aj\Gamma_0(4)A_j = \Gamma_0(4)A_{j'}, which happens only when j=jj = j'. Since Γ0(4)AjΓ0(4)AjT\Gamma_0(4)A_j \mapsto \Gamma_0(4)A_jT is a map from Γ0(4)\SL2(Z)\Gamma_0(4) \backslash \operatorname{SL}_2(\mathbb{Z}) to itself, and is injective, we conclude that it is also bijective. This means there exists a permutation σS6\sigma \in \mathfrak{S}_6 such that AjTΓ0(4)Aσ(j)A_jT \in \Gamma_0(4)A_{\sigma(j)} for each j{1,,6}j \in \{1, \dots, 6\}. Pick γjΓ0(4)\gamma_j \in \Gamma_0(4) such that AjT=γjAσ(j)A_jT = \gamma_jA_{\sigma(j)}, then we haveve, we see that it is bijective. That is, there exists a permuatif

F(z+1)=j=16(f4AjT)(z)=j=16(f4γjAσ(j))(z).F(z+1) = \prod_{j=1}^6 (f|_4A_jT)(z) = \prod_{j=1}^6 (f|_4\gamma_jA_{\sigma(j)})(z).

But fM4(Γ0(4))f \in \mathcal{M}_4(\Gamma_0(4)) so f4γj=ff|_4\gamma_j = f. This gives

F(z+1)=j=16(f4Aσ(j))(z)=j=16(f4Aj)(z)=F(z).F(z+1) = \prod_{j=1}^6 (f|_4A_{\sigma(j)})(z) = \prod_{j=1}^6 (f|_4A_j)(z) = F(z).

Now, the same argument applies for AjAjSA_j \mapsto A_jS. That is, there exists τS6\tau \in \mathfrak{S}_6 such that AjSΓ0(4)Aτ(j)A_j S \in \Gamma_0(4)A_{\tau(j)}. Pick γjΓ0(4)\gamma'_j \in \Gamma_0(4) such that AjS=γjAτ(j)A_jS = \gamma'_jA_{\tau(j)}. Then, we have

(F24S)(z)=F(Sz)z24=j=16(f4Aj)(Sz)z4=j=16(f4AjS)(z)=j=16(f4γjAτ(j))(z)=j=16(f4Aτ(j))(z)=j=16(f4Aj)(z)=F(z).\begin{align*} (F|_{24} S)(z) &= F(Sz)z^{-24}\\ &= \prod_{j=1}^6(f|_4A_j)(Sz)z^{-4}\\ &= \prod_{j=1}^6(f|_4A_jS)(z)\\ &= \prod_{j=1}^6 (f|_4\gamma_jA_{\tau(j)})(z)\\ &= \prod_{j=1}^6 (f|_4A_{\tau(j)})(z)\\ &= \prod_{j=1}^6 (f|_4A_j)(z)\\ &= F(z). \end{align*}

This shows that FM24F \in \mathcal{M}_{24}. Holomorphy at infinity is established since it is a product of forms that are holomorphic at infinity. Since ff has a0=a1=a2=0a_0 = a_1 = a_2 = 0, we see that ord(F)=j=16ord(f4Aj)=ord(f)+j=26ord(f4Aj)ord(f).\operatorname{ord}_\infty(F) = \sum_{j=1}^6 \operatorname{ord}_\infty(f|_4 A_j) = \operatorname{ord}_\infty(f) + \sum_{j=2}^6 \operatorname{ord}_\infty(f|_4 A_j) \geq \operatorname{ord}_\infty(f). But f~(q)=n3anqn\tilde{f}(q) = \sum_{n \geq 3} a_n q^n has ord0(f~)3\operatorname{ord}_0(\tilde{f}) \geq 3, so ord(F)ord(f)=ord0(f~)3\operatorname{ord}_\infty(F) \geq \operatorname{ord}_\infty(f) = \operatorname{ord}_0(\tilde{f}) \geq 3. However, recall the valence formula (Theorem 7) for k=24k = 24. We have

k12=ord(F)+12ordi(F)+13orde2πi/3(F)+other zordz(F).\frac{k}{12} = \operatorname{ord}_\infty(F) + \frac{1}{2}\operatorname{ord}_i(F) + \frac{1}{3}\operatorname{ord}_{e^{2\pi i/3}}(F) + \sum_{\text{other } z} \operatorname{ord}_z(F).

Plugging in k=24k = 24 and ord(F)3\operatorname{ord}_\infty(F) \geq 3, we see that 23+(nonnegative integer)2 \geq 3 + (\text{nonnegative integer}), a contradiction. This shows that FF must be identically zero. This means j=16(f4Aj)0\prod_{j=1}^6 (f|_4A_j) \equiv 0. Hence f0f \equiv 0. \square

Theorem 37. dimM4(Γ0(4))=3\dim \mathcal{M}_4(\Gamma_0(4)) = 3.

Proof. The map

{M4(Γ0(4))C3n0ane2πinz(a0,a1,a2)\begin{cases}\mathcal{M}_4(\Gamma_0(4)) &\to \mathbb{C}^3 \\ \sum_{n \geq 0} a_n e^{2 \pi i n z} & \mapsto (a_0, a_1, a_2)\end{cases}

is linear and injective by Lemma 36, so dimM4(Γ0(4))3\dim \mathcal{M}_4(\Gamma_0(4)) \leq 3. Proposition 35 shows that dimM4(Γ0(4))3\dim \mathcal{M}_4(\Gamma_0(4)) \geq 3. \square

In particular, this means {E4,E42,E44}\{E_4, E_4 \circ 2, E_4 \circ 4\} forms a basis for M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4)).

The Secret Formula

Now that we know that θ8M4(Γ0(4))=SpanC({E4,E42,E44})\theta^8 \in \mathcal{M}_4(\Gamma_0(4)) = \operatorname{Span}_\mathbb{C}(\{E_4, E_4 \circ 2, E_4 \circ 4\}), there exists λ1,λ2,λ4C\lambda_1, \lambda_2, \lambda_4 \in \mathbb{C} such that θ8=λ1E4+λ2(E42)+λ4(E44)\theta^8 = \lambda_1 E_4 + \lambda_2 (E_4 \circ 2) + \lambda_4(E_4 \circ 4). Comparing the Fourier coefficients at infinity, writing θ8(z)=n0rne2πinz\theta^8(z) = \sum_{n \geq 0} r_n e^{2 \pi i n z} and E4(z)=n0ane2πinzE_4(z) = \sum_{n \geq 0} a_n e^{2 \pi i n z}, we get

r0=λ1a0+λ2a0+λ4a0r1=λ1a1r2=λ1a2+λ2a1.\begin{align*} r_0 &= \lambda_1 a_0 + \lambda_2 a_0 + \lambda_4 a_0 \\ r_1 &= \lambda_1 a_1 \\ r_2 &= \lambda_1 a_2 + \lambda_2 a_1. \end{align*}

We can run a very basic computer program (as written at the beginning of the article) to compute r0,r1,r2r_0, r_1, r_2. We have r0=1;r1=16;r2=112.r_0 = 1; r_1 = 16; r_2 = 112. Recall Theorem 17: for n1n \geq 1,

an=(2πi)kζ(k)(k1)!σk1(n).a_n = \frac{(2\pi i)^k}{\zeta(k)(k-1)!} \sigma_{k-1}(n).

In particular, a0=1;a1=16π4ζ(4)3!=240;a2=16π4ζ(4)3!σ3(2)=240(13+23)=2160a_0 = 1; a_1 = \frac{16\pi^4}{\zeta(4)3!} = 240; a_2 = \frac{16\pi^4}{\zeta(4)3!}\sigma_3(2) = 240 \cdot (1^3 + 2^3) = 2160, recalling the fact that ζ(4)=π490\zeta(4) = \frac{\pi^4}{90}. We’re left with solving the system

1=λ1+λ2+λ416=λ1240112=λ12160+λ2240.\begin{align*} 1 &= \lambda_1 + \lambda_2 + \lambda_4 \\ 16 &= \lambda_1 \cdot 240 \\ 112 &= \lambda_1 \cdot 2160 + \lambda_2 \cdot 240. \end{align*}

We get λ1=115\lambda_1 = \frac{1}{15}, λ2=215\lambda_2 = -\frac{2}{15}, and so λ4=1615\lambda_4 = \frac{16}{15}. This shows that θ8=115E4215(E42)+1615(E44)\theta^8 = \frac{1}{15}E_4 - \frac{2}{15}(E_4 \circ 2) + \frac{16}{15}(E_4 \circ 4). Furthermore, we have the way to compute rnr_n now!

Theorem 38. rn=16σ3(n)32σ3(n2)12n+256σ3(n4)14nr_n = 16 \sigma_3(n) - 32 \sigma_3(\frac{n}{2})\mathbf{1}_{2 \mid n} + 256\sigma_3(\frac{n}{4})\mathbf{1}_{4 \mid n} for all n1n \geq 1.

Proof. We’ve seen that θ8=115E4215(E42)+1615(E44)\theta^8 = \frac{1}{15}E_4 - \frac{2}{15}(E_4 \circ 2) + \frac{16}{15}(E_4 \circ 4). This means rn=115an215an2+1615an4r_n = \frac{1}{15}a_n - \frac{2}{15}a_{\frac{n}{2}} + \frac{16}{15}a_{\frac{n}{4}}, where at:=0a_t := 0 for tNt \notin \mathbb{N}. We’ve also seen that

an=(2πi)4ζ(4)3!σ3(n)=16π4π4906σ3(n)=240σ3(n).a_n = \frac{(2\pi i)^4}{\zeta(4)3!} \sigma_3(n) = \frac{16\pi^4}{\frac{\pi^4}{90} \cdot 6}\sigma_3(n) = 240\sigma_3(n).

This completes the formula. \square

We’re now ready to write an optimized code:

n = int(input())
def sigma3(n):
	ret = 0
	for d in range(1, int(n**0.5)+1):
		if d**2 == n:
			ret += d**3
		elif n % d == 0:
			ret += d**3 + (n//d)**3
	return ret
if n == 0:
	print(1)
else:
	t1 = 16*sigma3(n)
	t2 = -32*sigma3(n // 2) if n % 2 == 0 else 0
	t3 = 256*sigma3(n // 4) if n % 4 == 0 else 0
	print(t1+t2+t3)

And yes, this runs in time O(n)O(\sqrt{n})!

What’s next?

From a computational perspective, this is still inefficient! The running time is not polynomial in logn\log n. Is there a better algorithm? Is there a polynomial time algorithm? I don’t know! Sounds like an interesting problem to work on.

From a mathematical perspective, this use case is very limited. Can we generalize? Can we use tool like this to count the number of representations of an integer into sum of kk squares? Turns out that we can generalize a bit! But for kk odd, we’ll have to look into half-integer weight modular forms. Also, for higher kk there might be problems involving cusp forms, and the dimension might be large. (I don’t know)

However, this already looks like a cool computational problem! I hope you enjoy it!

References