Invitation to Modular Forms: Representation of an integer into sum of eight squares

Created: August 23, 2025 8:54 PM

Last Updated: September 13, 2025

Note: This is an edited reupload which is intended to be hosted on my blog. The reason is that I've received a performance issue complain from the old site hosted at Notion, so here is a spare one. There are minor typographical fixes but no new content changes.

Introduction

We’re interested in the following computational problem: Given $n \in \mathbb{N}$ , how many ways (i.e. number of tuples $(x_1, \dots, x_8) \in \mathbb{Z}^8$ ) are there such that $n$ can be written as $n = x_1^2 + \dots + x_8^2$ .

Our basic computer programming knowledge tells us that we can design a naive algorithm as follows:

n = int(input())
ans = 0
for x1 in range(-n, n+1):
    for x2 in range(-n, n+1):
        for x3 in range(-n, n+1):
            for x4 in range(-n, n+1):
                for x5 in range(-n, n+1):
                    for x6 in range(-n, n+1):
                        for x7 in range(-n, n+1):
                            for x8 in range(-n, n+1):
                                if x1**2 + x2**2 + x3**2 + x4**2 + x5**2 + x6**2 + x7**2 + x8**2 == n:
                                    ans += 1
print(ans)

Of course, this takes a time complexity of $O(n^8)$ , which is not very good. A basic obvious improvement is to observe that $x^2 \geq 0$ for all $x \in \mathbb{N}$ , so in order for a tuple to work, no entry can be larger than $\sqrt{n}$ because otherwise (say, if $x_1 > \sqrt{n}$ ) we would have $x_1^2 + \dots + x_8^2 > n+x_2^2 + \dots + x_8^2 \geq n+1$ . This allows us to decrease the loop to the size of $O(\sqrt{n})$ .

n = int(input())
ans = 0
k = int(n**0.5)+1
for x1 in range(-k, k+1):
    for x2 in range(-k, k+1):
        for x3 in range(-k, k+1):
            for x4 in range(-k, k+1):
                for x5 in range(-k, k+1):
                    for x6 in range(-k, k+1):
                        for x7 in range(-k, k+1):
                            for x8 in range(-k, k+1):
                                if x1**2 + x2**2 + x3**2 + x4**2 + x5**2 + x6**2 + x7**2 + x8**2 == n:
                                    ans += 1
print(ans)

This takes time $O(n^4)$ , a bit better, huh?

Obviously, from the eye of an algorithmist, we can do better. One direct way when we see something in this form is to do the meet-in-the-middle trick: instead of trying to solve for $x_1^2 + \dots + x_8^2 = n$ , we solve for $x_1^2 + x_2^2 + x_3^2 + x_4^2 = m$ and $x_5^2 + x_6^2 + x_7^2 + x_8^2 = n-m$ , count the number of solutions to this, and sum over the scenarios for each $m \in \{0, \dots, n\}$ . Let us try to implement.

n = int(input())

def count_sol(m):
    k = int(m**0.5)+1
    local_ans = 0
    for a in range(-k, k+1):
        for b in range(-k, k+1):
            for c in range(-k, k+1):
                for d in range(-k, k+1):
                    if a**2 + b**2 + c**2 + d**2 == m:
                        local_ans += 1
    return local_ans

ans = 0
for m in range(n+1):
    ans += count_sol(m) * count_sol(n-m)
print(ans)

The count_sol subroutine runs in time $O(\sqrt{m}^4) = O(m^2)$ . We repeat the process $2(n+1)$ times, each of $m \leq n$ so the total time is in $2(n+1) \cdot O(n^2) = O(n^3)$ .

Can we do better? Still yes. Skilled algorithmists might even come up of this formulation immediately (note: I’m not that good though — I asked this problem to a bunch of friends and they immediately gave me the following formulation). We formulate this in terms of knapsack problem. Define $\mathrm{dp}_{i,j}$ to be the quantity $\#\{(x_1, \dots, x_i) \in \mathbb{Z}^i \colon \sum_{k=1}^i x_k^2 = j\}$ , and look at what we can do about its recurrence.

In order to know the quantity $\mathrm{dp}_{i,j}$ , we can enumerate all possible $x_i$ ’s and observe that

$\begin{align*} &\{(x_1, \dots, x_i) \in \mathbb{Z}^i \colon \sum_{k=1}^i x_k^2 = j\} \\ &= \bigcup_{0 \leq x_i \leq \sqrt{j}} \{(x_1, \dots, x_{i-1}) \in \mathbb{Z}^{i-1} \colon \sum_{k=1}^{i} x_k^2 = j\} \times \{(x_i), (-x_i)\} \\ &= \bigcup_{0 \leq x_i \leq \sqrt{j}} \{(x_1, \dots, x_{i-1}) \in \mathbb{Z}^{i-1} \colon \sum_{k=1}^{i-1} x_k^2 = j-x_i^2\} \times \{(x_i), (-x_i)\}. \end{align*}$

Observe that the union in disjoint, so the cardinality of the whole is the sum of the cardinality of each case, so

$\mathrm{dp}_{i,j} = \sum_{1 \leq a \leq \sqrt{j}} 2\mathrm{dp}_{i-1, j-a^2} + \mathrm{dp}_{i-1,j}.$

With this formula, we can implement our dynamic programming algorithm as follows.

n = int(input())
dp = [[0 for j in range(n+1)] for i in range(9)]
dp[0][0] = 1
for i in range(1, 9):
    for j in range(0, n+1):
        dp[i][j] += dp[i-1][j]
        for a in range(1, int(j**0.5)+1):
            if j >= a**2:
                dp[i][j] += 2*dp[i-1][j-a**2]
print(dp[8][n])

This takes time $O(n^{3/2})$ , which is already very much improved from before. Is this the end? Can we do better? An algorithmist might give up at this point, but I refuse to give up just yet! Can we use some mathematical magic to improve this even more? It turns out that we can!

In this article, I’m going to invite you to the fascinating (quite relatively deep) theory of modular forms, not as just an algorithmist, but more as an algorithmic number theorist.

The prerequisites may be quite demanding in order to fill in all the gaps, but I intend to make this suitable to most general audiences, so for deep results from mathematics, if you’re not used to it, please feel free to “just assume the claim is true, move on to see how things fit, and try to understand the proof later”. The true prerequisites are:

Design and analysis of algorithms (and the fact that factoring an integer $n$ is currently still not known to be doable in polynomial time in $\log n$ )
Introductory, real, complex, and Fourier analysis (basic ordinary differential equation, standard topology in $\mathbb{R}$ and $\mathbb{C}$ , differentiability, Lebesgue integration, dominated convergence theorem, Fourier transform, holomorphic function, contour integral, Laurent expansion)
Linear algebra (vector space, basis, dimension) and abstract algebra (group, group action, field)

Acknowledgement

I’d like to dedicate this section to thank my professor, Emmanuel Kowalski, for teaching me the basics of number theory and modular forms. I want to also thank a friend of mine, Pitchayut (Mark) Saengrungkongka, for helping me through mathematical obstacles during my studies. Finally, after all the messes I’ve made in my life, I’d like to thank everyone who still believes in me, who still believes that I can become a mathematician too, despite my sluggish brain and the distractions that slowed me down.

Disclaimer. Any mistakes found in this article belong to me. I’m more than happy to receive feedback and report of errors.

Now let us begin.

Modular Forms

Note. The main reference for this whole section is the course Number Theory II: Introduction to Modular Forms given by Emmanuel Kowalski (see https://metaphor.ethz.ch/x/2025/fs/401-3112-23L/). Other small references will be given throughout the article.

It is quite surprising that modular form is a structure that is studied extensively and, even though initially it is of number theoretic nature, it now has applications in very different areas of mathematics. When I first see it, it looks very artificial, as if it comes out of nowhere. (Who on earth would have think up of this structure???) The following quote is taken from Martin Eichler.

There are five fundamental operations in mathematics: addition, subtraction, multiplication, division and modular forms.

So what is a modular form? In simple terms, it is a function $f \colon \mathbb{H} \to \mathbb{C}$ that satisfies the following property (here $\mathbb{H} := \{z \in \mathbb{C} \colon \operatorname{Im}(z) > 0\}$ is the Poincaré upper-half plane):

There exists an integer $k \geq 0$ such that $f(\frac{az+b}{cz+d}) = (cz+d)^kf(z)$ for all $z \in \mathbb{H}$ and for all $a, b, c, d \in \mathbb{Z}$ such that $ad-bc = 1$ .

If furthermore, $f$ is holomorphic everywhere in $\mathbb{H}$ , we say that it is a holomorphic modular form of weight $k$ and of level $1$ . For meromorphic modular forms, we don’t require $f$ to be defined on the whole $\mathbb{H}$ — there can be isolated poles, only that the relation $f(\frac{az+b}{cz+d}) = (cz+d)^kf(z)$ holds for all $z$ such that $f(z)$ is defined (i.e. not a pole). We will generalize to other levels later.

Now we ramp up the machinery a bit for some spiciness. Define $\mathrm{GL}_n(K)$ as the set of $n \times n$ invertible matrices with entry in the field $K$ . Equivalently, it is the set of $n \times n$ matrices with nonzero determinant. Since $\det(AB) = \det(A)\det(B)$ for square matrices $A, B$ of the same size, and by invertibility, we see clearly that $\mathrm{GL}_n(K)$ has a group structure defined by multiplication. Now, we define a subset $\mathrm{SL}_n(K) \subseteq \mathrm{GL}_n(K)$ as the subset of matrices with determinant one. This subset is also a group, so it’s a subgroup. We can generalize this a bit for the case that $K$ is not a field, for $\mathrm{GL}_n(K)$ it looks like we run into an immediate problem: it’s not a group anymore. However, fortunately, for $\mathrm{SL}_2(\mathbb{Z})$ , there is no problem. If $\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \mathrm{SL}_2(\mathbb{Z})$ is a matrix with determinant one, i.e. $ad-bc = 1$ , then consider the matrix $\begin{pmatrix}-d & b\\c & -a\end{pmatrix}$ . We have

$\begin{pmatrix}a & b\\c & d\end{pmatrix}\begin{pmatrix}-d & b\\c & -a\end{pmatrix} = \begin{pmatrix}-ad+bc &ab-ba\\-cd +dc & bc -ad\end{pmatrix} = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}.$

This means $\begin{pmatrix}-d & b\\c & -a\end{pmatrix}$ is the inverse of $\begin{pmatrix}a & b\\c & d\end{pmatrix}$ ! So $\mathrm{SL}_2(\mathbb{Z})$ forms a group.

Now, define a group action $|_k$ of $\mathrm{SL}_2(\mathbb{Z})$ acting from the right on the space of functions $\mathbb{H} \to \mathbb{C}$ by

$f \Big\vert_k \begin{pmatrix}a & b\\c & d\end{pmatrix} := \left(z \mapsto f\left(\frac{az+b}{cz+d}\right)(cz+d)^{-k}\right).$

It restricts to the space of holomorphic functions. Now, in fancier words, we can define modular forms of weight $k$ and of level $1$ to be functions $f \colon \mathbb{H} \to \mathbb{C}$ such that $f|_kg = f$ for all $g \in \mathrm{SL}_2(\mathbb{Z})$ , i.e., it is the subset of $(\mathrm{SL}_2(\mathbb{Z}), |_k)$ -invariant functions. In particular, it is enough to show that $f$ is invariant under a generating set of $\mathrm{SL}_2(\mathbb{Z})$ to conclude that it is invariant under the whole group. We will use this fact later.

Definition. (Modular form of weight $k$ and of level $1$ ) A function $f \colon \mathbb{H} \to \mathbb{C}$ is said to be a modular form of weight $k$ and of level $1$ if $f$ is invariant under the group action $|_k$ of the group $\operatorname{SL}_2(\mathbb{Z})$ .

Now, consider the action of $\mathrm{SL}_2(\mathbb{Z})$ on $\mathbb{H}$ defined by $\begin{pmatrix}a & b\\c & d\end{pmatrix} \cdot z = \frac{az+b}{cz+d}$ . This action sends point of $\mathbb{H}$ to another (or maybe the same) point of $\mathbb{H}$ . We can start by just being randomly curious about this and ask interesting questions on this action. First of all, given a fixed $z \in \mathbb{H}$ , what is the orbit of this point by the action? Well, we can pick $\begin{pmatrix}1 & 1\\0 & 1\end{pmatrix}$ from $\mathrm{SL}_2(\mathbb{Z})$ and observe that it sends $z$ to $\frac{1z+1}{0z+1} = z+1$ . Therefore, $z+n$ is inside the orbit, for all $n \in \mathbb{Z}$ . Of course, this is not the end. Let’s try something different. Try $\begin{pmatrix}0 & -1\\1 & 0\end{pmatrix} \in \mathrm{SL}_2(\mathbb{Z})$ . This sends $z \mapsto \frac{-1}{z}$ . It may be hard to visualize at first, but observe that if $|z| > 1$ , it sends to $|\frac{-1}{z}| = \frac{1}{|z|} < 1$ . This means, a point outside the unit disk can be sent into somewhere inside the unit disk. This allows us to conclude that “it is enough to just consider points inside a strip $[z_0, z_0+1]$ because the points outside this strip can be reached from one inside this strip by applying the group action” and also “it is enough to just consider points inside the unit disk because points outside the unit disk can be reached by the $z \mapsto -1/z$ transformation”. Turns out this is something that we’ll state it in more precise terms and use it repeatedly afterwards (but for reasons that will come later, instead of inside the disk, we’d rather use outside the disk).

The Fundamental Domain

Definition. The fundamental domain is defined as $\mathcal{D} := \{z \in \mathbb{H} \colon |z| \geq 1; |\operatorname{Re}(z)| \leq \frac{1}{2}\}$ .

Theorem 1 [Serre, Ch. VII, §1, Th 1. and 2.].

For every $z \in \mathbb{H}$ there exists $g \in \mathrm{SL}_2(\mathbb{Z})$ such that $gz \in \mathcal{D}$ .
Suppose $z, z' \in \mathcal{D}$ are two distinct points, such that there exists $g \in \mathrm{SL}_2(\mathbb{Z})$ with $z' = g \cdot z$ , then either (a) $\operatorname{Re}(z) \in \{-\frac{1}{2}, \frac{1}{2}\}$ and $z = z' \pm 1$ , or (b) $|z| = 1$ and $z' = \frac{-1}{z}$ .
For all $z \in \mathcal{D}^\circ$ , $\operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z) = \{\operatorname{id}\}.$
The group $\mathrm{SL}_2(\mathbb{Z})$ is generated by $S = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}$ and $T = \begin{pmatrix}1 & 1\\0 & 1\end{pmatrix}$ .

Proof. Define $G' := \langle S, T \rangle \leq \mathrm{SL}_2(\mathbb{Z})$ . We will eventually show that $G' = \mathrm{SL}_2(\mathbb{Z})$ later, but now let us show 1. Take $z \in \mathbb{H}$ , and let $g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in G'$ be arbitrary first (we will fix it later). We have

$\begin{align*} \operatorname{Im}(gz) &= \operatorname{Im}\left(\frac{az+b}{cz+d}\right) \\ &= \operatorname{Im}\left(\frac{(az+b)\overline{(cz+d)}}{|cz+d|^2}\right) \\ &= \frac{1}{|cz+d|^2}\operatorname{Im}(a\overline{c}|z|^2+b\overline{cz} + a\overline{d}z + b\overline{d})\\ &= \frac{1}{|cz+d|^2} \operatorname{Im}(bc\overline{z}+adz)\\ &= \frac{1}{|cz+d|^2} (ad-bc) \operatorname{Im}(z)\\ &= \frac{\operatorname{Im}(z)}{|cz+d|^2}. \end{align*}$

(We used the simple facts that $a, b, c, d \in \mathbb{Z}$ , so $\overline{c} = c; \overline{d} = d$ . We know that $ad-bc = 1$ , and we know that $\operatorname{Im}(\overline{z}) = -\operatorname{Im}(z)$ .) Now, the idea is that for a given $z$ , we will try to send it as far away (from the origin) as possible, then slide back by repeatedly applying $T$ until it stays in $\mathcal{D}$ . For any fixed $M > 0$ , The set $S_{z, M} :=\left\{(c, d) \in \mathbb{Z}^2 \colon |cz+d| \leq M\right\}$ is finite. To see this, write $z = x+iy$ and observe that $|cz+d| \leq M$ if and only if $M^2 \geq |cz+d|^2 = |c(x+iy)+d|^2 = (cx+d)^2 + (cy)^2$ . So if $(c, d)$ is in that set, then $|c| \leq \frac{M}{|y|}$ (finitely many choices), and for a fixed $c$ , the integer $d$ must satisfy $|cx+d| \leq M$ , that is, $d \in [-cx-M, -cx+M]$ , which contains finitely many integers. This proves that the set $S_{z, M}$ is finite. If we fix $M = |z|$ , then we see that $(1, 0) \in S_{z, M}$ , so the set is not empty. Furthermore, we’re looking for such pair with smallest $|cz+d|$ , so enlarging $M$ wouldn’t help. This means $\min_{(c,d) \in \mathbb{Z}^2} |cz+d|$ exists, with $(c, d) \in S_{z, |z|}$ . However, we want to only look at pairs corresponding to elements of $G'$ , so we consider the set

$\tilde{S}_z := \left\{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in G' \colon |cz+d| \leq |z|\right\}.$

Well, $S \in \tilde{S}_z$ so it is not empty, and there are only finitely many choices of $(c, d)$ here, so, to minimize $|cz+d|$ , we loop over all possible pairs in $S_{z,|z|}$ , and check if there exists a pair $(a, b)$ making $\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \tilde{S}_z$ . We don’t need to loop over all possible $(a, b)$ . Just that if one exists, we track the minimum over such $|cz+d|$ . This shows that there exists $g \in \tilde{S}_z$ minimizing $|cz+d|$ .

Now that we got $g \in G'$ , we want to slide $gz$ back into the strip $[-\frac{1}{2},\frac{1}{2}]+i\mathbb{R}_{>0}$ . Indeed, pick $n \in \mathbb{Z}$ such that $n+gz \in [-\frac{1}{2},\frac{1}{2}]$ , and so $T^ngz \in [-\frac{1}{2},\frac{1}{2}]$ . Furthermore, we have $|T^ngz| \geq 1$ because otherwise $|T^ngz| < 1$ and so when we consider $ST^ng \in G'$ we get

$\operatorname{Im}(ST^ngz) = \frac{\operatorname{Im}(T^ngz)}{|T^ngz|} > \operatorname{Im}(T^ngz) = \operatorname{Im}(gz),$

a contradiction of the maximality of $\operatorname{Im}(gz)$ for $g \in G'$ . This shows that $T^ngz \in \mathcal{D}$ .

Phew. That was long. Now let us move to the second point (which is even longer, because I want to make it very clear to myself, and not leave the casework to the reader). We want to now show that if two points inside the fundamental domain are congruent modulo $G'$ , then they must be in a few special cases only. Suppose $z, z' \in \mathcal{D}$ are two distinct points and $\begin{pmatrix}a & b\\c & d\end{pmatrix} =g \in G'$ such that $z' = gz$ . Assume without loss of generality that $\operatorname{Im}(z) \leq \operatorname{Im}(z')$ (otherwise just swap $z$ and $z'$ ). And so $\operatorname{Im}(z) \leq \operatorname{Im}(gz) = \frac{\operatorname{Im}(z)}{|cz+d|^2}$ , so $|cz+d| \leq 1$ . Write now $z = x+iy$ with $|x| \leq \frac{1}{2}$ and $y > \frac{\sqrt{3}}{2}$ . If $|c| \geq 2$ , then

$\begin{align*} |cz+d|^2 &= |cx+icy+d|^2 = (cx+d)^2 + (cy)^2 \\ &\geq c^2y^2 \geq 4 \frac{\sqrt{3}^2}{2^2} = 3, \end{align*}$

a contradiction. So $|c| < 2$ , i.e. $c \in \{-1, 0, 1\}$ . If $c = 0$ then $|cz+d| \leq 1$ implies $d \in \{-1, 1\}$ (remember that $c = d = 0$ is not possible since this means $\det g = 0$ , impossible in $G' \subseteq \mathrm{SL}_2(\mathbb{Z})$ ). Furthermore, in $\mathrm{SL}_2(\mathbb{Z})$ , we have $ad - bc = 1$ so for $c = 0$ , $ad = 1$ means $a = d$ . This means $z' = gz = \frac{az+b}{d} = z \pm (\text{integer})$ , but $z, z' \in \mathcal{D}$ so such integer must be $\pm 1$ only. And in this case, $|\operatorname{Re}(z)-\operatorname{Re}(z')| = 1$ , so they must lie on somewhere inside the vertical boundaries of $\mathcal{D}$ . This concludes that in case $c = 0$ , we have $\operatorname{Re}(z) = \pm \frac{1}{2}$ and $z' = z \pm 1$ , i.e., case (a) in the theorem.

Now, for the case that $c \in \{-1, 1\}$ , we have $1 \geq |cz+d| = |cx+icy+d| = (cx+d)^2 + y^2 \geq (cx+d)^2 + (\frac{\sqrt{3}}{2})^2 = (cx+d)^2 + \frac{3}{4}.$ This means $|cx+d| \leq \frac{1}{2}$ . This means $d$ cannot be too far away, say, $|d| > 1$ is not possible, because then $cx \in [-d-\frac{1}{2}, -d+\frac{1}{2}]$ is impossible for $|cx| = |x| \leq \frac{1}{2}$ . For $d = -1$ , the only possible place for $cx$ to be is $\frac{1}{2}$ , and in this case, $1 \geq |cz+d|^2 = (cx+d)^2 + y^2 = \frac{1}{4} + y^2$ , so $\frac{\sqrt{3}}{2} \leq y \leq \frac{\sqrt{3}}{2}$ , i.e., $y = \frac{\sqrt{3}}{2}$ , and for $d = 1$ , $cx = -\frac{1}{2}$ , and $1 \geq |cz+d|^2 = \frac{1}{4}+y^2$ so $y = \frac{\sqrt{3}}{2}$ as well. In these two subcases, we have $z = \pm \frac{1}{2} + i\frac{\sqrt{3}}{2}$ , and so $|z| = 1$ .

In both subcases, we have $d+2cx = 0$ . Consider the fact that $gz = z' \in \mathcal{D}$ , so $|\operatorname{Re}(gz)| \leq \frac{1}{2}$ , i.e. that

$\begin{align*} \frac{1}{2} &\geq \left|\operatorname{Re}\left(\frac{az+b}{cz+d}\right)\right| \\ &= \left|\operatorname{Re}\left(\frac{(az+b)\overline{(cz+d)}}{(az+b)\overline{(cz+d)}}\right)\right| \\ &= \frac{1}{|cz+d|^2} |\operatorname{Re}((a(x+iy)+b)(c(x-iy)+d))| \\ &= \frac{1}{|cz+d|^2} |(ax+b)(cx+d) + acy^2|. \end{align*}$

But we knew that $|cz+d| \leq 1$ , so $\frac{1}{2} \geq |acx^2+bcx+adx+bd+acy^2|$ . In both subcases, $1 = |z|^2 = x^2+y^2$ , so we have $\frac{1}{2} \geq |ac + (bc+ad)x + bd| = |ac + (2bc+1)x + bd| = |b(d+2cx) + ac+x| = |ac+x|$ . If $|a| \geq 2$ , then $2 \leq |ac+x-x| \leq |ac+x| + |x| = |ac+x| + \frac{1}{2}$ , a contradiction, so $a \in \{-1, 0, 1\}$ .

Observe that in both subcases, $2cxd = -1$ . If $b \ne 0$ , then $ad = 1+bc$ means $ad-1$ also cannot be zero, i.e. $ad \ne 1$ . If $a = -d$ then $\frac{1}{2} \geq|ac+x| = |-dc+x| = |\frac{-cdx+x^2}{x}| = |\frac{\frac{1}{2}+\frac{1}{4}}{\pm\frac{1}{2}}| = \frac{3}{2}$ , a contradiction, so the only option when $b \ne 0$ is that $a = 0$ . In such scenario, $1 = ad-bc = -bc$ so $b = -c$ . Now for $b = 0$ , we have $1 = ad - bc = ad$ so $a = d$ .

To summarize, for the case (b.1) that $c \in \{-1, 1\}$ and $d \ne 0$ , we have deduced the candidates for $g$ :

$g \in \left\{\begin{pmatrix}0 & -c\\c & d\end{pmatrix} \colon cx = -\frac{d}{2}\right\} \cup \left\{\begin{pmatrix}d & 0\\c & d\end{pmatrix} \colon cx = -\frac{d}{2}\right\}.$

If $g$ is in the first set, $gz = \frac{-c}{cz+d} = \frac{-c}{c(x+iy)+d} = \frac{-c}{\frac{d}{2}+ic\frac{\sqrt{3}}{2}} = \frac{-1}{\frac{d}{2c}+i\frac{\sqrt{3}}{2}} = \frac{-1}{-x+iy} = \frac{1}{x-iy} = \frac{1}{\overline{z}} = \frac{z}{|z|^2} = z$ , a contradiction with the fact that $z' \ne z$ . If $g$ is in the second set, $gz = \frac{dz}{cz+d} = \frac{dz}{cz+d} \frac{\overline{cz+d}}{\overline{cz+d}} = \frac{(dz)(\overline{cz+d})}{|cz+d|^2} = dc|z|^2 + d^2z = cd+z = -2x+x+iy = -x+iy = -\bar{z} = -\frac{z\bar{z}}{z} = -\frac{1}{z}$ , and now it is clear that $g = \begin{pmatrix}1 & 0\\-2x & 1\end{pmatrix}$ or $g = \begin{pmatrix}-1 & 0\\2x & -1\end{pmatrix}$ for the case (b.1).

Now we’re left with the case that $c \in \{-1, 1\}$ and $d = 0$ (b.2). In this case, $1=ad-bc=-bc$ so $b = -c$ . We’ve assumed that $1 \geq |cz+d|$ , but for $d = 0$ and $|c| = 1$ , this means $|z| = 1$ . Recall that $|\operatorname{Re}(gz)| \leq \frac{1}{2}$ implies that $\frac{1}{2} \geq |acx^2+bcx+adx+bd+acy^2|$ . Plug in what we have to get $\frac{1}{2} \geq |acx^2 -c^2x +acy^2| = |ac(x^2+y^2) - x|.$ Suppose $|a| \geq 2$ , then, we have, by triangle inequality, that $|ac(x^2+y^2)| \leq |ac(x^2+y^2)-x| +|x|$ , and so

$\begin{align*} \frac{1}{2} &\geq |ac(x^2+y^2)-x| \\ &\geq |ac(x^2+y^2)|-|x| \\ &= |a| |c| |z|^2 -|x| \\ &\geq 2 \cdot 1 \cdot 1 - \frac{1}{2} = \frac{3}{2}, \end{align*}$

a contradiction. So, $a \in \{-1, 0, 1\}$ . If $a \ne 0$ , then $a = \pm 1$ , so $ac = \pm 1$ , and $\frac{1}{2} \geq |\operatorname{Re}(gz)| = \frac{1}{|cz+d|^2} |(ax+b)(cx+d) + acy^2| = |(ax-c)(cx) + acy^2| = |acx^2 + acy^2 - x| = |ac|z|^2 - x| = |ac-x|$ would mean $x$ must be near $ac = \pm 1$ with distance no more than $\frac{1}{2}$ . The only case is back to when $|x| = \frac{1}{2}$ , and this shows that $y = \frac{\sqrt{3}}{2}$ and $|\operatorname{Re}(gz)| = \frac{1}{2}$ also in this case. Furthermore, if $ac = -1$ then $x = -\frac{1}{2}$ and if $ac = 1$ then $x = \frac{1}{2}$ . We can summarize this as $2acx=1$ (or equivalently $ac = 2x$ ). This shows that $|gz|^2 = \frac{|az-c|^2}{|cz|^2} = (ax-c)^2 + (ay)^2 = 2-2acx = 1$ , so $|gz|=1$ also. Now, we have

$\begin{align*} gz &= \frac{az-c}{cz} = \frac{a}{c}-\frac{1}{z} = ac-\frac{1}{z} = 2x-\frac{1}{z} \\ &= z+\bar{z}+\frac{1}{z} = \frac{z^2+|z|^2-1}{z} = \frac{z^2}{z} = z, \end{align*}$

a contradiction. This rules out $a = \pm 1$ . Now let us check $a = 0$ . We have

$gz = \frac{-c}{cz} = \frac{-1}{z}.$

This shows that for $c \ne 0$ , $|z| = 1$ and $z' = -\frac{1}{z}$ , and completes the case (b).

Now the third part follows by the same argument as in the previous part: Take $z \in \mathcal{D}^\circ$ . Suppose $g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z)$ , then $gz = z$ . We have $\operatorname{Im}(gz) = \frac{\operatorname{Im}(z)}{|cz+d|^2} = \operatorname{Im}(z)$ , so $|cz+d|=1$ . By the same argument as before, it is impossible for $|c| \geq 2$ , so we know that $c \in \{-1, 0, 1\}$ . If $c \ne 0$ then we follow the same argument as before to deduce that $z = \pm \frac{1}{2} + \frac{\sqrt{3}}{2}i$ , a contradiction since we’ve assumed that $z$ is in the interior, so we conclude that $c = 0; d = 1$ , and $ad-bc = 1$ implies $a = 1$ . This means $z = gz = \frac{az+b}{cz+d} = z+b$ , so $b = 0$ and this tells us that $\operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z) \subseteq \{\operatorname{id}\}$ .

Now let us show the fourth part, that $G' = \mathrm{SL}_2(\mathbb{Z})$ . Take $g \in \mathrm{SL}_2(\mathbb{Z})$ and let us show that $g \in G'$ . Choose $z_0 := 2i \in \mathcal{D}^\circ$ , and let $z = gz_0 \in \mathbb{H}$ . We’ve seen that by 1., there exists $g' \in G'$ such that $g'z \in \mathcal{D}$ . The points $z_0$ and $g'z = g'gz_0$ are both in $\mathcal{D}$ and are congruent modulo $G'$ . But $z_0$ is not on the boundary, so both points fail to be distinct. Therefore, $g'gz_0 = z_0 \in \mathcal{D}^\circ$ . So $g'g \in \operatorname{Stab}_{\operatorname{SL}_2(\mathbb{Z})}(z_0) = \{\operatorname{id}\}$ , i.e. $g = g'^{-1} \in G'$ . This shows that $\operatorname{SL}_2(\mathbb{Z}) \subseteq G'$ and hence completes the proof. $\square$

We now turn to the more restricted notion: holomorphic modular forms which is also defined and holomorphic at infinity.

At Infinity

The real line $\mathbb{R}$ is not compact. The sequence $1, 2, 3, \dots$ has no accumulation point in $\mathbb{R}$ . However, we can compactify it by adding a point at infinity, say, define $\hat{\mathbb{R}} := \mathbb{R} \cup \{\infty\}$ . In the same way, the complex plane is not compact, but we can think of it as a sphere with one punctured hole corresponding to $\infty$ .

Riemann Sphere, GKFXtalk, de:User:Bjoern_klipp, CC BY-SA 3.0 http://creativecommons.org/licenses/by-sa/3.0/, via Wikimedia Commons

Now, with this picture in mind, we go back to modular forms and try to complete them at infinity.

Proposition 2. Let $f$ be a meromorphic modular form of weight $k$ , then there exists a meromorphic function $\tilde{f} \colon \mathbb{D}^\times \setminus \{\text{isolated poles}\}\to \mathbb{C}$ such that $f(z) = \tilde{f}(e^{2\pi i z})$ for all $z \in\mathbb{H} \setminus \{\text{isolated poles}\}$ , where $\mathbb{D}^\times := \{z \in \mathbb{C} \colon 0 < |z| < 1\}$ .

Proof. First, for any $a \in \mathbb{R}$ , there is a bijection

$\Phi_a \colon \begin{cases}\text{Strip}_a &\to \mathbb{D}^\times \setminus \{re^{2\pi i a} \colon 0 < r < 1\}\\ z &\mapsto e^{2\pi i z}\end{cases}$

where $\text{Strip}_a :=\{z \in \mathbb{C} \colon \operatorname{Re}(z) \in (a, a+1); \operatorname{Im}(z) \in (0, +\infty)\}$ , which is very nice in our use case. Furthermore, $\Phi$ is holomorphic everywhere in its strip, and we can compute the derivative as $\Phi'(z) = 2\pi i e^{2\pi i z}$ which never vanishes in the strip. This shows that $\Phi$ is a conformal equivalence. Now, we can define $\tilde{f}$ by patching up two charts. The first chart is $\Phi_{-\frac{1}{2}}$ sending our favorite strip to $\mathbb{D}^\times \setminus (-1, 0)$ , and the second chart is $\Phi_{\frac{1}{2}}$ sending an offset strip to $\mathbb{D}^\times \setminus (0, 1)$ . We define $\tilde{f}(q)$ as $f(\Phi_{-\frac{1}{2}}^{-1}(q))$ whenever $q \in \mathbb{D}^\times \setminus (-1, 0)$ and define as $f(\Phi_0^{-1}(q))$ whenever $q \in \mathbb{D}^\times \setminus (0, 1)$ . Observe that this is well-defined (except for some not too many points that are not defined even on $f$ since it’s meromorphic) since the values agree on the intersection. Indeed, if $q \in \mathbb{D}^\times \setminus (-1, 1)$ then $z_{-\frac{1}{2}} := \Phi_{-\frac{1}{2}}^{-1}(q)$ and $z_0 := \Phi_0^{-1}(q)$ satisfies $e^{2\pi iz_{-1/2}} = e^{2\pi i z_0}$ , so $e^{2 \pi i(z_{-1/2}-z_0)} = 1$ . This means $n := z_{-1/2}-z_0 \in \mathbb{Z}$ , and so $f(z_{-1/2}) = f(z_0 + n) = f\left(\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} z_0\right) = (0z_0+1)^k f(z_0) = f(z_0)$ by the modularity of $f$ (if it is defined there). The two charts extend each other to $\tilde{f} \colon \mathbb{D}^\times \to \mathbb{C}$ , which is meromorphically defined, such that $f(z) = \tilde{f}(e^{2\pi iz})$ for all $z \in \mathbb{H}$ wherever defined. (One can check this by sliding $z$ back to our favorite strip or the offset strip, and note that $f$ is periodic due to modularity, and $z \mapsto e^{2\pi iz}$ is already periodic by default) $\square$

Note. Inspecting the proof carefully, we haven’t actually used the (rather strong) fact that $f$ is modular. We’ve only used the fact that $f$ is periodic. We will recall this construction for “just periodic, but not necessarily modular” meromorphic functions later on.

So now the disk $\mathbb{D}^\times$ is our model for $\mathbb{H}$ . Think of it as the upper hemisphere of the Riemann sphere, with the center removed. Do you see the point of doing this now? If we can complete $\tilde{f}$ at the hole, we would complete $f$ at infinity! If there is a meromorphic continuation of $\tilde{f}$ to all of $\mathbb{D} := \mathbb{D}^\times \cup \{0\}$ , then we’re done.

Definition. A meromorphic modular form $f$ is said to be meromorphic at infinity if there exists a meromorphic continuation of $\tilde{f}$ to $\mathbb{D}$ . It is holomorphic at infinity if $\tilde{f}$ is holomorphic at $0$ . If $\tilde{f}$ is defined at $0$ , we may write $f(\infty)$ to denote $\tilde{f}(0)$ .

Fourier Expansion

For meromorphic modular form that is also meromorphic at infinity, by complex analysis, we can expand $\tilde{f}$ into a Laurent series $\tilde{f}(q) = \sum_{n \geq k} a_n q^n$ for some $k \in \mathbb{Z}$ (maybe negative) and for some sequence $(a_n)_{n \geq k}$ of complex numbers with $a_k \ne 0$ , such that equality holds for all $q$ in a neighborhood of $0$ (except maybe zero itself if $k < 0$ ). Such expansion is unique, i.e., for a given $f$ , the pair $(k, (a_n)_{n \geq k})$ is uniquely defined (even if we pick different neighborhoods, the pair coincides). Equivalently, we have $f(z) = \sum_{n \geq k} a_ne^{2 \pi i n z}$ for all $z$ with large enough $\operatorname{Im}(z)$ , with an extra value of $f(\infty) = \sum_{n \geq k} a_n$ . To simplify things, we would rather denote $(a_n)_{n \in \mathbb{Z}}$ but keep in mind that it is always zero when the index is negative enough.

Definition. For a meromorphic modular form $f$ that is also meromorphic at infinity, there exists a unique sequence $(a_n)_{n \in \mathbb{Z}}$ of complex numbers, which takes value zero when the index is negative enough, such that $\tilde{f}(q) = \sum_{n \in \mathbb{Z}} a_nq^n$ in a neighborhood of zero but maybe without zero itself. The sequence is called the Fourier expansion of $f$ at infinity, and the sequence is called the Fourier coefficients.

Definition. We write $\mathcal{M}_k$ for the set of holomorphic modular forms of weight $k$ , which is also holomorphic at infinity, and we write $\mathcal{S}_k$ for the subset of those forms that take value $0$ at infinity.

Also observe that the Fourier coefficients starts at $n \geq 0$ for forms in $\mathcal{M}_k$ and starts at $n \geq 1$ for forms in $\mathcal{S}_k$ .

Now that we’ve defined $\mathcal{M}_k$ and $\mathcal{S}_k$ , what do you see? This is where things get interesting.

Proposition 3. For $f, h \in \mathcal{M}_k$ and $\lambda \in \mathbb{C}$ , we have $f+h \in \mathcal{M}_k$ and $\lambda f \in \mathcal{M}_k$ . The result is still true if we replace $\mathcal{M}_k$ by $\mathcal{S}_k$ .

Proof. For $g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ and for all $z \in \mathbb{H}$ ,

$\begin{align*} (f+h)|_k g(z) &= (f+h)(gz)(cz+d)^{-k}\\ &= (f(gz)+h(gz))(cz+d)^{-k}\\ &= f|_kg(z) + h|_kg(z)\\ &= f(z)+h(z)\\ &= (f+h)(z). \end{align*}$

Also,

$(\lambda f)|_k g(z) = (\lambda f)(gz)(cz+d)^{-k} = \lambda (f|_kg)(z) = \lambda f(z).$

This checks that the modularity condition still holds. The fact that they are holomorphic and also holomorphic at infinity is a standard fact from complex analysis. Same thing holds when we replace $\mathcal{M}_k$ by $\mathcal{S}_k$ . Easy to check that they still attain zero at infinity. $\square$

This shows that $\mathcal{M}_k$ is a vector space, and $\mathcal{S}_k$ is a subspace.

Soon, we will want to study the shape of $\mathcal{M}_k$ and $\mathcal{S}_k$ . Let us just be curious about these objects, and so we can ask natural questions: Can we compute the dimension of the spaces for each weight? Is there an explicit formula? Furthermore, can we find a basis for them? It turns out that luckily, we can do all of that. But anyway, we need some heavy complex analytic machinery here first.

The Valence Formula

Here I’m going to follow almost exactly the same as in [Serre, Ch. VII, §3, 3.1.], but written in my own wording (and I’ll try to simplify as much as I can).

From complex analysis, for any meromorphic function $f \not\equiv 0$ in an open set $U$ , if we fix $a \in U$ then we can find a neighborhood $N$ of $a$ inside $U$ and a Laurent expansion $\sum_{n \geq k} a_n (z-a)^n$ (with $a_k \ne 0$ ) such that the expansion coincides with $f$ in $N \setminus \{a\}$ . Such expansion is unique, even if the neighborhood is not, that is, even in a different choice of neighborhood, the expansion is still the same. The number $k$ is called the order of vanishing of $f$ at $a$ , and is denoted by $\operatorname{ord}_a(f)$ . The residue of $f$ at $a$ , denoted by $\operatorname{Res}_{z=a}(f(z))$ or $\operatorname{Res}_a(f)$ , is defined as $a_{-1}$ .

Let us first set up the complex-analytic machinery.

Lemma 4. For a meromorphic function $f$ defined on an open set $U$ , and for a point $a \in U$ , we have

$\operatorname{Res}_a\left(\frac{f'}{f}\right) = \operatorname{ord}_a(f).$

Proof. Write the Laurent expansion of $f$ at $a$ as $\sum_{n \geq k} a_n (z-a)^n$ , $a_k \ne 0$ . Put $g = \sum_{n \geq k+1} a_n(z-a)^n$ , then, in a neighborhood of $a$ except at $a$ , we have

$\begin{align*} \frac{f'}{f} &= \frac{a_k k (z-a)^{k-1} + g'}{a_k (z-a)^k + g} \\ &= \frac{a_k(z-a)^kk(z-a)^{-1} + gk(z-a)^{-1}+g'-gk(z-a)^{-1}}{a_k(z-a)^k+g} \\ &= k(z-a)^{-1} + \frac{g'-gk(z-a)^{-1}}{a_k(z-a)^k+g}. \end{align*}$

Observe that the second term is

$\frac{(g'-gk(z-a)^{-1})(z-a)^{-k}}{a_k+g(z-a)^{-k}}$

which, after plugging in $z = a$ , gives $\frac{-k a_{k+1}}{a_k}$ , i.e., well-defined without a pole, so the function $z \mapsto \frac{g'-gk(z-a)^{-1}}{a_k(z-a)^k + g}$ is holomorphic in that neighborhood, i.e. its Laurent series is Taylor, and so the residue of $f'/f$ must be $k$ , which is the order of $f$ at $a$ . $\square$

Recall the basic formula from complex analysis (Cauchy Integral Formula): If $f \colon U \to \mathbb{C}$ is a holomorphic function on an open set $U$ , let $\gamma \colon S^1 \to U$ be a null-homotopic Jordan curve with Jordan interior $\mathcal{N}$ such that $\mathcal{N} \subseteq U$ and with winding number $+1$ . Then, for any $a \in \mathcal{N}$ , we have

$\frac{1}{2\pi i}\int_\gamma \frac{f(z)}{z-a} \mathrm{d}z =f(a).$

Lemma 5. Suppose $U \subseteq \mathbb{C}$ is an open set, where $f \colon U \to \mathbb{C}$ is holomorphic. We define, for any $a \in U;\, r > 0;\, \alpha \in (0, 2\pi];\, \theta \in \mathbb{R}$ such that the disk $\{a+z \in \mathbb{C} \colon |z| \leq r\}$ is contained in $U$ , the curve

$\gamma_{a, r, \alpha, \theta} \colon \begin{cases} [0, 1] &\to \mathbb{C}\\ t &\mapsto a+re^{i (\theta + t \alpha)}. \end{cases}$

Then, we have for any $a \in U$

$\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f(z)}{z-a} \mathrm{d}z = i\alpha f(a).$

Proof. Since $f$ is holomorphic at any $a \in U$ , we fix $a \in U$ then there exists a Taylor expansion in a neighborhood $\mathcal{N}$ of $a$ in $U$ . By definition of a neighborhood, there exists a closed disk of some radius $R$ centered at $a$ that lies entirely in $\mathcal{N}$ . Write $f(z) = f(a) + g(z)(z-a)$ with $g$ holomorphic. This equality holds in $\mathcal{N}$ . Now, we fix $\theta \in \mathbb{R}$ , and we have, for $0 < r \leq R$ ,

$\int_{\gamma_{a, r, \alpha, \theta}} \frac{f(z)}{z-a} \mathrm{d}z = f(a)\int_{\gamma_{a,r,\alpha,\theta}} \frac{1}{z-a} \mathrm{d}z + \int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z.$

For the main part, we have

$\begin{align*} \int_{\gamma_{a,r,\alpha,\theta}} \frac{1}{z-a} \mathrm{d}z &= \int_0^1 \frac{1}{a+re^{i(\theta+t\alpha)}-a} \gamma_{a,r,\alpha,\theta}'(t)\mathrm{d}t \\ &= \int_0^1 \frac{1}{re^{i(\theta+t\alpha)}} i\alpha re^{i(\theta+t\alpha)} \mathrm{d}t \\ &= \int_0^1 i\alpha \mathrm{d}t = i\alpha. \end{align*}$

For the error part, observe that

$\int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z = \int_0^1 g(a+re^{i(\theta+t\alpha)}) i\alpha re^{i(\theta+t\alpha)} \mathrm{d}t$

and consider the compact disk $\{a+z \colon |z| \leq R\} \subseteq \mathcal{N}$ . Its image under $z \mapsto |g(z)|$ must also be compact, so we can bound $|g(a+re^{i(\theta+t\alpha)})| \leq C$ for some constant $C$ not depending on $r$ . This means

$\begin{align*} \left|\int_{\gamma_{a,r,\alpha,\theta}} g(z)\mathrm{d}z\right| &\leq \int_0^1 |g(a+re^{i(\theta+t\alpha)})| |i\alpha r e^{i(\theta+t\alpha)}| \mathrm{d}t \\ &\leq \alpha r \int_0^1 C \mathrm{d}t = \alpha r C. \end{align*}$

Taking limit $r \xrightarrow{> 0} 0$ yields the result. $\square$

Now we bring another tool from the basic complex analysis toolbox. (Cauchy’s Argument Principle). For a meromorphic function $f$ meromorphic on an open set $U \subseteq \mathbb{C}$ , if $\gamma \colon S^1 \to \mathbb{C}$ is a null-homotopic Jordan curve with winding number $+1$ and with Jordan interior $\mathcal{N} \subseteq U$ , then

$\int_\gamma \frac{f'}{f} = 2\pi i \sum_{z \in \mathcal{N}} \operatorname{ord}_z(f).$

Corollary 6. For a meromorphic function $f$ meromorphic on an open set $U \subseteq \mathbb{C}$ , for any $a \in U$ , reusing the notation of curves from Lemma 5., we have

$\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f'}{f} = i \alpha \operatorname{ord}_a(f).$

Proof. Recall from Lemma 4. that in some neighborhood $\mathcal{N}$ of $a \in U$ , we have that $f'/f \equiv \frac{\operatorname{ord}_a(f)}{z-a} + g$ is the Laurent expansion at $a$ , with $g$ holomorphic. Apply Lemma 5 on $z \mapsto \operatorname{ord}_a(f) + g(z)(z-a)$ , which is holomorphic in a (maybe another) neighborhood $\mathcal{N}'$ of $a$ , to see that

$\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a,r,\alpha,\theta}} \frac{\operatorname{ord}_a(f)+g(z)(z-a)}{z-a} = i\alpha \operatorname{ord}_a(f).$

Now observe that in $\mathcal{N} \cap \mathcal{N}'$ , the meromorphic equality $\frac{f'}{f} \equiv \frac{\operatorname{ord}_a(f)}{z-a} +g$ is actually defined, i.e., that $\frac{f'(z)}{f(z)} = \frac{\operatorname{ord}_a(f)}{z-a} + g(z)$ for all $z \in \mathcal{N} \cap \mathcal{N}'$ . This shows that

$\lim_{\substack{r \to 0\\r > 0}} \int_{\gamma_{a, r, \alpha, \theta}} \frac{f'}{f} = i\alpha \operatorname{ord}_a(f)$

and hence completes the proof. $\square$

Now our complex analysis toolbox is complete, and we’re ready to proceed to our case of meromorphic modular forms of weight $k$ . Define $\operatorname{ord}_\infty(f)$ to be $\operatorname{ord}_0(\tilde{f})$ .

Theorem 7. Let $f \not\equiv 0$ be a meromorphic modular form of weight $k$ . One has

$\operatorname{ord}_\infty(f) + \frac{1}{2}\operatorname{ord}_i(f) + \frac{1}{3}\operatorname{ord}_{\exp(2i\pi/3)}(f) + \sum_{z \in \mathcal{D} \setminus \{i,e^{2i\pi/3}, e^{2i\pi/6}\}/\operatorname{SL}_2(\mathbb{Z})} \operatorname{ord}_z(f) = \frac{k}{12}.$

Proof. First, we observe that there are finitely many zeros and poles. Indeed, since $\tilde{f}$ is meromorphic at zero, there exists a neighborhood (and hence we can assume to be a smaller disk) $U = \{z \in \mathbb{C} \colon 0 < |z| < r\}$ such that $\tilde{f}$ has neither zeros nor poles in $U$ . This means $f$ has neither zeros nor poles when $0 < |e^{2\pi i z}| < r$ . Writing $z = x+iy$ , this is equivalent to $0 < |e^{2\pi i (x+iy)}| < r$ , but $e^{2\pi i(x+iy)} = e^{2\pi i x} e^{-2\pi y}$ so this is equivalent to $-2\pi y < \log r$ , which is $y > \frac{-\log r}{2\pi} = \frac{\log(1/r)}{2\pi}$ . So zeros and poles of $f$ must have imaginary part no more than $\frac{\log (1/r)}{2\pi}$ . Now, the set $\mathcal{D}_r := \mathcal{D} \cap \{x+iy \in \mathbb{H} \colon y \leq \frac{\log(1/r)}{2\pi}\}$ is compact, and so there can be only finitely many zeros and poles.

Now we set up an integral over the contour $\mathcal{D}_r$ .

Let $\Gamma_R$ denote the boundary curve of $\mathcal{D}_r$ minus small disks of radius $R$ around $e^{2\pi i/6}, i, e^{2\pi i/3}$ , going counterclockwise, with any parametrization with nonzero speed. Suppose first that there is no zero and no pole of $f$ on the boundary curve $\Gamma_R$ . We will take care of this later. We have, by the Cauchy’s argument principle, that

$\int_{\Gamma_R} \frac{f'}{f} = 2\pi i \sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f).$

Now, we try to do the same integral but from another point of view. First, the segment $\mathrm{EA}$ gets mapped by the conformal equivalence to a circle $C$ in the unit disk. Let $D$ be the disk such that its boundary is $C$ . This means

$\begin{align*} \int_{\mathrm{EA}} \frac{f'(z)}{f(z)} \mathrm{d}z &= -\int_{\mathrm{AE}} \frac{2 \pi i\tilde{f}'(e^{2\pi i z})}{\tilde{f}(e^{2\pi i z})} \mathrm{d}z \\ &= -\int_C \frac{2\pi i\tilde{f}(q)}{f(q)} \frac{\mathrm{d}q}{2\pi i}\\ &= -2\pi i \sum_{q \in D^\circ} \operatorname{ord}_q(\tilde{f})\\ &= -2\pi i \operatorname{ord}_0(\tilde{f})\\ &= -2\pi i\operatorname{ord}_\infty(f). \end{align*}$

Observe that $\tilde{f}(e^{2\pi iz}) = f(z)$ , so $2\pi i \tilde{f}'(e^{2\pi iz}) = f'(z)$ . The equality before the last comes from the fact that we’ve already assumed that there are no zeros or poles of $f$ in the higher region of our favorite strip, and so this means there are no zeros or poles in $D^\circ \setminus \{0\}$ .

Next, we have

$\begin{align*} &\int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{DE}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= \int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{DE}} \frac{f'(z-1)}{f(z-1)} \mathrm{d}z \\ &= \int_{\mathrm{AB}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{BA}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= 0. \end{align*}$

The fact that $f(z) = f(z-1)$ comes from the modularity of $f$ .

Now, we have

$\begin{align*} &\int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(z)}{f(z)} \mathrm{d}z \\ &= \int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(-1/z)z^{-k-2} - kz^{-k-1}f(-1/z)}{f(-1/z)z^{-k}} \mathrm{d}z \end{align*}$

since the fact that $f(-1/z) = z^kf(z)$ tells us that $f'(-1/z) z^{-2} = kz^{k-1} f(z) + z^k f'(z)$ and so $f'(z) = f'(-1/z)z^{-k-2} - kz^{-1}f(z) = f'(-1/z)z^{-k-2} - kz^{-k-1}f(-1/z)$ . Simplifying this, we have

$\int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} = \int_{\mathrm{B'C}} \frac{f'(z)}{f(z)} \mathrm{d}z + \int_{\mathrm{C'D}} \frac{f'(-1/z)}{f(-1/z)}z^{-2} \mathrm{d}z + \int_{\mathrm{C'D}} -kz^{-1} \mathrm{d}z.$

Putting $u = -1/z$ , we have $\mathrm{d}u = z^{-2}\mathrm{d}z$ , and the segment $\mathrm{C'D}$ gets mapped into $\mathrm{CB'}$ . This means

$\int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} = \int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{CB'}} \frac{f'}{f} + \int_{\mathrm{C'D}} -kz^{-1}\mathrm{dz} = \int_{\mathrm{C'D}}-kz^{-1}\mathrm{d}z.$

Now we’re left with taking care of the removed disks. For $\mathrm{BB'}$ , we have

$\int_{\mathrm{BB'}} \frac{f'}{f} = - \int_{\gamma_{2\pi i/3, R, \alpha_R, \theta_R}} \frac{f'}{f}$

where $\alpha_R$ is the arc angle of $\mathrm{BB'}$ , and $\theta_R$ is the offset angle. Observe that as $R$ tends to zero, $\alpha_R$ tends to $\frac{\pi}{3}$ . The same argument applies for $\mathrm{DD'}$ . Same for $\mathrm{CC'}$ but the angle tends to $\pi$ instead.

Summing up all the integrals, we see that

$\begin{align*} 2\pi i \operatorname{ord}_\infty(f) &= \int_{\Gamma_R} \frac{f'}{f} \\ &= \int_{\mathrm{AB}} \frac{f'}{f} + \int_{\mathrm{BB'}} \frac{f'}{f} + \int_{\mathrm{B'C}} \frac{f'}{f} + \int_{\mathrm{CC'}} \frac{f'}{f} + \int_{\mathrm{C'D}} \frac{f'}{f} + \int_{\mathrm{DD'}} \frac{f'}{f} + \int_{\mathrm{D'E}} \frac{f'}{f} + \int_{\mathrm{EA}} \frac{f'}{f}\\ &= \int_{\mathrm{BB'}} \frac{f'}{f} + \int_{\mathrm{CC'}} \frac{f'}{f} + \int_{\mathrm{DD'}} \frac{f'}{f} + \int_{\mathrm{C'D}} -kz^{-1}\mathrm{d}z \end{align*}$

Taking the limit as $R$ tends to zero, and applying Corollary 6 (and Lemma 5. for the last integral)., we have

$2 \pi i\sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f) = -2 \pi i \operatorname{ord}_\infty(f) -i \frac{\pi}{3} \operatorname{ord}_{e^{2i\pi/3}}(f) - i\frac{\pi}{3} \operatorname{ord}_{e^{2i\pi/6}}(f) - i\pi\operatorname{ord}_i(f) + i \frac{\pi}{6} k.$

We’ve seen that $e^{2i\pi/3}$ and $e^{2i\pi/6}$ are congruent modulo $\operatorname{SL}_2(\mathbb{Z})$ , so, we have

$2\operatorname{ord}_\infty(f) + \frac{2\operatorname{ord}_{e^{2\pi/3}}(f)}{3} + \operatorname{ord}_i(f) + 2\sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f) =\frac{k}{6},$

or, finally,

$\frac{k}{12} = \frac{1}{2}\operatorname{ord}_i(f) + \frac{1}{3}\operatorname{ord}_{e^{2i\pi/3}}(f) + \operatorname{ord}_\infty(f) + \sum_{z \in \mathcal{D}_r} \operatorname{ord}_z(f).$

Now we’re left with handling the case where there exists some zeros or poles on $\Gamma_0 \setminus \{e^{2i\pi/3}, e^{2i\pi/6}, i\}$ . Note that we don’t have to worry in the process of taking $R \to 0$ because zeros and poles are isolated, and so the limit still works (we can just skip those $R$ ’s). In this limiting scenario, $\mathrm{B}$ and $\mathrm{B'}$ coincide, same for $(\mathrm{C}, \mathrm{C'})$ and $(\mathrm{D}, \mathrm{D'})$ . If $z \in\mathrm{AB}$ is a zero or a pole, then by modularity, $z+1 \in \mathrm{DE}$ would also be the same. We can edit $\Gamma_R$ by perturbing it by a disk of a very small radius.

image 1.png

In this way, the integral would stay the same, but the sum of orders will count this point once. In order to avoid problems of double counting, we denote by $\sum_{z \in \mathcal{D}_r/\operatorname{SL}_2(\mathbb{Z})} \operatorname{ord}_z(f)$ instead of simply summing over the whole contour (or just the interior). Now we’re left with dealing with zeros and poles on the arc $\mathrm{BCD} \setminus \{i, e^{2\pi i/3}, e^{2 \pi i/6}\}$ . We apply the same trick, but when we see any problem point $z$ now we will also see the congruent one at $-1/z$ . We can just put two disks, remove one from the contour and indent one to the contour. This completes the proof. $\square$

First Examples of Modular Forms: Eisenstein Series and Poincaré Series

When I was writing everything before this, I thought I could skip these explicit examples and then move on to our special case, but now I changed my mind and I think that it’s a nicer way to approach the subject by introducing these explicit examples.

The thing is, even if I said it’s “explicit”, it’s not that explicit. I’ll begin by showing the idea and then move on to the definition.

Let’s play a little game here. If I have a grid with $n$ rows and $m$ columns, in each cell there exists at most one object, and I have some function $f$ that has its value depending on the multiset of objects in each row (not depending on order). What happens now? If we permute the columns of the grid, then the multiset of objects in each row doesn’t change, and so the value of this grid under $f$ stays the same, right? Now if we want to construct a function that has its value preserved under permutations of rows and permutations of columns, how would we construct? We would define

$\tilde{f}(\text{grid}) := \sum_{\sigma \text{ permuting rows}} f(\sigma \cdot \text{grid}).$

And now, for any permutation $\sigma$ of rows, of columns, or composition of both, we have $\tilde{f}(\sigma \cdot \text{grid}) = \tilde{f}(\text{grid})$ .

Now I’ll take this to the algebraic setting and make it more precise.

Lemma 8. For a group $G$ , a subgroup $H$ of $G$ , a set $X$ in which $G$ acts on $X$ , and a function $f \colon X \to \mathbb{C}$ which is $H$ -invariant (i.e. $f(hx) = f(x)$ for all $h \in H$ and $x \in X$ ), we can define

$\tilde{f}(x) := \sum_{\bar{g} \in H \backslash G} f(gx).$

If the sum makes sense, i.e., if it converges absolutely, then $\tilde{f}(gx) = \tilde{f}(x)$ for all $g \in G$ and $x \in X$ .

Proof. First, $\tilde{f}$ is well-defined, because for two elements $g_1$ and $g_2$ in $G$ corresponding to the same coset, we have $Hg_1 = Hg_2$ , i.e., there exists $h \in H$ with $h = g_2 g_1^{-1}$ . This means $f(g_2x) = f(g_2g_1^{-1}g_1x) = f(hg_1x)$ . But $f$ is $H$ -invariant, so this is actually $f(g_1x)$ . This means one can take any element in the same coset inside the sum. The condition of absolute convergence implies that we can commute the sum however we want.

Now, pick any $g' \in G$ . We have

$\tilde{f}(g'x) = \sum_{\bar{g} \in H \backslash G} f(gg'x).$

But $\Phi \colon Hg \mapsto Hgg'$ only permutes the cosets in $H \backslash G$ . Indeed, for any coset $Hg_0$ we can pick $g = g_0g'^{-1}$ then $\Phi(Hg) = Hgg' = Hg_0g^{-1}g' = Hg_0$ , so $\Phi$ is surjective. Now if $Hg_1g' = Hg_2g'$ , then $g_2g'g'^{-1}g_1^{-1} \in H$ means there exists $h = g_2g'g'^{-1}g_1^{-1}$ and so $hg_1 = g_2$ , i.e., $Hg_1 = Hg_2$ , so $\Phi$ is injective. This means the sum inside can be rewritten as the original form, and so $\tilde{f}(g'x) = \tilde{f}(x)$ . $\square$

We will adapt this idea into our settings of modular forms. Let $G := \operatorname{SL}_2(\mathbb{Z})$ , $H := \langle \pm T \rangle$ , and $X := \mathbb{H}$ . Then there exists a very simple function that is $\pm T$ -invariant (i.e. periodic). Fix $m \in \mathbb{Z}$ and define

$f_m(z) := e^{2\pi i m z}.$

This is periodic. Now we can do something similar to the lemma above, but adapt things a bit to make it satisfy the modularity condition. Define

$P_{m,k}(z):= \sum_{\left[ \begin{pmatrix}a & b\\c & d\end{pmatrix}\right] \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m\left(\frac{az+b}{cz+d}\right)(cz+d)^{-k}.$

(To make it easier, we write $g$ and $g'$ for $\begin{pmatrix}a & b\\c & d\end{pmatrix}$ and $\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}$ next.)

We have, for any $g' \in \operatorname{SL}_2(\mathbb{Z})$ , that

$\begin{align*} P_{m,k}(g'z) &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)(cg'z+d)^{-k} \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)\left(c\frac{a'z+b'}{c'z+d'} + d\right)^{-k} \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(gg'z)\left(\frac{(ca'+dc')z + cb'+d'd}{c'z+d'}\right)^{-k} \\ &= (c'z+d')^{k} \sum_{\overline{g''} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} f_m(g''z)(c''z+d'')^{-k} \\ &= (c'z+d')^k P_{m,k}(z), \end{align*}$

where $g'' = gg'$ , with $g'' =\begin{pmatrix}a'' & b''\\c'' & d''\end{pmatrix}$ . This $P_{m,k}$ is called the Poincaré series, and for $m = 0$ it is called the Eisenstein series, denoted by $E_k := P_{0,k}$ . We’ve showed that if they are well-defined, then they are modular forms of weight $k$ . We’re now left with showing that they are well-defined, i.e., that the series converges absolutely.

Proposition 9. For even $k \geq 4$ and $m \geq 0$ , the series $P_{m,k}$ converges locally uniformly absolutely for all $z \in \mathbb{H}$ .

Proof. Let us observe the absolute form:

$\begin{align*} &\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |f_m(gz)(cz+d)^{-k}| \\ &= \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |e^{2 \pi i m gz}| |cz+d|^{-k}. \end{align*}$

We know that $gz \in \mathbb{H}$ for all $g \in \operatorname{SL}_2(\mathbb{Z})$ , so we may write $gz = x+iy$ with $y > 0$ , and observe that $e^{2\pi i m gz} = e^{2 \pi i m (x+iy)} = e^{2 \pi i m x - 2 \pi m y}$ , and so its absolute value (i.e. complex modulus) is $e^{-2\pi m y}$ , which is between zero and one. This means

$\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |e^{2 \pi i m gz}| |cz+d|^{-k} \leq \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}.$

Let us now show the convergence for $\sum_{\bar{g} \in \langle T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}$ . We claim that for each $(c, d) \in \mathbb{Z}^2$ with $\gcd(c,d)=1$ , there exists $(a, b) \in \mathbb{Z}^2$ such that $\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ . Furthermore, if there are multiple candidates, then they belong to the same coset. To show this, fix $(c, d) \in \mathbb{Z}^2$ with $\gcd(c,d)=1$ . By Bézout’s identity from elementary number theory, there exists an integer solution $(a_0, b_0) \in \mathbb{Z}^2$ satisfying $a_0d-b_0c = 1$ . This is a linear Diophantine equation, and so all solutions are described by $(a_t, b_t)$ , defined by $a_t = a_0+ct$ and $b_t = b_0+dt$ for $t \in \mathbb{Z}$ . Now we’re left with showing that $\begin{pmatrix}a_0 & b_0\\c & d\end{pmatrix}$ and $\begin{pmatrix}a_t & b_t\\c & d\end{pmatrix}$ belong to the same coset for all $t \in \mathbb{Z}$ . Indeed, we have

$\begin{align*} \begin{pmatrix}a_t & b_t\\c & d\end{pmatrix}\begin{pmatrix}a_0 & b_0\\c & d\end{pmatrix}^{-1} &= \begin{pmatrix}a_t & b_t\\c & d\end{pmatrix}\begin{pmatrix}d & -b_0\\-c & a_0\end{pmatrix} \\ &= \begin{pmatrix}a_td-b_tc &-b_0a_t + b_ta_0\\cd-dc & -cb_0+da_0\end{pmatrix}\\ &= \begin{pmatrix}1 & -b_0a_t + b_ta_0\\0 & 1\end{pmatrix} \in \langle \pm T \rangle. \end{align*}$

This shows that they belong to the same coset. Now, also observe that elements from the same coset have the same $(c, d)$ . This means what we were doing is essentially defining a projection

$\pi \colon \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z}) \to \{(c, d) \in \mathbb{Z}^2 \colon \gcd(c,d) = 1\},$

sending $\begin{pmatrix}a & b\\c & d\end{pmatrix}$ to $(c, d)$ , and showed that it is well-defined, injective, and surjective, hence a bijection. Now we go back to the sum and see that

$\begin{align*} &\sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} |cz+d|^{-k}\\ &= \sum_{\substack{(c,d) \in \mathbb{Z}^2\\ \gcd(c,d)=1}} |cz+d|^{-k}\\ &\leq \sum_{(c,d) \in \mathbb{Z}^2 \setminus\{(0,0)\}} |cz+d|^{-k}. \end{align*}$

Now we recall some basic facts from analysis. In $\mathbb{R}^2$ , all norms are equal. Observe that $(x, y) \mapsto |xz+y|$ is a norm, so there exists $\alpha(z), \beta(z) > 0$ such that for all $(x, y) \in \mathbb{R}^2$ , $\alpha(z) |xz+y| \leq \max(|x|, |y|) \leq \beta(z) |xz+y|$ . This means

$\begin{align*} \sum_{(c,d) \in \mathbb{Z}^2 \setminus \{(0,0)\}} |cz+d|^{-k} &\leq \sum_{(c,d) \in \mathbb{Z}^2 \setminus \{(0,0)\}} \alpha(z)^k\max(|c|, |d|)^{-k} \\ &\leq \alpha(z)^k \sum_{r \geq 1} (8r)r^{-k} \end{align*}$

where the last inequality follows from counting integer lattice points $(c,d)$ that has $\max(|c|, |d|) = r$ . (They are integer lattice points on the square of side length $2r+1$ . Remove one from each side to avoid double counting, then multiply by four to count them all.) Observe that for $k > 2$ , this converges, so in particular, for $k \geq 4$ , the Poincaré series converges absolutely for fixed $z$ . Now, let us show that it converges locally uniformly absolutely. We can actually define $\frac{1}{\alpha(z)} := \sup_{(x,y) \ne (0, 0)} \frac{|xz+y|}{\max(|x|, |y|)} = \sup_{\max(|x|, |y|) = 1} |xz+y|$ . This is continuous, so for fixed $z$ , one can pick a small compact neighborhood $K$ of $z$ and then the value of $\alpha(z')$ for $z' \in K$ will be bounded away from zero and infinity. This proves that the convergence is locally uniform. $\square$

Now that $P_{m, k}$ is locally uniformly absolutely convergent, for any $m \geq 0$ and $k \geq 3$ , we observe that each piece $z \mapsto f_m(z)(cz+d)^{-k}$ is a holomorphic function, and so the sum over all pieces in a locally uniformly absolutely convergent series must also be holomorphic. This means the following.

Corollary 10. For any $m \geq 1$ and even $k \geq 4$ , we have $P_{m,k} \in \mathcal{S}_k$ , and for $m = 0, P_{0, k} = E_k \in \mathcal{M}_k \setminus \mathcal{S}_k$ .

Proof. We’ve shown the holomorphicity in $\mathbb{H}$ , and that they satisfy the modularity condition. Now we’re left with studying their behavior at infinity. We have, for $m \geq 1$ , that

$\lim_{y \to \infty} P_{m,k}(x+iy) = \lim_{y \to \infty} \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} e^{2 \pi i m (x+iy)}(cz+d)^{-k} = 0.$

In particular, it is holomorphic and vanishes, so $P_{m,k} \in \mathcal{S}_k$ . However, for $P_{0,k} = E_k$ , we have

$\lim_{y \to \infty} E_k(x+iy) = \lim_{y \to \infty} \sum_{\bar{g} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cx+icy+d)^{-k}.$

For the trivial coset, we have $(c,d) = (0, 1)$ , and so the term inside is one. For the rest, the terms inside tends to zero, and so we have $E_k(\infty) = 1$ , so $E_k \in \mathcal{M}_k \setminus \mathcal{S}_k$ . $\square$

It seems like we’ve constructed a huge family of modular forms, namely the Poincaré series. However, this is a bit far from obvious, but a lot of them turn out to be zero, so we get nothing new. Anyway, the Eisenstein series are surely not zero, since they extends to one at infinity.

The Space of Holomorphic Modular Forms

Now we’re ready to study $\mathcal{M}_k$ and $\mathcal{S}_k$ . Recall that $\mathcal{M}_k$ is the space of holomorphic modular forms that are also holomorphic at infinity. This means, if $f \in \mathcal{M}_k$ , then $\operatorname{ord}_z(f) \geq 0$ for all $z \in \mathbb{H} \cup \{\infty\}$ . Before we continue, observe that if $f$ is a modular form, we have

$f(-1/z) = (-z)^kf(z),$

and so

$f(z) = f(-1/(-1/z)) = (-(-1/z))^kf(-1/z) = z^{-k}(-z)^k f(z) = (-1)^kf(z).$

So if $k$ is odd, then $f$ must be identically zero! This tells us that $\mathcal{M}_k = 0$ for $k$ odd, and so from now on we will only consider even $k$ .

Now, we recall the valence formula from Theorem 7. Suppose $f \not\equiv 0$ , then

The idea here is to try to solve the equation (solve for $\operatorname{ord}_z(f)$ ) given by the valence formula.

Let us begin with $\mathcal{M}_0$ . For $k = 0$ , the right hand side is zero, and we knew that the orders of a holomorphic modular form that is also holomorphic at infinity must be nonnegative, so the only possible outcome is the the order of $f$ is everywhere (including at infinity) is zero! This means $|f|$ is bounded away from zero and infinity. Let $f(z) =\tilde{f}(e^{2\pi i z}) = \sum_{n \geq 0} a_n e^{2\pi i n z}$ be its Fourier expansion at infinity. Recall the fundamental domain $\mathcal{D} := \{z \in \mathbb{H} \colon |z| \geq 1; |\operatorname{Re}(z)| \leq \frac{1}{2}\}$ . We thicken the fundamental domain by a small margin $\varepsilon > 0$ (to make it concrete one can just pick $\varepsilon := \frac{1}{100}$ ) and define $\mathcal{D}_\varepsilon := \{z \in \mathbb{H} \colon \operatorname{dist}(z, \mathcal{D}) < \varepsilon\}$ . We push $\mathcal{D}_\varepsilon$ into the conformal equivalence $\Phi$ and send it into an open subset of the disk. This misses the point at infinity (zero of the disk), so we complete it and define $\bar{\mathcal{D}'_\varepsilon} := \Phi(\mathcal{D}_\varepsilon) \cup \{0\}$ . This is very similar to an open disk (except maybe the boundary isn’t a perfectly round circle) inside the unit disk. We also send the original fundamental domain (with completion) to the unit disk as $\bar{\mathcal{D}'} := \Phi(\mathcal{D}) \cup \{0\}$ . Now this is a compact disk inside the unit disk, and $\bar{\mathcal{D}'} \subseteq \bar{\mathcal{D}'_\varepsilon}$ . By the maximum modulus principle, if $\tilde{f}$ is not constant, there cannot be a local maximum of $\tilde{f}$ inside $\bar{\mathcal{D}'_\varepsilon}$ . But since $f$ is modular, the image of $f$ under $\mathbb{H} \cup \{\infty\}$ is already presented by the image of $f$ under $\mathcal{D} \cup \{\infty\}$ , which is the image of $\tilde{f}$ under $\bar{\mathcal{D}'}$ , so a global maximum of $f$ (including maybe at infinity) must correspond to a local maximum of $\tilde{f}$ inside $\bar{\mathcal{D}'_\varepsilon}$ . It turns out that this is not possible. Therefore, $\tilde{f}$ must be constant on $\bar{\mathcal{D}'_\varepsilon}$ . By modularity, it is constant on the whole unit disk, and so $f$ is constant. This means $\mathcal{M}_0 = \{f \colon \mathbb{H} \to \mathbb{C} \colon f(z) = k, \forall z\}$ is the space of constant functions.

For $\mathcal{M}_2$ , we put $k = 2$ into the valence formula and observe that there is no solution to the equation. Putting any point to have order greater than zero gives too high value on the left hand side, and there cannot be points with negative order, so $\mathcal{M}_2 = \{0\}$ is the zero vector space.

Before moving on to other weights, let me clarify that, in $\mathcal{M}_k$ for a fixed $k$ , two nonzero modular forms are linearly dependent (i.e. one is a multiple of another) if and only if their orders (at every orbit) match. Why? First, if $f = \lambda g \in \mathcal{M}_k \setminus \{0\}$ for some $\lambda \in \mathbb{C}^\times$ then $\operatorname{ord}_z(f) = \operatorname{ord}_z(\lambda g) = \operatorname{ord}_z(\lambda) + \operatorname{ord}_z(g) = 0 + \operatorname{ord}_z(g)$ . So the orders match everywhere. Now, if we know that $f, g \in \mathcal{M}_k \setminus \{0\}$ are two modular forms that the orders match everywhere, how can we deduce that they’re linearly dependent? Well, put $h = f/g$ . This is a meromorphic modular form of weight zero, and for any $z \in \mathbb{H} \cup \{\infty\}$ , we have $\operatorname{ord}_z(h) = \operatorname{ord}_z(f/g) = \operatorname{ord}_z(f) - \operatorname{ord}_z(g) = 0$ , since the orders match at every point. This means $h$ is holomorphic everywhere, including at infinity, so $h \in \mathcal{M}_0 \setminus \{0\}$ , and we’ve already deduced that $\mathcal{M}_0$ are just constant functions, so we have $h = f/g = \lambda$ for some $\lambda \in \mathbb{C}^\times$ , i.e., that $f = \lambda g$ . The case for which one of them (or both) is identically zero is uninteresting and doesn’t give much sense to the “order” at a point.

For $\mathcal{M}_4$ , put $k = 4$ into the valence formula, and observe that in order to make the sum of orders to be exactly $\frac{1}{3}$ , one order must come from the point $\exp(2i\pi/3)$ , and nothing else (because everything else contributes strictly greater than $\frac{1}{3}$ to the equation). Recall that $E_4 \in \mathcal{M}_4 \setminus \{0\}$ . This means any nonzero modular form in $\mathcal{M}_4$ must be a multiple of $E_4$ , so $(E_4)$ forms a basis of $\mathcal{M}_4$ .

For $\mathcal{M}_6$ , we repeat the same game, and deduce that the only possible scenario is when one order comes from the point $i$ . Recall that $E_6 \in \mathcal{M}_6 \setminus \{0\}$ and so it forms a basis.

For $\mathcal{M}_8$ , we repeat the same game, and deduce that the only possible way is that $\exp(2i\pi/3)$ has double order. Recall that $E_8 \in \mathcal{M}_8 \setminus \{0\}$ and so it forms a basis.

For $\mathcal{M}_{10}$ , we have $\frac{10}{12} = \frac{5}{6} = \frac{1}{3} + \frac{1}{2}.$ One order comes from $i$ , another comes from $\exp(2i\pi/3)$ and no other possibilities. So $E_{10}$ forms a basis.

For $\mathcal{M}_{12}$ , this is where things get harder. We have $\frac{12}{12} = 1$ . Maybe triple order at $\exp(2i\pi/3)$ ? Maybe double order at $i$ ? Or maybe single order at somewhere else? We’re still not sure, but at least we have $E_{12} \in \mathcal{M}_{12}$ .

It turns out that we cannot simply repeat the same game and apply it to large weights. However, we can go through this by some other ways around.

Lemma 11. The form $\Delta := E_4^3 - E_6^2$ is actually in $\mathcal{S}_{12}$ . It is not identically zero, and it has no zero and no pole in $\mathbb{H}$ . The zero at infinity is simple.

Proof. We know that $E_4^3, E_6^2 \in \mathcal{M}_{12}$ , and so their sum is also in there. We’re left with showing that $\Delta(\infty) = 0$ . Indeed, we’ve seen in Corollary 10. that $E_4(\infty) = E_6(\infty) = 1$ and so $\Delta(\infty) = 0$ . This shows that it belongs to $\mathcal{S}_{12}$ . Now, it is not identically zero, because $\Delta(i) = E_4(i)^3 - E_6(i)^3 = E_4(i)^3 \ne 0$ . Now, we see that $\operatorname{ord}_\infty(\Delta) \geq 1$ . By the valence formula, it must be equal to one, and the order of everywhere else is zero, hence no zero and no pole anywhere else. Since the order is one, the zero at infinity is simple. $\square$

Lemma 12. The subspace $\mathcal{S}_k \subseteq \mathcal{M}_k$ is a vector subspace of dimension $\dim \mathcal{M}_k -1$ or $\dim \mathcal{M}_k$ , for all $k \geq 0$ . Moreover, for even $k \geq 4$ , $\dim \mathcal{S}_k = \dim \mathcal{M}_k - 1$ .

Proof. The subspace $\mathcal{S}_k$ can be defined as the kernel of the linear map $f \mapsto f(\infty)$ . By the rank-nullity theorem, $\dim \operatorname{ker} + \dim \operatorname{im} = \dim \mathcal{M}_k$ . The dimension of the image can be zero (if the whole space matches the subspace of cusp forms), or one (since the linear map lands in $\mathbb{C}$ ). Now, we’ve seen from Corollary 10., that for even $k \geq 4$ , $E_k \in \mathcal{M}_k \setminus \mathcal{S}_k$ , so necessarily $\dim \mathcal{S}_k$ cannot be equal to $\dim \mathcal{M}_k$ . This completes the proof. $\square$

Lemma 13. The map $\mathcal{M}_k \to \mathcal{S}_{k+12}$ sending $f \mapsto f\Delta$ is an isomorphism of vector spaces.

Proof. Obviously the map is well-defined and $\mathbb{C}$ -linear. It is injective: if $f\Delta$ is identically zero, then on $\mathbb{H}$ , $f$ must be identically zero (because $\Delta$ is nowhere zero), and so $f \equiv 0$ . It is surjective: if $g \in \mathcal{S}_{k+12}$ , we can define $f = g/\Delta$ . A priori this is just a meromorphic modular form, but by evaluating the order at each point $z$ , we see that $\operatorname{ord}_z(f) = \operatorname{ord}_z(g) - \operatorname{ord}_z(\Delta) = \operatorname{ord}_z(g) \geq 0$ for all $z \in \mathbb{H}$ . Now, for $\infty$ , we have $\operatorname{ord}_\infty(g) \geq 1$ since $g$ is a cusp form. And we knew (Lemma 11) that $\operatorname{ord}_\infty(\Delta) = 1$ , so $\operatorname{ord}_\infty(f) = \operatorname{ord}_\infty(g) - \operatorname{ord}_\infty(\Delta) \geq 0$ . This means $f$ is also holomorphic at infinity. This shows that $f \in \mathcal{M}_k$ satisfies $f\Delta = g$ , and so the linear map is surjective. Therefore, it is an isomorphism of vector spaces. $\square$

This shows us that $\mathcal{M}_0 \cong \mathcal{S}_{12}$ , and so $\dim \mathcal{S}_{12} = 1$ . Since $0 \not\equiv \Delta \in \mathcal{S}_{12}$ , it forms a basis there.

By Lemma 12., we see that $\dim \mathcal{M}_{12} = 2$ , and $E_{12} \in \mathcal{M}_{12} \setminus \mathcal{S}_{12}$ , so $\{E_{12}, \Delta\}$ forms a basis for $\mathcal{M}_{12}$ . We’re now ready to attack the spaces of bigger weights. But before moving on, let us quickly recap our knowledge of smaller spaces.

k	A basis for M_k	A basis for S_k
0	{1}	{}
2	{}	{}
4	{E_4}	{}
6	{E_6}	{}
8	{E_8}	{}
10	{E_{10}}	{}
12	{E_{12}, Delta}	{Delta}

Theorem 14. For even $k \geq 0$ , one has $\dim \mathcal{M}_k = \begin{cases} \lfloor \frac{k}{12}\rfloor &; \frac{k}{2} \equiv 1 \pmod{6}\\ \lfloor \frac{k}{12}\rfloor+1 &; \frac{k}{2} \not\equiv 1 \pmod{6}.\end{cases}$ Otherwise, for $k$ odd or negative, one has $\mathcal{M}_k = \{0\}$ . Moreover, there exists an algorithm to construct a basis for $\mathcal{M}_k$ in terms of Eisenstein series and $\Delta$ . The same can be done for $\mathcal{S}_k$ as well, where for even $k \geq 4$ , $\dim \mathcal{S}_k = \dim \mathcal{M}_k - 1$ .

Proof. For $k \in \{0, 2, 4, 6, 8, 10, 12\}$ we’ve finished the proof already. For the rest, we proceed by induction. For a fixed even $k \geq 14$ , one has $\mathcal{S}_k \cong \mathcal{M}_{k-12}$ by Lemma 13. Suppose $\mathscr{B}_{k-12}$ is a basis for $\mathcal{M}_{k-12}$ . We can use the isomorphism $\mathcal{S}_k \cong \mathcal{M}_{k-12}$ to construct a basis for $\mathcal{S}_k$ . More explicitly, a possible basis for $\mathcal{S}_k$ can be given by $\{f\Delta \colon f \in \mathscr{B}_{k-12}\}$ , and we see that $\dim \mathcal{S}_k = \dim \mathcal{M}_{k-12}$ . Apply Lemma 12 to see that $\operatorname{dim} \mathcal{S}_k = \operatorname{dim} \mathcal{M}_k-1$ . Since $\mathcal{S}_k \subseteq \mathcal{M}_k$ , from such basis, we can add one more modular form that is not in $\mathcal{S}_k$ (hence linearly independent), to form a basis for $\mathcal{M}_k$ . We have one such natural candidate: $E_k$ . Hence,

$\mathscr{B}_k := \{f\Delta \colon f \in \mathscr{B}_{k-12}\} \cup \{E_k\}$

is a basis for $\mathcal{M}_k$ . Now, we have $\dim \mathcal{M}_k = \dim \mathcal{M}_{k-12}+1$ . Plugging in the formula for $\dim \mathcal{M}_{k-12}$ to conclude that the formula for $\dim \mathcal{M}_k$ holds as well. $\square$

Now we’ve made a huge progress. We’ve studied the space of modular forms to the point that we can construct a basis for holomorphic space of every possible integer weight! What’s so nice about this? The idea is: If we know that a function $f \colon \mathbb{H} \to \mathbb{C}$ is a holomorphic modular form of weight $k$ that is also holomorphic at infinity, then we know that $f \in \mathcal{M}_k$ . But we’ve already construct the bases! And so $\mathcal{M}_k = \operatorname{Span}_{\mathbb{C}}(\mathscr{B}_k)$ , so there exists a unique $(\lambda_1, \dots, \lambda_d) \in \mathbb{C}^d$ such that $f = \sum_{i=1}^d \lambda_i b_i$ , where $d := \dim \mathcal{M}_k$ and $\mathscr{B}_k = \{b_1, \dots, b_d\}$ . We’ve transformed the problem of studying $f$ into the problem of studying the elements $b_1, \dots, b_d$ of the basis!

Now the next thing we would want to do is to study more on the Eisenstein series (and maybe also the Poincaré series, but maybe we can skip them since we don’t use them to construct the bases). However, the content there would require heavier complex-analytic machinery like the Poisson summation formula, so let’s just take a break from modular forms and take a quick look at these tools from analysis.

Fourier Transform and Poisson Summation Formula

First, we recall our basic result from the theory of Fourier transform that will be useful for us.

Definition. For a Lebesgue-integrable function $f \colon \mathbb{R} \to \mathbb{C}$ , we define its Fourier transform $\hat{f} \colon \mathbb{R} \to \mathbb{C}$ as

$\hat{f}(t) := \int_{-\infty}^\infty f(x) e^{-2\pi i x t} \mathrm{d}x.$

(For simplicity of notation, we use the variable $x$ in $f(x)$ and use $t$ in $\hat{f}(t)$ .)

Lemma 15. The function $g(x) := e^{-\pi x^2}$ defined for all $x \in \mathbb{R}$ has $\hat{g}(t) = e^{-\pi t^2}$ , i.e., its Fourier transform is itself.

Note. We’re not going to use Lemma 15 immediately right now, so you can comfortably skip this and come back later. I’m just putting it here because I don’t want to put results from Fourier transform all over the place.

Proof. We first expand the definition for $\hat{g}$ and see that for all $t \in \mathbb{R}$ , we have

$\hat{g}(t) = \int_{-\infty}^\infty e^{-\pi x^2}e^{-2\pi i x t} \mathrm{d}x,$

i.e., the integral converges, and the function $\hat{g}$ is at least $\mathcal{C}^1$ . Now, we take derivative by $t$ on both sides to see

$\partial_t \hat{g}(t) = \partial_t \int_{-\infty}^\infty e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty \partial_t e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x,$

where the right equality holds by Lebesgue’s dominated convergence theorem for derivatives. Let us check the conditions we need. Define $f(x, t) := e^{-\pi x^2-2\pi i xt}$ for all $x, t \in \mathbb{R}$ . We need to check:

For each $t_0 \in \mathbb{R}$ , the function $t \mapsto f(x, t)$ is differentiable at $t_0$ .
For each $t_0 \in \mathbb{R}$ there exists a neighborhood $U$ of $t_0$ in $\mathbb{R}$ and a function $h \in L^1(\mathbb{R})$ such that $|\partial_t f(x, t)| \leq h(x)$ for all $x \in \mathbb{R}$ and all $t \in U$ .
For each $t_1 \in U$ , the function $x \mapsto f(x, t_1)$ is in $L^1(\mathbb{R})$ .

First of all, we take $\partial_t f(x, t) = e^{-\pi x^2} (-2\pi i x)e^{- 2 \pi i x t}$ to see that it is indeed differentiable everywhere. The first condition is satisfied. Now, for any fixed $t_0 \in \mathbb{R}$ , we take $U := (t_0-1, t_0+1)$ and observe that $|\partial_t f(x, t)| \leq |-2\pi i x|e^{-\pi x^2} \sup_{t \in U} |e^{-\pi i xt}| = 2\pi |x| e^{-\pi x^2}$ . Define $h(x) := 2\pi |x| e^{-\pi x^2}$ and observe that

$\|h\|_1 =\int_{-\infty}^{\infty} |h(x)| \mathrm{d}x = \int_{-\infty}^\infty 2\pi|x|e^{-\pi x^2} = 4\pi \int_0^\infty xe^{-\pi x^2} \mathrm{d}x.$

Put $u = x^2$ substitution (this works since $x \mapsto x^2$ is a $\mathcal{C}^1$ -diffeomorphism on $(0, \infty)$ ) with $\mathrm{d}u = 2x\mathrm{d}x$ . We have

$\|h\|_1 = 2\pi \int_0^\infty e^{-\pi u} \mathrm{d}u = 2\pi \left[\frac{1}{-\pi} e^{-\pi u}\right]_0^\infty = 2\pi(e^{-\pi 0} - \lim_{x \to \infty} e^{-\pi x}) = 2\pi.$

This shows that $h \in L^1(\mathbb{R})$ , and the second condition is satisfied. Now, let us check the third condition. Fix $t_1 \in U$ and observe that

$\|x \mapsto f(x, t_1)\|_1 = \int_{-\infty}^\infty |e^{-\pi x^2}| |e^{-2\pi i x t_1}| \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} \mathrm{d}x = 1.$

This shows that the third condition is satisfied. Now let us forget about $f(x,t), h(x), t_0, t_1, U$ and the extra notations we used in DCT, and let’s go back to $\partial_t \hat{g}(t) .$ We have

$\partial_t \hat{g}(t) = \int_{-\infty}^\infty \partial_t e^{-\pi x^2 - 2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} (-2\pi i x)e^{- 2 \pi i x t} \mathrm{d}x.$

To continue, let us try to attack this with integration by parts. Observe that the derivative of $x \mapsto e^{-\pi x^2}$ is $x \mapsto -2\pi x e^{-\pi x^2}$ . Put $u := ie^{-2\pi i x t}$ and $\mathrm{d}v := -2\pi x e^{-\pi x^2}$ so that

$\int_{-\infty}^\infty u \mathrm{d}v = [uv]_{-\infty}^{\infty} - \int_{-\infty}^\infty v \mathrm{d}u.$

We have $\mathrm{d}u = i(-2\pi i t)e^{-2\pi i x t} \mathrm{d}x = 2\pi te^{-2\pi i xt} \mathrm{d}x$ and $v = e^{-\pi x^2}$ , so we have

$\partial_t \hat{g}(t) = \int_{-\infty}^\infty u\mathrm{d}v = [ie^{-2\pi i x t} e^{-\pi x^2}]_{-\infty}^{\infty} - \int_{-\infty}^{\infty} e^{-\pi x^2} 2\pi t e^{-2\pi i x t} \mathrm{d}x.$

Since $\lim_{x \to +\infty} e^{-\pi x^2} = \lim_{x \to -\infty} e^{-\pi x^2} = 0$ , the middle boundary term vanishes, and we have

$\partial_t \hat{g}(t) = -2\pi t \int_{-\infty}^\infty e^{-\pi x^2} e^{- 2 \pi i x t} \mathrm{d}x = -2\pi t \hat{g}(t).$

Now we’re looking for $\hat{g}$ such that $\hat{g}' = -2\pi t \hat{g}$ . This is a basic ODE. We give up our dignity and resort to the physicists’ notation for a while:

$\frac{\mathrm{d}\hat{g}}{\mathrm{d}t} = -2\pi t \hat{g}$

$\frac{1}{\hat{g}} \mathrm{d}\hat{g} = -2\pi t \mathrm{d}t$

and we integrate both sides like a boss:

$\log \hat{g} + (\mathrm{const}) =\int \frac{1}{\hat{g}} \mathrm{d}\hat{g} = \int-2\pi t\mathrm{d}t = -\pi t^2 + (\mathrm{const}).$

Exponentiating both sides, we end up with

$\hat{g} = Ce^{-\pi t^2}$

for some $C \in \mathbb{C}$ . Let us try to plug in some basic value for $\hat{g}$ from the definition, say, evaluate at $t = 0$ and see that $\hat{g}(0) = \int_{-\infty}^\infty e^{-\pi x^2} e^{-2\pi i x (0)} \mathrm{d}x = \int_{-\infty}^\infty e^{-\pi x^2} \mathrm{d}x = 1.$ This means $C = 1$ , and our only solution for $\hat{g}$ is $\hat{g}(t) = e^{-\pi t^2}$ , defined for all $t \in \mathbb{R}$ . We can easily plug this in and check that it satisfies the ODE and the initial conditions. This completes the proof of Lemma 15. $\square$

Theorem 16. (A version of the Poisson Summation Formula) Let $f \colon \mathbb{R} \to \mathbb{C}$ be $\mathcal{C}^1$ and $L^1(\mathbb{R})$ , such that $f' \in L^1(\mathbb{R})$ also. If the series $\sum_{n \in \mathbb{Z}} f'(n+x)$ is locally uniformly absolutely convergent, then

$\sum_{n \in \mathbb{Z}} f(n) = \sum_{m \in \mathbb{Z}} \hat{f}(m).$

Note. The final hypothesis that $\sum_{n \in \mathbb{Z}} f'(n+x)$ is locally uniformly absolutely convergent can be removed, but it will complicate the proof too much since we would otherwise have to detour through Dirichlet-Jordan theorem for pointwise Fourier series convergence for functions with bounded variation.

Proof. Put $F(x) := \sum_{n \in \mathbb{Z}} f(n+x)$ . We will check that this is well-defined by proving that the series is locally uniformly absolutely convergent. First, observe that, by the fundamental theorem of calculus, we have $\int_a^b f'(t) \mathrm{d}t = f(b) - f(a)$ . Put $a = n+x$ and observe that

$f(n+x) = f(b) - \int_{n+x}^b f'(t)\mathrm{d}t.$

By taking absolute values, and applying triangle inequality, we have

$|f(n+x)| \leq |f(b)| + \int_{n+x}^b |f'(t)| \mathrm{d}t$

Taking integral for $b$ from $n+x$ to $n+x+1$ , we get

$\begin{align*} |f(n+x)| &= \int_{n+x}^{n+x+1} |f(n+x)| \mathrm{d}b \\ &\leq \int_{n+x}^{n+x+1}|f(b)| \mathrm{d}b + \int_{n+x}^{n+x+1}\int_{n+x}^b |f'(t)| \mathrm{d}t \\ &\leq \int_{n+x}^{n+x+1} |f(b)| \mathrm{d}b + \int_{n+x}^{n+x+1} \int_{n+x}^{n+x+1} |f'(t)| \mathrm{d}t \\ &= \int_{n+x}^{n+x+1} (|f(b)|+|f'(b)|)\mathrm{d}b. \end{align*}$

Now, for any compact interval $I \subseteq \mathbb{R}$ , say $I = [c_L, c_R]$ , we have

$\begin{align*} \sup_{x \in I} \sum_{|n| > N} |f(n+x)| &\leq \sup_{x \in I} \sum_{|n| > N} \int_{n+x}^{n+x+1}(|f(b)|+|f'(b)|) \mathrm{d}b \\ &= \sup_{x \in I} \int_{(-\infty, x-N+1] \cup [x+N, \infty)} (|f(b)|+|f'(b)|) \mathrm{d}b \\ &\leq \sup_{x \in [c_L, c_R]} \int_{(-\infty, x-N+1]} (|f(b)|+|f'(b)|)\mathrm{d}b + \sup_{x \in [c_L, c_R]} \int_{[x+N, \infty)}(|f(b)|+|f'(b)|)\mathrm{d}b \\ &\leq \int_{(-\infty, c_R-N+1]} (|f(b)|+|f'(b)|)\mathrm{d}b + \int_{[c_L+N, \infty)} (|f(b)|+|f'(b)|) \mathrm{d}b, \end{align*}$

and as $N$ tends to infinity, since $f, f' \in L^1(\mathbb{R})$ , both integrals vanish. This shows the local uniform absolute convergence for $F$ . By local uniform absolute convergence, we also see that $F$ is continuous, and since $\sum_{n \in \mathbb{Z}} f'(n+x)$ is also locally uniformly absolutely convergent, we see that $F$ is also $\mathcal{C}^1$ . Now, by definition, $F$ is $1$ -periodic. By the theory of Fourier series, such function converges uniformly to its Fourier series

$\sum_{n \in \mathbb{Z}} c_ne^{2 \pi inx}$

where $c_n := \int_0^1 F(x) e^{-2\pi i n x} \mathrm{d}x$ .

In particular, we have (the left equality comes from definition of $F$ and the right equality comes from the convergence of the Fourier series at zero)

$\sum_{n \in \mathbb{Z}} f(n)= F(0) = \sum_{n \in \mathbb{Z}} c_n.$

Now let us compute $c_n$ . We have

$c_n = \int_{0}^1 F(x) e^{-2\pi i n x} \mathrm{d}x = \int_{0}^1 \sum_{m \in \mathbb{Z}} f(m+x)e^{-2\pi i n (m+x)} \mathrm{d}x = \int_{-\infty}^\infty f(x)e^{-2 \pi i n x} \mathrm{d}x = \hat{f}(n),$

where the equality before the last follows from $\sigma$ -additivity of the integral. And so this completes the proof. $\square$

The Fourier Coefficients of Eisenstein Series

Now that we have our toolbox ready, let us go back to modular forms and compute those coefficients!

Theorem 17. Let $k \geq 4$ be an even integer. Then, the Fourier coefficients of $E_k$ is given by

$a_n = \frac{(2\pi i)^k}{\zeta(k)(k-1)!} \sigma_{k-1}(n)$

for $n \geq 1$ , and $a_0 = 1$ , where $\sigma_{k-1}(n) := \sum_{d \mid n} d^{k-1}$ is the sum of $(k-1)$ -th power of divisors function.

The proof is going to be quite long, so before getting into it, let me try to give a brief overview. The first step is to recall the definition of $E_k$ that we have to sum over cosets in $\langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})$ , where we can select any representative and the result will be the same. This means we’d want to parametrize $\langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})$ by a nice collection of representatives. This is called Bruhat decomposition. Once this is done, we try to compute the sum over those representatives, but we will be stuck. However, the innermost piece of the sum will be sum over $m \in \mathbb{Z}$ , and the function is quite nice, nice enough to apply the Poisson summation formula and sum over its Fourier transform instead. This will give another ugly integral over a vertical line in the complex plane. We solve this by applying the residue theorem and switch to an integral over a semicircle arc. When the radius goes to infinity, the term from the semicircle vanishes, and so the ugly integral is solved. The residue is the source of the $\frac{1}{(k-1)!}$ term. Now the outer sums are parametrized in an arithmetically convenient way, and this gives something called the Ramanujan sum. The Ramanujan sum can be solved explicitly into sum in terms of divisors, and then we will obtain a Dirichlet series. Applying Möbius inversion, we get $\zeta(k)$ .

To prepare for the proof, let me introduce the basic analytic theory of numbers: Dirichlet convolution, Dirichlet series, and Möbius inversion.

Definition. For functions $f, g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}$ , we define its Dirichlet convolution, denoted by $f \star g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}$ as

$(f\star g)(n) := \sum_{d \mid n} f(d)g\left(\frac{n}{d}\right).$

Observe that $\star$ is commutative, associative, and there is a neutral element $\mathbf{1}_{=1}$ defined by

$\mathbf{1}_{=1}(n) := \begin{cases}1 &; n=1\\ 0 &; n\ne 1.\end{cases}$

Furthermore, for each function $f$ , if $f(1) \ne 0$ , then there is an inverse $f^{-\star}$ such that $f \star f^{-\star} = \mathbf{1}_{=1}$ . To see this, we can define $f^{-\star}$ inductively. We want such function to satisfy

$\sum_{d \mid n} f(d) f^{-\star}(n/d)=(f \star f^{-\star})(n) = \mathbf{1}_{=1}(n).$

For $n = 1$ , define $f^{-\star}(1) := \frac{1}{f(1)}$ . Now, for any $n$ , assume that $f^{-\star}$ has already been defined for all positive integers less than $n$ . Then, we have

$f^{-\star}(n) := \frac{-\sum_{d \mid n; d \ne 1} f(d)f^{-\star}(n/d)}{f(1)}.$

Observe that $f^{-\star}$ is uniquely defined, and this is the only choice to make it an inverse of $f$ . This shows that $\{f \colon \mathbb{Z}_{\geq 1} \to \mathbb{C} \colon f(1) \ne 0\}$ forms an abelian group under the operation $\star$ .

Proposition 18. (Möbius Inversion) The Möbius function $\mu$ defined by

$\mu(n) := \begin{cases}(-1)^k &; n \text{ is a product of } k \text{ distinct primes}\\0 &; \exists p \text{ prime}, p^2 \mid n\end{cases}$

for $n \geq 1$ (in particular $\mu(1) = 1$ ) satisfies

$\mu \star 1 = \mathbf{1}_{=1}.$

In other words, it is the inverse of the function that sends everything to one.

Proof. It is enough to show that

$(\mu \star 1)(n) = 0$

for $n \geq 2$ . (The case for $n = 1$ is obvious) Let us compute.

$(\mu \star 1)(n) = \sum_{d \mid n} \mu(d).$

Suppose $n = \prod_{i=1}^m p_i^{a_i}$ with $a_i > 0$ (apply the fundamental theorem of arithmetic to get this). Then, we have

$\sum_{d \mid n} \mu(d) = \sum_{(x_1, \dots, x_m) \in \{0,1\}^m} \mu\left(\prod_{i=1}^m p_i^{x_i}\right) = \sum_{x \in \{0,1\}^m} \mu(\|x\|_1) = \sum_{k=0}^m {m \choose k} (-1)^k = (1-1)^m = 0.$

The equality before the last follows from the binomial theorem. The equality before that follows by reparametrizing the sum by $\|x\|_1 = k$ for each $k$ , then count the number of tuples satisfying the condition. $\square$

For the Dirichlet series, the actual power lies in defining it over the complex plane (with suitable analytic continuation). However, that is an overkill for our purpose right now, and so I’ll only introduce it for a subset of positive reals.

Definition. For a function $f \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}$ , one can define its Dirichlet series $D_f$ as

$D_f(s) := \sum_{n \geq 1} \frac{f(n)}{n^s}.$

If $f$ grows polynomially, i.e. $f(n) = O(n^c)$ for some $c > 0$ , then the series converges absolutely (pointwise) for $s > c+1$ .

Note. For functions $f, g \colon \mathbb{Z}_{\geq 1} \to \mathbb{C}$ , if both $D_f$ and $D_g$ converge absolutely, we have $D_f D_g = D_{f \star g}$ for sufficiently large input. To see this, write

$D_f(s) D_g(s) = \left(\sum_{n \geq 1}\frac{f(n)}{n^{s}}\right)\left(\sum_{m \geq 1} \frac{g(m)}{m^{s}}\right) = \sum_{n \geq 1; m \geq 1} \frac{f(n)g(m)}{(nm)^s}.$

Now we reparametrize the sum by $nm = k$ to see that

$D_f(s)D_g(s) = \sum_{k \geq 1} \sum_{nm = k} \frac{f(n)g(m)}{k^s} = \sum_{k \geq 1} \sum_{n \mid k} \frac{f(n)g(k/n)}{k^s} = \sum_{k \geq 1} \frac{(f \star g)(k)}{k^s} = D_{f \star g}(s).$

Now we can go back to Theorem 17 by proving a sequence of results here.

Lemma 19. (Bruhat Decomposition)

$\operatorname{SL}_2(\mathbb{Z}) = \langle \pm T \rangle \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d) = 1}} \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle$

where $a_{c,d}, b_{c,d} \in \mathbb{Z}$ can be chosen arbitrarily such that $a_{c,d}d - b_{c,d}c = 1$ .

Proof. Take $g =\begin{pmatrix}\alpha & \beta\\\gamma & \delta\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ . If $\gamma = 0$ , then $\alpha \delta - \beta\gamma = 1$ implies $\alpha = \delta = \pm 1$ and so $g \in\langle \pm T \rangle$ . Now assume $\gamma \ne 0$ . First let’s assume $\gamma \geq 1$ . If we multiply $g$ by $T^n$ , we have

$gT^n = \begin{pmatrix}\alpha & \beta\\\gamma & \delta\end{pmatrix}\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} = \begin{pmatrix}\alpha & n\alpha +\beta\\\gamma & n\gamma + \delta\end{pmatrix}.$

Observe that there exists unique $n \in \mathbb{Z}$ such that $0 \leq n\gamma + \delta \leq \gamma -1$ . Fix such $n$ and put $c = \gamma;\, d = n\gamma + \delta$ . Write $a = \alpha$ and $b = n\alpha + \beta$ . If we multiply $gT^n$ by $T^m$ from the left, we get

$T^mgT^n = \begin{pmatrix}1 & m\\0 & 1\end{pmatrix} \begin{pmatrix}a & b\\c & d\end{pmatrix} = \begin{pmatrix}a+mc & b+md\\c & d\end{pmatrix}.$

But since $a_{c,d}d - b_{c,d}c = 1$ and $ad-bc=1$ , we see that there exists a unique $m \in \mathbb{Z}$ such that $a+mc = a_{c,d}$ and $b+md = b_{c,d}$ . Fix such $m$ to get

$T^mgT^n = \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix},$

and so

$g = T^{-m}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^{-n} \in \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.$

This shows that $g$ belongs to the decomposition when $\gamma \geq 1$ . Now for $\gamma \leq -1$ , we know that $-g$ belongs to such decomposition. Multiplying the left $T^{-m}$ by $-1$ gives

$g = -T^{-m}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^{-n} \in \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.$

This shows that

$\operatorname{SL}_2(\mathbb{Z}) \subseteq \langle \pm T \rangle \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d) = 1}} \langle \pm T \rangle \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle.$

The other inclusion is trivial since everything is included in $\operatorname{SL}_2(\mathbb{Z})$ . The fact that the union is disjoint comes from the fact that $m$ and $n$ are unique. If there were to be a nonempty intersection between the decompositions, then there must be a $g \in \operatorname{SL}_2(\mathbb{Z})$ such that it can be put into multiple decompositions, but by uniqueness of $(m, n)$ , we see that there is only one choice of $(c, d)$ . This completes the proof. $\square$

Corollary 20.

$E_k(z) = \sum_{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cz+d)^{-k} = 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (c(z+m)+d)^{-k}.$

Proof. By definition, $E_k(z) = \sum_{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \langle \pm T \rangle \backslash \operatorname{SL}_2(\mathbb{Z})} (cz+d)^{-k}$ . Parametrize the cosets by Bruhat decomposition to get

$\begin{align*} E_k(z) &= \sum_{\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \left\{\begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}\right\} \sqcup \bigsqcup_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}}\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix} \langle T \rangle} (c'z+d')^{-k} \\ &= 1+ \sum_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}\langle T \rangle} (c'z+d')^{-k} \\ &= 1+ \sum_{\substack{c \geq 1\\0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (c(z+m)+d)^{-k}, \end{align*}$

where the last equality comes from the fact that $\langle T \rangle = \{T^m \colon m \in \mathbb{Z}\}$ and

$\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} =\begin{pmatrix}a_{c,d} & b_{c,d}\\c & d\end{pmatrix}T^m = \begin{pmatrix}a_{c,d} & ma_{c,d} + b_{c,d}\\c & mc+d\end{pmatrix}$

varies over all $m \in \mathbb{Z}$ . $\square$

Now we look at the innermost sum and use the next lemma to apply Poisson summation formula there.

Lemma 21. Put $\varphi(x) := (c(z+x)+d)^{-k}$ . Then $\hat{\varphi}(t) = (2\pi i)^k t^{k-1}c^{-k} e^{2 \pi i (d/c+z)t} \frac{1}{(k-1)!}$ .

Proof. This is nothing more than computing the Fourier transform for $\varphi$ . However, even if it looks trivial, we still need to do some work. We have

$\hat{\varphi}(t) = \int_{-\infty}^\infty \varphi(x) e^{-2 \pi i x t} \mathrm{d}x = \int_{-\infty}^\infty (c(z+x)+d)^{-k} e^{-2\pi i x t} \mathrm{d}x.$

Put $u := c(z+x)+d$ . Obviously this is a $\mathcal{C}^1$ -diffeomorphism, with $\mathrm{d}u = c \mathrm{d}x$ , and so we get

$\hat{\varphi}(t) = \int_{icy-\infty}^{icy+\infty} u^{-k}e^{-2\pi i(\frac{u-d}{c}-z)t} \frac{1}{c} \mathrm{d}u = \frac{1}{c} e^{2\pi i (\frac{d}{c}+z)t}\int_{icy-\infty}^{icy+\infty} u^{-k} e^{-2\pi i \frac{1}{c}ut} \mathrm{d}u.$

Put $v = 2 \pi i \frac{1}{c} ut$ . This is a $\mathcal{C}^1$ -diffeomorphism, with $\mathrm{d}v = 2\pi i \frac{t}{c} \mathrm{d}u$ , and so

$\hat{\varphi}(t) = \frac{1}{c}e^{2\pi i(\frac{d}{c}+z)t} \int_{-2\pi y-i\infty}^{-2 \pi y+i\infty} \left(\frac{vc}{2 \pi it}\right)^{-k} e^{-v} \frac{\mathrm{d}v}{2 \pi i \frac{t}{c}} = e^{2 \pi i(\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} \int_{-2\pi y-i\infty}^{-2\pi y+i\infty} v^{-k}e^{-v} \mathrm{d}v.$

Now we’re left with computing the integral $I_a := \int_{-a-i\infty}^{-a+i\infty} v^{-k} e^{-v} \mathrm{d}v$ . For $a > 0$ , draw a semicircle centered at $-a$ , down to $-a-iR$ and circle counterclockwise to $-a+iR$ , then down back to $-a$ . This defines a contour $C_{a, R}$ , where we would want to compute our integral in. By the residue theorem, the contour integral is equal to the sum of residues, times $2 \pi i$ . Let us check the poles for $v \mapsto v^{-k}e^{-v}$ . We solve for $v^ke^v = 0$ , and we see that the only solution is at $v = 0$ . Observe that, near zero, we have a Laurent expansion

$v^{-k}e^{-v} = v^{-k} \sum_{n \geq 0} \frac{(-v)^n}{n!} = \sum_{n \geq 0} \frac{(-1)^nv^{n-k}}{n!}.$

So the residue, that is, the $(-1)$ -th term, is $\frac{(-1)^{k-1}}{(k-1)!}$ . Since $k$ is known to be even, the residue is $\frac{-1}{(k-1)!}$ . We get

$(2 \pi i)\frac{-1}{(k-1)!} = \int_{C_{a,R}} v^{-k} e^{-v} \mathrm{d}v = \int_{-a+iR}^{-a-iR}v^{-k}e^{-v} \mathrm{d}v + \int_{v = -a+Re^{\pi i \theta}; \theta = 0}^{\theta = 1} v^{-k} e^{-v} \mathrm{d}v.$

The change of variable $\theta \mapsto -a+Re^{\pi i \theta}$ is again a $\mathcal{C}^1$ -diffeomorphism (this one is less obvious because the inverse isn’t globally well-defined, but one can define such inverse in a chart containing the whole semicircle). We have $\mathrm{d}v = R\pi i e^{\pi i \theta} \mathrm{d}\theta$ , and so the integral becomes

$\int_0^1 (-a+Re^{\pi i \theta})^{-k} e^{a-R(\cos(\pi\theta)+i\sin(\pi\theta))} R \pi i e^{\pi i \theta} \mathrm{d}\theta.$

Taking absolute value inside, we see that its value behaves like $R^Ce^{-R}$ as $R \to \infty$ , for a fixed $C$ , and so this integral vanishes as $R$ goes to infinity. The left integral, as $R \to \infty$ , becomes $-I_a$ . Taking limits on both sides, we get

$\frac{-2\pi i}{(k-1)!} = -I_a.$

Plugging back to $\hat{\varphi}(t) = e^{2 \pi i(\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} I_{2 \pi y}$ to get

$\hat{\varphi}(t) = e^{2\pi i (\frac{d}{c}+z)t} c^{-k} (2\pi it)^{k-1} \frac{2 \pi i}{(k-1)!}.$

This completes the proof. $\square$

In order to apply Poisson summation formula, we must check that the function $\varphi$ satisfies the hypotheses. First, it is easy to see that it’s $\mathcal{C}^1$ since $\operatorname{Im}(z) > 0$ and so the value inside never becomes zero. Now, $\varphi(x) = O((1+x)^{-k})$ and so its integral (of its absolute value) over the reals converges. We have $\varphi'(x) = -kc(c(z+x)+d)^{-k-1}$ and hence is integral also converges. Now we’re left with checking that the infinite sum $\sum_{n \in \mathbb{Z}} \varphi'(n+x)$ absolutely converges. Well, for a fixed $x$ , $\varphi'(n+x) = O(n^{-k-1})$ and so the sum absolutely converges. This allows us to apply Poisson summation formula. We have

$\begin{align*} E_k(z) &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} \varphi(m) \\ &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} \hat{\varphi}(m) \\ &= 1 + \sum_{c \geq 1} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} (2\pi i)^k m^{k-1}c^{-k} e^{2 \pi i (d/c+z)m} \frac{1}{(k-1)!} \\ &= 1 + (2\pi i)^k \frac{1}{(k-1)!} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} \sum_{m \in \mathbb{Z}} m^{k-1} e^{2 \pi i (d/c+z)m} \\ &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c}. \end{align*}$

Lemma 22. Fix $c \geq 1$ and vary $d$ . We have

$\sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c} = \sum_{\delta \mid \gcd(c, m)} \mu(c/\delta)\delta.$

Proof. Define $R_n(c) := \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2\pi i nd/c}$ . First, observe that the full sum (without the coprime constraint) is easier to calculate. We have

$\sum_{0 \leq d \leq c-1} e^{2 \pi i N d/c} = \begin{cases}c &; c \mid N\\0 &; c \nmid N.\end{cases}$

Why? The first case is clear: if $c$ divides $N$ then the exponent is $2\pi i$ times an integer, and so, by periodicity we get $e^0 = 1$ . Summing over $c$ times, we get $c$ . Now, if $c$ doesn’t divide $N$ , consider the identity $1+z+\dots+z^{c-1} = \frac{z^c-1}{z-1}$ . The complex number $z = e^{2\pi i N/c}$ is not equal to $1$ , and so by plugging into this identity, the right hand side becomes zero. Now, we have

$\sum_{0 \leq d \leq c-1} e^{2 \pi i N d/c} = \sum_{\delta \mid c} \sum_{\substack{0 \leq d \leq c-1\\\gcd(d, c) = \delta}} e^{2\pi i N d/c} = \sum_{\delta \mid c} \sum_{\substack{0 \leq d' <\frac{c}{\delta}\\\gcd(d', \frac{c}{\delta}) = 1}} e^{2 \pi i N \frac{d'\delta}{\frac{c}{\delta} \delta}} = \sum_{\delta \mid c} R_N\left(\frac{c}{\delta}\right).$

In Dirichlet convolution notation, this means $\mathrm{id}\mathbf{1}_{\cdot \mid N} = 1 \star R_N$ . Apply Möbius inversion to see that

$\mu \star \mathrm{id}\mathbf{1}_{\cdot \mid N} = \mu \star 1 \star R_N = \mathbf{1}_{=1} \star R_N = R_N.$

Put $N := m$ and observe that

$R_m(c) = (\mathrm{id}\mathbf{1}_{\cdot \mid m} \star \mu)(c) = \sum_{\delta \mid c} \mathrm{id}\mathbf{1}_{\cdot \mid m}(\delta) \mu(c/\delta) = \sum_{\substack{\delta \mid c\\\delta \mid m}} \delta\mu\left(\frac{c}{\delta}\right) = \sum_{\delta \mid \gcd(c,m)} \delta \mu(c/\delta).$

This completes the proof. $\square$

Continuing from where we left, applying the previous lemma, we have

$\begin{align*} E_k(z) &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\substack{0 \leq d \leq c-1\\\gcd(c,d)=1}} e^{2 \pi i m d/c} \\ &= 1 + \frac{(2\pi i)^k}{(k-1)!} \sum_{m \in \mathbb{Z}} e^{2 \pi i m z} m^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, m)} \mu(c/\delta)\delta. \end{align*}$

Comparing with our template of Fourier expansion $E_k(z) = \sum_{n \geq 0} a_n e^{2 \pi i n z}$ , we have $a_0 = 1$ and

$a_n = \frac{(2\pi i)^k}{(k-1)!} n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu(c/\delta)\delta$

for $n \geq 1$ . We can simplify this by the next lemma.

Lemma 23.

$n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu\left(\frac{c}{\delta}\right) \delta = \frac{\sigma_{k-1}(n)}{\zeta(k)}.$

Proof. We have

$\begin{align*} &n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu\left(\frac{c}{\delta}\right) \delta \\ &= n^{k-1} \sum_{\delta \mid n} \sum_{\substack{c \geq 1\\\delta \mid c}} c^{-k} \mu\left(\frac{c}{\delta}\right) \delta \\ &= n^{k-1} \sum_{\delta \mid n} \sum_{c' \geq 1} (c'\delta)^{-k} \mu(c') \delta \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k}\sum_{c' \geq 1} c'^{-k} \mu(c'). \end{align*}$

Now observe that $k \mapsto \sum_{c' \geq 1} c'^{-k} \mu(c')$ is actually the Dirichlet series for $\mu$ , evaluated at $k \geq 4$ . Since $\mu = O(1)$ , the Dirichlet series absolutely converges for input greater than one. Also, note that $D_1(s)$ absolutely converges for $s > 1$ , and so, we have $D_\mu(s)D_1(s) = D_{\mu \star 1}(s)$ for $s > 1$ . In particular, putting $s = k \geq 4$ works. And by the Möbius inversion, $\mu \star 1 = \mathbf{1}_{=1}$ , so, we have $D_\mu(k)D_1(k) = D_{\mathbf{1}_{=1}}(k) = \frac{1}{1^k} = 1$ . But we have $D_1(s) = \sum_{n \geq 1} \frac{1}{n^s} = \zeta(s)$ , so $D_\mu(k) = \frac{1}{\zeta(k)}$ . This means

$\begin{align*} &n^{k-1} \sum_{\delta \mid n} \delta^{1-k}\sum_{c' \geq 1} c'^{-k} \mu(c') \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k} D_\mu(k) \\ &= n^{k-1} \sum_{\delta \mid n} \delta^{1-k} \frac{1}{\zeta(k)} \\ &= \frac{1}{\zeta(k)} \sum_{\delta \mid n} (n/\delta)^{k-1} \\ &= \frac{1}{\zeta(k)} \sum_{\delta' \mid n} \delta'^{k-1} \\ &= \frac{\sigma_{k-1}(n)}{\zeta(k)}. \end{align*}$

And this completes the proof. $\square$

With this lemma in place, we apply it to our Fourier coefficients to get

$\begin{align*} a_n &= \frac{(2\pi i)^k}{(k-1)!} n^{k-1} \sum_{c \geq 1} c^{-k} \sum_{\delta \mid \gcd(c, n)} \mu(c/\delta)\delta \\ &= \frac{(2\pi i)^k \sigma_{k-1}(n)}{(k-1)! \zeta(k)}. \end{align*}$

This completes the Proof of Theorem 17. $\square$

That was a long, hard, and quite dry journey. But finally we’ve obtained the coefficients for $E_k$ . Let us now begin the real thing (finally) about representing an integer into sum of eight squares.

Theta Functions

The actual theory of theta functions is quite vast and could be generalized heavily (even into the case of modular forms of half-integer weights, which we’re not going to touch for now). We begin with defining our favorite theta function:

$\theta(z) := \sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}$

What if we raise this to the eighth power? We get

$\theta^8(z) = \left(\sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}\right)^8 = \sum_{(x_1, \dots, x_8) \in \mathbb{Z}^8} e^{2 \pi i (x_1^2 + \dots + x_8^2)z}.$

If we can reparametrize the sum as a Fourier expansion

$\theta^8(z) = \sum_{n \geq 0} a_n e^{2\pi i n z},$

then, by comparing the coefficients, the $n$ th Fourier coefficient is precisely the number of ways to represent $n$ into sum of eight squares! Now our goal is to compute the Fourier coefficients of $\theta^8$ efficiently. The idea is to first show that $\theta^8$ satisfies some automorphy condition (which is a bit weaker than our classical modularity condition), and consider the whole space of forms that satisfy this condition. Once we can show that it’s a vector space, and we can compute a basis for this, then we can write $\theta^8$ as a linear combination of elements from the basis. And if we manage to find an explicit formula for the Fourier coefficients of such basis, then we get an explicit formula for $a_n$ !

Let us define $\varphi_z(x) := e^{2 \pi i x^2 z}$ for $z \in \mathbb{H}$ . We see that $\theta(z) = \sum_{n \in \mathbb{Z}} \varphi_z(n)$ . This is where we want to apply the Poisson summation formula. Observe that $\varphi_z$ is smooth and decays (even more rapidly than) exponentially. Just this fact already shows that $\varphi_z$ is $\mathcal{C}^1$ , $L^1(\mathbb{R})$ , and that $\varphi'_z$ is $L^1(\mathbb{R})$ with $\sum_{n \in \mathbb{Z}} \varphi'_z(n+x)$ absolutely convergent. This allows us to apply the Poisson summation formula. Let us first compute $\hat{\varphi}_z(t) := \int_{-\infty}^\infty \varphi_z(x) e^{-2 \pi i x t} \mathrm{d}x$ . We have

$\hat{\varphi}_z(t) = \int_{-\infty}^\infty e^{2 \pi i x^2 z} e^{-2 \pi i x t} \mathrm{d}x.$

We want to transform the main term into something like $e^{-\pi u^2}$ so that we can apply Lemma 15. To do this we have to put $u := i\sqrt{2 iz}x$ . But $\sqrt{\cdot}$ is not well-defined in $\mathbb{C}$ ! Luckily, we observe that since $z \in \mathbb{H}$ , $\operatorname{Re}(iz) < 0$ and we can define the square root locally inside the half-plane $\{w \in \mathbb{C} \colon \operatorname{Re}(w) < 0\}$ as $re^{i\theta} \mapsto \sqrt{r}e^{i\theta/2}$ . This makes the change of variable a well-defined $\mathcal{C}^1$ -diffeomorphism. We get

$\hat{\varphi}_z(t) = \int_{-i\sqrt{iz}\infty}^{i\sqrt{iz}\infty} e^{-\pi u^2} e^{-2\pi i \frac{u}{i\sqrt{2iz}} t} \frac{\mathrm{d}u}{i\sqrt{2iz}}$

Now we’re stuck. But if we assume $z$ is a purely imaginary number, then we have $i\sqrt{iz} \in \mathbb{R}_{< 0}$ and so in such case we have

$\begin{align*} \hat{\varphi}_z(t) &= \frac{1}{i\sqrt{2iz}}\int_{\infty}^{-\infty} e^{-\pi u^2} e^{-2\pi i u \frac{t}{i\sqrt{2iz}}} \mathrm{d}u\\ &= -\frac{1}{i\sqrt{2iz}} \hat{g}\left(\frac{t}{i\sqrt{2iz}}\right) \\ &= -\frac{1}{i\sqrt{2iz}} g\left(\frac{t}{i\sqrt{2iz}}\right) \\ &= -\frac{1}{i\sqrt{2iz}} e^{-\pi \frac{t^2}{-2iz}} = \frac{i}{\sqrt{2iz}} e^{-\pi i \frac{1}{2z} t^2}. \end{align*}$

This means

$\theta(iy) = \sum_{n \in \mathbb{Z}} \varphi_{iy}(n) = \sum_{n \in \mathbb{Z}} \hat{\varphi}_{iy}(n) = \sum_{n \in \mathbb{Z}} \frac{i}{\sqrt{-2y}} e^{2\pi i n^2\frac{-1}{4iy}} = \frac{1}{\sqrt{2y}} \theta\left(-\frac{1}{4iy}\right).$

More precisely, for any $z \in \mathbb{H}$ with $\operatorname{Re}(z) = 0$ , we have

$\theta(z) = \frac{1}{\sqrt{-2iz}} \theta\left(-\frac{1}{4z}\right)$

where this square root is the square root function for real numbers. We want to extend this result to other $z \in \mathbb{H}$ .

Proposition 24. For all $z \in \mathbb{H}$ , for $\sqrt{-2iz} = \sqrt{e^{i\theta}} := e^{i\theta/2}$ defined for $\theta \in [\frac{3}{2}\pi, \frac{5}{2}\pi]$ , we have $\theta(z) = \frac{1}{\sqrt{-2iz}} \theta\left(-\frac{1}{4z}\right)$ .

Proof. Define $\Phi(z) := \theta(z) - \frac{1}{\sqrt{-2iz}}\theta(-\frac{1}{4z})$ for $z \in \mathbb{H}$ . We’ve seen that $\varphi_z(x)$ decays rapidly, and so $\theta$ is holomorphic on $\mathbb{H}$ . Since $z \mapsto \sqrt{-2iz}$ , defined that way, is also holomorphic on $z \in \mathbb{H}$ , and is never zero, we see that $\Phi$ is holomorphic on $\mathbb{H}$ . Since it takes value zero on the set $\{iy \colon y > 0\}$ , and this set has an accumulation point on $\mathbb{H}$ , apply the identity theorem for holomorphic functions to see that $\Phi \equiv 0$ on $\mathbb{H}$ . This completes the proof. $\square$

As a result, for $\theta^8$ , we observe that it satisfy some automorphy conditions in the following.

Corollary 25. For all $z \in \mathbb{H}$ ,

$\theta^8(z+1) = \theta^8(z)$
$\theta^8(-\frac{1}{4z}) = (2z)^4 \theta^8(z)$
$\theta^8(\frac{z}{4z+1}) = (4z+1)^4\theta^8(z)$ .

Proof. The first one is obvious by periodicity of $z \mapsto e^{2 i \pi n^2 z}$ for each fixed $n$ . Apply Proposition 24 and take it to the power of eight to see the second one. For the third one, we have (by the second one)

$\theta^8\left(\frac{z}{4z+1}\right) = \theta^8\left(-\frac{1}{4(\frac{-4z-1}{4z})}\right) = \left(2\left(\frac{-4z-1}{4z}\right)\right)^4\theta^8\left(\frac{-4z-1}{4z}\right).$

We also have (by the first one, then the second one),

$\theta^8\left(\frac{-4z-1}{4z}\right) = \theta^8\left(-\frac{1}{4z}\right) = (2z)^4 \theta^8(z).$

Putting this back, we get

$\theta^8\left(\frac{z}{4z+1}\right) = \left(2\left(\frac{-4z-1}{4z}\right)\right)^4(2z)^4 \theta^8(z) = (4z+1)^4 \theta^8(z).$

This completes the proof. $\square$

What does this mean? This means, for $U := \begin{pmatrix}1 & 0\\4 & 1\end{pmatrix}$ , and the same old $T$ , we have $\theta^8 |_4 U = \theta^8 |_4 T = \theta^8$ . This means $\theta^8$ is invariant under the subgroup generated by $U$ and $T$ . This is strictly smaller than $\operatorname{SL}_2(\mathbb{Z})$ , but it’s structured enough to study it in. Let us continue by studying modular forms invariant under this subgroup.

Sum of Eight Squares of Integers

Define

$\Gamma_0(n) := \left\{\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z}) \colon n \mid c\right\}.$

This is a subgroup of $\operatorname{SL}_2(\mathbb{Z})$ . One can check by multiplying two elements and see that the lower-left entry of their product is still divisible by $n$ . We claim that $U$ and $T$ (and a little $\pm$ ) generate $\Gamma_0(4)$ .

The Subgroup $\Gamma_0(4)$

Proposition 26. $\langle U, T, -I \rangle = \Gamma_0(4)$ , where $-I = \begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}$ .

Proof. We’d actually want to mimic the proof of Theorem 1. However, (spoiler alert) the fundamental domain for $\Gamma_0(4)$ isn’t that nice. It’s a union of six copies of the original fundamental domain, patched for each coset. We’ll see this after the characterization of cosets given in Proposition 27.

So, right now, let us try to do an algebraic proof (without resorting to the action $\langle U, T, -I \rangle \times \operatorname{SL}_2(\mathbb{Z}) \to \operatorname{SL}_2(\mathbb{Z})$ ). Put $G' := \langle U, T, -I \rangle$ and observe that $G'$ forms a subgroup of $\Gamma_0(4)$ . Indeed, it is a subgroup of $\operatorname{SL}_2(\mathbb{Z})$ , and the generators belong to $\Gamma_0(4)$ , hence a subgroup of $\Gamma_0(4)$ . We’re left with showing that $\Gamma_0(4) \subseteq G'$ .

Well actually not an algebraic proof, but an algorithmic proof. Let us construct a reduction algorithm (i.e. descent). We claim that for all $g \in \Gamma_0(4)$ there exists a finite sequence of generators (i.e. a word) such that $g =g_1g_2\dots g_k$ , with $g_i \in \{U, T, -I, U^{-1}, T^{-1}\}$ .

To do this, suppose we’re given an input $g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4)$ . We observe that $d$ cannot be zero, because $ad - bc = 1$ and $c$ is divisible by four (two, in particular) means $d$ is odd. If $c = 0$ then $ad = 1$ means $a = d = 1$ or $a = d = -1$ . This means $g = T^b$ or $g = (-I)T^{-b}$ , and we’re done with this base case.

Note. We can rephrase this “algorithmic proof” into induction: We perform an induction on $N$ for the statement “for all $g=\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4)$ such that $|c|+|d| = N$ , there exists a finite sequence $g_1, g_2, \dots, g_k$ with $g_i \in \{U, T, -I, U^{-1}, T^{-1}\}$ , such that $g = g_1g_2\dots g_k$ ”. The base case covers $N = 0, 1$ .

Now, we consider a general induction for a given $g = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \Gamma_0(4)$ with $|c|+|d|=N$ , supposing that the induction hypothesis holds for $0, \dots, N-1$ . Since we’ve cleared the case $c = 0$ , we suppose $c \ne 0$ , then we compare $|c|$ with $2|d|$ . First, it is impossible that $|c| = 2|d|$ because in such case, since $4 \mid c$ , we see that $2 \mid d$ , and so $ad-bc$ is even, but it must satisfy $ad-bc = 1$ , a contradiction.

Case A: $|c| < 2|d|$ . If $c < 0$ then we replace $g$ with $g(-I)$ so that $c > 0$ . By the division algorithm, there exists a unique $n \in \mathbb{Z}$ such that $0 \leq cn+d < c$ . And so, we get

$\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} := gT^n = \begin{pmatrix}a & b\\c & d\end{pmatrix}\begin{pmatrix}1 & n\\0 & 1\end{pmatrix} = \begin{pmatrix}a & an+b\\c & cn+d\end{pmatrix}.$

We see that $d' = cn+d < c = c'$ . If $0 \leq d' \leq \frac{c'}{2}$ then we’re done (for now), with the fact that $|c'|+|d'| \leq |c'| + |\frac{c'}{2}| = |c| + \frac{|c|}{2} < |c|+|d| = N$ . Since $|c'| + |d'| \leq N-1$ , we can write $gT^n = g_1 \dots g_k$ and so $g = g_1 \dots g_k T^{-n}$ . Otherwise we have $\frac{c'}{2} < d' < c'$ . Observe that we can reflect $d'$ to $-d'$ then add $c'$ back: $-\frac{c'}{2} > -d' > -c'$ and so $\frac{c'}{2} > c'-d' > 0$ . We have

$\begin{pmatrix}a'' & b'' \\ c'' & d''\end{pmatrix} :=gT^n(-I)T^{-1} = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}\begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}\begin{pmatrix}1 & -1\\0 & 1\end{pmatrix} = \begin{pmatrix}-a' & a'-b'\\-c' & c'-d'\end{pmatrix},$

with $|c''| + |d''| = |-c'| + |c'-d'| < |c| + \frac{c'}{2} < |c|+|d| = N$ . By induction hypothesis, write $gT^n(-I)T^{-1} = g_1 \dots g_k$ and so $g = g_1 \dots g_k T (-I) T^{-n}$ .

Case B: $|c| > 2|d|$ . If $d < 0$ we replace $g$ with $g(-I)$ so that $d > 0$ . By the division algorithm, there exists a unique $n \in \mathbb{Z}$ such that $0 \leq c+4dn < 4d$ . Fix such $n$ and put

$\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} := gU^n = \begin{pmatrix}a & b\\c & d\end{pmatrix} \begin{pmatrix}1 & 0\\4n & 1\end{pmatrix} = \begin{pmatrix}a+4bn & b\\ c+4dn & d\end{pmatrix}.$

This means $d' = d$ and $0 \leq c' = c+4dn < 4d$ . If $c' \leq 2d'$ we’re done, because $|c'|+|d'| \leq |2d'| + |d'| = 2|d| + |d| < |c|+|d| = N$ and so the induction hypothesis applies to $gU^n$ , and we’re done. Now if $c' > 2d'$ , then put

$\begin{pmatrix}a'' & b''\\c'' & d''\end{pmatrix} := gU^n(-I)U^{-1} = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \begin{pmatrix}-1 & 0\\0 & -1\end{pmatrix}\begin{pmatrix}1 & 0\\-4 & 1\end{pmatrix} = \begin{pmatrix}-a'+4b' & -b'\\-c'+4d' & -d'\end{pmatrix}.$

Since $4d > c' > 2d'$ , we have $-4d' < -c' < -2d'$ and so $0 < -c'+4d' < 2d'$ . This shows that $|c''| + |d''| < 2|d| + |d| < |c|+|d|= N$ . By induction hypothesis, $gU^n(-I)U^{-1}$ can be written as a word with each character in $\{U, T, -I, U^{-1}, T^{-1}\}$ , and so $g$ is that word multiplied by $U(-I)U^{-n}$ from the right.

This completes the inductive step for $|c|+|d|=N$ , and hence completes the strong induction. This shows that $\Gamma_0(4) \subseteq \langle U, T, -I\rangle$ . $\square$

But now that we see that $\Gamma_0(4) = \langle U, T, -I\rangle$ , and that $\theta^8|_4U = \theta^8|_4 T = \theta^8|_4 (-I) = \theta^8$ , it becomes clear that $\theta^8$ is stabilized by the action $|_4$ of $\Gamma_0(4)$ from the right. This is the “modularity condition” for our subgroup $\Gamma_0(4)$ .

Proposition 27. $[\operatorname{SL}_2(\mathbb{Z}) \colon \Gamma_0(4)] = 6$ , where the cosets can be represented by

$\begin{align*} A_1 &= I;\\ A_2 &= \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix};\\ A_3 &= \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix};\\ A_4 &= \begin{pmatrix}0 & -1\\1 & 1\end{pmatrix};\\ A_5 &= \begin{pmatrix}0 & -1\\1 & 2\end{pmatrix};\\ A_6 &= \begin{pmatrix}0 & -1\\1 & 3\end{pmatrix}. \end{align*}$

Proof. Observe that for $g, g' \in \operatorname{SL}_2(\mathbb{Z})$ to belong to the same coset, i.e., $\Gamma_0(4)g = \Gamma_0(4)g'$ , it is equivalent to $g'g^{-1} \in \Gamma_0(4)$ . Write $g = \begin{pmatrix}a & b\\c & d\end{pmatrix}$ and $g' = \begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}$ . The necessary and sufficient condition is exactly that

$\begin{pmatrix}a'd - b'c & -a'b + b'a\\c'd - d'c & -c'b + d'a\end{pmatrix} =\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix}\begin{pmatrix}d & -b\\-c & a\end{pmatrix} \in \Gamma_0(4),$

or equivalently, that $c'd - d'c$ is divisible by four, i.e. $c'd \equiv d'c \pmod{4}$ . Now we vary $g$ as an input and claim that for any $g$ there exists $g' \in \{A_1, A_2, \dots, A_6\}$ satisfying $c'd \equiv d'c \pmod{4}$ . First of all, suppose $c \equiv 0 \pmod{4}$ then $g' = A_1$ does the job. Now if $c \equiv 1 \pmod{4}$ we split into four subcases depending on $d \bmod 4$ .

$d \equiv 0 \pmod {4}$ : then this forces $d' \equiv 0 \pmod{4}$ . Pick $g' = A_3$ .
$d \equiv 1 \pmod{4}$ : then this forces $c' \equiv d' \pmod{4}$ . Pick $g' = A_4$ .
$d \equiv 2 \pmod{4}$ : then this forces $2c' \equiv d' \pmod{4}$ . Pick $g' = A_5$ .
$d \equiv 3 \pmod{4}$ : then this forces $3c' \equiv d' \pmod{4}$ . Pick $g' = A_6$ .

Now, if $c \equiv 3 \pmod{4}$ then $c'd \equiv d'c \equiv d'(-1) \pmod{4}$ so $c'(-d) \equiv d' \pmod{4}$ , i.e., we’ve reduced to the same cases as in $c \equiv 1 \pmod{4}$ , just, instead of comparing $d \bmod 4$ , we compare $(-d)\bmod 4$ and pick suitable $g' = A_3, A_4, A_5, A_6$ .

The only case left is $c \equiv 2 \pmod{4}$ , where we would have $c'd \equiv 2d' \pmod{4}$ . Observe that $d \not\equiv 0, 2 \pmod{4}$ because otherwise $ad-bc=1$ won’t be satisfied. So $d \equiv 1, -1 \pmod{4}$ but $2 \equiv -2 \pmod{4}$ so we’re left with solving $c' \equiv 2d' \pmod{4}$ . Picking $g' = A_2$ does the job. This shows that $(\Gamma_0(4)A_1, \dots, \Gamma_0(4)A_6)$ are all the cosets. The fact that for each given $g$ we can pick only one such $g'$ shows that they are unique. $\square$

Now that we see the coset representatives, we could try to visualize the fundamental domain, which is

$\mathcal{D}_{\Gamma_0(4)} := \bigcup_{i=1}^6 \mathcal{D}_i; \qquad \mathcal{D}_i := A_i\mathcal{D}.$

Here I try to sample points using the following code (where go_to_domain_1 is the algorithm given in Theorem 1 that takes a point to the original fundamental domain):

domain_1_sample = []
for rep in range(10000):
	x = random.uniform(-3, 3)
	y = 2**random.uniform(-5, 5)
	z = go_to_domain_1(x + 1j * y)
	domain_1_sample.append(z)

Then, I construct $A_1, \dots, A_6$ and let them act on the samples.

g1 = CongruentElement(1, 0, 0, 1)
g2 = CongruentElement(1, 0, -2, 1)
g3 = CongruentElement(0, -1, 1, 0)
g4 = CongruentElement(0, -1, 1, 1)
g5 = CongruentElement(0, -1, 1, 2)
g6 = CongruentElement(0, -1, 1, 3)
D1 = [g1.act(z) for z in domain_1_sample]
D2 = [g2.act(z) for z in domain_1_sample]
D3 = [g3.act(z) for z in domain_1_sample]
D4 = [g4.act(z) for z in domain_1_sample]
D5 = [g5.act(z) for z in domain_1_sample]
D6 = [g6.act(z) for z in domain_1_sample]

Then I plot the points.

We can see that the fundamental domain looks uglier than before, and now, even though in principle we could just do another integral on this, avoiding points which have nontrivial stabilizers, and derive the valence formula, and once we’ve obtained the valence formula, we could deduce the dimension and basis for the space of modular forms for $\Gamma_0(4)$ . I’m simply too lazy for this. It’s going to take a lot of hard work. So I propose that we take a shortcut directly to our special case of $\mathcal{M}_4(\Gamma_0(4))$ , but before that, we should at least define this space properly.

Definition. For a subgroup $\Gamma'$ of $\operatorname{SL}_2(\mathbb{Z})$ , a cusp is an orbit of the action $\Gamma' \times \mathbb{P}^1(\mathbb{Q}) \to \mathbb{P}^1(\mathbb{Q})$ defined by $\left(\begin{pmatrix}a & b\\c & d\end{pmatrix}, x\right) \mapsto \frac{ax+b}{cx+d}$ with the usual rule that $\frac{1}{0} = \infty$ and $\frac{1}{\infty} = 0$ .

Proposition 28. For $\Gamma' = \operatorname{SL}_2(\mathbb{Z})$ , such action is transitive, and so there is only one cusp (where we’d usually represent with $\infty$ ).

Proof. Let us show that for any $\frac{p}{q} \in \mathbb{P}^1(\mathbb{Q})$ , there exists $\begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ sending $\frac{p}{q}$ to $\infty$ . (If the input is already infinity, we could just take the identity matrix and it’s done, so we assume the input is a finite rational number. Furthermore, we assume $\gcd(p,q) = 1$ without loss of generality.) Consider the equation $a(-p) - bq = 1$ . Since $\gcd(p,q) = 1$ , by Bézout’s identity there exists a pair $(a, b) \in \mathbb{Z}^2$ satisfying the equation. Now take the matrix $\begin{pmatrix}a & b\\q & -p\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ . It sends $\frac{p}{q}$ to $\frac{a\frac{p}{q}+b}{q\frac{p}{q}-p} = \frac{\frac{ap}{q}+b}{0} = \infty$ . This works because the denominator isn’t zero. Why? If it were to be zero, then $ap+bq = 0$ , a contradiction with $a(-p)-bq = 1$ . This shows that any rational number can be sent into infinity, and conversely, the inverse of this matrix sends infinity into the input rational number. Furthermore, one can send a rational number into another, by sending to infinity first, then send back to the second one. $\square$

Lemma 29. If $\Gamma' \leq \operatorname{SL}_2(\mathbb{Z})$ with $\Gamma' \backslash \operatorname{SL}_2(\mathbb{Z}) = \{\Gamma'g_1, \dots, \Gamma'g_k\}$ , then

$\{\operatorname{Orb}_{\Gamma'}(g_i \infty) \colon i \in \{1, \dots, k\}\} = \Gamma' \backslash \mathbb{P}^1(\mathbb{Q}) ,$

i.e., the cusps are parametrized by the orbits of the old cusp (but they may repeat).

Proof. The inclusion from left to right is clear: each orbit correspond to a cusp. The question is, are all cusps found by just these candidates? Can there be some cusp not connected to any of the $g_i$ ’s? And let us show that this cannot happen. A strong algebraist might even say this is obvious. Well, let $\mathfrak{a} \in \Gamma' \backslash \mathbb{P}^1(\mathbb{Q})$ be a cusp, and represent it by some $x \in \mathbb{P}^1(\mathbb{Q})$ . By the previous proposition, we see that there exists $g \in \operatorname{SL}_2(\mathbb{Z})$ such that $g\infty = x$ . Project $g$ into the cosets, i.e., consider the natural quotient $\rho \colon \operatorname{SL}_2(\mathbb{Z}) \to \Gamma'\backslash \operatorname{SL}_2(\mathbb{Z})$ and suppose $\rho(g) = g_j$ (so that $g = g'g_j$ for some $g' \in \Gamma'$ ). Then, $\operatorname{Orb}_{\Gamma'}(g_j\infty) = \{g''g_j\infty \colon g'' \in \Gamma'\} \supseteq \{g'g_j\infty\} = \{g\infty\} = \{x\}.$ This means $\mathfrak{a} =\operatorname{Orb}_{\Gamma'}(x) = \operatorname{Orb}_{\Gamma'}(g_j\infty).$ $\square$

Proposition 30. There are three cusps for $\Gamma_0(4)$ , which can be represented by $\infty, 0, -\frac{1}{2}$ .

Note. A priori we might not know this, but this information is suggested by the plot (above) using numerical data. And then we can try to show this.

Proof. Let us apply Lemma 29 and compute $x_i := A_i\infty$ . We have

$\begin{align*} x_1 &= A_1\infty = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix}\infty = \infty; \\ x_2 &= A_2\infty = \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix} \infty = \frac{\infty}{-2\infty + 1} = -\frac{1}{2}; \\ x_3 &= A_3\infty = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}\infty = \frac{-1}{\infty} = 0; \\ x_4 &= A_4\infty = \begin{pmatrix}0 & -1\\1 & 1\end{pmatrix}\infty = \frac{-1}{\infty+1} = 0; \\ x_5 &= A_5\infty = \begin{pmatrix}0 & -1\\1 & 2\end{pmatrix}\infty = \frac{-1}{\infty+2} = 0;\\ x_6 &= A_6\infty = \begin{pmatrix}0 & -1\\1 & 3\end{pmatrix} = \frac{-1}{\infty+3} = 0. \end{align*}$

This shows that $\Gamma_0(4)\backslash\mathbb{P}^1(\mathbb{Q}) = \{\Gamma_0(4)\infty, \Gamma_0(4)(-\frac{1}{2}), \Gamma_0(4)0\}$ . $\square$

Now we can do Fourier expansion as before. Recall that in Proposition 2, we have a conformal equivalence

$\Phi_a \colon \begin{cases}\text{Strip}_a &\to \mathbb{D}^\times \setminus \{re^{2\pi i a} \colon 0 < r < 1\}\\z &\mapsto e^{2\pi i z}\end{cases}$

which allowed us to patch the disk $\mathbb{D}$ . Now we want to do something similar for each of the cusps (not just infinity). We define $\tilde{f}_x$ as follows.

Proposition 31. Assume $T \in \Gamma'$ . If $x \in \mathbb{P}^1(\mathbb{Q})$ represents a cusp for $\Gamma'$ , then by Proposition 28, there exists an element $\sigma_x \in \operatorname{SL}_2(\mathbb{Z})$ such that $\sigma_x \infty = x$ . Analogous to Proposition 2, if $f \colon \mathbb{H} \to \mathbb{C}$ is holomorphic on $\mathbb{H}$ and satisfies the modularity condition (with weight $k$ ) for $\Gamma'$ , then there exists a holomorphic function $\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C}$ such that $(f |_k \sigma_x)(z) = \tilde{f}_x(e^{2\pi iz})$ for all $z \in \mathbb{H}$ .

Proof. Let $\bar{f}_x := f|_k \sigma_x$ . Write $\sigma_x = \begin{pmatrix}a & b\\c & d\end{pmatrix} \in \operatorname{SL}_2(\mathbb{Z})$ . First, observe that $\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x = \operatorname{Stab}_{\Gamma'}(\infty) = \langle \pm T \rangle$ . To show the first equality, take $g \in \operatorname{Stab}_{\Gamma'}(x)$ and see that

$\sigma_x^{-1}g\sigma_x \infty = \sigma_x^{-1}g x = \sigma_x^{-1}x = \infty.$

This means $\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x \subseteq \operatorname{Stab}_{\Gamma'}(\infty)$ . For the reverse inclusion, take $\gamma \in \operatorname{Stab}_{\Gamma'}(\infty)$ and observe that

$\sigma_x\gamma\sigma_x^{-1}x = \sigma_x\gamma\infty = \sigma_x\infty = x.$

This means $\sigma_x\gamma\sigma_x^{-1} \in \operatorname{Stab}_{\Gamma'}(x)$ and so $\gamma \in \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x$ , i.e. $\operatorname{Stab}_{\Gamma'}(\infty) \subseteq \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x$ , and so the first equality is shown. Now, for the second equality, we observe that $\pm T\infty = \infty$ , so $T$ and $-T$ stabilize infinity. Now take any $\begin{pmatrix}a' & b'\\c' & d'\end{pmatrix} \in \operatorname{Stab}_{\Gamma'}(\infty)$ . We have $\frac{a'\infty + b'}{c'\infty + d'} = \infty$ , so $c' = 0$ (otherwise the result is $\frac{a'}{c'}$ which is finite). Since $a'd'-b'c' = 1$ , we have $a' = d' = 1$ or $a' = d' = -1$ . This means the element is actually $T^{b'}$ or $-T^{b'}$ . This proves that $\operatorname{Stab}_{\Gamma'}(\infty) = \langle \pm T \rangle$ .

Now, observe that

$\bar{f}_x(z+1) = (f|_k\sigma_x)(Tz) = ((f|_k\sigma_x)|_kT)(z)(0z+1)^k = (f|_k\sigma_xT)(z).$

But we’ve seen that $\sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x = \langle \pm T \rangle$ , so, in particular, $\sigma_x T \in \operatorname{Stab}_{\Gamma'}(x)\sigma_x$ . Pick such $\gamma \in \operatorname{Stab}_{\Gamma'}(x)$ satisfying $\sigma_xT = \gamma \sigma_x$ . We have

$\bar{f}_x(z+1) = (f|_k\sigma_xT)(z) = (f|_k\gamma\sigma_x)(z) = (f|_k\sigma_x)(z) = \bar{f}_x(z),$

where the equality before the last follows since $f$ is $\Gamma'$ -invariant under the action $|_k$ (this is the modularity condition for $\Gamma'$ ). This shows that $\bar{f}_x$ is periodic. Apply Proposition 2 to deduce the existence of $\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C}$ such that $\bar{f}_x(z) = \tilde{f}_x(e^{2 \pi i z})$ for all $z \in \mathbb{H}$ . If $f$ is holomorphic on $\mathbb{H}$ , then $\tilde{f}_x$ is holomorphic on $\mathbb{D}^\times$ . $\square$

Proposition 32. Such $\tilde{f}_x$ doesn’t depend on the choice of $\sigma_x$ . Moreover, if $y \in \mathbb{P}^1(\mathbb{Q})$ is another representative of the same cusp as $x$ , then $\tilde{f}_x = \tilde{f}_y$ .

Proof. Suppose $\sigma'_x \in \operatorname{SL}_2(\mathbb{Z})$ is another element sending $\infty$ to $x$ . Then $\sigma_x^{-1}\sigma'_x \in \operatorname{Stab}_{\Gamma'}(\infty) = \sigma_x^{-1}\operatorname{Stab}_{\Gamma'}(x)\sigma_x$ . Take $\gamma \in \operatorname{Stab}_{\Gamma'}(x)$ such that $\sigma^{-1}_x\sigma'_x = \sigma^{-1}_x\gamma\sigma_x$ , so that $\sigma'_x = \gamma \sigma_x$ , then $f|_k\sigma'_x = f|_k\gamma\sigma_x = f|_k\sigma_x$ . This means $\tilde{f}_x$ doesn’t depend on the choice of $\sigma_x$ .

Now, suppose $y$ is another representative of the same cusp, then there exists $\gamma \in \Gamma'$ such that $y = \gamma x$ . Let $\sigma_x$ be arbitrary, but fix $\sigma_y = \gamma \sigma_x$ . Then, $f|_k \sigma_y = f|_k \gamma \sigma_x = f|_k\sigma_x$ , so $\tilde{f}_y = \tilde{f}_x$ . $\square$

This allows us to properly define the space $\mathcal{M}_4(\Gamma_0(4))$ accordingly.

Definition. The space $\mathcal{M}_4(\Gamma_0(4))$ is the set of holomorphic functions $f \colon \mathbb{H} \to \mathbb{C}$ such that

(Modularity/Automorphy) $f$ satisfies the modularity condition for $\Gamma_0(4)$ with weight $4$ , i.e., for all $g \in \Gamma_0(4)$ , $f|_4 g = f$ .
(Holomorphy) $f$ is holomorphic at each cusp. More precisely, for all $x \in \{\infty, 0, -\frac{1}{2}\}$ , $\tilde{f}_x \colon \mathbb{D}^\times \to \mathbb{C}$ can be extend analytically to a holomorphic function $\mathbb{D} \to \mathbb{C}$ .

Theorem 33. $\theta^8 \in \mathcal{M}_4(\Gamma_0(4))$ .

Proof. We’ve already checked that $\theta^8$ is holomorphic on $\mathbb{H}$ and satisfies the automorphy for $U, T, -I$ , which generate $\Gamma_0(4)$ . We’re left with checking the holomorphy at the cusps. We’ve seen that the series

$\sum_{n \in \mathbb{Z}} e^{2 \pi i n^2 z}$

converges absolutely, and also normally away from the real axis. This allows us to write

$\theta^8(z) =\left(\sum_{n \in \mathbb{Z}} e^{2\pi i n^2 z}\right)^8 = \sum_{n \geq 0} a_n e^{2 \pi i n z}$

with $a_0 = 1$ . So it is holomorphic at $\infty$ , where we can extend it to attain value $1$ . Now, consider

$\tilde{\theta}^8_0(e^{2 \pi i z}) = (\theta^8 |_4 \sigma_0)(z)$

with $\sigma_0 = \begin{pmatrix}0 & -1\\1 & 0\end{pmatrix}$ . Put $z = x+iy$ and take the limit $y \to \infty$ . We get

$\lim_{y \to \infty} (\theta^8 |_4 \sigma_0)(x+iy) = \lim_{y \to \infty} \theta^8\left(-\frac{1}{x+iy}\right)(x+iy)^{-4}.$

Using Corollary 24, 2. to see that $\theta^8(-\frac{1}{x+iy}) = (\frac{x+iy}{2})^4 \theta^8(\frac{x+iy}{4})$ , and so

$\lim_{y \to \infty} (\theta^8 |_4 \sigma_0)(x+iy) = \lim_{y \to \infty} \frac{1}{16}(x+iy)^4(x+iy)^{-4} \theta^8\left(\frac{x+iy}{4}\right) = \frac{1}{16}.$

This works for any fixed $x \in \mathbb{R}$ , so by patching the charts, we can put $\tilde{\theta}^8_0(0) := \frac{1}{16}$ . Now let us observe the cusp at $-\frac{1}{2}$ . We have

$\tilde{\theta}_{-\frac{1}{2}}^8(e^{2\pi i z}) = (\theta^8 |_4 \sigma_{-\frac{1}{2}})(z).$

Take $\sigma_{-\frac{1}{2}} := \begin{pmatrix}1 & 0\\-2 & 1\end{pmatrix}$ and so

$(\theta^8 |_4 \sigma_{-\frac{1}{2}})(z) = \theta^8\left(\frac{z}{-2z+1}\right)(-2z+1)^{-4}.$

$\lim_{y \to \infty} (\theta^8|_4\sigma_{-\frac{1}{2}})(x+iy) = \lim_{y \to \infty}\theta^8\left(\frac{x+iy}{-2x-2iy+1}\right)(-2x-2iy+1)^{-4}.$

The way I’ll proceed here is to consider the limit

$L := \lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1,1]}} \theta^8\left(\frac{x+iy}{-2x-2iy+1}\right)\frac{1}{(-2x-2iy+1)^4}.$

To evaluate this, we consider $\theta^8(z) = \sum_{m \in \mathbb{Z}^8} e^{2 \pi i \|m\|^2 z}$ and plug in $z = \frac{x+iy}{-2x-2iy+1}$ . We want to now show that the series

$\sum_{m \in \mathbb{Z}^8} f_{m}(x,y); \qquad f_{m}(x, y) :=\frac{e^{2 \pi i \|m\|^2 (\frac{x+iy}{-2x-2iy+1})}}{(-2x-2iy+1)^4}.$

converges normally for $x \in [-1, 1]$ and $y \in [1, +\infty)$ . Write $a_{m} := \sup_{\substack{x \in [-1, 1]\\y \in [1, +\infty)}} |f_{m}(x,y)|$ . We have

$|f_{m}(x,y)| = \frac{1}{|-2x-2iy+1|^4} e^{-2 \pi \|m\|^2 \operatorname{Im}(\frac{x+iy}{-2x-2iy+1})}.$

Now $|-2x-2iy+1| = \sqrt{(-2x+1)^2 + (-2y)^2} \geq 2y$ , and $\operatorname{Im}(\frac{x+iy}{-2x-2iy+1}) = \operatorname{Im}(\frac{x+iy}{-2x-2iy+1}\frac{\overline{-2x-2iy+1}}{\overline{-2x-2iy+1}}) = \operatorname{Im}((x+iy)(-2x+2iy+1)) \frac{1}{(-2x+1)^2+(-2y)^2} = \frac{y}{(-2x+1)^2+(-2y)^2} \geq \frac{y}{9 + 4y^2} \geq \frac{y}{9y^2+4y^2} = \frac{1}{13y}.$ This means

$|f_{m}(x,y)| \leq \frac{1}{16y^4} e^{-2\pi \|m\|^2 \frac{1}{13y}}.$

Observe that $e^{-t} \leq \frac{4}{t^4 (\log t)^2}$ for all $t > 0$ (and the only zero case is $|f_{0,0}(x,y)| \leq \frac{1}{2y} \leq \frac{1}{2}$ ), so for $(n, m) \ne (0, 0)$ , we have

$\begin{align*} |f_{m}(x,y)| &\leq \frac{1}{16y^4}\frac{4}{(2\pi \|m\|^2\frac{1}{13y})^4 (\log(2\pi\|m\|^2\frac{1}{13y}))^2}\\ &= \frac{4 \cdot 13^4 \cdot y^4}{16^2 y^4 \pi^4 \|m\|^8 (2 \log \|m\| + \log(2\pi) - \log(13y))^2}\\ &\leq \frac{28561}{64\pi^4 \|m\|^8} \frac{1}{(2\log\|m\|)^2}\\ &= \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2}. \end{align*}$

And so, we observe that $0 \leq a_{m} \leq \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2}.$ for $m \ne 0$ and $0 \leq a_{0,0,0,0,0,0,0,0} \leq \frac{1}{2}$ . Since $\mathbb{R}^8$ is a finite dimensional vector space, all norms are equivalent. This means there exists $\alpha, \beta > 0$ such that $\alpha \|v\|_{\infty} \leq \|v\| \leq \beta \|v\|_\infty$ for all $v \in \mathbb{R}^8$ where $\|(v_1, \dots, v_8)\|_\infty := \max(|v_1|, \dots, |v_8|)$ . This means, for $m \ne 0$ ,

$0 \leq a_m \leq \frac{28564}{256\pi^4\|m\|^8(\log\|m\|)^2} \leq \frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}.$

Now, we have

$\sum_{m \in \mathbb{Z}^8} a_m \leq \frac{1}{2} + \sum_{m \in \mathbb{Z}^8 \setminus \{0\}} \frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}.$

Reparametrize the latter sum by $r = \|m\|_\infty$ . The number of tuples satisfying $r = \|m\|_\infty$ is bounded by $2 \cdot 8 \cdot (2r+1)^7 \leq 34992r^7$ (fix an axis $j$ and set $m_j = \pm r$ , then vary the rest). This means

$\begin{align*} &\frac{1}{2} + \sum_{m \in \mathbb{Z}^8\setminus\{0\}}\frac{28564}{256\pi^4\alpha^8\|m\|_\infty^8(\log(\alpha\|m\|_\infty))^2}\\ &\leq \frac{1}{2} + \sum_{r \geq 1} \frac{28564 \cdot 34992r^7}{256\pi^4\alpha^8r^8 (\log(\alpha r))^2}\\ &= \frac{1}{2} + \frac{999511488}{256\pi^4\alpha^7} \sum_{r \geq 1} \frac{1}{\alpha r (\log(\alpha r))^2}. \end{align*}$

Now we evaluate the inner sum using the integral test. We have

$\begin{align*} \int_1^\infty \frac{1}{\alpha t (\log(\alpha t))^2} \mathrm{d}t &= \frac{1}{\alpha}\int_\alpha^\infty \frac{1}{u (\log u)^2} \mathrm{d}u\\ &= \frac{1}{\alpha} \int_{\log \alpha}^\infty \frac{1}{v^2} \mathrm{d}v\\ &= \frac{1}{\alpha} \left[-\frac{1}{v}\right]_{\log \alpha}^\infty\\ &= \frac{1}{\alpha \log \alpha}. \end{align*}$

It converges, so the inner sum converges as well, and this shows that the series $\sum_{m \in \mathbb{Z}^8} f_{m}(x,y)$ normally converges for $x \in [-1, 1]; y \in [1, +\infty)$ . This means

$L =\lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1, 1]}} \sum_{m \in \mathbb{Z}^8} f_m(x,y) = \sum_{m \in \mathbb{Z}^8} \lim_{\substack{y \to \infty\\y \geq 1\\x \in [-1, 1]}} f_m(x,y) = \sum_{m \in \mathbb{Z}^8} 0 = 0.$

In particular, $\lim_{z \to \infty} \tilde{\theta}^8_{-\frac{1}{2}}(z) = 0$ . After all these work, we see that $\theta^8$ is holomorphic at the cusps. This shows that $\theta^8 \in \mathcal{M}_4(\Gamma_0(4))$ . $\square$

The Space $\mathcal{M}_4(\Gamma_0(4))$

Now, we take the shortcut and try to come up with a basis for $\mathcal{M}_4(\Gamma_0(4))$ . Consider the following proposition.

Proposition 34. If $f \in \mathcal{M}_4$ , define $f \circ n := (z \mapsto f(nz))$ for $n \in \mathbb{Z}$ , then $f, f \circ 2, f \circ 4 \in \mathcal{M}_4(\Gamma_0(4))$ .

Proof. Take $f \in \mathcal{M}_4$ . Since $U, T, -I \in \operatorname{SL}_2(\mathbb{Z})$ , we see immediately that $f |_4 U = f|_4 T = f|_4 (-I) = f$ . Moreover, $f$ behaves the same way at each cusp, hence holomorphic. This means $f \in \mathcal{M}_4(\Gamma_0(4))$ . Now, $(f \circ 2)(z) = f(2z) = f(2z+2) = (f \circ 2)(z+1)$ and $(f \circ 2)(\frac{z}{4z+1}) = f(\frac{2z}{4z+1}) = f(-\frac{4z+1}{2z})(\frac{2z}{4z+1})^{-4} = f(-2-\frac{1}{2z})(\frac{2z}{4z+1})^{-4} = f(-\frac{1}{2z})(\frac{2z}{4z+1})^{-4} = f(2z)(2z)^4 (\frac{2z}{4z+1})^{-4} = (f \circ 2)(z)(4z+1)^4.$ This shows that $(f \circ 2)|_4 U = (f \circ 2)|_4 T = (f \circ 2)|_4 (-I) = f\circ 2$ . Moreover, for $x \in \{\infty, 0, -\frac{1}{2}\}$ , $2x \in \operatorname{Orb}_{\Gamma_0(4)}(\infty)$ , so it is holomorphic at all cusps. Now, $(f \circ 4)(z+1) = f(4(z+1)) = f(4z+4) = f(4z) = (f \circ 4)(z)$ . And $(f \circ 4)(\frac{z}{4z+1}) = f(\frac{4z}{4z+1}) = f(1-\frac{1}{4z+1}) = f(-\frac{1}{4z+1}) = (4z+1)^4f(4z+1) = (4z+1)^4f(4z) = (4z+1)^4 (f \circ 4)(z)$ . This shows that $(f \circ 4)|_4 U = (f \circ 4)|_4 T = (f \circ 4)|_4 (-I) = f \circ 4$ . With the same argument, the cusps are holomorphic. $\square$

Proposition 35. The forms $E_4, E_4 \circ 2, E_4 \circ 4 \in \mathcal{M}_4(\Gamma_0(4))$ form a linearly independent family.

Proof. We recall their expansions at infinity:

$\begin{align*} E_4(z) &= \sum_{n \geq 0} a_n e^{2 \pi i n z}; \\ E_4(2z) &= \sum_{n \geq 0} a_n e^{2 \pi i n 2z} = \sum_{\substack{n \geq 0\\2 \mid n}} a_{\frac{n}{2}} e^{2 \pi i n z}; \\ E_4(4z) &= \sum_{n \geq 0} a_n e^{2 \pi i n 4z} = \sum_{\substack{n \geq 0\\4 \mid n}} a_{\frac{n}{4}} e^{2 \pi i n z}. \end{align*}$

Suppose there exists $\lambda_1, \lambda_2, \lambda_4 \in \mathbb{C}$ such that $\lambda_1E_4 + \lambda_2(E_4 \circ 2) + \lambda_4(E_4 \circ 4) = 0$ , then, by comparing the $0, 1, 2$ -coefficients, we have

$\begin{align*} \lambda_1 a_0 +\lambda_2 a_0 + \lambda_4 a_0 &= 0\\ \lambda_1a_1 + \lambda_2 0 + \lambda_4 0 &= 0\\ \lambda_1 a_2 + \lambda_2 a_1 + \lambda_4 0 &= 0. \end{align*}$

Solving this system, since we’ve seen (Theorem 17) that $a_0, a_1, a_2 \ne 0$ , we conclude that $\lambda_1 = \lambda_2 = \lambda_4 = 0$ , and so they’re linearly independent. $\square$

In fact, these forms form a basis for $\mathcal{M}_4(\Gamma_0(4))$ . We will prove this.

Lemma 36. If $f(z) = \sum_{n \geq 0} a_n e^{2\pi i nz} \in \mathcal{M}_4(\Gamma_0(4))$ has $a_0 = a_1 = a_2 = 0$ , then $f \equiv 0$ .

Proof. Define $F(z) := \prod_{j=1}^6 (f |_4 A_j)(z)$ . Then $F(z+1) = \prod_{j=1}^6 (f|_4A_j)(Tz) = \prod_{j=1}^6 (f|_4A_jT)(z)$ , but $A_j \mapsto A_jT$ permutes the cosets. To see this, if $\Gamma_0(4)A_jT = \Gamma_0(4)A_{j'}T$ then $A_{j'}TT^{-1}A_j^{-1} \in \Gamma_0(4)$ means $A_{j'}A_j^{-1} \in \Gamma_0(4)$ , i.e. that $\Gamma_0(4)A_j = \Gamma_0(4)A_{j'}$ , which happens only when $j = j'$ . Since $\Gamma_0(4)A_j \mapsto \Gamma_0(4)A_jT$ is a map from $\Gamma_0(4) \backslash \operatorname{SL}_2(\mathbb{Z})$ to itself, and is injective, we conclude that it is also bijective. This means there exists a permutation $\sigma \in \mathfrak{S}_6$ such that $A_jT \in \Gamma_0(4)A_{\sigma(j)}$ for each $j \in \{1, \dots, 6\}$ . Pick $\gamma_j \in \Gamma_0(4)$ such that $A_jT = \gamma_jA_{\sigma(j)}$ , then we haveve, we see that it is bijective. That is, there exists a permuatif

$F(z+1) = \prod_{j=1}^6 (f|_4A_jT)(z) = \prod_{j=1}^6 (f|_4\gamma_jA_{\sigma(j)})(z).$

But $f \in \mathcal{M}_4(\Gamma_0(4))$ so $f|_4\gamma_j = f$ . This gives

$F(z+1) = \prod_{j=1}^6 (f|_4A_{\sigma(j)})(z) = \prod_{j=1}^6 (f|_4A_j)(z) = F(z).$

Now, the same argument applies for $A_j \mapsto A_jS$ . That is, there exists $\tau \in \mathfrak{S}_6$ such that $A_j S \in \Gamma_0(4)A_{\tau(j)}$ . Pick $\gamma'_j \in \Gamma_0(4)$ such that $A_jS = \gamma'_jA_{\tau(j)}$ . Then, we have

$\begin{align*} (F|_{24} S)(z) &= F(Sz)z^{-24}\\ &= \prod_{j=1}^6(f|_4A_j)(Sz)z^{-4}\\ &= \prod_{j=1}^6(f|_4A_jS)(z)\\ &= \prod_{j=1}^6 (f|_4\gamma_jA_{\tau(j)})(z)\\ &= \prod_{j=1}^6 (f|_4A_{\tau(j)})(z)\\ &= \prod_{j=1}^6 (f|_4A_j)(z)\\ &= F(z). \end{align*}$

This shows that $F \in \mathcal{M}_{24}$ . Holomorphy at infinity is established since it is a product of forms that are holomorphic at infinity. Since $f$ has $a_0 = a_1 = a_2 = 0$ , we see that $\operatorname{ord}_\infty(F) = \sum_{j=1}^6 \operatorname{ord}_\infty(f|_4 A_j) = \operatorname{ord}_\infty(f) + \sum_{j=2}^6 \operatorname{ord}_\infty(f|_4 A_j) \geq \operatorname{ord}_\infty(f).$ But $\tilde{f}(q) = \sum_{n \geq 3} a_n q^n$ has $\operatorname{ord}_0(\tilde{f}) \geq 3$ , so $\operatorname{ord}_\infty(F) \geq \operatorname{ord}_\infty(f) = \operatorname{ord}_0(\tilde{f}) \geq 3$ . However, recall the valence formula (Theorem 7) for $k = 24$ . We have

$\frac{k}{12} = \operatorname{ord}_\infty(F) + \frac{1}{2}\operatorname{ord}_i(F) + \frac{1}{3}\operatorname{ord}_{e^{2\pi i/3}}(F) + \sum_{\text{other } z} \operatorname{ord}_z(F).$

Plugging in $k = 24$ and $\operatorname{ord}_\infty(F) \geq 3$ , we see that $2 \geq 3 + (\text{nonnegative integer})$ , a contradiction. This shows that $F$ must be identically zero. This means $\prod_{j=1}^6 (f|_4A_j) \equiv 0$ . Hence $f \equiv 0$ . $\square$

Theorem 37. $\dim \mathcal{M}_4(\Gamma_0(4)) = 3$ .

Proof. The map

$\begin{cases}\mathcal{M}_4(\Gamma_0(4)) &\to \mathbb{C}^3 \\ \sum_{n \geq 0} a_n e^{2 \pi i n z} & \mapsto (a_0, a_1, a_2)\end{cases}$

is linear and injective by Lemma 36, so $\dim \mathcal{M}_4(\Gamma_0(4)) \leq 3$ . Proposition 35 shows that $\dim \mathcal{M}_4(\Gamma_0(4)) \geq 3$ . $\square$

In particular, this means $\{E_4, E_4 \circ 2, E_4 \circ 4\}$ forms a basis for $\mathcal{M}_4(\Gamma_0(4))$ .

The Secret Formula

Now that we know that $\theta^8 \in \mathcal{M}_4(\Gamma_0(4)) = \operatorname{Span}_\mathbb{C}(\{E_4, E_4 \circ 2, E_4 \circ 4\})$ , there exists $\lambda_1, \lambda_2, \lambda_4 \in \mathbb{C}$ such that $\theta^8 = \lambda_1 E_4 + \lambda_2 (E_4 \circ 2) + \lambda_4(E_4 \circ 4)$ . Comparing the Fourier coefficients at infinity, writing $\theta^8(z) = \sum_{n \geq 0} r_n e^{2 \pi i n z}$ and $E_4(z) = \sum_{n \geq 0} a_n e^{2 \pi i n z}$ , we get

$\begin{align*} r_0 &= \lambda_1 a_0 + \lambda_2 a_0 + \lambda_4 a_0 \\ r_1 &= \lambda_1 a_1 \\ r_2 &= \lambda_1 a_2 + \lambda_2 a_1. \end{align*}$

We can run a very basic computer program (as written at the beginning of the article) to compute $r_0, r_1, r_2$ . We have $r_0 = 1; r_1 = 16; r_2 = 112.$ Recall Theorem 17: for $n \geq 1$ ,

$a_n = \frac{(2\pi i)^k}{\zeta(k)(k-1)!} \sigma_{k-1}(n).$

In particular, $a_0 = 1; a_1 = \frac{16\pi^4}{\zeta(4)3!} = 240; a_2 = \frac{16\pi^4}{\zeta(4)3!}\sigma_3(2) = 240 \cdot (1^3 + 2^3) = 2160$ , recalling the fact that $\zeta(4) = \frac{\pi^4}{90}$ . We’re left with solving the system

$\begin{align*} 1 &= \lambda_1 + \lambda_2 + \lambda_4 \\ 16 &= \lambda_1 \cdot 240 \\ 112 &= \lambda_1 \cdot 2160 + \lambda_2 \cdot 240. \end{align*}$

We get $\lambda_1 = \frac{1}{15}$ , $\lambda_2 = -\frac{2}{15}$ , and so $\lambda_4 = \frac{16}{15}$ . This shows that $\theta^8 = \frac{1}{15}E_4 - \frac{2}{15}(E_4 \circ 2) + \frac{16}{15}(E_4 \circ 4)$ . Furthermore, we have the way to compute $r_n$ now!

Theorem 38. $r_n = 16 \sigma_3(n) - 32 \sigma_3(\frac{n}{2})\mathbf{1}_{2 \mid n} + 256\sigma_3(\frac{n}{4})\mathbf{1}_{4 \mid n}$ for all $n \geq 1$ .

Proof. We’ve seen that $\theta^8 = \frac{1}{15}E_4 - \frac{2}{15}(E_4 \circ 2) + \frac{16}{15}(E_4 \circ 4)$ . This means $r_n = \frac{1}{15}a_n - \frac{2}{15}a_{\frac{n}{2}} + \frac{16}{15}a_{\frac{n}{4}}$ , where $a_t := 0$ for $t \notin \mathbb{N}$ . We’ve also seen that

$a_n = \frac{(2\pi i)^4}{\zeta(4)3!} \sigma_3(n) = \frac{16\pi^4}{\frac{\pi^4}{90} \cdot 6}\sigma_3(n) = 240\sigma_3(n).$

This completes the formula. $\square$

We’re now ready to write an optimized code:

n = int(input())
def sigma3(n):
	ret = 0
	for d in range(1, int(n**0.5)+1):
		if d**2 == n:
			ret += d**3
		elif n % d == 0:
			ret += d**3 + (n//d)**3
	return ret
if n == 0:
	print(1)
else:
	t1 = 16*sigma3(n)
	t2 = -32*sigma3(n // 2) if n % 2 == 0 else 0
	t3 = 256*sigma3(n // 4) if n % 4 == 0 else 0
	print(t1+t2+t3)

And yes, this runs in time $O(\sqrt{n})$ !

What’s next?

From a computational perspective, this is still inefficient! The running time is not polynomial in $\log n$ . Is there a better algorithm? Is there a polynomial time algorithm? I don’t know! Sounds like an interesting problem to work on.

From a mathematical perspective, this use case is very limited. Can we generalize? Can we use tool like this to count the number of representations of an integer into sum of $k$ squares? Turns out that we can generalize a bit! But for $k$ odd, we’ll have to look into half-integer weight modular forms. Also, for higher $k$ there might be problems involving cusp forms, and the dimension might be large. (I don’t know)

However, this already looks like a cool computational problem! I hope you enjoy it!

References

[Serre] A course in arithmetic, Jean-Pierre Serre, 1973.
[Kowalski & Vego] Number Theory II: Introduction to Modular Forms, Emmanuel Kowalski and Ana Marija Vego, 2025. URL: https://metaphor.ethz.ch/x/2025/fs/401-3112-23L/
[Stein] “Modular Forms, a Computational Approach”, William Stein, 2007.
[Miyake] Modular Forms, Toshitsune Miyake, 1989.
[Whittaker & Watson] A Course of Modern Analysis, Edmund T. Whittaker and George N. Watson, 1927.
[Koblitz] Introduction to Elliptic Curves and Modular Forms, Neal Koblitz, 1993.
[Diamond & Shurman] A First Course in Modular Forms, Fred Diamond and Jerry Shurman, 2005.

Invitation to Modular Forms: Representation of an integer into sum of eight squares

Introduction

Acknowledgement

Modular Forms

The Fundamental Domain

At Infinity

Fourier Expansion

The Valence Formula

First Examples of Modular Forms: Eisenstein Series and Poincaré Series

The Space of Holomorphic Modular Forms

Fourier Transform and Poisson Summation Formula

The Fourier Coefficients of Eisenstein Series

Theta Functions

Sum of Eight Squares of Integers

The Subgroup Γ0(4)\Gamma_0(4)Γ0​(4)

The Space M4(Γ0(4))\mathcal{M}_4(\Gamma_0(4))M4​(Γ0​(4))

The Secret Formula

What’s next?

References

The Subgroup $\Gamma_0(4)$

The Space $\mathcal{M}_4(\Gamma_0(4))$