The large sieve for self-dual Eisenstein series of varying levels

We prove an essentially optimal large sieve inequality for self-dual Eisenstein series of varying levels. This bound can alternatively be interpreted as a large sieve inequality for rationals ordered by height. The method of proof is recursive, and has some elements in common with Heath-Brown's quadratic large sieve, and the asymptotic large sieve of Conrey, Iwaniec, and Soundararajan.

A particularly interesting choice of λ m,n is λ f (n), where f ranges over a family F of automorphic forms or L-functions, n ranges over an interval of positive integers, say N/2 < n ≤ N, and λ f (n) is the n-th Dirichlet series coefficient of the L-function L(f, s).In this case, we write ∆(F , N) for the norm of this large sieve matrix, namely The dual norm ∆ * (F , N) is given by The classical multiplicative large sieve inequality concerns the case where λ f (n) = χ(n), and where the family runs over primitive Dirichlet characters χ modulo q, with q ≤ Q. Applications include the Bombieri-Vinogradov theorem, estimates for moments of L-functions, zero density estimates, and a variety of sieving problems.See [M] for details.
There are many works on large sieve inequalities for other families.For instance, Deshouillers and Iwaniec [DI] obtained a sharp bound for cusp forms on GL 2 , which in turn has been a powerful tool in studying statistical properties of the Riemann zeta function on the critical line.Heath-Brown [H-B] showed an essentially optimal upper bound on the sparse sub-family of quadratic Dirichlet characters.Many state of the art works on quadratic twists of modular forms, with elliptic curves being of particular interest, have relied on Heath-Brown's bound.
In this paper, we are interested in the following family F .For any Dirichlet character ψ modulo r and real number t, define Here ab=n λ ψ,t (a, b) =: λ ψ,t (n) is the n-th Hecke eigenvalue of a self-dual Eisenstein series on Γ 0 (r 2 ), and when ψ is primitive, the Eisenstein series is a newform.Let k be a positive integer, and let θ run over all Dirichlet characters modulo k.Let Q ≥ 1 be a real number, and for each Q/2 < q ≤ Q with (q, k) = 1, let χ run over primitive Dirichlet characters modulo q.Finally, let T ≥ 1 be a real number, and let |t| ≤ T .Then define F to consist of the characters χθ, with corresponding data λ χθ,t (a, b), with N/2 < ab ≤ N and (a, b) = 1.We write which agrees with ∆(F , N) for this family F .The dual norm ∆ * (Q, k, T, N) is given by As a 'trivial' bound, which we mainly state for a frame of reference, one may deduce from the classical large sieve inequality the bound Deducing the estimate (1.5) uses the idea of the Dirichlet hyperbola method, by summing over a ≤ √ N trivially, and applying the classical large sieve to the sum over b ≪ N/a.The condition (a, b) = 1 may be easily overlooked, yet it is vital.The above sketch shows that the trivial bound (1.5) holds even without the condition (a, b) = 1.In fact, if the condition (a, b) = 1 were to be omitted in (1.3), then the term of size Q 2 kT √ N in (1.5) would not be removable, because one could choose α a,b to be the indicator function that a = b in (1.3).For this, note λ χ,t (a, a) = 1 for a coprime to the modulus of χ.Therefore, any substantial improvement over this trivial bound must use the condition (a, b) = 1.The restriction (a, b) = 1 is similar in spirit to the (necessary) square-free restriction when studying quadratic characters, as in [H-B]; for more on this point, see Section 1.4.1.We also observe that choosing α a,b = α ab to depend only on the product ab would give rise to sums of the form n α n λ ψ,t (n) appearing in (1.3).Then considering n = p 2 would lead to a large term as discussed above.
1.2.Main results, and discussion.Theorem 1.1.We have This estimate is optimal (up to the ε-aspect) by general principles (see [IK,Chapter 7]).We may interpret this as a spectral large sieve inequality for the family of trivial nebentypus newform Eisenstein series on Γ 0 (q 2 k 2 ), with varying level q.Theorem 1.1 appears to be the first sharp large sieve inequality for a GL 2 family with varying levels.The classical large sieve inequality can be interpreted as a GL 1 large sieve inequality, while Heath-Brown's celebrated quadratic large sieve can be viewed as an estimate for the sub-family of self-dual GL 1 forms.The GL 2 families of varying nebentypus do not seem to have strong orthogonality properties, as shown by Iwaniec and Li [IL].
We also have an additive character variant on Theorem 1.1.
Theorem 1.2 follows quickly from Theorem 1.1, by the method in [IK,Section 7.5].We have omitted the T and k aspects solely to simplify the expressions; hybrid bounds analogous to (1.6) hold for the additive characters as well.
We may interpet Theorem 1.1 as a large sieve inequality for rationals, which we now explain.Let v p be the usual p-adic valuation.For q ≥ 1, let Q (q) = {x ∈ Q : v p (x) ≥ 0 for all p | q}, which is a ring.Indeed, with the multiplicative set S defined by S = {n ∈ Z : (n, q) = 1}, then Q (q) = S −1 Z, the localization of Z by S.There exists a natural reduction map red q : Q q → Z/qZ.The reduction map may be restricted to Q × (q) = {x ∈ Q : v p (x) = 0 for all p | q}, which is a multiplicative subgroup of Q × .If χ is a Dirichlet character modulo q, and n ∈ Q × (q) , then define χ(n) by χ(red q (n)).That is, if n = a/b ∈ Q × (q) , then χ(n) = χ(ab).For n = a/b ∈ Q × in lowest terms, define ht(n) = |ab|, which is a cousin of a height function.Note that |{n ∈ Q × : ht(n) ≤ X}| = X 1+o(1) .Theorem 1.3.We have This is simply a restatement of Theorem 1.1 in this notation, with k = 1 and the omission of T .These specializations are not necessary, and are only in place to de-clutter the statement.
From Theorem 1.3 one can also easily derive results about rationals ordered by the more standard height function.For n = a/b ∈ Q × in lowest terms, let Ht(n) = max (|a|, |b|).Note that ht(n) ≤ Ht(n) 2 , from which we immediately deduce: Corollary 1.4.We have q≤Q * χ (mod q) n∈Q × (q) Ht(n)≤N This is sharp, since |{n ∈ Q × : Ht(n) ≤ X}| = X 2+o (1) .Since Theorem 1.3 easily implies Corollary 1.4, but not vice-versa, this supports our usage of ht in place of Ht.
For n ∈ Q × , one may define α n = e(n), or α a/b = e b (a), etc.These examples illustrating Theorem 1.3 are somewhat similar to the quantities studied in [DFI].
The proof of Theorem 1.1 attacks the problem from both sides, via ∆ and ∆ * .In this sense, the proof has new features not seen in previous large sieve inequality bounds.Very briefly, the strategy of proof is as follows.If N ≫ Q 2 kT , then we study the dual norm ∆ * and apply the functional equation of Dirichlet L-functions.The dual side is effective in this range of parameters because the functional equation will shorten the lengths of summation.On the other hand, if N ≪ Q 2 kT , then we more directly study the family average.The main tool on this side is the divisor-switching method used by Conrey-Iwaniec-Soundararajan on the asymptotic large sieve [CIS] (see also [H, p.210]).On both sides, we derive a recursive bound which relates the norm to itself, but with different (smaller) parameters.
When N ≈ Q 2 kT , then both methods are essentially circular.The key to breaking out of this deadlock is to use monotonicity, lengthening one of the sums.The use of the functional equation and monotonicity were both crucial tools in Heath-Brown's quadratic large sieve.A major difference between our method and Heath-Brown's is that in the quadratic case, the norm was almost self-dual by quadratic reciprocity.This property completely fails in our situation.
We now discuss the two main workhorse results used to prove Theorem 1.1, both of which require defining some variants on ∆.Let Theorem 1.1 will show these norms are essentially the same order of magnitude.On a first pass, the reader is encouraged to think of ∆ ′ (Q, k, T, N) as ∆(Q, k, T, N) itself.Another notational convenience is to write and similarly for other norms, such as ∆ ′ .In practice, the choices of ε will be either unimportant, or apparent from the context, and no confusion should arise from suppressing them on the left hand side of (1.9).
We also derive a recursive bound on ∆ by the family average approach.
Theorem 1.6 (Recursive family average).Suppose Q 2 kT ≫ N(QkT ) −ε .Then The proofs of Theorems 1.5 and 1.6, appearing in Sections 4 and 5, respectively, are logically independent, and can be read in either order.Although very different in the fine details, the two proofs have important structural similarities.Because of the logical independence of these two sections, and due to the strong analogies, we have deliberately chosen to 'refresh' notation when passing from Section 4 to Section 5.Even more, we have structured the proofs in a similar way, and chosen notation to help draw the reader's attention to analogous quantities in the two proofs.
Our main interest in Theorem 1.1 is with k = T = 1.However, the recursive nature of the proof and the appearance of the generalized norm ∆ ′ in Theorems 1.5 and 1.6 forces us to consider more general values of k and T .1.3.Applications.The classical large sieve has a wealth of important applications, and we consider some variants for the new rational large sieve (Theorem 1.1).The literature in analytic number theory on sieving problems for the rational numbers is relatively sparse.The authors of [EEHK, Z] give versions of Gallagher's larger sieve for rationals, and deduce some impressive algebraic applications.More applications could be of great interest.
Consider the following sieving problem.Let N = {n ∈ Q >0 : ht(n) ≤ N}.Let P be a finite set of prime numbers.For each p ∈ P, let Ω p ⊂ Z/pZ.Define the sifted set Note that if v p (n) = 0, then red p (n) may not be defined.Let ω(p) = |Ω p |, and suppose that ω(p) < p for all p ∈ P. Let h(p) = ω(p) p−ω(p) for p ∈ P, and h(p) = 0 for p ∈ P, and extend h multiplicatively on the squarefree integers.Define H = q≤Q h(q).Proposition 1.7.With the above notation, we have One can prove this following the method of [IK,Theorem 7.14].Alternatively, see [K,Proposition 2.3] for a proof in much greater generality.For a nontrivial result, one needs H ≫ N ε , which is more restrictive than in the classical arithmetic large sieve.
A standard application of the classical large sieve is to let Ω p consist of p−1 2 residue classes chosen arbitrarily, for all p ≤ Q.Then H ≫ Q, and taking We also present a Barban-Davenport-Halberstam type theorem.Suppose that α n is a sequence supported on Q >0 , with ht(n) ≤ X.We assume a weak Siegel-Walfisz type condition for the sequence, as follows.Define For χ = χ ′ χ 0 with χ ′ of conductor r > 1, and χ 0 trivial modulo s, we assume for some k-fold divisor function τ k , and all r ≤ (log X) B .
We prove Proposition 1.8 in Section 3. 1.4.Proof sketches.Here we present some overly-simplified outlines of the proofs.In this section we freely drop factors of size (Q 2 kT N) ε , as if they were 1.
1.4.1.Theorem 1.5.For simplicity, we omit the t-aspect, and write ∆(Q, k, N) for the norm.For a bump function w supported on [1/2, 2], consider The condition (a, b) = 1 is necessary but difficult to use.In comparison to the quadratic large sieve, this condition is analogous to the restriction to fundamental discriminants.Inspired by this similarity, and following [H-B], let 1 ≤ Y < N/10 to be chosen later, and note S ≤ S >Y where We then write S >Y = S ∞ −S ≤Y , where S ≤Y has ab Next shift contours to the line ε, passing a pole at s = 1/2.The contribution to S ≤Y from the new contour is essentially This term (1.13) is not satisfactorily bounded on its own.Indeed, even if we accept Theorem 1.1, then by breaking up into dyadic segments M/2 < ab ≤ M, with 1 ≤ M ≤ Y , we can at best bound (1.13) by max The former term of size Q 2 k √ N is the culprit, and matches with (1.5).Luckily, and crucially, the term (1.13) will partially cancel with another term from S ∞ .This cancellation property also appeared in [H-B].
Next consider S ∞ .Opening |T (a, b)| and applying the Mellin inversion formula gives where Φ = χ 1 χ 2 θ 1 θ 2 .Unfortunately, Φ may not be primitive, and this complicates the application of the functional equation.For this sketch, we consider the two extremes, where either Φ is primitive of conductor q 1 q 2 k, or where Φ is trivial.The trivial case is easy to control, since this means χ 1 = χ 2 (whence q 1 = q 2 ) and θ 1 = θ 2 .This gives rise to a diagonal term of acceptable size O(N|β| 2 ).For the primitive characters, we shift contours to the line −1, change variables s → 1 − s, and apply the functional equation.This gives (roughly) where γ(s) is the product of gamma factors in the completed L-function of L(s, Φ)L(s, Φ).
Next re-open the Dirichlet series and rearrange, giving Letting g = (a, b), replacing a by ga and b by gb, and summing over g, we obtain as the sum can be truncated at ab (by shifting the contour far to the right).Next we shift contours back to the line ε, crossing a pole at s = 1/2.This polar term has a nice simplification, and takes the same form as (1.13), but with ab truncated at then causes these two polar terms to cancel!The contribution on the line ε essentially becomes bounded by , in line with Theorem 1.5.
1.4.2.Theorem 1.6.For simplicity take k = 1 and omit t, and write ∆(Q, N) for the norm.For a bump function w, let The condition that χ is primitive is necessary but difficult to use.In analogy with the proof of Theorem 1.5, let Y < Q/10, and define Then S ≤ S >Y , by positivity.Again, write S = S ∞ − S ≤Y where S ≤Y has characters modulo q with cond(χ) ≤ Y and S ∞ has χ ranging over all characters of modulus q.
For S ≤Y , replace q by qq 0 and χ by χχ 0 where (the new) χ has conductor q, and χ 0 is trivial.Ignoring coprimality, we have T (χχ 0 , t) ≈ T (χ, t).Applying Mellin inversion, and summing over q 0 to form a zeta function, we obtain We shift contours to the line ε, passing a pole at s = 1 only.This polar term takes the form On the new line ε, we essentially obtain an expression of size ∆(Y, N)|β| 2 .This polar term is the analog of (1.13), and as before, it is not satisfactorily bounded on its own.Indeed, Theorem 1.1 would imply that at best it is bounded by Here the term QN is the culprit, and as before, we will cancel this polar term with one arising within S ∞ .Now consider S ∞ .Opening the square and applying orthogonality of characters gives where w 1 (x) = xw(x).The range of possible values of gcd(a 1 b 2 , a 2 b 1 ) causes some arithmetical difficulties.For this sketch, we consider the two extreme cases, where either they are coprime, or where a 1 b 2 = a 2 b 1 , which we call the diagonal case.Since (a 1 , b 1 ) = (a 2 , b 2 ) = 1, the diagonal reduces to a 1 = a 2 and b 1 = b 2 , giving a term of size O(Q 2 |α| 2 ), which is acceptable.
We now focus on the case (a 1 b 2 , a 2 b 1 ) = 1.Write a 1 b 2 = a 2 b 1 + qr, which we now interpret as a 1 b 2 ≡ a 2 b 1 (mod r), with q = a 1 b 2 −a 2 b 1 r .Note typically r ≪ N/Q, so this reduces the modulus when Q 2 ≫ N.This leads to Next we detect the congruence with characters modulo r, as in [CIS], giving Since the characters are not primitive, replace χ by χχ 0 and r by rr 0 where the new χ has conductor r, and χ 0 is trivial modulo r 0 .Applying Mellin inversion, and evaluating the r 0 -sum in terms of a zeta function, we obtain that S ∞ is roughly Next we shift contours to the line −1 + ε, passing a pole at s = 0 only.Note that w 1 (0) = w(1).This polar term nicely simplifies, and takes the same form as (1.14), but with r truncated at N/Q instead of Y .Taking Y = N/Q causes the two polar terms to cancel.
Next consider the integral along the line −1 + ε.The variables a i , b i are not separated, but one might hope that this is only a technical issue solvable with integral transform techniques (indeed, see Lemma 5.2).We might then expect that the contribution from the new line of integration to be bounded by Q 2 N ∆(N/Q, N)|α| 2 , which is consistent with Theorem 1.6.The wealth of extra parameters in the definition of ∆ ′ in (1.8) are there to account for the overlooked conditions (both arithmetical and archimedean).
1.4.3.Reflections.The similarities between the proofs are remarkable, even if the fine details are different.We also observe that the divisor-switching method used in the proof of Theorem 1.6 is analogous to the functional equation of the Dirichlet L-functions used for Theorem 1.5.At the cost of some exaggeration, one might call the divisor switch itself a functional equation.In support of this, consider the family of functions τ s (n) = ab=n (a/b) s , which does indeed satisfy the functional equation τ −s (n) = τ s (n), by the divisor switch.Moreover, these coefficients τ s (n) appear as Fourier coefficients of the level 1 Eisenstein series, and the functional equation of the Eisenstein series is entwined with the functional equation of its Fourier coefficients.
1.4.4.Theorem 1.1.Theorem 1.1 is deduced from Theorems 1.5 and 1.6 in Section 2. The proof uses that the norm ∆ is monotonic, and applies the two self-referential theorems in a recursive manner.In retrospect, some of these ideas have similarities to elements used in [BI1,BI2].We use the notation A B as a synonym for 1.6.Acknowledgments.I thank Henryk Iwaniec and Emmanuel Kowalski for valuable comments.I am also grateful to the referee for a careful reading which uncovered several inaccuracies.

Deduction of Theorem 1.1
In this section, we use Theorems 1.5 and 1.6 to prove Theorem 1.1.
2.1.Monotonicity.As in the quadratic large sieve [H-B], it is vital that the norm ∆(Q, k, T, N) is essentially monotonic in the N-and Q-components.The proofs differ a bit depending on the case, but the overall theme is similar, and based on an idea of Forti and Viola [FV].
Lemma 2.1.Suppose P ≫ log QN with a large (but absolute) implied constant.Then there exists a prime p ∈ [P, 2P ] so that Proof.Since k and T are frozen, we suppress them from the discussion, writing ∆(Q, N) in place of ∆(Q, k, T, N).Let γ a,b be complex numbers supported on N/2 < ab ≤ N, and (a, b) = 1.Let P ≥ 1 be a parameter to be chosen, and let P * denote the number of primes p ∈ [P, 2P ].The prime number theorem implies P * ∼ P log P .Now we have For the terms with p|q, we simply use 1 Taking P ≫ log Q large enough so that P * log P ≥ 2 log Q, and rearranging, we obtain Next we separate the values of a and b to make two sub-sums corresponding to (p, ab) = 1 and p|ab.This gives We bound the terms with p|ab similarly to the treatment of p|q, giving max We choose P ≫ log N large enough so that 4 log N P * log P ≤ 1 2 , whence Now we freely multiply by |χ(p)| 2 , which has absolute value 1 since p ∤ q.In addition, we change variables Lemma 2.2.Suppose P ≫ log NQ with a large (but absolute) implied constant.Then there exists a prime p ∈ [P, 2P ] so that Proof.Since k and T are frozen, we suppress them in the notation.Let P ≥ 10 to be chosen, and let P * * = P ≤p≤2P * ψ(mod p) 1, so P * * ≍ P 2 log P .We have For the terms with p|ab, we simply use 1 1 ≤ 2P log N P * * log P , and choose P ≫ log N large enough so that 2P log N P * * log P ≤ 1 2 .For the terms with p ∤ ab, we freely multiply by |ψ(a)ψ(b)| 2 , which is 1 for such primes.This gives Next we separate the values of q to make two sub-sums corresponding to (p, q) = 1 and p|q.This gives We upper bound the terms with p|q, giving We choose P ≫ log Q large enough so that 4P log Q P * * log P ≤ 1 2 , whence Now χψ is a character of conductor pq, with pQ/2 ≤ pq ≤ pQ, so we obtain Remark.The norm ∆ is also monotonic in the k and T -aspects, but this property is not needed in this work, so we do not give proofs.

Relations between norms.
To simplify the recursive steps in the proof of Theorem 1.1, it is convenient to have the following relations.
Lemma 2.3.Suppose that there exists e > 1 such that Lemma 2.4.Suppose that there exists e > 1 such that Proof.The proofs of both lemmas follow from the definitions (1.8) and (1.9).

The recursions.
Proposition 2.5.Suppose that there exists e > 1 such that By Lemma 2.3, we can use the assumption (2.1) to obtain .
We also have a complementary version: Proposition 2.6.Suppose that there exists e > 1 such that By Lemma 2.4, we can use the assumption (2.2) to obtain 1) .

Proof of Proposition 1.8
The following proof is based on [IK,Section 17.2].Decomposing with Dirichlet characters and applying orthogonality gives Write q = q 0 q ′ and χ = χ 0 χ ′ , where χ has conductor q ′ .Then (3.1) is at most We break up this sum according to q ′ ≤ Q 0 = (log X) B and q ′ > Q 0 .For q ′ ≤ Q 0 , we apply the S-W condition (1.12), giving a bound of the form The terms with Q 0 < q ′ ≤ Q/q 0 are bounded by For R ≤ (XQ) 1/10 , we use the "ε-free" bound ∆(R, X) ≪ (R 4 + X log X) (see (1.5)), while for R > (XQ) 1/10 , we use Theorem 1.1.In total, we obtain the following bound for the terms with q ′ > Q 0 : 4. Proof of Theorem 1.5 4.1.Miscellany.We begin with some miscellaneous results that will be useful later.
Let ω and ω 0 be as in Definition 4.1.Then for Proof.A tedious but straightforward calculation with Stirling's approximation gives , where the c i are some polynomials in s, of degree at most 2i+1.This provides an asymptotic expansion as r → ∞ provided s ≪ |r| 1/2 , say.From this, one may derive By Fourier inversion, we have Integration by parts, aided with (4.8), gives (4.5).For T ′ = 1 and |s| ≪ T ε , then the asymptotic Stirling formula does not hold, yet we can claim a crude but uniform upper bound of the form γ (j) (r) ≪ (T ε ) j , which suffices to give (4.7).

Preparation.
Here we begin the proof of Theorem 1.5.Choose a nonnegative smooth weight function w, with w(x) ≥ 1 for 1/2 ≤ x ≤ 1, and w(x) = 0 for x < 1/4 and for x ≥ 2. From (1.4), we have We will assume that β χ,θ,t is supported on and that an otherwise un-labeled integral/sum over t,q,χ,θ is implied to run over this domain.
In particular, we will often suppress these conditions and recall them only when needed.
To prove Theorem 1.5, it suffices to prove the bound for χ and θ of fixed parities, so for convenience we also assume that this condition is enforced by the support of β χ,θ,t .Let 1 ≤ Y ≤ N 100 be a parameter to be chosen later.Then S ≤ S >Y , where (4.12) by positivity, since if (a, b) = 1, then the condition ab > Y is redundant to the support of w(ab/N).By simple inclusion-exclusion, we have where for * ∈ {Y, ∞}, S ≤ * corresponds to the sum over ab (a,b) 2 ≤ * .We will often write S ∞ as an alias for S ≤∞ .
One of the main issues with applying the functional equation is that, after opening the square, we obtain a character of the form χ 1 χ 2 θ 1 θ 2 which may be imprimitive.In order to facilitate the problem of controlling the conductor, we will apply some combinatorial-type decompositions.These preparatory results are bookended by Lemmas 4.5 and 4.11.
Lemma 4.5 (Detecting primitivity).Let q ≥ 1 be an integer.There exist complex numbers c ℓ = c ℓ (q) supported on a finite set of integers with the following two properties: , where τ (q) denotes the number of divisors of q.
Proof.Suppose ψ has conductor q * .Consider the expression The inner sum inside the parentheses is 1 if q * divides q/d (equivalently, d divides q/q * ), and 0 otherwise.Hence the above sum evaluates as d|q/q * µ(d), which by Möbius inversion is the indicator function that q * = q, i.e., that ψ is primitive.To finish the proof, we can let c ℓ be supported on 1 ≤ ℓ ≤ q + 1, and let (4.13) µ(q/e) q/e , so that ℓ |c ℓ | ≤ τ (q).Suppose q, r ≥ 1 are integers with r|q.Let G q (resp.G r ) be the group of Dirichlet characters modulo q (resp.r).By a slight abuse of notation, we can view G r as a subgroup of G q , by multiplying every element of G r by the trivial character modulo q.
Lemma 4.6.Let q, r, G q , and G r be as above.Let F (χ 1 , χ 2 ) be a function defined on pairs of Dirichlet characters modulo q.Then Remark.Lemma 4.6 is analogous to Lemma 4.2.
Proof.The condition that χ 1 χ 2 has modulus r means that χ 1 χ 2 ∈ G r .Now say G q = ∪ γ γG r , where γ runs over G q /G r .By basic group theory, we can write uniquely χ 1 = γψ 1 and Corollary 4.7 (Separation of variables).Let notation be as in Lemma 4.6.Then Proof.We first apply Lemma 4.6 to detect that χ 1 χ 2 has modulus r, and then use Lemma 4.5 to detect that ψ 1 ψ 2 is primitive.
and where δ runs over coset representatives of G k /G k ′ .Lemma 4.9.Let k ≥ 1 be an integer, and let b θ be any sequence of complex numbers indexed by Dirichlet characters θ modulo k.Then we have a decomposition of the form which can alternatively be written as Proof.Begin by opening the square, obtaining a double sum θ 1 ,θ 2 (mod k) b θ 1 b θ 2 .Parameterizing the sum according to the conductor (say k ′ ) of θ 1 θ 2 , we obtain Next we apply Corollary 4.7 with With a further factorization , we obtain (4.14).The variant (4.15) is similar.
We also need more elaborate versions of Definition 4.8 and Lemma 4.9 to handle χ of varying modulus.
Definition 4.10.For i = 1, 2, suppose χ i is primitive of conductor q i .Factor (4.16) i , where χ ′ i has conductor q ′ i , χ + i has conductor q + i , and so on, and the factorization is defined in terms of local information as follows.
(i) The primes making up q ′ 1 are those that divide q 1 but do not divide q 2 , and likewise the primes in q ′ 2 are those that divide q 2 but not q 1 .(ii) The factors q + 1 and q − 2 are characterized by 1 ≤ v p (q − 2 ) < v p (q + 1 ) for all p|q + 1 .Similarly, q + 2 and q − 1 are characterized by 1 ≤ v p (q − 1 ) < v p (q + 2 ) for all p|q + 2 .(iii) The remaining factor r corresponds to the primes where v p (q 1 ) = v p (q 2 ).Definition 4.10 is motivated by the fact that (4.17) which has conductor q ′ 1 q ′ 2 q + 1 q + 2 cond(χ 2 ).Let b χ be any sequence of complex numbers indexed by primitive Dirichlet characters χ modulo q, with q varying over a finite set of positive integers.Consider the sum | q,χ b χ | 2 .Opening the square gives a sum of the form q 1 ,q 2 ,χ 1 ,χ 2 b χ 1 b χ 2 .Definition 4.10 shows that the parameters q ′ i , q + i , etc., are uniquely determined.We can then arrange the sum according to the values of these parameters, giving (4.18) , where the reference to (Def.4.10) in the summation conditions indicates the conditions translated into appropriate summation form.
Lemma 4.11.Let b χ be any sequence of complex numbers indexed by primitive Dirichlet character χ modulo q, with q varying over a finite set of positive integers.Then In reference to (4.17), now = ψ 1 ψ 2 |γ| 2 , which has conductor r ′ , so χ 1 χ 2 has conductor q ′ 1 q ′ 2 q + 1 q + 2 r ′ .We are now ready to apply the preceding decompositions to S ≤ * (see (4.12) to infer the definition).Specifically, we apply Lemma 4.9 (in the form (4.15)) and Lemma 4.11, giving (Def 4.10) We remind the reader that there are additional conditions encoded in the support of the coefficients, as recorded in (4.11), which will be recalled as needed.Observe that the finite part of Φ (i.e., omitting m it 1 −it 2 ) is primitive of modulus q ′ 1 q ′ 2 q + 1 q + 2 r ′ k ′ .It is convenient to record here for later purposes that for i = 1, 2, (4.25) At this point our treatments of S ≤ * for * = Y and * = ∞ diverge.

Now we estimate S ′
≤Y (k, q).We arrange the expression to most closely resemble (4.10), specifically Referring back to (1.4), and noting that our new family has varying modulus q ′ i of size , and fixed modulus q + i q − i r ′ k ′ , we see (4.32) Using Cauchy's inequality and monotonicity (Lemma 2.1) leads quickly to (4.26).
4.4.Functional equation side.In this section we will apply the functional equation of Dirichlet L-functions to S ∞ (k, q), picking up from the expression (4.23).To facilitate this, we first apply Möbius inversion, in the form (4.33) To continue the theme of concise notation, let g = (g 1 , g 2 , g 3 , g 4 ), µ(g) = µ(g 1 )µ(g 2 )µ(g 3 )µ(g 4 ), Φ(g) = Φ(g 1 g 3 g 2 g 4 ), and |g| = g 1 g 2 g 3 g 4 .The summation condition on g is that (4.34) g 1 |k 0 , g 2 |k 0 , g 3 |r 0 , g 4 |r 0 , though we will usually suppress this and only recall it as needed.Then S ∞ (k, q) equals g (4.34) holds µ(g) (Def 4.10) We also have need to decompose the t i -integrals to help pin down the archimedean conductor.Applying the partition from Definition 4.1, we obtain that S ∞ (k, q) equals (4.35) (Def 4.10) Define quantities (4.36) and note that among the variables of summation, Q * depends only on the outer variables q, k, and T ′ , while N * depends only on q, k, T ′ , and g.Proposition 4.13.We have a decomposition The diagonal term satisfies the bound and the term E ∞ is negligibly small.Proof of 4.13.Applying the Mellin inversion formula to w and writing the sum over a and b as a product of Dirichlet L-functions in (4.35) gives (Def 4.10) We shift contours to the line −ε, crossing a pair of poles at s = 1 ± i(t 1 − t 2 ), which exist only when Φ is trivial, and let S ′ ∞ (k, q) be the new integral on the line −ε.Recall that the finite part of Φ is primitive of modulus (4.40) , and the rapid decay of w(s) practically forces |t 1 − t 2 | ≪ T ε .It is easy to see that the contribution of this diagonal polar term is consistent with (4.39).
Recall that the parity of the χ i and θ i was assumed to be fixed, so that χ 1 χ 2 θ 1 θ 2 is even, and hence the gamma factor is as stated in (4.4).For later use, note that γ s | s=1/2 = 1.In addition, recall the bound (4.8), which in the present context means γ s (r) ≪ (T ′ ) 2σ−1 .We then obtain (Def 4.10) Next we will re-open the Dirichlet series expansions of the Dirichlet L-functions.A small modification is that we write and likewise for L(s, Φ).This gives (Def 4.10) We then factor out the gcd of a and b, by writing g ′ = (a, b) and changing variables a → g ′ a and b → g ′ b.The sum over g ′ forms a Dirichlet L-function of principal character of modulus qk 0 r 0 , which is given by (4.27).Then S ′ ∞ (k, q) equals g,T ′ µ(g) (Def 4.10) Shifting the integral far to the right shows that the portion of the sum with ab ≫ and hence Thus we can truncate the sum at ab ≤ N * .Let S ′′ ∞ (k, q) denote the contribution to S ′ ∞ (k, q) from the terms with ab ≤ N * .Let q = q 1 q 2 , where Next we apply Lemma 4.5 to detect the condition that θ ′ 1 θ ′ 2 is primitive of modulus k ′ , and likewise for ψ 1 ψ 2 of modulus r ′ .We also apply Möbius inversion to detect (q ′ 1 , q ′ 2 ) = 1, as preceding (4.28).Our final arithmetical separation of variables step is to write and likewise for ρ Φ,k 0 r 0 (indexing the sum with the letter d 2 ).We need an archimedean separation of variables as well, and this is provided by Corollary 4.4.With this, and rearranging, we then obtain where with β 1,j 1 taking the form β * ,T −T ′ /2+T ′ j 1 +t 1 (i.e., with a linear change of variables as in Corollary 4.4), and B 2 is given by a similar definition.
We next the contour of integration back to the line Re(s) = ε, crossing a pole at s = 1/2 only.Let S (0) ∞ (k, q) denote this polar term, and let S ′′′ ∞ (k, q) be the new integral.We record the polar term: Now we turn to S ′′′ ∞ (k, q).By the triangle inequality, and using (4.5) to bound the L 1 norm of η T ′ , we obtain Analogously to (4.32), on the line Re(s) = ε, we obtain the bound (4.46) We note that j 1 |β 1,j 1 | 2 = |β 1 | 2 , since this simply re-assembles the integral to all of [T /2, T ] (also, for each j 1 , the number of j 2 with |j 1 − j 2 | ≤ 1 is at most 3).Applying (4.46) to (4.45) via Cauchy's inequality and using (4.25) (and the previous sentence to handle the sum over the j i ) completes the proof of Proposition 4.13.
4.5.Conclusion.Now we use Propositions 4.12 and 4.13 to prove Theorem 1.5.We have a decomposition The diagonal term is acceptable for Theorem 1.5, as is the small error term E ∞ .
Next we turn to the terms S ′ * (k, q), where * refers to ≤ Y or ∞.We choose with the same value of ε as in the definition of N * (see (4.36)).First consider S ′ ≤Y , where Cauchy's inequality implies Recalling the definition (1.8), it is easy to see that max k,q In summary, we have shown which is consistent with Theorem 1.5.The case of S ′ ∞ is fairly similar to that of S ′ ≤Y , though the details are more complicated.Following similar steps as the case of S ′ ≤Y , and using the AM-GM inequality, we derive plus a similar term with the i = 2 variables (q + 2 , q − 2 , etc.).By symmetry, this latter term will give the same bound as the displayed one.Substituting the values of Q * and N * from (4.36), we obtain (4.49) 1 , shows this is consistent with Theorem 1.5.Finally, we consider the polar terms from s = 1/2, namely S (0) ≤Y (k, q).We need to show there is substantial cancellation between these two terms.To aid in this, we first simplify S (0) ∞ (k, q) which recall is defined in (4.43).Observe that (4.50) ≤Y .To see this, we sum over g and d 1 and d 2 in (4.43) (though modified to read ab ≤ Y in place of ab ≤ N * ).The sum over g is not constrained, and we have For d and d 2 , we have Therefore, these two evaluations perfectly cancel.The sums over j 1 and j 2 can be simplified by using Lemma 4.2 in the reverse order.Moreover, since γ s (t 1 − t 2 ) = 1 at s = 1/2, we can write T ′ ω T ′ (t 1 − t 2 ) = 1.Hence, the partition of unity is fully re-assembled.Comparing (4.29) and (4.44), it is not hard to see that ∞,Y * , which for ease of reference we write directly as follows: Now the estimations are similar to those of S ′ ≤Y and S ′ ∞ , though the details are a little different.Following the same initial steps as in S ′ ≤Y , we obtain We claim this is bounded consistently with Theorem 1.5.To see this, first note The condition "X ≤ C" from (1.8) is easy to check, by setting M = Y /C.This completes the proof of Theorem 1.5.
5. Proof of Theorem 1.6 5.1.Miscellany.Here we present a couple tools with self-contained proofs.
Lemma 5.1.Let c, d be positive integers, and define the Dirichlet series where Z 1,1 (s) has meromorphic continuation to Re(s) > 0 with a simple pole at s = 1 only, and where from which the lemma follows with a bit of calculation.
Lemma 5.2 (Separation of variables).Let ω = ω V be a smooth, even, function supported on [−2V, 2V ], where V > 0, satisfying ω (j) V (x) ≪ V −j , for all j = 0, 1, . . . .Let w(x, y, z, w) be smooth of compact support on R 4 >0 .Let g be a Schwartz-class function.Define F : R 4 where T , X, Y are positive parameters.Let R = V XY , and set U = max(T, R −1 ).Then where G (depending on T, V, X, Y ) satisfies the bound for any A > 0 , then one may apply the lemma to ω(x, s), giving rise to a family of functions G = G s .The proof shows that G s satisfies (5.3) with an implied constant depending polynomially on s.

Proof. By Mellin inversion,
(5.4) In (5.5), change variables It is to check that and that x 1 is concentrated on x 1 = 1 + O(min(R, T −1 )), from whence integration by parts gives Taking Re(u i ) = 0 and defining G on R 4 appropriately completes the proof.

Preparation.
It is convenient to work with a couple modified norms that are closely related to (1.3).Define , and in the other direction, we have Secondly, define It is easy to see that ∆ 1 (Q, k, T, N) ≤ ∆ 2 (Q, k, T, N), since when (q, k) = 1, the map (χ, θ) → χθ is a bijection onto the set of primitive characters modulo qk.After having done this, we then arrive at (5.8) by dropping the condition (q, k) = 1, by positivity.For the proof of Theorem 1.6, we will bound the norm ∆ 2 .Indeed, we can deduce Theorem 1.6 from the bound Let w be a nonnegative smooth weight function with w(x) ≥ 1 for 1/2 ≤ x ≤ 1, and w(x) = 0 for x < 1/4 and for x ≥ 2. Then ∆ 2 (Q, k, T, N) ≤ max |α|=1 S, where dt.
A simple argument with a dyadic partition of unity and Cauchy's inequality shows that Hence, in the proof of Theorem 1.6, we may assume that a and b are each supported in dyadic ranges, say a ≍ N 1 and b ≍ N 2 .
Let 1 ≤ Y ≤ Q 100 be a parameter to be chosen later.For ψ (mod qk), say qk = q ′ (dk) where d|k ∞ and (q ′ , k) = 1, and write ψ = ψ k ψ ′ where ψ k has modulus dk and ψ ′ has modulus q ′ .Let m k (ψ) = dk denote the modulus of the k-part of ψ, and cond q ′ (ψ) denote the conductor of ψ ′ , i.e., the coprime to k part of ψ.Then S ≤ S >Y , where by positivity, since if ψ is primitive modulo qk, then cond q ′ (ψ)m k (ψ) = cond(ψ) = qk.This uses that the condition qk > Y k is redundant to the support of w(q/Q).
We begin with some arithmetic manipulations that are in common between S ∞ and S ≤Y .Opening the square, we have
The e i are pairwise coprime, by the support of γ.Thus and A 2 has a similar definition.Lemma 5.4 follows by using and monotonicity (Lemma 2.1).
Proposition 5.5.We have a decomposition ∞ is given by (5.34) below, and S ′ ∞ satisfies the bound The diagonal term satisfies the bound , and the term E ∞ is negligibly small.Proof.We carry on with (5.15) and apply orthogonality of characters to the sum over ψ.This picks out the congruence γ 1 a 1 b 2 ≡ γ 2 a 2 b 1 (mod kq), however with a side condition (γ 1 γ 2 a 1 a 2 b 1 b 2 , kq) = 1.This side condition can be dropped, since the congruence γ Additionally evaluating the t-integral, in all we obtain be the non-diagonal portion of S ∞ .Write γ 1 a 1 b 2 = γ 2 a 2 b 1 + qkr, where r = 0. Additionally, we detect the condition (q, g 1 g 2 ) = 1 by Möbius inversion in the form d|(q,g 1 g 2 ) µ(d), and substitute q = de.This gives It is convenient to record that the side condition follows from the congruence (5.28) together with the coprimality (γ 1 a 1 b 2 , γ 2 a 2 b 1 ) = 1.We also factor r as r = r 0 r 1 , r 0 |(kg 1 g 2 ) ∞ , (r 1 , kg 1 g 2 ) = 1.With these substitutions, we obtain Next we express the congruences using Dirichlet characters modulo dkr 0 and |r 1 |; this is enabled by the side condition (5.29).This leads to The characters of varying modulus need to be primitive, so we substitute where r |(q ′ ) ∞ , (r 2 , q ′ ) = 1, χ is primitive of modulus |q ′ |, and χ 0 is trivial modulo r 2 .With this, we obtain Let w 1 (x) = x −1 w 2 (x), so w 2 (x) = x 2 w(x), and w 2 (−s) = w(2 − s).In addition, apply the Mellin inversion formula to w 2 .Then we obtain that S ′′ ∞ equals where the summand (sgn) is shorthand for the indicator function that (5.30) sgn(q ′ ) = sgn(γ 1 a 1 b 2 − γ 2 a 2 b 1 ).
Prior to the Mellin inversion formula, (5.30) was enforced by the support of w 2 .The sums over r 1 and r 2 evaluate exactly as in (5.21).Thus Now we apply a dyadic partition of unity of the form where ω is smooth, even, and supported on [1, 2] ∪ [−2, −1].By the rapid decay of w, and recalling (5.17), note that Therefore, we may assume that With this partition, we obtain where ω s (x) = x s−1 ω(x).By shifting the contour far to the right, q ′ may be truncated at We next want to apply Lemma 5.2.Note that where recall the support of α implies β 13 a 1 ≍ β 14 a 2 ≍ N 1 and β 23 b 2 ≍ β 24 b 1 ≍ N 2 .We may then freely attach a redundant weight function of the form Now this is set up to apply Lemma 5.2 with g 2 , and with ω = ω s .Observe that with this substitution, then γ 1 a 1 b 2 − γ 2 a 2 b 1 = x 1 y 2 − x 2 y 1 , as desired.This gives (a,g)=1 (sgn) δ a 1 b 1 (s)δ a 2 b 2 (s)α (1,g) plus a small error term.Here G = G s depends on s, via ω s (x) = x s−1 ω(x).We also record (5.33) R = V g 1 g 2 N , and U = N g 1 g 2 V (QkT N) o(1) .Now we the contour to the line Re(s) = ε.In doing so we cross a pole at s = 1, and we denote its residue by S (0) ∞ .There is a small but convenient simplification with the sign condition (5.30), namely that all the summands are independent of sgn(γ 1 a 1 b 2 − γ 2 a 2 b 1 ) and sgn(q ′ ), except for the indicator function that these signs agree.We may therefore take q ′ > 0. We also make a small modification by factoring r 0 = r g r k where r g |(g 1 g 2 ) ∞ and r k |k ∞ .With this simplification and others, we obtain (5.34) S (0) ∞ = QkT G 1 (u 1 , u 2 , u 3 , t) θ (mod dkrgr k ) q ′ ≤Q * (q ′ ,kg 1 g 2 )=1 w(1)Z 1,1 ν q ′ δ kg ϕ(q ′ ) * (1,g) Let S ′ ∞ denote the remaining contour integral along Re(s) = ε.Here we obtain |G s (u 1 , u 2 , u 3 , t)|du 1 du 2 du 3 θ (mod dkr 0 ) q ′ ≤Q * (q ′ ,kg 1 g 2 )=1 * χ (mod q ′ ) (a 1 b 1 ,a 2 b 2 )=1 (a,g)=1 δ a 1 b 1 (s)α (1,g) A small issue here concerns the dependence of G s on s.By the rapid decay of | w(2 − s)|, we may truncate the s-integral at |s| 1.The remark following Lemma 5.2 shows that the family of functions G s have a good uniform bound.We may then truncate the t-integral at U(QkT N) o(1) .Lemma 5.4 allows us to essentially remove the coprimality condition (a 1 b 1 , a 2 b 2 ) = 1; we apply this lemma with M ≪ N g 1 g 2 and γ where recall U is given by (5.33) and Q * was defined by (5.32).Note UV = N g 1 g 2 (QkT N) o(1) .It is convenient to write V = V max /P , where 1 ≪ P ≪ V max , in which case (5.35) simplifies as , dkr g r k , P T, N g 1 g 2 |α (1,g) Recalling the definition (1.8), this completes the proof.
5.5.Conclusion.Now we use Propositions 5.3 and 5.5 to prove Theorem 1.6.Recall that we need to show that S >Y satisfies (5.9), that is where for convenience to the reader we recall the definition (1.8): We have a decomposition The diagonal term S diag ∞ is acceptable for Theorem 1.6, as is E ∞ .Now we turn to the terms S ′ ≤ * .Recall the definitions (5.32) and (5.31).We choose (5.37) Y = (QkT N) ε N QkT , with a value of ε so that when V = V max , then Q * = Y g 1 g 2 rgr k .Using the assumption Q 2 kT ≫ N 1−ε , it is easy to check that (5.19) is acceptable for Theorem 1.6, and also that Y ≤ Q/100, so this is a valid choice of Y .Moreover, (5.25) directly shows that S ′ ∞ is bounded in accord with the theorem.
Finally, consider the polar terms from s = 1, namely S ≤Y given by (5.34) and (5.22).We simplify S (0) ∞ , continuing with (5.34).We reverse the orders of summation between V and q ′ ; the condition q ′ ≤ Q * = CV Qkrgr k (where C here is shorthand for (QkT N) ε ) becomes instead V > C −1 q ′ Qkr k r g (on the inside) and q ′ ≤ Y rgr k g 1 g 2 (on the outside).We then write S ∞,2 has V ≤ C −1 q ′ Qkr k r g .A pleasant feature of S (0) ∞,1 is that the sum over V re-assembles the partition of unity, since G 1 corresponds to ω s (x)| s=1 = ω(x).We also re-open the definition of w.Together, these steps give (5.38) (1,g) Next we further cut this sum into four pieces, via (5.39) Call the corresponding sums S i , for i = 1, 2, 3, 4.There is a pleasant simplification available for S 1 , S 2 , and S 3 .In these three sums, both the summation conditions in (5.39), as well as all the summands in (5.38), depend only on the product dr g = D (say), with the exception of the presence of µ(d).Möbius inversion means that the sum over d|D detects D = 1.This immediately implies S 3 = 0.Moreover, we see that S 1 = S (q ′ ,kg 1 g 2 )=1 1 ϕ(q ′ ) * χ (mod q ′ ) w(1)Z 1,1 ν q ′ δ kg (a 1 b 1 ,a 2 b 2 )=1 (a,g)=1 δ a 1 b 1 δ a 2 b 2 α (1,g) Similarly to the estimation of S ′ ∞ , using Lemma 5.4 we obtain Setting up the problem.A general large sieve inequality is an upper bound on the operator norm of an arithmetically-defined matrix Λ = (λ m,n ), with λ m,n ∈ C. Define the norm of Λ, denoted Λ λ ψ,t (a, b) = ψ(a)ψ(b)(a/b) it .
(a,b) 2 ≤ Y , and S ∞ has a and b unconstrained.These two sums are treated in completely different ways.For S ≤Y , let g = (a, b) and change variables a → ga and b → gb.Ignoring coprimality issues, then T (ga, gb) ≈ T (a, b), and so