Singular Integrals

Let \mathbb{T} = \{ |z|=1\, : \, z\in \mathbb{C}\} be the unit circle. Trigonometric polynomials on the unit circle can be written as finite sums g(e^{i \theta}) = \sum_{k \in \mathbb{Z}} c_{k} e^{ik \theta}. More generally we can consider trigonometric polynomials on product of unit circles \mathbb{T}^{d} = \mathbb{T}\times \cdots\times \mathbb{T} in which case
\begin{aligned} g(e^{i \theta_{1}}, \ldots, e^{i \theta_{d}}) = \sum_{k=(k_{1}, \ldots, k_{d}) \in \mathbb{Z}^{d}} c_{k} e^{i \theta_{1} k_{1}}\cdots e^{i \theta_{d} k_{d}} \end{aligned}
and only finitely many c_{k} are nonzero. It is convenient to denote \xi_{\ell} =e^{i \theta_{\ell}}, \ell=1,\ldots, d, and think about \xi_{1}, \ldots, \xi_{d} as “independent, uniformly distributed random variables on \mathbb{T}”. For a vector k=(k_{1}, \ldots, k_{d}) \in \mathbb{Z}^{d} we can set \xi^{k} = \prod_{j=1}^{d}\xi_{j}^{k_{j}} so that the expression for our trigonometric polynomial g takes a compact form
\begin{aligned} g(\xi) = \sum_{k \in \mathbb{Z}^{d}}c_{k} \xi^{k}. \end{aligned}
In our previous lecture notes we proved (for d=1)

Theorem 1. The Riesz projection

\begin{aligned} P_{+} (\sum_{k \in \mathbb{Z}} c_{k}\xi^{k})=\sum_{k \geq 0} c_{k} \xi^{k} \end{aligned}

and the Hilbert transform

\begin{aligned} \mathcal{H}(\sum_{k \in \mathbb{Z}} c_{k} \xi^{k}) =\sum_{k \in \mathbb{Z}} -i \mathrm{sign}(k)c_{k}\xi^{k}\end{aligned}

both are bounded in L^{p}(\mathbb{T}) for 1<p<\infty i.e., \| P_{+} g\|_{p}\leq C_{p} \|g\|_{p} and \|\mathcal{H}(g)\|_{p}\leq C_{p} \|g\|_{p}. Moreover boundedness of P_{+} (or \mathcal{H}) implies the boundedness of \mathcal{H} (or P_{+}) due to the identity

\begin{aligned} \mathcal{H}(g) = i g + i\mathbb{E}g - 2i P_{+}(g), \end{aligned}

where \mathbb{E}g = \int_{\mathbb{T}}g(\xi) dm(\xi), here dm(\xi) is the uniform probability measure on \mathbb{T}.


The Hilbert transform \mathcal{H} (and in general a “singular integral”) is closely related to another object in probability which is called “martingale transform”. Let us briefly mention their connection (for this we need to increase the dimension of \mathbb{T}^{d} considerably).

It will be convenient to borrow some notations from probability. For example, for a trigonometric polynomial g :\mathbb{T}^{d} \to \mathbb{C} we denote

\begin{aligned} \mathbb{E} g = \int_{\mathbb{T}^{n}} g(\xi) dm(\xi_{1})\ldots dm(\xi_{d}) \end{aligned}

i.e., the average over all variables \xi_{1}, \ldots, \xi_{d}. Sometimes we want to fix some variables, say \xi_{1}, \xi_{2}, \ldots, \xi_{m} and to take the average with respect to the rest of the variables

\begin{aligned}  \mathbb{E} \left( g \, |\, \xi_{1}, \xi_{2}, \ldots, \xi_{m} \right) = \underbrace{\int_{\mathbb{T}} \ldots \int_{\mathbb{T}}}_{d-m} g(\xi_{1}, \xi_{2}, \ldots, \xi_{m}, \xi_{m+1}, \ldots, \xi_{d}) dm(\xi_{m+1})\ldots dm(\xi_{d}) \end{aligned}

We will shortly show

Proposition 1. Boundedness of the Hilbert transform \|\mathcal{H}(g)\|_{L^{p}(\mathbb{T})}\leq C_{p} \|g\|_{L^{p}(\mathbb{T})} implies

\begin{aligned} \| \varepsilon_{1} f_{1}(\xi_{1})+\varepsilon_{2} f_{2}(\xi_{1}, \xi_{2}) +\ldots+\varepsilon_{n} f_{n}(\xi_{1}, \xi_{2}, \xi_{3}, \ldots, \xi_{n})\|_{L^{p}(\mathbb{T}^{n})}\leq \end{aligned}
\begin{aligned} C_{p}^{2}\| f_{1}(\xi_{1})+f_{2}(\xi_{1}, \xi_{2}) +\ldots+ f_{n}(\xi_{1}, \xi_{2}, \xi_{3}, \ldots, \xi_{n})\|_{L^{p}(\mathbb{T}^{n})} \end{aligned}

for all n\geq 1, all choices of signs \varepsilon_{j} = \pm 1, and all trigonometric polynomials f_{1}(\xi_{1}), …, f_{n}(\xi_{1}, \ldots, \xi_{n}) such that

\begin{aligned} \int_{\mathbb{T}} f_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) dm(\xi_{\ell})=0, \end{aligned}

in other words “every time we add one more coordinate \xi_{\ell} to f_{\ell} the average with respect to that coordinate must be zero”.

Remark 1: The sequence F = \{ F_{\ell}\}_{\ell=1}^{N} where

\begin{aligned}F_{\ell} := f_{1}(\xi_{1})+f_{2}(\xi_{1}, \xi_{2}) +\ldots+f_{\ell}(\xi_{1}, \xi_{2}, \xi_{3}, \ldots, \xi_{\ell}) \end{aligned}

is called martingale, and it has “martingale property” \mathbb{E} \left( F_{k}\, |\, \xi_{1}, \xi_{2}, \ldots, \xi_{m} \right) = F_{m} for all k \geq m. The sequence G=\{G_{\ell}\}_{\ell=1}^{N} where

\begin{aligned} G_{\ell} :=\varepsilon_{1} f_{1}(\xi_{1})+f_{2}(\xi_{1}, \xi_{2}) +\ldots+\varepsilon_{\ell}f_{\ell}(\xi_{1}, \xi_{2}, \xi_{3}, \ldots, \xi_{\ell}) \end{aligned}

is called “martingale transform” of F. The inequality in the proposition is referred to as “Boundedness of the martingale transform” and usually is written as \| G\|_{p}\leq C_{p} \|F\|_{p} where G is the martingale transform of F.

Proof of Proposition 1.

Let N_{1}, \ldots, N_{n} be arbitrary nonzero integers (will be determined later) such that \mathrm{sign}(N_{\ell})=\varepsilon_{\ell} for all \ell = 1, \ldots, n. If a,b are nonzero integers then of course

\begin{aligned} \mathrm{sign}(ab) = \mathrm{sign(a)} \mathrm{sign}(b). \end{aligned}

Set N = (N_{1}, \ldots, N_{n}), and let w \in \mathbb{T} (think of it as a dummy variable). Then

\begin{aligned}  \varepsilon_{\ell}f_{\ell}(\xi_{1}w^{N_{1}}, \ldots, \xi_{\ell}w^{N_{\ell}})= \sum_{k \in \mathbb{Z}^{\ell}, \, k_{\ell}\neq 0} \mathrm{sign}(N_{\ell})c_{k} \xi^{k} w^{N\cdot k} = \sum_{k \in \mathbb{Z}^{\ell}, \, k_{\ell}\neq 0} \mathrm{sign}(k_{\ell}) \mathrm{sign}(N_{\ell} k_{\ell}) c_{k} \xi^{k} w^{N\cdot k} \end{aligned}

Now, if |N_{1}|, \ldots, |N_{n}| is increasing sufficiently fast then we can replace \mathrm{sign}(N_{\ell} k_{\ell}) by \mathrm{sign}(N\cdot k) due to the fact that k_{\ell}\neq 0 and the vectors k are from a fixed bounded set in \mathbb{Z}^{d}. Thus we can replace the full sum by its Hilbert transform with respect to w variable, i.e.,

\begin{aligned} \varepsilon_{\ell}f_{\ell}(\xi_{1}w^{N_{1}}, \ldots, \xi_{\ell}w^{N_{\ell}}) =i \mathcal{H}_{w}\left( \sum_{k \in \mathbb{Z}^{\ell}, \, k_{\ell}\neq 0} \mathrm{sign}(k_{\ell}) c_{k} \xi^{k} w^{N\cdot k}\right) \end{aligned}

Another observation is that an addition of “dummy” variable w does not change L^{p} norms (we can add dummy integral \int_{\mathbb{T}}dm(w) if necessary). We obtain

\begin{aligned} \| \sum_{\ell=1}^{n} \varepsilon_{\ell} f_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) \|_{L^{p}(\mathbb{T}^{n})} = \| \sum_{\ell=1}^{n} \mathrm{sign}(N_{\ell}) f_{\ell}(\xi_{1}w^{N_{1}}, \ldots, \xi_{\ell}w^{N_{\ell}}) \|_{L^{p}(\mathbb{T}^{n}\times \mathbb{T})}=\end{aligned}
\begin{aligned} \|\mathcal{H}_{w} \left( \sum_{\ell=1}^{n} \tilde{f}_{\ell}(\xi_{1}w^{N_{1}}, \ldots, \xi_{\ell}w^{N_{\ell}}) \right)\|_{L^{p}(\mathbb{T}^{n}\times \mathbb{T})}\leq C_{p}\|\sum_{\ell=1}^{n} \tilde{f}_{\ell}(\xi_{1}w^{N_{1}}, \ldots, \xi_{\ell}w^{N_{\ell}}) \|_{L^{p}(\mathbb{T}^{n}\times \mathbb{T})}= \end{aligned}
\begin{aligned} C_{p}\|\sum_{\ell=1}^{n} \tilde{f}_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) \|_{L^{p}(\mathbb{T}^{n})} \end{aligned}

where \tilde{f}_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) = \sum_{k \in \mathbb{Z}^{\ell}, \, k_{\ell}\neq 0} \mathrm{sign}(k_{\ell}) c_{k} \xi^{k}. We can continue and play the same game again now replacing (\xi_{1}, \ldots, \xi_{\ell}) by (\xi_{1}w^{M_{1}}, \ldots, \xi_{\ell}w^{M_{\ell}}) with sufficiently fast increasing M_{1}, \ldots, M_{n} and positive(!) integers so that \mathrm{sign}(M\cdot k)=\mathrm{sign}(M_{\ell}k_{\ell})=\mathrm{sign}(k_{\ell}), and hence we can replace \tilde{f}_{\ell} by f_{\ell} using the identity \mathrm{sign}(k_{\ell})\mathrm{sign}(k_{\ell})=1. This leads to the final upper bound C^{2}_{p}\|\sum_{\ell=1}^{n} f_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) \|_{L^{p}(\mathbb{T}^{n})}.

Remark 2: In fact the reverse implication also holds true (due to Burkholder) i.e, boundedness of martingale transforms implies boundedness of Hilbert transform (and actually boundedness of almost any Singular Integral Operator of Calderon–Zygmund type). This side is more delicate and uses the fact that Hilbert transform can be bounded in terms of averages of dilates and translates of so called dyadic shifts which eventually can be easily bounded in L^{p} using only martingale transforms and their boundedness (also applied twice!).

The equivalence of these two inequalities can be proved in an arbitrary Banach space X, i.e., in this case one considers finite sums f =\sum_{k} c_{k} \xi^{k} with c_{k} \in X (instead of c_{k} \in\mathbb{C}), and L^{p} norms are calculated as \|f\|_{p} := (\mathbb{E} \|f\|_{X}^{p})^{1/p}, where \|\cdot \|_{X} is the norm in the Banach space. As soon as one of the estimates holds in the Banach space X (for some 1<p<\infty) it is called UMD space (here UMD = Unconditional Martingale Difference). In other words harmonic analysis can be developed on UMD spaces with “almost the same results” as for X=\mathbb{C}.

Calderon-Zygmund theory

Hilbert transform on the unit circle has Fourier multipliers -i\, \mathrm{sign}(k). Due to transference theorems that we obtained in our previous lecture notes, we see that the boundedness of the Hilbert transform on the unit circle (implies) and follows from the boundedness of the operator

\begin{aligned} T_{m}f = \mathcal{F}^{-1}(m(\cdot) \mathcal{F}(f)(\cdot)), \quad f \in \mathcal{S}(\mathbb{R}), \end{aligned}

from L^{p}(\mathbb{R})\to L^{p}(\mathbb{R}) where m(\xi) = -i\, \mathrm{sign}(\xi). One can show that for m(\xi) = -i\, \mathrm{sign}(x) we have

\begin{aligned} \mathcal{F}^{-1}(m(\cdot) \mathcal{F}(f)(\cdot))(x) = p.v. \frac{1}{\pi} \int_{\mathbb{R}} \frac{f(y)}{x-y}dy , \quad f \in S(\mathbb{R}). \end{aligned}

Hint: formally the right hand side is p.v. f *\frac{1}{\pi x}, so taking the Fourier transform, using the formal identity “\mathcal{F}(\frac{1}{\pi x})(\xi) =-i \mathrm{sign}(\xi)”, and then taking \mathcal{F}^{-1} of both sides one informally gets the identity. Now there are several issues, first of all x \mapsto \frac{1}{\pi x} is not in L^{1} and \mathcal{F}(fg)=\mathcal{F}(f) \mathcal{F}(g) was proved for f,g \in L^{1}. To fix the issue one modifies the kernel (\pi x)^{-1} by replacing it with
k_{\varepsilon, R}(x) = (\pi x)^{-1}1_{\varepsilon<|x|<R}, one considers k_{\varepsilon, R} = \mathcal{F}^{-1}(\mathcal{F}(f*k_{\varepsilon, R}))=\mathcal{F}^{-1}(\mathcal{F}(k_{\varepsilon, R}) \mathcal{F}(f))
and after invoking dominated convergence theorems one passes to the limit \varepsilon \to 0, R \to \infty.

Exercise 1: For f \in \mathcal{S}(\mathbb{R}) denote H(f)(x) =p.v. \frac{1}{\pi}\int_{\mathbb{R}}\frac{f(y)}{x-y}dy. Show that \lim_{|x| \to \infty} xH(f) = \frac{1}{\pi} \int_{\mathbb{R}}f. In particular Hf is not in L^{1}(\mathbb{R}), and Hf is in L^{2}.

Exercise 2: show that \mathcal{F} (Hf)(\xi) = -i \mathrm{sign}(\xi) \mathcal{F}(f)(\xi) for all f \in \mathcal{S}(\mathbb{R}) (here \mathcal{F} (Hf) should be understood via Fourier–Plancherel transform sense because Hf \in L^{2} (it is not in L^{1}). In particular, conclude that \|H(f)\|_{2} = \|f\|_{2} for all f \in \mathcal{S}(\mathbb{R}). Thus H extends to \tilde{H}, i.e., to a continuous linear operator on L^{2}. Conclude that \tilde{H}^{2}f=-f for all f \in L^{2} i.e., H is unitary operator on L^{2}.

Next, let us ignore the constants \pi, \frac{1}{2\pi}, i, and let us recall that boundedness of the Hilbert transform on the unit circle amounts to show that

\begin{aligned} T(f)(\theta):= p.v. \int_{-\pi}^{\pi} f(x) \mathrm{ctg}\left(\frac{\theta -x}{2}\right)dx \end{aligned}

is bounded from L^{p}(\mathbb{T}) to L^{p}(\mathbb{T}). On the real line the boundedness of the Hilbert transform amounts to show

\begin{aligned} T(f)(x) := p.v. \int_{\mathbb{R}} f(y)\frac{1}{x-y}dy \end{aligned}

is bounded from L^{p}(\mathbb{R}) to L^{p}(\mathbb{R}). We have proved that both of them are bounded for 1<p<\infty.

We would be happy to have a general theorem which describes the kernels K(x,y) such that the operator Tf(x)\mapsto \int_{\mathbb{R}^{d}} K(x,y)f(y)dy is bounded from L^{p} \to L^{p}. There are several issues, in general K(x,y) : \mathbb{R}^{d}\times \mathbb{R}^{d} \to \mathbb{C} is not defined for x=y (recall the example K(x,y)=\frac{1}{x-y}), also \int_{\mathbb{R}^{d}} K(x,y)f(y)dy may not make sense (we had to use p.v. in case of the Hilbert transform). The general theorem that we prove is called Calderon–Zygmund theorem. We will see that the family of Kernel’s that it covers essentially look like \frac{1}{x-y}.

Theorem 2. Let T be a linear operator bounded from L^{2}(\mathbb{R}^{d}) to L^{2}(\mathbb{R}^{d}), and such that for any compactly supported f \in L^{2}(\mathbb{R}^{d}) we have

\begin{aligned} T(f)(x) = \int_{\mathbb{R}^{d}}K(x,y)f(y)dy \end{aligned}

for all x \notin \mathrm{supp}(f), where K : \mathbb{R}^{d} \times \mathbb{R}^{d} \to \mathbb{C} is defined for x\neq y and satisfies the following conditions

\begin{aligned} \, (i) \quad \quad |K(x,y)| < \frac{C}{|x-y|^{d}}; \end{aligned}
\begin{aligned} \, (ii) \quad \int_{\mathbb{R}^{d}\setminus 2Q}\left|K(x,y)-K(x,y')\right|dx<C_{d} \quad \text{for all}\quad y,y' \in Q\, \, \text{and all cubes}\, \, Q \subset \mathbb{R}^{d} \end{aligned}

and the same condition (ii) holds for K replaced by \tilde{K}(x,y)=K(y,x). Then for all 1<p<\infty the inequality \| Tf\|_{p} \leq C_{p,d} \|f\|_{p} holds on a dense class of functions in L^{p}.

In particular the theorem says that T extends to a bounded operator from L^{p}(\mathbb{R}^{d}) to L^{p}(\mathbb{R}^{d}).

Before we start proving the theorem let us a little bit talk about the assumptions in the theorem which essentially come from its proof.

1) The validity of the integral formula Tf(x) = \int K(x,y) f(y)dy only for x \notin \mathrm{supp}(f) is technical, and it is needed only to avoid the issues related to principal value. We say that for x \notin \mathrm{supp}(f) we have this integral formula, and let it be something for x \in \mathrm{supp}(f) but we really don’t need it (we are actually putting under a carpet issues about principal value). It is a fact(!) that if there are two operators T_{1} and T_{2} with the same kernel defined as in the theorem then T_{1}f - T_{2} f = b(x) f(x) for some bounded b, i.e., the kernels K do not determine T uniquely.

2) We assume(!) that T is bounded from L^{2} to L^{2}, i.e., \|Tf\|_{2}\leq C \|f\|_{2}. This seems to a be a pretty strong assumption (we are assuming something that we really would like to prove). In practice to figure out when a given operator is bounded in L^{2} usually is a simple task and almost follows from the Plancherel’s theorem (as we did with Hilbert transform, and similarly can be done for any convolution type operator as long as the Fourier transform of its kernel is in L^{\infty}). However, there are other operators when it is really a difficult task to prove boundedness even in L^{2} (in this case there are other arguments: a) TT^{*} argument; b) Cotlar–Stein lemma, see for instance the proof of Calderón-Vallaincourt theorem, i.e., Theorem 4 in Tao’s notes).

3) The assumption (i) is only needed to see why the integral \int K(x,y) f(y)dy converges as soon as x\notin \mathrm{supp}(f) and f \in L^{2} is compactly supported. Well, simply because

\begin{aligned} \left| \int_{\mathbb{R}^{d}} K(x,y) f(y)dy \right| \leq \int_{\mathrm{supp}(f)} |K(x,y)| |f(y)|dy \leq \left(\int_{\mathrm{supp}(f)} |K(x,y)|^{2}dy\right)^{1/2} \| f\|_{L^{2}(\mathbb{R}^{d})} \end{aligned}

Of course, one can impose much weaker assumptions than (i), say for example \int_{E} |K(x,y)|^{2}dy<\infty for any compact E as long as x \notin E. For now we prefer not to be too demanding.

4) Condition (ii) seems a little bit artificial, it is, in fact, the most important condition which lies in the heart of the proof of weak type (1,1) estimate

\begin{aligned} |\{ |Tf| > \lambda \}| \leq C \frac{\|f\|_{1}}{\lambda} \quad \text{for all} \quad \lambda >0, \end{aligned}

a substitute of the desired \| Tf\|_{1}<C \|f\|_{1} which, unfortunately, may not hold in general. Let us remark that the constant C_{d} in (ii) is independent of the choice of the cube Q. Here, 2Q means we enlarge the cube twice by keeping in the same center. For example if Q=[-1,1] then 2Q=[-2,2]; and if Q=[0,1] then 2Q = [-\frac{1}{2}, \frac{3}{2}]. The constant 2 in the integral \int_{\mathbb{R}^{d}\setminus 2Q} is not important. One can replace it by any number \alpha>1 (but it should be one and the same(!) for all cubes Q), one can also replace cubes Q by balls B (or by some other family of sets which is obtained by translates, shifts, and dilates of a fixed convex body).

A simpler condition which implies (ii) is

\begin{aligned} (ii)' \qquad |\nabla_{y} K(x,y)| \leq \frac{C'}{|x-y|^{d+1}} \quad \text{for all}\quad x\neq y. \end{aligned}

Indeed, notice that |x-y| \geq c' \ell(Q) whenever x \notin 2Q and y \in Q for some universal constant c'>0 (for example, c'=\frac{1}{2} works fine), where \ell(Q) denotes the side length of the cube. Therefore

\begin{aligned} |K(x,y)-K(x,y')| \leq |y-y'| |\nabla_{y}K(x,\xi)| = C''_{d} \frac{|y-y'|}{|x-\xi|^{d+1} } \leq \frac{C'''_{d}\, \ell(Q)}{|x-\mathrm{center}(Q)|^{d+1}} \end{aligned}

for some \xi belonging to the segment [y,y']. Thus we obtain

\begin{aligned} \int_{\mathbb{R}^{d} \setminus{2Q}} |K(x,y)-K(x,y')| dx \leq C'''_{d}\, \ell(Q) \int_{\mathbb{R}^{d} \setminus 2Q}\frac{dx}{|x-\mathrm{center}(Q)|^{d+1}}. \end{aligned}

And the last integral we can estimate as

\begin{aligned} \ell(Q)\int_{\mathbb{R}^{d} \setminus 2Q}\frac{dx}{|x-\mathrm{center}(Q)|^{d+1}} = \ell(Q)\sum_{k=1}^{\infty}\int_{2^{k+1}Q \setminus 2^{k}Q}\frac{dx}{|x-\mathrm{center}(Q)|^{d+1}} \leq  \ell(Q)\sum_{k=1}^{\infty}\, \frac{\ell(Q)^{d}\, [2^{d(k+1)}-2^{kd}]}{(c 2^{k+1}\ell(Q))^{d+1}} < C_{d}. \end{aligned}

Another type of kernels K which imply (ii) (with some fixed c>1 instead of 2) but do not have a luxury to satisfy (ii)’ are the ones for which

\begin{aligned} (ii)'' \qquad |K(x,y)-K(x,y')| \leq C_{d}\, \frac{|y-y'|^{\beta}}{|x-y|^{d+\beta}} \quad \text{for all} \quad x \notin 2Q, \; y,y' \in Q \end{aligned}

for all cubes Q \subset \mathbb{R}^{d} and some fixed \beta >0. The proof is just repetition of what we did in the case (ii)’ and is left to the reader as an exercise. There seems to be an asymetry in the right hand side of (ii)” but notice that we can replace |x-y|^{-d-\beta} by |x-y'|^{-d-\beta} nothing really changes except the constants. Another way of more compactly writing (ii)” is to use the balls instead of cubes, for example the condition

\begin{aligned} |K(x,y)-K(x,y')| \leq C_{d}\, \frac{|y-y'|^{\beta}}{|x-y|^{d+\beta}} \quad \text{whenever}\quad \, |y-y'|\leq \frac{1}{2}|y-x| \end{aligned}

does imply (ii) with \int_{\mathbb{R}^{d}\setminus c\, Q} for some c>1 instead of 2 which is still enough for the conclusion of the theorem

5)and the same condition (ii) holds for K replaced by \tilde{K}(x,y)=K(y,x)” – let us explain the reason of such a requirement. Without this requirement we can only get the estimate
\| Tf\|_{p} \leq C_{p} \|f\|_{p} for 1<p<2. The standard way to explain what happens for p>2 is to invoke duality arguments. Let us present an argument which has an issue but at least clearly explains what is going on. To prove \| Tf\|_{q} \leq C_{q} \|f\|_{q} for q>2 by Holder’s inequality it suffices to show

\begin{aligned} \left| \int_{\mathbb{R}^{d}}\int_{\mathbb{R}^{d}}K(x,y)f(y) \overline{g(x)} dxdy \right| \leq \| f\|_{q} \|g\|_{p} \quad (1)\end{aligned}

where \frac{1}{p}+\frac{1}{q}=1. Now taking supremum over all f with \|f\|_{q}=1 it suffices to show that \| \int_{\mathbb{R}^{d}} \overline{K(x,y)}g(x)dx \|_{p} \leq C_{q} \|g\|_{p}. But now the kernel \tilde{K}(x,y)=\overline{K(y,x)} is again of the type described in the theorem, and thereby the last inequality holds for 1<p\leq 2. In particular, T extends to a bounded operator L^{p} \to L^{p} for all 1<p<\infty.

Of course in this reasoning the issue is that (1) holds true only when f and g have disjoint compact supports. To fix the issue first one proves that the adjoint operator T^{*} is bounded on L^{2} and T^{*}f(x) = \int_{\mathbb{R}^{d}} \overline{K(y,x)}f(y)dy whenever x\notin \mathrm{supp}(f). But now the kernel \tilde{K}(x,y)=\overline{K(y,x)} is again of the type described in the theorem which means that T^{*} extends to a bounded operator on L^{p} for 1<p<2. By an abstract duality theorem T extends to a bounded operator L^{q} \to L^{q} if and only if T^{*} extends to a bounded operator L^{p} \to L^{p} (here \frac{1}{p}+\frac{1}{q}=1). In particular, one obtains that T extends to a bounded operator L^{p} \to L^{p} for all 1<p<\infty.

After discussing all aspects of the assumption in the theorem let us fix the notion of Calderon–Zygmund Operator.

Definition. The linear operators described in Theorem 2 are called Calderon–Zygmund Operators (CZO)

Let us make the last comment. Sometimes one studies weighted estimates, namely, one wants to understand under what conditions on a nonnegative function (later called weight) w \geq 0 one has the inequality

\begin{aligned} \| Tf\|^{p}_{L^{p}(w)} = \int_{\mathbb{R}^{d}} |Tf|^{p} w(x)dx \leq C_{p} \int_{\mathbb{R}^{d}} |f|^{p} w(x)dx = C_{p} \| f\|^{p}_{L^{p}(w)}   \quad \end{aligned}

for all 1<p<\infty, where T is CZO. The best result in this direction is the inequality

\begin{aligned} \|Tf\|_{L^{p}(w)} \leq C_{p,T} [w]_{A_{p}}^{\max\{1, \frac{1}{p-1}\}} \|f\|{L^{p}(w)} \quad \text{for all} \quad 1<p<\infty  \quad (2)\end{aligned}

where A_{p}characteristic of w, namely, [w]_{A_{p}} is defined as

\begin{aligned}  \sup_{Q} \left( \frac{1}{|Q|}\int_{Q} w\right) \left(\frac{1}{|Q|}\int_{Q} w^{-\frac{1}{p-1}}\right)^{p-1} = [w]_{A_{p}} \end{aligned}

and the weights w for which [w]_{A_{p}}<\infty are called Muckenhoupt weights and this class is denoted as A_{p}. Here supremum is taken over all cubes Q \subset \mathbb{R}^{d}, and T is almost any Calderon–Zygmund operator where additionally the kernel satisfies slightly stronger assumption, namely (ii)” instead of (ii). The estimate is sharp in the sense that one cannot replace the power \max\{1, \frac{1}{p-1}\} by a smaller one otherwise for the Hilbert transform there exists a weight which would violate the inequality. The estimate (2) may seem a little bit involved, however, there are certain extrapolation arguments which say that “instead of proving (2) for all w \in A_{p} and all 1<p<\infty it is enough to prove the estimate for all w \in A_{2} and p=2, i.e., \|Tf\|_{L^{2}(w)} \leq C [w]_{A_{2}} \|f\|_{L^{2}(w)}”.
Another remark is that one can first try to prove such L^{2} weighted inequality for the “martingale transforms” and then (due to equivalence of Bourgain–Burkholder) most likely the same sharp inequality should hold for CZO. Perhaps this was the starting point and “Bellman function technique” turned out to be successful, however, later the arguments were simplified considerably.

It is a good time to start proving Theorem 2.

Proof of Theorem 2.

There will be many constants appearing in the proof. We decided to denote all of them by C. Important thing is that they will be independent of functions f, cubes Q, and a certain parameter \lambda>0 that will be constantly present in the proof. The functions f will be always compactly supported bounded unless we say otherwise.

The proof consists of several lemmas.

Lemma 1. Let T be CZO. Then the following weak type estimates hold true

\begin{aligned} | \{x \in \mathbb{R}^{d}\, :\, |Tf(x)|>\lambda\}| \leq C \frac{\|f\|_{1}}{\lambda} \quad \text{for all} \quad \lambda>0, \quad (3) \end{aligned}
\begin{aligned} | \{x \in \mathbb{R}^{d}\, :\, |Tf(x)| > \lambda\}| \leq C\frac{\|f\|_{2}^{2}}{\lambda^{2}} \quad \text{for all} \quad \lambda>0.   \quad(4) \end{aligned}

Before we prove the lemma let us see why this implies the inequality \| Tf \|_{p} \leq C_{p} \|f\|_{p} for 1<p<2 (and then for 2<p<\infty by duality as we discussed).

Indeed, we start from a useful identity which relates \|Tf\|_{p} to the function \lambda \mapsto |\{ |Tf|>\lambda\}|, namely,

\begin{aligned} \int |Tf(x)|^{p} dx = p \int_{0}^{\infty} \lambda^{p-1} |\{x \in \mathbb{R}^{d}\, :\, |Tf(x)|>\lambda\}| d\lambda \end{aligned}

To prove the identity notice that

\begin{aligned} \int_{\mathbb{R}^{d}} |Tf(x)|^{p}dx = \int_{\mathbb{R}^{d}} \int_{0}^{\infty} 1_{[0, |Tf(x)|]}(\lambda) d\lambda dx=\int_{0}^{\infty} \int_{\mathbb{R}^{d}} 1_{[0, |Tf(x)|]} (s) dx ds= \end{aligned}

\begin{aligned} \int_{0}^{\infty}|\{x \in \mathbb{R}^{d}\, : \, |Tf(x)|^{p}>s\}| ds \stackrel{(s^{1/p}=\lambda)}{=} p\int_{0}^{\infty}\lambda^{p-1}|\{x \in \mathbb{R}^{d}\, : \, |Tf(x)|>\lambda\}| d\lambda. \end{aligned}

Next, instead of |\{x \in \mathbb{R}^{d}\, : \, |Tf(x)|>\lambda\}| we will be writing shortly |\{|Tf|>\lambda\}|. Any f we can split

\begin{aligned} f = f 1_{\{|f|\leq \lambda\}} + f 1_{\{|f| > \lambda\}} = f^{\lambda}_{1} +f^{\lambda}_{2} \end{aligned}

By linearity |Tf| \leq |Tf_{1}|+|Tf_{2}|, and hence \{ |Tf|>\lambda \} \subset \{ |Tf^{\lambda}_{1}|>\frac{\lambda}{2}\} \cup \{ |Tf^{\lambda}_{2}|>\frac{\lambda}{2}\}.


\begin{aligned} |\{ |Tf|>\lambda \}| \leq | \{ |Tf^{\lambda}_{1}|>\frac{\lambda}{2}\}| + |\{ |Tf^{\lambda}_{2}|>\frac{\lambda}{2}\}|. \end{aligned}


\begin{aligned} \int |Tf|^{p} \leq p \int_{0}^{\infty}\lambda^{p-1}| \{ |Tf^{\lambda}_{1}|>\frac{\lambda}{2}\}| d\lambda + p \int_{0}^{\infty}\lambda^{p-1}| \{ |Tf^{\lambda}_{2}|>\frac{\lambda}{2}\}| d\lambda=p(I_{1}+I_{2}) \end{aligned}

To estimate I_{1} we use (4). Indeed, we get

\begin{aligned}  I_{1} =\int_{0}^{\infty}\lambda^{p-1}| \{ |Tf^{\lambda}_{1}|>\frac{\lambda}{2}\}| d\lambda \leq 4C \int_{0}^{\infty}\lambda^{p-3}\int_{\mathbb{R}^{d}} |f|^{2} 1_{|f|\leq \lambda }dx d\lambda \end{aligned}

\begin{aligned} \stackrel{Tonelli}{=} 4C\int_{\mathbb{R}^{d}} |f|^{2} \int_{|f|}^{\infty} \lambda^{p-3} d\lambda dx = \frac{4C}{p-2} \int |f|^{p}. \end{aligned}

To estimate I_{2} we use (3).

\begin{aligned} I_{2}=\int_{0}^{\infty}\lambda^{p-1}| \{ |Tf^{\lambda}_{2}|>\frac{\lambda}{2}\}| d\lambda \leq 2C \int_{0}^{\infty} \lambda^{p-2} \int _{\mathbb{R}^{d}} |f| 1_{|f|>\lambda} dx d\lambda \end{aligned}

\begin{aligned} \stackrel{Tonelli}{=} 2C \int_{\mathbb{R}^{d}}|f| \int_{0}^{|f|} \lambda^{p-2}d\lambda dx = \frac{2C}{p-1} \int |f|^{p} \end{aligned}


\begin{aligned} \int |Tf|^{p} dx \leq \left(\frac{4Cp}{p-2} + \frac{2Cp}{p-1} \right) \int |f|^{p} \leq C_{p} \int |f|^{p}. \end{aligned}

Remark 3: The argument is called Marcinkiewicz interpolation theorem. The estimate |\{ |Tf|>\lambda\}| < C \frac{\|f\|^{p}_{p}}{\lambda^{p}} follows from \|Tf\|_{p} \leq C \|f\|_{p} (it is called weak type (p,p) inequality). Indeed, it follows from a simple chain of inequalities

\begin{aligned} C \int |f|^{p} \geq \int |Tf|^{p} \geq \int_{|Tf| > \lambda } |Tf|^{p}\geq \lambda^{p} |\{ |Tf|>\lambda \}| \end{aligned}

(this is also called Chebyshev’s inequality).

All it remains is to prove Lemma 1. Inequality (4) is trivial and it follows from \|Tf\|_{2} \leq C \|f\|_{2} (we just explained it in the remark above). It is the estimate (3), i.e., weak type (1,1) inequality for CZO T, namely,

\begin{aligned} | \{ |Tf|>\lambda \}| \leq C \frac{\| f\|_{1}}{\lambda} \quad \text{for any} \quad \lambda>0 \end{aligned}

that is nontrivial and requires the proof.

Proof of Lemma 1.

After learning about the crazy idea of Marcinkiewicz, i.e., for each \lambda>0 to decompose f = g+h, where g and h depend(!) on \lambda, and to use the trivial estimate

\begin{aligned} |\{ |Tf|>\lambda \}| \leq |\{ |Tg|>\lambda/2 \}| + |\{ |Th|>\lambda/2 \}| = I_{1}+I_{2}, \quad (5)\end{aligned}

it is tempting to try the same approach. All we know about T is

\begin{aligned} (a) \quad \|Tb\|_{2} \leq \|b\|_{2} \quad \text{for any} \quad b \in L^{2}; \end{aligned}

\begin{aligned} (b) \quad \int_{\mathbb{R}^{d}\setminus 2Q}|K(x,y)-K(x,y')|dx\leq C \quad \text{for all} \quad y, y' \in Q. \end{aligned}

By property (a), we can estimate one of the terms in (5), say I_{1}, as

\begin{aligned} I_{1} = |\{ |Tg|>\lambda/2 \}| \leq 4 \frac{\int_{\mathbb{R}^{d}} |Tg|^{2}}{\lambda^{2}} \leq C \frac{\int_{\mathbb{R}^{d}}|g|^{2}}{\lambda^{2}} \end{aligned}

So if we succeed to choose the decomposition f=g+h so that

\begin{aligned} \int_{\mathbb{R}^{d}} |g|^{2} \leq C \lambda \|f\|_{1} \end{aligned}

then this would give us the right estimate I_{1} \leq C \frac{\|f\|_{1}}{\lambda} (it is not really difficult to find such a decomposition f=g+h, we can just choose g = f 1_{\{ |f|\leq \lambda\}}, then even pointwise |g(x)| \leq \sqrt{\lambda} \sqrt{|f(x)|}).

For the second term I_{2} it seems that the best we can do is to use again Chebyshev inequality but now with L^{1} norm, namely,

\begin{aligned}  I_{2} = |\{ |Th|>\lambda/2 \}| \leq 2 \frac{\int_{\mathbb{R}^{d}} |Th| }{\lambda}. \quad (6) \end{aligned}

At this moment it is not clear at all how to use part (b). After looking at (b) again we can make a nice observation: if h is supported in a cube Q with zero average \int_{Q} h =0 then for x outside of Q we have

\begin{aligned} |Th(x)| = \left|\int_{\mathbb{R}^{d}} K(x,y)h(y)dy\right| = \left| \int_{\mathbb{R}^{d}} (K(x,y)-K(x,y'))h(y) dy \right| \leq \end{aligned}
\begin{aligned} \int_{\mathbb{R}^{d}} |K(x,y)-K(x,y')| |h(y)| dy, \end{aligned}

and then immediately from (a) we obtain

\begin{aligned} \int_{\mathbb{R}^{d} \setminus 2Q} |Th(x)| dx \stackrel{Tonelli}{\leq} \int_{Q} |h(y)| \int_{\mathbb{R}^{d}\setminus 2Q} |K(x,y)-K(x,y')|dx dy \leq C \|h\|_{1}. \end{aligned}

Thus, if h is supported in Q, \int_{Q} h=0, and \|h\|_{1} \leq C \| f\|_{1} then this almost gives the right bound for the right hand side in (6) except we still have to deal with the remaining part \frac{\int_{2Q} |Th|}{\lambda}.

Here is another trick: if |2Q| is small enough, this gives a hope that \int_{2Q} |Th| is also small enough, but the issue is |Th| can be arbitrarily large on 2Q, i.e., we do not have “any good control” on K(x,y) when x,y \in Q. How to ignore |Th|? A simple remedy is to adapt the original estimate of I_{2} as

\begin{aligned} I_{2} = |\{ |Th| > \lambda/2\}| = |\{ 1_{2Q} |Th| > \lambda/2\}| + |\{ 1_{\mathbb{R}^{d}\setminus 2Q} |Th| > \lambda/2\}| \leq    |2Q| + 2 \frac{\int_{\mathbb{R}^{d} \setminus 2Q} |Th|}{\lambda} = I'_{2} + I''_{2}.  \end{aligned}

Perfect! So if |Q| \leq C \frac{\|f\|_{1}}{\lambda} then the proof is complete as long as we find the desired decomposition f = g+h with \|g\|_{2}^{2} \leq \lambda \|f\|_{1}, \mathrm{supp}(h) \subset Q, \int_{Q} h =0, \|h\|_{1} \leq C \|f\|_{1} and |Q|\leq C \frac{\|f\|_{1}}{\lambda}.

It looks like too many requirements! The trivial Marcinkiewicz choice g = f 1_{\{|f|\leq \lambda \}} determines h automatically, and it is this “bad” h already failing any of its desired properties. It is getting hopeless… . If we could tweak a little bit to get more control on h … but how?

“There is no failure except in no longer trying”.

Elbert Hubbard

Here comes another trick: if it happens that h is supported on a disjoint collection of cubes \mathcal{W} (instead of only one cube), i.e., \mathrm{supp}(h) \subset \cup_{Q \in \mathcal{W}} Q, with essentially the same properties

\begin{aligned} (A) \quad \int_{Q} h =0 \quad \text{for any} \quad Q \in \mathcal{W}\end{aligned}
\begin{aligned} (B) \quad \|h\|_{1} \leq C \|f\|_{1}\end{aligned}
\begin{aligned} (C) \quad \sum_{Q \in \mathcal{W}} |Q| \leq C \frac{\|f\|_{1}}{\lambda} \end{aligned}

then by repeating the same steps as before, and using the identity

\begin{aligned} h = \sum_{S \in\mathcal{W}} h 1_{S}, \end{aligned}

we obtain

\begin{aligned} I_{2} = |\{ |Th| > \lambda/2\}| =  |\{ 1_{\cup_{Q \in \mathcal{W}} 2Q} |Th| > \lambda/2\}| +  |\{ 1_{\mathbb{R}^{d}\setminus \cup_{Q \in \mathcal{W}} 2Q} |Th| > \lambda/2\}| \leq \end{aligned}

\begin{aligned} \left|\cup_{Q \in \mathcal{W}} 2Q \right| + 2 \frac{\int_{\mathbb{R}^{d} \setminus \cup_{Q \in \mathcal{W}}2Q} |Th|}{\lambda} \leq \sum_{Q \in \mathcal{W}} |2Q| + \frac{2}{\lambda} \sum_{S \in \mathcal{W}} \int_{\mathbb{R}^{d} \setminus \cup_{Q \in \mathcal{W}}2Q} |T(h 1_{S})|\leq \end{aligned}

\begin{aligned} C \frac{\|f\|_{1}}{\lambda} + \frac{2}{\lambda} \sum_{S \in \mathcal{W}} \int_{\mathbb{R}^{d} \setminus 2S} |T(h 1_{S})| \leq C \frac{\|f\|_{1}}{\lambda} + \frac{C}{\lambda} \sum_{S \in \mathcal{W}} \| h 1_{S}\|_{1} \leq C \frac{\|f\|_{1}+\|h\|_{1}}{\lambda} \leq C\frac{\|f\|_{1}}{\lambda} \end{aligned}

Thus for all \lambda >0 it remains to find a decomposition f=g+h with \|g\|_{2}^{2}\leq \lambda \|f\|_{1}, \mathrm{supp}(h) \subset \cup_{Q\in \mathcal{W}}Q, and h satisfies (A), (B), and (C). This is definitely a smart decomposition (called Calderon–Zygmund decomposition) comparing to trivial Marcinkiewicz decomposition)

Lemma 2 (Calderon–Zygmund decomposition) For any f \in L^{1}(\mathbb{R}^{d}) and any \lambda>0 there exists a decomposition f = g+ \sum_{Q \in \mathcal{W}} b_{Q} for some disjoin collection \mathcal{W} of cubes such that each b_{Q} is supported in Q and

\begin{aligned} \| g\|_{1} \leq \|f\|_{1}, \quad \|g\|_{\infty} \leq 2^{d} \lambda, \quad \int_{Q}|b_{Q}| \leq 2^{d+1}\lambda |Q|, \quad \int_{Q} b_{Q}=0, \end{aligned}

and most importantly \sum_{Q \in \mathcal{W}} |Q| \leq \frac{\|f\|_{1}}{\lambda}.

To complete the proof of weak type (1,1) inequality for T, the desired decomposition follows by taking h = \sum_{Q \in \mathcal{W}} b_{Q}, and g=g. Clearly

\begin{aligned} \int_{\mathbb{R}^{d}} |g|^{2} \leq \|g\|_{\infty} \int_{\mathbb{R}^{d}} |g| \leq 2^{d} \lambda \|f\|_{1},\end{aligned}

\begin{aligned} \|h\|_{1} \leq \sum_{Q \in \mathcal{W}} \int_{Q}|b_{Q}| \leq 2^{d+1}\lambda \sum_{Q \in \mathcal{W}}|Q| \leq 2^{d+1} \|f\|_{1},\end{aligned}

\begin{aligned}  \mathrm{supp}(h) \subset \cup_{Q \in \mathcal{W}} Q, \quad \text{and} \quad \int_{Q} h=0 \quad \text{for any} \quad Q \in \mathcal{W}. \end{aligned}

The proof of Lemma 2 mimics stopping time arguments which is quite often used in probability. It is related to martingales and closely tied to what is called “dyadic calculus“.

Proof of Lemma 2.

Let us introduce a “dyadic grid”. Take [0,1], and divide it in two halves [0,\frac{1}{2}] and [\frac{1}{2},1], and continue this process. We will obtain what is called “dyadic subdivision” of the interval [0,1] which consists of

\begin{aligned}  \text{first generation} \quad [0,1);\end{aligned}

\begin{aligned} &\text{second generation} \quad \frac{1}{2}[0,1), \, \, \frac{1}{2}+\frac{1}{2}[0,1);\end{aligned}

\begin{aligned}  \text{third generation} \, \quad \frac{1}{4}[0,1), \; \frac{1}{4}+\frac{1}{4}[0,1), \; \frac{2}{4} + \frac{1}{4}[0,1), \; \frac{3}{4} + \frac{1}{4}[0,1);\end{aligned}

\begin{aligned}  \mathrm{etc}. \end{aligned}

Here \frac{1}{2^{k}}[0,1)=[0, \frac{1}{2^{k}}), and x+\mathrm{[interval]} means we shift the interval by x, for example, \frac{1}{2}+\frac{1}{2}[0,1)=[\frac{1}{2},1). Clearly after such dyadic subdivision any interval from a given generation looks like 2^{-k} \ell + 2^{-k}[0,1) for some \ell =0, \ldots, 2^{-k}-1. Let us denote the collection of such dyadic intervals (from all generation) by \mathcal{D}([0,1)). The following property is crucial: for any two dyadic interval A, B \in \mathcal{D}([0,1)), A\neq B, one and only one of the following holds

\begin{aligned}  a) \, A \subset B; \quad b)\, B\subset A; \quad c)\, A\cap B = \emptyset \quad (7) \end{aligned}

Similarly we want to construct the dyadic subdivision for \mathbb{R}^{d}. It will look like this

\begin{aligned} \{2^{k}\ell + 2^{k}[0,1)^{d}\, :\ell \in \mathbb{Z}^{d}, \, k \in \mathbb{Z}\}, \end{aligned}

and after a little bit of thought we can see that this family also satisfies the property (7), let us denote it by \mathcal{D}.

Pick any \lambda >0. Let \mathcal{W} be the collection of maximal dyadic cubes in \mathcal{D} such that

\begin{aligned} \frac{1}{|Q|}\int_{Q} |f| > \lambda.\quad (8) \end{aligned}

Here maximal means that if Q satisfies (8) then there is no other dyadic cube (i.e., “parent”) containing Q and satisfying (8). Since \int_{\mathbb{R}^{d}}|f| <\infty it follows that as soon as the dyadic cubes are sufficiently large they fail to satisfy (8).

Clearly the collection \mathcal{W} is disjoin and

\begin{aligned} \left| \cup_{Q \in \mathcal{W}} Q \right| = \sum_{Q \in \mathcal{W}} |Q| \leq \sum_{Q\in \mathcal{W}} \lambda \int_{Q}|f| \leq \lambda \int_{\mathbb{R}^{d}} |f|. \end{aligned}


\begin{aligned}  b_{Q} = \left( f - \frac{1}{|Q|}\int_{Q} f \right) 1_{Q} \quad \text{for any} \quad Q \in \mathcal{W};\end{aligned}

\begin{aligned}  g = f 1_{\mathbb{R}^{d}\setminus \cup_{Q \in \mathcal{W}}} + \sum_{Q \in \mathcal{W}} \left( \frac{1}{|Q|}\int_{Q}f\right) 1_{Q}. \end{aligned}

Then of course f = g + \sum_{Q \in \mathcal{W}} b_{Q}.

What was the point of being maximal? We can bound \frac{1}{|Q|}\int_{Q}|f| \leq 2^{d} \lambda for any Q \in \mathcal{W}. Indeed, since Q is maximal with property (8), therefore its “parent” \tilde{Q} (i.e., the smallest one in \mathcal{D} containing Q) must fail (8). Since \frac{|\tilde{Q}|}{|Q|}=2^{d} we obtain

\begin{aligned} \frac{1}{|Q|}\int_{Q}|f| \leq 2^{d} \frac{1}{|\tilde{Q}|}\int_{\tilde{Q}}|f| \leq 2^{d} \lambda. \end{aligned}

Thus we obtain \|b_{Q}\|_{1} \leq 2 \int_{Q}|f| \leq 2^{d+1}\lambda for any Q \in \mathcal{W}; \|g\|_{1} \leq \|f\|_{1}; |g(x)| \leq 2^{d} \lambda if x \in \cup_{Q\in \mathcal{W}} Q, and |g(x)| \leq \lambda otherwise. Indeed, if x \notin \cup_{Q\in \mathcal{W}} Q then for any dyadic cube Q \ni x the inequality (8) fails, we can take |Q| \to 0 and use the Lebesgue differentiation theorem to conclude that |f(x)|\leq \lambda for almost every x \in \mathbb{R}^{d}\setminus \cup_{Q \in \mathcal{W}}Q.


Theorem 3 (Mikhlin–Hormander multipliers ). Let m :\mathbb{R}^{d} \to \mathbb{C} be a bounded function with m \in C^{[\frac{d}{2}]+1}(\mathbb{R}^{d} \setminus\{0\}) satisfying \begin{aligned}  |\xi|^{|\alpha|} |D^{\alpha} m(\xi)| \leq C_{\alpha}\end{aligned} for all \alpha = (\alpha_{1}, \ldots, \alpha_{d}) \in \{\mathbb{N}\cup \{0\}\}^{d} with |\alpha| = \alpha_{1}+...+\alpha_{d} \leq [\frac{d}{2}]+1. Then the operator

\begin{aligned} f \mapsto \mathcal{F}^{-1}(m \mathcal{F}(f))\end{aligned}

defined for Schwartz functions extends to a bounded operator L^{p}(\mathbb{R}^{d}) \to L^{p}(\mathbb{R}^{d}) for all 1<p<\infty

Hint: Boundedness in L^{2} is trivial. Next, ideally we want to think that \mathcal{F}^{-1}(m \mathcal{F}(f)) = (\mathcal{F}^{-1}m)*f so that to use Calderon–Zygmund theorem for the kernel K(x,y) =\mathcal{F}^{-1}m(x-y) to verify (i) and (ii). The issue is that \mathcal{F}^{-1}m may not make any sense (think about m(\xi)  =  \mathrm{sign}(\xi)). Therefore, one invokes Littlewood–Paley decomposition: pick a smooth nonnegative \varphi such that \varphi(\xi) =1 for |\xi|\leq 1, and \varphi(\xi)=0 |\xi|\geq 2. Then 1 = \sum_{j \in \mathbb{Z}} \psi_{j}(\xi) where \psi_{j}(\xi) = \varphi(2^{-j}\xi)-\varphi(2^{-j+1}\xi). Then m = \sum_{j \in \mathbb{Z}} m_{j} where m_{j} = m\psi_{j}. If f,g have are bounded and have disjoint compact supports then one can show that

\begin{aligned}\int \mathcal{F}^{-1}(m \mathcal{F}(f)) g = \lim_{N \to \infty}\int (\sum_{j=-N}^{N} \mathcal{F}^{-1}m_{j})*f\, \,  g \end{aligned}

Thus it suffices to obtain uniform bounds (uniform in N) \left| \int \int   K_{N} (x,y) f(y) g(x) \right| \leq C_{p} \|f\|_{p} \|g\|_{p'} where 1/p+ 1/p'=1, and K_{N}(x,y) = \sum_{j=-N}^{N} (\mathcal{F}^{-1}m)(x-y). The last inequality can be achieved by checking that K_{N}(x,y) satisfies assumptions in Calderon–Zygmund theorem (with constants uniform in N). It is easier to check (just doing very rough integration by parts) the Calderon–Zygmund assumptions under the extra condition that |\xi|^{|\alpha|}|D^{\alpha}m(\xi)|<C_{\alpha} for all \alpha (not just |\alpha|\leq [d/2]+1). The general case requires a little bit more careful estimates.

Theorem 4(Riesz Transform). The operator R_{j} which has mutlipliers m(\xi) = i \frac{\xi_{i}}{\sqrt{\xi_{1}^{2}+...+\xi_{d}^{2}}} is called Riesz transform, and it is denoted by R_{j} = \frac{\partial}{\partial x_{j}}(-\Delta)^{-1/2}. Then

\begin{aligned}\| R_{j} f\|_{p}\leq C_{p,d} \|f\|_{p} \quad 1<p<\infty \end{aligned}

holds for all f \in \mathcal{S}(\mathbb{R}^{d}) and all j=1, \ldots, d.

In particular

\begin{aligned} \| \sqrt{(R_{1}f)^{2}+...+(R_{d}f)^{2}}\|_{p} \leq C_{p,d}\|f\|_{p}\end{aligned}

Hint: Notice that m(\xi) = i \frac{\xi_{i}}{\sqrt{\xi_{1}^{2}+...+\xi_{d}^{2}}} satisfies the assumptions in Mikhlin–Hormander theorem.

One can verify the identity R_{i}R_{j}(-\Delta) =\frac{\partial^{2}}{\partial x_{i} \partial x_{j}}. In particular, double application of the previous theorem gives

Theorem 5 (second derivatives and the Laplacian). For any 1<p<\infty, and any f \in \mathcal{S}(\mathbb{R}^{d}) we have

\begin{aligned} \| \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f\|_{p} \leq C_{p,d}\|\Delta f\|_{p} \quad 1<p<\infty\end{aligned}

Theorem 6 (Littlewood–Paley: martingale version).
For any sequence of functions f_{1}(\xi_{1}),f_{2}(\xi_{1}, \xi_{2}) \ldots, f_{N}(\xi_{1}, \ldots, \xi_{N}) on \mathbb{T}^{N} with \int_{\mathbb{T}} f_{\ell}(\xi_{1}, \ldots, \xi_{\ell}) dm(\xi_{\ell})=0 for all \ell =1, \ldots, N, we have

\begin{aligned} c_{p} \| f_{1}+...+f_{N}\|_{p} \leq \| (f_{1}^{2}+....+f_{N}^{2})^{1/2}\|_{L^{p}(\mathbb{T}^{N})} \leq C_{p} \| f_{1}+...+f_{N}\|_{p} \quad 1<p<\infty\end{aligned}

holds true for all N\geq 1 where c_{p}, C_{p} are some universal constants.

Hint: It follows from Proposition 1 that

\begin{aligned} \frac{1}{C_{p}^{2}}\| f_{1}+...+f_{N}\|_{p}\leq \|\varepsilon_{1} f_{1}+...+\varepsilon_{N}f_{N}\|_{p}\leq C_{p}^{2} \|f_{1}+...+f_{N}\|_{p} \end{aligned}

for all choices of signs \varepsilon_{j} = \pm 1. Now, take power p of the inequality, average over all signs \varepsilon_{j} = \pm 1, and use Khinchin’s inequality

\begin{aligned}\mathrm{average}_{\varepsilon} \left|\sum_{j=-N}^{N} \varepsilon_{j} x_{j} \right|^{p} \asymp \left(\sum_{j=-N}^{N}|x_{j}|^{2}\right)^{p/2} \end{aligned}

true for all complex numbers x_{j} gives the desired result (the proof of Khinchin’s inequality is given below).

Let \sum_{j \in \mathbb{Z}} \psi_{j}=1 be the Littlewood–Paley decomposition as before i.e., \psi_{j}(\xi) = \varphi(2^{-j}\xi)-\varphi(2^{-j+1}\xi), and for f \in \mathcal{S}(\mathbb{R}^{d}) let P_{j} = \mathcal{F}^{-1}(\psi_{j} \mathcal{F}f) be “projections”. Then

Theorem 7 (Littlewood–Paley: smooth Euclidean version) For any f \in \mathcal{S}(\mathbb{R}^{d}) we have

\begin{aligned}c_{p}\|f\|_{p}\leq  \| \left( \sum_{j \in \mathbb{Z}} (P_{j} f)^{2}\right)^{1/2}\|_{L^{p}(\mathbb{R}^{d})} \leq C_{p} \|f\|_{p} \quad 1<p<\infty \end{aligned}

Hint: the operator \sum_{j=-N}^{N} \varepsilon_{j} P_{j} has the symbol \sum_{j=-N}^{N} \varepsilon_{j} \psi_{j} which satisfies assumptions in Hormander–Mikhlin’s theorem (with constants independent of \varepsilon and N). Thus

\begin{aligned} \left\| \sum_{j=-N}^{N} \varepsilon_{j} P_{j} f  \right\|_{L^{p}(\mathbb{R}^{d})} \leq \|f\|_{L^{p}(\mathbb{R}^{d})}\end{aligned}

Now, take the power p of both sides, average over all signs \varepsilon_{j} and use Khinchin’s inequality, and finally take the limit N \to \infty.

For the reverse inequality we proceed by duality. It follows from Holder that \int (\sum_{j} (P_{j} f)^{2} )^{1/2}g \leq \|f\|_{p} \|g\|_{q} where 1/p+1/q=1 and g \geq 0. Choose g = (\sum_{j} g_{j}^{2})^{1/2} where g_{j} is arbitrary collection of nice functions. Then by Cauchy we get |\int f \sum_{j} P_{j}g_{j}|=|\int \sum (P_{j}f)g_{j}| \leq \|f\|_{p} \|(\sum g_{j}^{2})^{1/2}\|_{q}. Optimizing over all f we obtain

\begin{aligned} \| (\sum_{j} P_{j} g_{j}\|_{q} \leq C \|(\sum_{j} g^{2}_{j})^{1/2} \|_{q} \quad (LP)\end{aligned}.

Now it seems like we just need to choose g_{j} = P_{j}f to get what we want but the problem is that \sum_{j} P_{j} P_{j} f \neq f. To fix the issue we first obtain (LP) with \tilde{P_{j}} corresponding to multipliers \tilde{\psi_{j}} which are equal to 1 on the support of \psi_{j}, say \tilde{\psi}_{j}(\xi) = \varphi(\xi 2^{-j-2}) -\varphi(\xi 2^{-j+3}). Then it is an easy exercise to show that \sum \tilde{\psi}_{j}(\xi) \psi_{j}(\xi)=1 which leads to the desired lower bound $latex \| f \|_{q} \leq C_{q} \|(\sum_{j} (P_{j}f)^{2} )^{1/2} \|_{q}

Let \varepsilon_{1}, \ldots, \varepsilon_{n} be independent identically distributed symmetric Bernoulli random variables taking values +1 or -1. Let \mathbb{E} be the average taken with respect to these random variables, i.e., for any f :\{-1,1\}^{n} \to \mathbb{C} let

\begin{aligned}\mathbb{E} f(\varepsilon_{1}, \ldots, \varepsilon_{n}) = \frac{1}{2^{n}} \sum_{x = (x_{1}, \ldots, x_{n}) \in \{-1,1\}^{n}} f(x_{1}, \ldots, x_{n}) \end{aligned}

Theorem 8(Hypercontractivity on \{-1,1\}^{n}). Let 1\leq p <q , and r \in \mathbb{R}. For any complex numbers a_{S}, S \subset \{1, \ldots, n\} and any n \geq 1 we have

\begin{aligned} \left( \mathbb{E} \left|\sum_{S \subset \{1, \ldots, n\}} r^{|S|}a_{S} \prod_{j \in S} \varepsilon_{j} \right|^{q}\right)^{1/q} \leq \left(\mathbb{E} \left|\sum_{S \subset \{1, \ldots, n\}} a_{S} \prod_{j \in S}\varepsilon_{j}  \right|^{p}\right)^{1/p}\end{aligned}

if and only if |r| \leq \sqrt{\frac{p-1}{q-1}}. Here |S| denotes the cardinality of the set S.

Remark: we will see from the proof that inequality holds also when a_{S} are from an arbitrary normed space (X, \| \cdot \|) instead of \mathbb{C}. In that case the absolute values \left|\sum_{S \subset \{1, \ldots, n\}} a_{S} \prod_{j \in S}\varepsilon_{j} \right| should be replaced by \left\|\sum_{S \subset \{1, \ldots, n\}} a_{S} \prod_{j \in S}\varepsilon_{j} \right\|

Before we prove the the theorem let us mention its corollary

Corollary 9 (Khinchin’s inequality). For any 0<p<\infty we have

\begin{aligned} \mathbb{E} \left| \sum_{j=1}^{n} \varepsilon_{j} x_{j} \right|^{p} \asymp_{p} \left( \sum_{j=1}^{n} |x_{j}|^{2}\right)^{p/2} \end{aligned}

where \asymp_{p} means that it is comparable up to some universal constants depending only on p and independent of n and x_{j}.

Hint: apply hypercontractivity to those a_{S} when a_{S}=0 provided that |S|\neq 1. Then we obtain (choose p=2)

\begin{aligned} \left( \mathbb{E} \left| \sum_{j=1}^{n} x_{j} \varepsilon_{j} \right|^{q}\right)^{1/q} \leq \sqrt{q-1} \left( \mathbb{E} \left| \sum_{j=1}^{n} x_{j} \varepsilon_{j} \right|^{2}\right)^{1/2} = \sqrt{q-1} \sqrt{\sum_{j=1}^{n}|x_{j}|^{2}} \end{aligned}

hods true for all q\geq 2. Now play with Holder inequality to get Khinchin’s inequality for all powers. For example, the trick is that if you know the estimate \| f\|_{4} \leq C \|f\|_{2}. Then using Holder you can get \|f\|_{2}\leq C^{2} \|f\|_{1}. Indeed, by Holder we have \|f\|_{2}\leq \|f\|_{1}^{\theta} \|f\|_{4}^{1-\theta} \leq \|f\|_{1}^{\theta}C^{1-\theta} \|f\|_{2}^{1-\theta}, and hence \|f\|_{2} \leq C^{\frac{1-\theta}{\theta}} \|f\|_{1}, where \theta solves \frac{1}{2}=\frac{\theta}{1} + \frac{1-\theta}{4}, so \theta =\frac{1}{3}. Show that similarly one can go to p\in (0,1) as well.

The proof of hypercontractivity.

Let us first show the case n=1, i.e., let g(\varepsilon_{1})=a+b\varepsilon_{1} with a,b \in \mathbb{C} (the proof proceeds absolutely verbatim if \mathbb{C} is replaced by an arbitrary normed space (X, \|\cdot \|), then in the proof we just replace the absolute values |\cdot | by the norms \| \cdot \|, the only property we need is triangle inequality! ) then

\begin{aligned} \left( \mathbb{E} |a+rb\varepsilon_{1}|^{q} \right)^{1/q} \leq \left( \mathbb{E} |a+b\varepsilon_{1}|^{p} \right)^{1/p}\end{aligned}

Notice that a+rb\varepsilon_{1} = \mathbb{E}_{\delta_{1}} g(\delta_{1}) (1+r\varepsilon_{1} \delta_{1}) where \delta_{1} is independent copy of \varepsilon_{1} (i.e., it is another \pm 1 symmetric Bernoulli random variable), here \mathbb{E}_{\delta_{1}} takes average only with respect to \delta_{1}. Therefore by the triangle inequality (and the fact that |r|\leq 1, r \in \mathbb{R})

\begin{aligned} |a+rb\varepsilon_{1}| = \left| \mathbb{E}_{\delta_{1}}g(\delta_{1}) (1+r\varepsilon_{1} \delta_{1}) \right|  \leq \mathbb{E}_{\delta_{1}}|g(\delta_{1})| (1+r\varepsilon_{1} \delta_{1}). \end{aligned}

If we let |g(1)|=u and |g(-1)|=v then

\begin{aligned}\mathbb{E}_{\delta_{1}}|g(\delta_{1})| (1+r\varepsilon_{1} \delta_{1}) = \frac{u+v}{2}+\frac{u-v}{2}r\varepsilon_{1} \end{aligned}

\begin{aligned}\left( \mathbb{E} |a+b\varepsilon_{1}|^{p} \right)^{1/p} =\left( \mathbb{E} |g(\varepsilon_{1})|^{p} \right)^{1/p} = \left( \frac{u^{p}+v^{p}}{2}\right)^{1/p} \end{aligned}


\begin{aligned} \left( \mathbb{E}_{\varepsilon_{1}} |a+rb\varepsilon_{1}|^{q}\right)^{1/q} \leq \left( \mathbb{E}_{\varepsilon_{1}} \left(\frac{u+v}{2}+\frac{u-v}{2}r\varepsilon_{1}\right)^{q}\right)^{1/q}  = \left( \frac{\left(\frac{u+v}{2}+\frac{u-v}{2}r\right)^{q} +\left(\frac{u+v}{2}-\frac{u-v}{2}r\right)^{q}}{2}\right)^{1/q}\end{aligned}

Denoting A = \frac{u+v}{2} \geq 0, and B =\frac{u-v}{2}, |B|\leq A we see that to verify \left( \mathbb{E} |a+b\varepsilon_{1}|^{q} \right)^{1/q} \leq \left( \mathbb{E} |a+rb\varepsilon_{1}|^{p} \right)^{1/p} it suffices to show that

\begin{aligned}\left( \frac{\left(A+Br\right)^{q} +\left(A-Br\right)^{q}}{2}\right)^{1/q} \leq  \left( \frac{\left(A+B\right)^{p} +\left(A-B\right)^{p}}{2}\right)^{1/p}\end{aligned}

Perfect! Thus we reduced the question to real numbers! So how to prove the last inequality? There are several ways to proceed from here a) one uses Taylor’s formula for (1+x)^{\alpha} for |x|\leq 1; b) the second one goes through what is called “log-Sobolev inequality”. We prefer the second one because it is a general approach and works for other similar questions.

Using homogeneity we can assume A=1 (otherwise we can divide both sides by A). The map r \mapsto \left(1+Br\right)^{q} +\left(1-Br\right)^{q} is even and convex, thus increasing on [0,1], therefore it suffices to consider the “worst case scenario” when r = \sqrt{\frac{p-1}{q-1}}. Replacing B by \frac{B}{\sqrt{p-1}} it suffices to show the inequality

\begin{aligned}\left( \frac{\left(1+\frac{B}{\sqrt{q-1}}\right)^{q} +\left(1-\frac{B}{\sqrt{q-1}}\right)^{q}}{2}\right)^{1/q} \leq \left( \frac{\left(1+\frac{B}{\sqrt{p-1}}\right)^{p} +\left(1-\frac{B}{\sqrt{p-1}}\right)^{p}}{2}\right)^{1/p}\end{aligned}

Clearly this is the same as prove that the map

\begin{aligned} p \mapsto \left(\mathbb{E}\left( 1+\varepsilon_{1} \frac{B}{\sqrt{p-1}}\right)^{p}\right)^{1/p} = \left( \frac{\left(1+\frac{B}{\sqrt{p-1}}\right)^{p} +\left(1-\frac{B}{\sqrt{p-1}}\right)^{p}}{2}\right)^{1/p} \end{aligned}

is decreasing for p\geq 1. Due to symmetry we can assume B\geq 0. Taking the logarithm and differentiating we obtain (let us denote B_{p} = \frac{B}{\sqrt{p-1}})

\begin{aligned}-\frac{1}{p^{2}} \log \mathbb{E} (1+\varepsilon_{1} B_{p})^{p} + \frac{1}{p} \frac{\mathbb{E} (1+\varepsilon_{1} B_{p})^{p} \left[ \log (1+\varepsilon_{1} B_{p}) - \frac{p \varepsilon_{1} B_{p}}{2(p-1)(1+\varepsilon_{1} B_{p})}\right]}{\mathbb{E} (1+\varepsilon_{1} B_{p})^{p}} \stackrel{?}{\leq} 0 \end{aligned}

Let A :=B_{p}. Then the last inequality can be simply rewritten as follows

\begin{aligned}\mathbb{E} (1+\varepsilon_{1} B)^{p} \log (1+\varepsilon_{1}B)^{p} - \mathbb{E} (1+\varepsilon_{1} B)^{p} \log \mathbb{E} (1+\varepsilon_{1} B)^{p} \stackrel{?}{\leq} \frac{p^{2}A}{2(p-1)} \mathbb{E} (1+\varepsilon_{1}A)^{p-1} \varepsilon_{1} \end{aligned}

Great! But still seems to be a nontrivial inequality (especially taking into account that we want to show it for all p >1). Maybe(!) it suffices only to show the inequality for p=2? For example, suppose we can prove the following inequality (replace 1 by u>0, A by v>0 and p=2),

\begin{aligned}\mathbb{E} (u+\varepsilon_{1} v)^{2} \log (u+\varepsilon_{1}v)^{2} - \mathbb{E} (u+\varepsilon_{1} v)^{2} \log \mathbb{E} (u+\varepsilon_{1} v)^{2} \stackrel{?}{\leq} 2v \mathbb{E} (u+\varepsilon_{1}v) \varepsilon_{1} \end{aligned}

for all u\geq v >0. Then it turns out that this simple inequality implies the previous inequality. Indeed, suppose we can prove the last inequality (p=2), then let us see why does it imply the general inequality. Choose u \geq v \geq 0 so that u+\varepsilon_{1}v = (1+A \varepsilon_{1})^{p/2}. Of course this system has a solution with u \geq v \geq 0. Then the last inequality implies

\begin{aligned}\mathbb{E} (1+\varepsilon_{1} A)^{p} \log (1+\varepsilon_{1}A)^{p} - \mathbb{E} (1+\varepsilon_{1} A)^{p} \log \mathbb{E} (1+\varepsilon_{1} A)^{p} \leq 2v \mathbb{E} (u+\varepsilon_{1}v) \varepsilon_{1} = 2v^{2}. \end{aligned}

If we can show that 2v^{2}\leq \frac{p^{2}A}{2(p-1)} \mathbb{E} (1+\varepsilon_{1}A)^{p-1} \varepsilon_{1} then this will prove the general inequality. Since u+\varepsilon_{1}v = (1+A \varepsilon_{1})^{p/2} it follows that

\begin{aligned} 2v^{2} = 2\left[\frac{(1+A)^{p/2}+(1-A)^{p/2}}{2}\right]^{2}  =  \frac{p^{2}}{8} \left(\int_{-A}^{A} (1+x)^{\frac{p}{2}-1}dx \right)^{2} \leq \frac{p^{2}}{8} \left(\int_{-A}^{A}(1+x)^{p-2}dx \right)\left(\int_{-A}^{A}dx \right)= \end{aligned}

\begin{aligned}\frac{p^{2}A}{4(p-1)}\left(  (1+A)^{p-1}-(1-A)^{p-1} \right) = \frac{p^{2}A}{2(p-1)} \mathbb{E} (1+\varepsilon_{1}A)^{p-1} \varepsilon_{1}\end{aligned}

So it suffices to prove only the inequality \begin{aligned}\mathbb{E} (u+\varepsilon_{1} v)^{2} \log (u+\varepsilon_{1}v)^{2} - \mathbb{E} (u+\varepsilon_{1} v)^{2} \log \mathbb{E} (u+\varepsilon_{1} v)^{2} \stackrel{?}{\leq} 2v \mathbb{E} (u+\varepsilon_{1}v) \varepsilon_{1}  = 2v^{2}\end{aligned}

Of course (replacing v by v/u) the inequality reduces to the case u=1, i.e.,

\begin{aligned}\mathbb{E} (1+\varepsilon_{1} v)^{2} \log (1+\varepsilon_{1}v)^{2} - \mathbb{E} (1+\varepsilon_{1} v)^{2} \log \mathbb{E} (1+\varepsilon_{1} v)^{2} \stackrel{?}{\leq} 2v^{2}\end{aligned}

Which simplifies to

\begin{aligned}\frac{(1+ v)^{2} \log (1+v)^{2}+(1-v)^{2} \log (1-v)^{2}}{2} -  (1+ v^{2}) \log  (1+v^{2}) \stackrel{?}{\leq} 2v^{2}\end{aligned}

Now if we let

\begin{aligned} f(v):=2v^{2}+(1+ v^{2}) \log (1+v^{2}) - \frac{(1+ v)^{2} \log (1+v)^{2}+(1-v)^{2} \log (1-v)^{2}}{2} \end{aligned}

Then it follows from direct calculation that f(0)=f'(0)=f''(0)=f'''(0)=0, and
f'''(v) = \frac{16v}{(1-v^{2})(1+v^{2})^{2}}\geq 0 on [0,1]. Now this implies that f is convex and hence increasing, this completes the proof of

\begin{aligned} \left( \mathbb{E} |a+rb\varepsilon_{1}|^{q} \right)^{1/q} \leq \left( \mathbb{E} |a+b\varepsilon_{1}|^{p} \right)^{1/p}\end{aligned}.

To prove the general case let

\begin{aligned} f(\varepsilon_{1}, \ldots, \varepsilon_{n}) = \sum_{S \subset \{1, \ldots, n\}}a_{S} \prod_{j \in S} \varepsilon_{j} \end{aligned}

and notice that \varepsilon_{1} \mapsto f(\varepsilon_{1}, \ldots, \varepsilon_{n}) is a map of the type a+\varepsilon_{1} b where a, b do not depend on \varepsilon_{1}. The goal is to show that \| f(r\varepsilon_{1}, \ldots, r\varepsilon_{n})\|^{p}_{q} \leq \| f(\varepsilon_{1}, \ldots, \varepsilon_{n})\|^{p}_{p}. Indeed, we have

\begin{aligned} (\mathbb{E}|f(r\varepsilon_{1}, \ldots, r\varepsilon_{n})|^{q})^{p/q}   = (\mathbb{E}_{\varepsilon_{2},...,\varepsilon_{n}} \mathbb{E}_{\varepsilon_{1}}|f(r\varepsilon_{1}, \ldots, r\varepsilon_{n})|^{q})^{p/q} \leq (\mathbb{E}_{\varepsilon_{2}, \ldots, \varepsilon_{n}} (\mathbb{E}_{\varepsilon_{1}}|f(\varepsilon_{1},r\varepsilon_{2} \ldots, r\varepsilon_{n})|^{p})^{q/p})^{p/q}\end{aligned}

\begin{aligned}\stackrel{\mathrm{Minkowski}}{\leq}  \mathbb{E}_{\varepsilon_{1}}(\mathbb{E}_{\varepsilon_{2}, \ldots, \varepsilon_{n}} |f(\varepsilon_{1},r\varepsilon_{2} \ldots, r\varepsilon_{n})|^{q})^{p/q}\end{aligned}

where in the last inequality we used Minkowski’s inequality Thus we got rid off one r. Iterating this process we get rid off all r‘s and finally we end up with \mathbb{E} |f(\varepsilon_{1}, \ldots, \varepsilon_{n})|^{p}. This finishes the proof of hypercontractivity.

Concluding remarks

  1. For certain class of martingales (also called “conditionally symmetric martingales”), say

    \begin{aligned}F = \varepsilon_{1}+g_{1}(\varepsilon_{1})\varepsilon_{2}+g_{2}(\varepsilon_{1}, \varepsilon_{2})\varepsilon_{3}+...+g_{N}(\varepsilon_{1}, ...,\varepsilon_{N-1})\varepsilon_{N} \end{aligned}

    where (\varepsilon_{1}, ..., \varepsilon_{N}) \in \{-1,1\}^{N}, and g_{1}, ..., g_{N} are arbitrary functions the “extended” Lettlewood–Paley inequality holds

    \begin{aligned} \frac{1}{C_{p}}\|\varepsilon_{1}+g_{1}(\varepsilon_{1})\varepsilon_{2}+g_{2}(\varepsilon_{1}, \varepsilon_{2})\varepsilon_{3}+...+g_{N}(\varepsilon_{1}, ...,\varepsilon_{N-1})\varepsilon_{N}\|_{p}  \leq \|(1+g^{2}_{1}(\varepsilon_{1})+g^{2}_{2}(\varepsilon_{1}, \varepsilon_{2})+...+g^{2}_{N}(\varepsilon_{1}, ...,\varepsilon_{N-1}))^{1/2}\|_{p}  \end{aligned}

    for all 0<p<\infty, here L^{p} norms are calculated with respect to the average over \varepsilon_{1}, ..., \varepsilon_{N}.
  2. The Martingale Littlewood–Paley inequalities are closely related (in fact sometimes one implies the other or both imply each other) to square function estimates for the “stochastic processes”. Without giving definitions the estimate (1/C_{p})\|T^{1/2}\|_{p}\leq \|B_{T}\|_{p} \leq C_{p}\|T^{1/2}\|_{p} holds (the right hand side for all 0<p<\infty, and the left hand side for all 1<p<\infty provided that \|T^{1/2}\|_{p} <\infty), here B_{t} is the standard brownian motion, and T is any stopping time.
  3. There are several versions of Littlewood–Paley inequalities in Euclidean space. In the analogy to “Conditionally symmetric martingales” the following extension of Littlewood–Paley inequality holds true on the unit circle: let \{J\}_{J \in D} be the dyadic decomposition of \mathbb{Z}, say J = \varepsilon [2^{j}, 2^{j-1}), \varepsilon = \pm 1 and j \in \mathbb{Z}, then for any finite sim f = \sum_{k \in \mathbb{Z}} c_{k} \xi^{k} we have

    \begin{aligned} \| f \|_{L^{p}(\mathbb{T})} \leq \|(\sum_{J \in D} f_{J}^{2})^{1/2} \|_{L^{p}(\mathbb{T})} \quad 0<p<\infty \end{aligned}

    holds true where f_{J} = \sum_{k \in J} c_{k} \xi^{k}. Moreover, when 0<p\leq 2 the inequality holds for any collection of disjoint intervals \{ J\}_{J \in D}, the case 1<p\leq 2 is due to Rubio de Francia, the case p=1 is due to Bourgain, and the case 0<p<1 is due to Kislyakov–Parilov (their result is written in H^{p} spaces but it is easy to get the it for the case of (not analytic) finite sums as well).
  4. The hypercontractivity

    \begin{aligned} \left( \mathbb{E} \left|\sum_{S \subset \{1, \ldots, n\}} r^{|S|}a_{S} \prod_{j \in S} \varepsilon_{j} \right|^{q}\right)^{1/q} \leq \left(\mathbb{E} \left|\sum_{S \subset \{1, \ldots, n\}} a_{S} \prod_{j \in S}\varepsilon_{j} \right|^{p}\right)^{1/p}\end{aligned}

    was recently extended to complex r‘s, namely if we let r=z to be a complex number then the inequality holds if and only if

    \begin{aligned} |(p-2)-z^{2}(q-2)|\leq p-|z|^{2}q. \end{aligned}

    Here it is important that a_{S} are complex numbers (and not from an arbitrary Banach space).

Fourier multipliers: examples on the torus

Two problems on \mathbb{T}^{d}

In our previous lecture notes we proved Boundedness of Riesz Projection P_{+}, namely
\begin{aligned} \left\| \sum_{k \geq 0} c_{k} e^{ik \theta} \right\|_{p} \leq C_{p} \left\| \sum_{k \in \mathbb{Z}}  c_{k} e^{i k \theta} \right\|_{p} \quad 1<p<\infty\end{aligned}
and, as a corollary, the boundedness of the Hilbert transform
\begin{aligned} \left\| \sum_{k \in \mathbb{Z}}  \mathrm{sign}(k)c_{k} e^{ik \theta} \right\|_{p} \leq C_{p} \left\| \sum_{k \in \mathbb{Z}} c_{k} e^{i k \theta} \right\|_{p} \quad 1<p<\infty\end{aligned}
for all finite sums \sum c_{k} e^{i \theta k}, where c_{k} are complex numbers.
Apparently, the most important corollary of boundedness of Riesz projection is the inequality

\begin{aligned} \left\| \sum_{k \in [a,b]\cap \mathbb{Z}} c_{k} e^{ik \theta} \right\|_{p} \leq C_{p} \left\| \sum_{k \in \mathbb{Z}} c_{k} e^{i k \theta} \right\|_{p} \quad 1<p<\infty\end{aligned}

holds true for all intervals [a,b] and C_{p} does not depend on [a,b].

We can make one step further and out of curiosity consider \mathbb{T}^{d} torus, where we consider finite sums

\begin{aligned} \sum_{k=(k_{1}, \ldots, k_{d}) \in \mathbb{Z}^{d}} c_{k} e^{i k_{1} \theta_{1}} \cdots e^{i k_{d} \theta_{d}}  \end{aligned}.

In what follows, instead of e^{i \theta_{j}} we will be writing \xi_{j}, j=1, \ldots, d, and more generally, for a multi index k = (k_{1}, \ldots, k_{d}) \in \mathbb{Z}^{d} we will denote \xi^{k}=\prod_{j=1}^{d} \xi_{j}^{k_{j}}, \xi = (\xi_{1}, \ldots, \xi_{d}), so that our trigonometric polynomials will take a compact form \sum_{k} c_{k} \xi^{k}. The following remark will be helpful!

Remark 1: It might be a good idea to think about \xi_{1}, \ldots, \xi_{d} as random variables, independent and uniformly distributed on \mathbb{T}. Moreover, instead of writing

\begin{aligned} \frac{1}{(2\pi)^{d}} \int_{0}^{2\pi}\ldots \int_{0}^{2\pi} g(\xi) d\theta_{1} \ldots d\theta_{d} \end{aligned}

we will be using the notation \mathbb{E} g(\xi). Similarly we will have

\begin{aligned} \| \sum_{k} c_{k} \xi^{k}\|^{p}_{p} = \mathbb{E} | \sum_{k} c_{k} \xi^{k}|^{p} \end{aligned}.

Also notice that if we let f = \sum c_{k} \xi^{k} then c_{k} = \mathbb{E} f \xi^{-k} for all k \in \mathbb{Z}^{d} just because \mathbb{E} \xi^{k}=0 if and only if k \neq (0,0, \ldots, 0)

Let 1\leq p, q \leq \infty be arbitrary numbers (one may push curiosity further and consider 0 < p, q \leq \infty). Suppose we are given collection of complex numbers \{ m(k)\}_{k \in \mathbb{Z}^{d}} (by traditional reasons they will be called Fourier multipliers). We will be interested for which (p,q,d) the inequality

\begin{aligned} \| \sum_{k \in \mathbb{Z}^{d}} m(k)c_{k} \xi^{k} \|_{q} \leq C_{p,q,d} \| \sum_{k \in \mathbb{Z}^{d}} c_{k} \xi^{k} \|_{p} \quad (1)\end{aligned}

holds true. We can take one more step further and ask the question not for all finite sums \sum_{k} c_{k} \xi^{k}, but only for those finite sums whose frequencies live in a prescribed set E \subset \mathbb{Z}^{d}. Namely,

\begin{aligned} \| \sum_{k \in E} m(k)c_{k} \xi^{k} \|_{q} \leq C_{p,q,d, E} \| \sum_{k \in E} c_{k} \xi^{k} \|_{p} \end{aligned}

There are three important examples one needs to keep in mind: 0) E = \mathbb{Z}^{d}; 1) E = B(0,n)\cap \mathbb{Z}^{d}, i.e,. the ball of radius n; 2) E = \mathbb{Z}^{d}\setminus B(0,n). Notice that if d=1 then 1) corresponds to trigonometric polynomials of degree n, i.e., g = \sum_{|k|\leq n} c_{k} \xi^{k}. And in the second case 2) g = \sum_{|k|\geq n} c_{k} \xi^{k}, i.e., we say that g lives on high frequencies.

Question 1. “What is the easiest way” to figure out for which 4-tuples (p,q,d, E), we have the inequality

\begin{aligned} \| \sum_{k \in E} m(k)c_{k} \xi^{k} \|_{q} \leq C_{p,q,d, E} \| \sum_{k \in E} c_{k} \xi^{k} \|_{p} \quad (2)\end{aligned}

for all complex numbers c_{k}. Also, how the constant C_{p,q,d, E} depends on the parameters (p,q,d, E) (especially on the dimension d and the “size” of E)?

This is a very beautiful (of course, open!) question. There are series of interesting particular examples which are more or less understood and we will mention them shortly. Several remarks are in order.

Remark 2: For g = \sum_{k \in E} c_{k} \xi^{k} , and given complex numbers \{ m(k)\}_{k \in \mathbb{Z}^{d}} the linear operator

\begin{aligned} g \mapsto \sum_{k \in E} m(k)c_{k} \xi^{k} \end{aligned}

is convenient to denote as T_{m} g, and the inequality (1) in Question 1 rewrites in a more compact form

\begin{aligned} \| T_{m} g\|_{q} \leq C_{p,q,d,E} \|g\|_{p}\quad \text{for all} \quad g = \sum_{E} c_{k} \xi^{k} \quad (3)\end{aligned}

If it happens that the Kernel
\begin{aligned} K_{m}(\xi) \; "=" \;  \sum_{k\in E} m(k)\xi^{k} \end{aligned}
makes some reasonable sense, then we could think of T_{m}g(\xi) = \mathbb{E}_{\eta} K_{m}(\eta) g(\xi \overline{\eta}), as a convolution similarly as we did with Hilbert transform (here \mathbb{E}_{\eta} means we integrate with respect to variable \eta = (\eta_{1}, \ldots, \eta_{d}) =(e^{i \theta_{1}}, \ldots, e^{i \theta_{d}}), and \overline{\eta} denotes complex conjugate). This gives a slightly different look at the inequality (3) which will be helpful when using Calderon–Zygmund techniques.

Remark 3. Sometimes one considers more sophisticated questions where one allows m(k, \xi), i.e., our Fourier multipliers m to depend on \xi, namely

\begin{aligned} \| \sum_{k \in E} m(k, \xi)c_{k} \xi^{k} \|_{q} \leq C_{p,q,d, E} \| \sum_{k \in E} c_{k} \xi^{k} \|_{p} \end{aligned}

These inequalities arise when studying pseudodifferential operators. Such multipliers create issues with orthogonality, namely the identity
\begin{aligned} \mathbb{E} |\sum_{E} m(k)c_{k} \xi^{k}|^{2} = \sum_{E} |m(k) c_{k}|^{2} \end{aligned}
may not hold anymore if we allow m to depend on \xi, therefore, even L^{2} boundedness becomes problematic.

In these notes we will work only with m(k), i.e.,, there is no dependence on \xi.

Remark 4: The case a) E = \mathbb{Z}^{d}, and b) E \neq  \mathbb{Z}^{d} should be (always!) kept in mind separately. In the case a) we ask inequality (2) to hold for all finite sums \sum c_{k} \xi^{k} without any restrictions on the coefficients c_{k}. This case is usually “easier”.

The case b) (i.e., when we do have restrictions on frequencies on finite sums \sum_{E} c_{k} \xi^{k}) are more delicate and in general are very hard to prove (especially to understand how C_{p,q,d,E} depends on the “size” of E). They are closely related to questions arising in Approximation Theory.

Sometimes one also separates the third case c) when E=(\mathbb{Z}_{+})^{d}=\{(k_{1}, \ldots, k_{d}), \, k_{j} \geq 0, j=1,..,d\}. This corresponds to analytic polynomials, and techniques from complex analysis will be helpful to better understand (2).

When looking at (2) it might be helpful to first think about the case p=q, and E = \mathbb{Z}^{d} (or even consider first p=q=2 to open parentheses and use orthogonality

\mathbb{E} |\sum c_{k} \xi^{k}|^{2} = \sum |c_{k}|^{2}).

I think the second important question (closely tied to the first one) is about Fourier coefficients. Given a finite sum g = \sum_{k \in E}c_{k} \xi^{k} it is reasonable to ask what information one can obtain on g if we know something about c_{k}. Since c_{k} = \hat{g}(k) it seems convenient to introduce the following norm
\begin{aligned} \| \hat{g} \|_{\ell^{r}} =\left( \sum_{k} |c_{k}|^{r}\right)^{1/r}  \end{aligned}. Sometimes we will just write \| \hat{g}\|_{r} instead of \| \hat{g} \|_{\ell^{r}}.

Question 2. “What is the easiest way” to figure out for which 4-tuples (p,q,d, E), we have the inequality

\begin{aligned} \left( \sum_{E} |c_{k}|^{q}\right)^{1/q} \leq C_{p,q,d, E} \|\sum_{E} c_{k} \xi^{k}\|_{p} \quad (4)\end{aligned};

\begin{aligned}   \|\sum_{E} c_{k} \xi^{k}\|_{q}\leq C'_{p,q,d, E}\left( \sum_{E} |c_{k}|^{p}\right)^{1/p}  \quad (4')\end{aligned}

Also, how the constant C_{p,q,d, E}, C'_{p,q,d, E} depends on the parameters (p,q,d, E) (especially on the dimension d and the “size” of E)?

In the next section we give series of example (results), which shed some light on these two questions that we have presented here.

Examples, Examples, and more Examples!

Example 1.(Hormander–Mikhlin multiplier theorem) Let m  : \mathbb{R}^{d} \to \mathbb{C} be such that m \in C^{r}(\mathbb{R}^{d}\setminus\{0\}) (here r = \left[ \frac{d}{2}\right]+1), and |x|^{|\alpha|}|D^{\alpha} m(x)|<C for all multi index \alpha=(\alpha_{1}, \ldots, \alpha_{d}), \alpha_{j} \in \mathbb{Z}_{+} with |\alpha|=\alpha_{1}+\ldots+\alpha_{d} \leq r, D^{\alpha} m  = \frac{\partial^{|\alpha|}}{\partial^{\alpha_{1}}x_{1}\ldots \partial^{\alpha_{d}}x_{d}} m(x), |x| = \sqrt{x_{1}^{2}+\ldots+x_{d}^{2}}, then

\begin{aligned} \| \sum_{\mathbb{Z}^{d}} m(k) c_{k} \xi^{k}\|_{p} \leq C_{p,d} \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k}\|_{p} \end{aligned}

holds for all 1<p<\infty, and all finite sums \sum c_{k} \xi^{k}.

Hint: see the hint to the first theorem in section of “Corollaries” here.

The choice d=1, and m(x) = -i \mathrm{sign}(x) gives Hilbert transform. The number m(0) can be any finite complex number; there is no any regularity assumptions at x=0. Also notice that \sup_{x\in \mathbb{R}^{d}}|m(x)|<C. We see that the Fourier multiplier m is kind of inherited from \mathbb{R}^{d} due to certain transference theorems between \mathbb{R}^{d} and \mathbb{T}^{d} which will be discussed later. The choice m(x) = |x|^{it}, t \in \mathbb{R} corresponds to Riesz Potential, and the choice m(x) = \frac{x_{j}}{|x|} corresponds to Riesz transform. Drawback in Mikhlin’s theorem is that it does not capture the right dependence of C_{p,d} on the dimension d.

Example 2 (Hypercontractivity).
1.(trigonometric case) For all 1 \leq p \leq q <\infty, and all real |r| \leq \sqrt{\frac{p-1}{q-1}} we have

\begin{aligned}\| \sum_{\mathbb{Z}^{d}} r^{|k|_{1}}c_{k} \xi^{k}\|_{q} \leq  \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k}\|_{p}\end{aligned}

holds for all finite sums \sum c_{k} \xi^{k}, here |k|_{1}=|k_{1}|+...+|k_{d}|.

2.(analytic case) For all 0<p\leq q <\infty, and all complex |z|\leq \sqrt{\frac{p}{q}} we have

\begin{aligned}\| \sum_{(\mathbb{Z}_{+})^{d}} z^{|k|_{1}}c_{k} \xi^{k}\|_{q} \leq \| \sum_{(\mathbb{Z}_{+})^{d}} c_{k} \xi^{k}\|_{p}\end{aligned}

The beautiful part in hypercontractivity is that constants in the inequality do not depend on p,q and most importantly on the dimension d. The next example follows from hypercontractivity.

Example 3. (Nikolskij type inequalities). We have

\begin{aligned}\| \sum_{ |k|_{1}\leq N} c_{k} \xi^{k}\|_{q} \leq (\sqrt{q-1})^{N} \|\sum_{ |k|_{1}\leq N} c_{k} \xi^{k}\|_{2} \qquad 2\leq q  \quad  (5)\end{aligned}

\begin{aligned}\| \sum_{k \in (\mathbb{Z}_{+})^{d},\; |k|_{1}\leq N} c_{k} \xi^{k}\|_{q} \leq \left(\sqrt{\frac{p}{q}}\right)^{N} \|\sum_{k \in (\mathbb{Z}_{+})^{d}, \, |k|_{1}\leq N} c_{k} \xi^{k}\|_{2} \qquad 0<p \leq q  \quad (5')\end{aligned}

hold true where |k|_{1}:=|k_{1}|+\ldots+|k_{d}|. Next

\begin{aligned}\| \sum_{|k|_{\infty}\leq N} c_{k} \xi^{k}\|_{q} \leq 2^{d} N^{d(\frac{1}{p}-\frac{1}{q})} \|\sum_{|k|_{\infty}\leq N} c_{k} \xi^{k}\|_{p}  \quad 0< p \leq q  \qquad (6)\end{aligned}

\begin{aligned}\| \sum_{k \in E} c_{k} \xi^{k}\|_{q} \leq  |E|^{\frac{1}{p}-\frac{1}{q}} \|\sum_{k \in E} c_{k} \xi^{k}\|_{p} \quad 0< p \leq 2, \, p \leq q \qquad (6')\end{aligned}

hold true where |k|_{\infty} = \max_{j = 1, \ldots, d} |k_{j}|. Here |E| denotes number of points in E.

(6′) it suffices to consider q=\infty. Use \|\sum_{E} c_{k} \xi^{k}\|_{\infty}\leq |E|^{1/2}\|\sum_{E} c_{k} \xi^{k}\|_{2}, and \|f\|_{a}\leq \|f\|_{b}^{\theta} \|f\|_{c}^{1-\theta} for nonnegative numbers a,b,c>0, \theta \in [0,1] such that \frac{1}{a}=\frac{\theta}{b}+\frac{1-\theta}{c}.

(6) it suffices to consider d=1, p\geq 2. N degree trigonometric polynomial is convolution with Dirichlet kernel D_{N}. Use Young’s convolution inequality and the fact \|D_{N}\|_{p} \asymp N^{1-\frac{1}{p}}.

(5) Use Hypercontractivity.

(5′) Use hypercontractivity for analytic functions and reduce the case to N hommogeneous polynomials.

The advantage of (5) is that the constants do not depend on dimension d. However, if d is small then, of course, (6) is better as it depends only polynomially on N.

The next example is due to Andrea Carbonaro and Oliver Dragičević.

Example 4.(Universal multipliers) Suppose m is a bounded holomorphic function in the cone \{z \in \mathbb{C}\, :\, |\mathrm{arg}(z)|<\varphi\}, where 0<\varphi \leq \pi/2. Then

\begin{aligned} \| \sum_{\mathbb{Z}^{d}} m(|k|^{2}_{2})c_{k} \xi^{k} \|_{p} \leq C(p, m, \varphi, r) \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k} \|_{p} \quad \text{if } \quad \arcsin \left|1 - \frac{p}{2}\right|< \varphi \end{aligned},

Hint: the heat semigroup is contractive on \mathbb{T}^{d}, now use the theorem of Carbonaro–Dragičević.

It should be noted that for the choice m(z) = z^{it} we once again (compare it to Mikhlin’s multiplier result), obtain the boundedness of Riesz potential

\begin{aligned} \| \sum_{\mathbb{Z}^{d}}  |k|_{2}^{it}c_{k} \xi^{k} \|_{p} \leq C(p, t) \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k} \|_{p} \quad \text{if } \quad 1<p<\infty \end{aligned}

and unlike Mikhlin’s theorem this bound does not depend on d!

Side note to experts: If (X, d\mu) is an arbitrary \sigma-finite measure space, L^{2}(X) square integrable functions, \{P_{t}\}_{t \geq 0} is strongly continuous semigroup defined on L^{2}(D, d\mu), selfadjoint, i.e., P_{t}^{*}=P_{t}, positive i.e., P_{t}f \geq 0 whenever f \geq 0, P_{t}1 =1, and L^{p} contractive i.e., \|P_{t} f\|_{p}\leq \|f\|_{p} for all 1\leq p \leq \infty, then it has a spectral resolution P_{t} = \int_{0}^{\infty} e^{-t\lambda }dE(\lambda), and Stein proved that the Fourier multiplier T_{m} =\int_{0}^{\infty} m(\lambda)e^{-t\lambda}dE(\lambda), where m(\lambda) = \lambda \int_{0}^{\infty} e^{-t\lambda}M(t)dt for some \| M\|_{\infty}<\infty is bounded \|T_m f\|_{p}\leq C(p, \|M\|_{\infty}) \|f\|_{p} for all 1<p<\infty. The main idea of the proof is just Burkholder’s inequality and Rota’s construction. We start from the identity
\begin{aligned} T_{m} f = \int_{0}^{\infty}M(t) \frac{\mathrm{d}}{\mathrm{d}t} P_{t} f \end{aligned}
Then Rota’s construction says that P_{t} f = \hat{\mathbb{E}} f_{t} for a certain martingale f_{t} and some conditional expectation \hat{E}. Then
\begin{aligned} \|T_{m} f\|_{p} = \|\int_{0}^{\infty}M(t) \frac{\mathrm{d}}{\mathrm{d}t} P_{t} f\|_{p} =\|\int_{0}^{\infty}M(t)  \hat{\mathbb{E}}d f_{t}\|_{p} \leq \|\int_{0}^{\infty}M(t) df_{t}\|_{p} \stackrel{\mathrm{Burkholder}}{\leq}C_{p} \|M\|_{\infty} \|f\|_{p}.    \end{aligned}

Example 5. (Zygmund 4/3). For all R>0 we have

\begin{aligned} \|\sum_{k \in \mathbb{Z}^{2}, |k|_{2}=R} c_{k} \xi^{k} \|_{4} \leq C \|\sum_{k \in \mathbb{Z}^{2}, |k|_{2}=R} c_{k} \xi^{k} \|_{2} \end{aligned}.

It is a challenging question what happens in higher dimensions \mathbb{Z}^{d} when d>. Bourgain made some progress. See also these lecture notes on Euclidean space \mathbb{R}^{d}. Such type of estimates are called Fourier Restriction estimates.

Example 6 (Bernstein–Markov inequalities) We have

\begin{aligned} \| \sum_{k \in \mathbb{Z}, \, |k|\leq N} ik c_{k} \xi^{k} \|_{p} \leq C N \| \sum_{k \in \mathbb{Z}, \, |k|\leq N} c_{k} \xi^{k} \|_{p} \quad \text{for all} \quad 0<p\leq \infty\quad (7)\end{aligned},

and its converse

\begin{aligned} \| \sum_{k \in \mathbb{Z}, \, |k|\geq N} ik c_{k} \xi^{k} \|_{p} \geq C N \| \sum_{k \in \mathbb{Z}, \, |k|\geq N} c_{k} \xi^{k} \|_{p} \quad \text{for all} \quad 1\leq p\leq \infty\quad (8)\end{aligned}.

Hint to (8): see my response to this mathoverflow question.

Both inequalities are critical in approximation theory. If one thinks about h = \sum c_{k} \xi^{k} as a function of variable \theta, \xi = e^{i \theta}, then (7) and (8) simply give the estimates on h'(\theta) in terms of h(\theta). One more observation is that one can show (8) is equivalent to Jackson’s inequality

\begin{aligned} \inf_{\mathrm{deg}(g) \leq N } \|h - g\|_{p}\leq \frac{C}{N}\| h'\|_{p}, \quad 1\leq p \leq \infty,  \end{aligned}

where infimum is taken over all trigonometric polynomials on \mathbb{T} of degree at most N.

Sidenote: Perhaps it was first Markov who was investigating the bounds of the form
\begin{aligned}\max_{x \in [-1,1]}| r(0)a_{0}+r(r)a_{1} x+...+r(n)a_{n}x^{n}| \leq C(r_{0}, \ldots, r_{n})\max_{x \in [-1,1]}|a_{0}+a_{1} x+...+a_{n}x^{n}| \end{aligned} for real algebraic polynomials a_{0}+a_{1}x+...+a_{n}x^{n} on the interval [-1,1].

The next example relates sizes of g and \hat{g}.

Example 7.(Hausdorff–Young inequality).
Let \frac{1}{p}+\frac{1}{q}=1 and 1\leq p\leq 2. Then

\begin{aligned}\left(\sum_{\mathbb{Z}^{d}} |c_{k}|^{q} \right)^{1/q} \leq \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k}\|_{p} \end{aligned}


\begin{aligned} \| \sum_{\mathbb{Z}^{d}} c_{k} \xi^{k}\|_{q} \leq \left(\sum_{\mathbb{Z}^{d}} |c_{k}|^{p} \right)^{1/p} \end{aligned}

Hint: verify p=1,2 and use Riesz–Thorin interpolation theorem.

Surprisingly similar questions when frequencies of g are restricted turn out to be difficult.

Example 8. (Bohnenblust–Hille inequality).
We have

\begin{aligned}\left(\sum_{\mathbb{(Z_{+})}^{d}, \, |k|_{2}\leq n} |c_{k}|^{\frac{2n}{n+1}} \right)^{\frac{n+1}{2n}} \leq C^{\sqrt{n \log n}}\| \sum_{\mathbb{(Z_{+})}^{d}, \, |k|_{2}\leq n} c_{k} \xi^{k}\|_{\infty} \end{aligned}

holds true for all n\geq 0 and all d \geq 1.

See the previous notes where analogous question (but on the Hamming cube instead of \mathbb{T}^{d}) has important implications in quantum computing. It is in fact conjectured that the bound C^{\sqrt{n \log n}} should be replaced by n^{C} but this is open.

Improving a little bit over Cauchy–Schwartz for certain estimates on \mathbb{T}^{d} have interesting implications in counting number of solutions for systems of equations in a certain set.

Example 9. (Bourgain–Demeter–Guth)

For any \varepsilon >0, and any d\geq 1 there exists C(\varepsilon, d)>0 such that

\begin{aligned} \left\|\sum_{k\in \Gamma_{N}} c_{k}\xi^{k} \right\|_{d(d+1)} \leq C(\varepsilon, d) N^{\varepsilon} \left\| \sum_{k \in \Gamma_{N}} c_{k}\xi^{k}\right\|_{2} \quad \text{for all} \quad c_{k} \in \mathbb{C}\end{aligned}

holds true for all N\geq 1, where \Gamma_{N}=\{(n, n^{2}, \ldots, n^{d}), \, 1\leq n \leq N\}.

Of course nontrivial case is when d \geq 2. If we choose c_{k}=1, then in the left hand side of the inequality in Example 9 (after rising it to the power d(d+1)) one can open parenthesis and explicitly calculate its expression, so one obtains that the number of solutions of the equation

\begin{aligned} K_{1}+\ldots+K_{\frac{d(d+1)}{2}}=L_{1}+\ldots+L_{\frac{d(d+1)}{2}} \end{aligned}

for K_{1}, \ldots, K_{d(d+1)/2}, L_{1}, \ldots, L_{d(d+1)/2}\in \Gamma_{N} is bounded by C'(\varepsilon, d) N^{\varepsilon} N^{\frac{d(d+1)}{2}}.

There are some interesting questions under extra assumptions on c_{k} which are not well understood. The next question seems to be open.

Example 10. Show that for all distinct integers k_{1}, \ldots, k_{m} we have

\begin{aligned} \| \sum_{j=1}^{m} \xi^{k_{j}}\|_{1} \leq 0.9999\,  \| \sum_{j=1}^{m} \xi^{k_{j}}\|_{2}\end{aligned}

holds true for all m\geq 2.

There are some more sophisticated examples on \mathbb{T}^{d} about pointwise almost everywhere convergence for functions in L^{2}, namely an open problem on \mathbb{T}^{2} is to show the estimate

\begin{aligned} \left\| \sup_{R>0} \left| \sum_{\mathbb{Z}^{2}} 1_{|k|_{2}\leq R}(k) c_{k} \xi^{k} \right| \right\|_{2} \leq C \left\|\sum _{\mathbb{Z}^{2}} c_{k} \xi^{k}\right\|_{2}\end{aligned}

Some other questions which we are completely avoiding due to lack of time and space concern about boundedness of Bochner–Riesz multipliers: for which (p, \alpha, d), \alpha>0, 1\leq p\leq \infty, the estimate

\begin{aligned} \sup_{R>0}\| \sum_{\mathbb{Z}^{d}} \left(1-\frac{|k|_{2}^{2}}{R^{2}} \right)_{+}^{\alpha} c_{k} \xi^{k} \|_{p} \leq C_{p} \| \sum_{\mathbb{Z}^{d}}  c_{k} \xi^{k} \|_{p}\leq \end{aligned}

holds true where x_{+} = \max(0,x). Despite of some partial results no complete answers are known.

In the next section we discuss how these questions on \mathbb{T}^{d} are related to analogous questions on \mathbb{R}^{d}.

Transference theorems

We would like to take our knowledge on \mathbb{T}^{d} and translate it to \mathbb{R}^{d}. Since the arguments we present do not see difference between dimensions (and all the estimates we obtain are dimensions independent), for simplicity we consider d=1.

Let us take a smooth f :\mathbb{R} \to \mathbb{C}. We will be assuming that f and all its derivatives decay faster then any polynomial at \pm \infty, namely,

\begin{aligned} \sup_{x \in \mathbb{R}} |x^{\alpha} D^{\beta} f| < C_{\alpha, \beta}<\infty \end{aligned}

for all \alpha, \beta \geq 0. Such class of functions is called Schwartz class, and is usually denoted as \mathcal{S}(\mathbb{R}). A typical example is f(x) = e^{-x^{2}}, or any smooth compactly supported function, i.e., f \in C_{0}^{\infty}(\mathbb{R}).

Pick an arbitrary \varepsilon>0 periodize f as follows

\begin{aligned} f_{\varepsilon}(x) = \sum_{k \in \mathbb{Z}} f(x+\frac{k}{\varepsilon}) = \ldots+f(x-\frac{1}{\varepsilon})+f(x) + f(x+\frac{1}{\varepsilon})+\ldots\end{aligned}

The sum converges in any possible way on compact sets, so there are no issues with differentiating inside the sum and integrating over compact sets.

It is clear that for each fixed x \in \mathbb{R}, f_{\varepsilon}(x) \to f(x) as \varepsilon \to 0 (in fact for any compact set K \subset \mathbb{R} we have \sup_{x \in K} | f_{\varepsilon}(x)-f(x)| \to 0 as \varepsilon \to 0).

Also f_{\varepsilon}(x) = f_{\varepsilon}(x+\frac{1}{\varepsilon}), i.e., f is \frac{1}{\varepsilon} periodic! So we can expand into Fourier series

\begin{aligned} f_{\varepsilon}(x)  = \sum_{k \in \mathbb{Z}} \hat{f_{\varepsilon}}(k) e^{2\pi i k x \varepsilon} \quad (9) \end{aligned}

We have

\begin{aligned} \hat{f_{\varepsilon}}(k) = \varepsilon\int_{0}^{1/\varepsilon} \sum_{\ell  \in \mathbb{Z}} f(y+\frac{\ell }{\varepsilon}) e^{-2\pi i k y \varepsilon} dy  = \int_{\mathbb{R}}f (\frac{t}{\varepsilon}) e^{2 \pi i k t}dt = \varepsilon \hat{f}(\varepsilon k),\end{aligned} where \hat{f}(x) := \int_{\mathbb{R}} f(y) e^{-2\pi i x y}dy for non-periodic functions! Thus our equation (9) takes the form

\begin{aligned}f_{\varepsilon}(x) = \sum_{k \in \mathbb{Z}} \varepsilon \hat{f}(\varepsilon k ) e^{2\pi i k\varepsilon  x } \quad (10)\end{aligned}

Notice that for each fixed x as \varepsilon \to 0 the left hand side of (10) tends to f(x). For the right hand side we should recognize the Riemann sum for \int_{\mathbb{R}} \hat{f}(y) e^{2\pi i x y}dy. Indeed, notice that \hat{f} \in \mathcal{S}(\mathbb{R}), in particular |\hat{f}(y)| < \frac{C}{1+|y|^{2}} which implies that taking N large enough

\begin{aligned} \left| \sum_{|k \varepsilon |> N} \varepsilon \hat{f}(\varepsilon k) e^{2 \pi i x k \varepsilon} \right| \leq \sum_{|k| \geq N/\varepsilon} \frac{C}{|k|^{2}\varepsilon} < \frac{C}{N}\end{aligned}

In particular on a compact set [-N, N] we have

\begin{aligned} \lim_{\varepsilon \to 0} \sum_{|k \varepsilon |\leq  N} \varepsilon \hat{f}(\varepsilon k) e^{2 \pi i x k \varepsilon} = \int_{-N}^{N} \hat{f}(y) e^{2\pi i xy}dy \end{aligned}

Taking into account that \left| \int_{|y|>N} \hat{f}(y) e^{2\pi i x y} dy\right| < \frac{C}{N} \to 0 as N \to \infty we obtain

\begin{aligned}f(x) = \lim_{\varepsilon \to 0} f_{\varepsilon}(x) = \lim_{\varepsilon \to 0} \sum_{k \in \mathbb{Z}} \varepsilon \hat{f}(\varepsilon k ) e^{2\pi i k\varepsilon x } = \int_{\mathbb{R}} \hat{f}(y) e^{2\pi i x y}dy \end{aligned}

Let us summarize (even for \mathbb{R}^{d}) what we have discovered.

Definition. For any f \in L^{1}(\mathbb{R}^{d}) its Fourier transform is defined as

\begin{aligned} \hat{f}(x) = \int_{\mathbb{R}^{d}} f(y) e^{-2\pi i x \cdot y}dy\end{aligned}

where x\cdot y = x_{1}y_{1}+\ldots+x_{d}y_{d} for x = (x_{1}, \ldots, x_{d}), y=(y_{1}, \ldots, y_{d}).

Definition. Let \mathcal{S}(\mathbb{R}^{d}) denote smooth functions f \in C^{\infty}(\mathbb{R}^{d}) such that for any multi index \alpha=(\alpha_{1}, \ldots, \alpha_{d}) and \beta = (\beta_{1}, \ldots, \beta_{d}) where \alpha_{i}, \beta_{j} \geq 0 are integers for all i,j=1,..., d we have

\begin{aligned} \sup_{x \in \mathbb{R}^{d}} |x^{\alpha} D^{\beta}f(x)|<C_{\alpha, \beta} <\infty\end{aligned}

where x=(x_{1}, \ldots, x_{d}), \, x^{\alpha} = x_{1}^{\alpha_{1}}\cdots x_{d}^{\alpha_{d}}, and

\begin{aligned} D^{\beta}f(x) = \frac{\partial^{|\beta|}}{\partial^{\beta_{1}} x_{1}\ldots \partial^{\beta_{d}} x_{d}}f(x) \end{aligned},

where |\beta| = \beta_{1}+...+\beta_{d}.

The identity (10) that we got is called Poisson summation formula.

Proposition 1(Poisson summation formula) For any f \in S(\mathbb{R}^{d}) we have

\begin{aligned} \sum_{k \in \mathbb{Z}^{d}} f\left( x+\frac{k}{\varepsilon}\right) = \varepsilon^{d} \sum_{k \in \mathbb{Z}^{d}} \hat{f}(k \varepsilon) e^{2 \pi i k \cdot x \varepsilon} \quad (11)\end{aligned}

holds true for all x \in \mathbb{R}^{d}, and all \varepsilon >0.

Remark: One can relax the assumption f  \in S(\mathbb{R}). For example, by a little bit of work one can show that |f|+|\hat{f}| <C (1+|x|)^{-d-\delta} for some \delta>0 is enough.

Remark.(Packing, sphere packing). Consider a parallelepiped in \mathbb{R}^{d}, say [0,1]^{d}, or even let us consider a more general parallelepiped, i.e., the image of [0,1]^{d} under an invertible linear map A :\mathbb{R}^{d} \to \mathbb{R}^{d}, namely, Q = A [0,1]^{d}. Denote its volume by |Q|. How many points can we find in Q so that the balls of radius 1/2 centered at these points completely lie inside Q and do not intersect each other? How this number depends on the volume |Q|? The goal is to maximize this number (in other words we want to pack the parallelepiped Q by balls of radius 1/2).

Here is a trick using Poisson summation formula to bound this number in terms of |Q|. Apply Poisson summation formula to f(x) = g(Ax) with \varepsilon =1 and use the fact that \widehat{g(A\cdot )}(x) = \frac{1}{|\mathrm{det} A|}\hat{g}((A^{T})^{-1}x). Then the formula becomes
\sum_{k \in \mathbb{Z}^{d}} g(Ak+x) = \frac{1}{|Q|}\sum_{k \in \mathbb{Z}^{d}}\hat{g}((A^{T})^{-1}k)e^{2 \pi i x \cdot (A^{T})^{-1}k}

Or \begin{aligned} \sum_{u \in A\mathbb{Z}^{d}} g(u+x) = \frac{1}{|Q|}\sum_{v \in (A^{T})^{-1} \mathbb{Z}^{d}}\hat{g}(v)e^{2 \pi i x \cdot v} \end{aligned}
holds for all \forall x \in \mathbb{R}^{d} (usually A\mathbb{Z}^{d} is called a lattice, and (A^{T})^{-1} \mathbb{Z}^{d} is called dual lattice). Next, let v_{1}, \ldots, v_{N} be the points chose in Q so that the balls of radius 1/2 centered at these points lie inside Q and do not intersect. Substituting x=v_{j}-v_{\ell} in the last formula and summing up over all 1\leq k, \ell \leq N we obtain

\begin{aligned} \sum_{1\leq j,\ell\leq N}\sum_{u \in A\mathbb{Z}^{d}} g(u+v_{j}-v_{\ell}) = \frac{1}{|Q|}\sum_{v \in (A^{T})^{-1} \mathbb{Z}^{d}}\hat{g}(v)\left| \sum_{\ell=1}^{N}e^{2 \pi i x \cdot v_{\ell}}  \right|^{2}\end{aligned}

Perfect! Now if g is chosen so that \hat{g}\geq 0, and g(x) \leq 0 for |x|\geq 1 then the right hand side we estimate from below by \frac{N^{2}}{|Q|} \hat{g}(0) (leave the point v=0 and put the rest to be zero). For the left hand side we notice that |x+v_{j}-v_{\ell}|<1 if and only if x=0, j=\ell, therefore we can estimate it from above as Ng(0). Thus we obtain \frac{N}{|Q|}\leq \frac{g(0)}{\hat{g}(0)} (well assuming that \hat{g}(0)\neq 0). This turns out to be a pretty good bound!

If one tries to minimize the quantity \frac{g(0)}{\hat{g}(0)} from symmetry considerations it suffices to consider only radial g, i.e., g(x)=g(|x|). So the problem becomes “1-dimensional”, however it seems that the “optimizers” are not so easy to find, only recently Viazovska found the optimizer in \mathbb{R}^{8} which gives \frac{g(0)}{\hat{g}(0)} = 2^{4}, and in particular the solution to the best sphere packing problem in dimension 8 (and later in dimension d=24 the problem was solved where the optimizer satisfies \frac{g(0)}{\hat{g}(0)} = 2^{24}).

Remark: another question of a similar flavor is the following “uncertainty principle”: let f : \mathbb{R}^{12} \to \mathbb{R} be such that both f and \mathcal{F}(f) are integrable and real valued. Assume further that f(0), \mathcal{F}f(0)\leq 0, and f(x) 1_{|x|\geq r_{1}}, \mathcal{F}f(x) 1_{|x|\geq r_{2}}\geq 0, then necessarily r_{1} r_{2} \geq 0 and this bound is sharp.

The limit \varepsilon \to 0 in (11) gives

Proposition 2. (Fourier inversion formula). For any f \in S(\mathbb{R}^{d}) we have

\begin{aligned} f(x)  = \int_{\mathbb{R}^{d}} \hat{f}(y) e^{2 \pi i  y \cdot x}dy \end{aligned}

It will be convenient to introduce

Definition. (inverse Fourier transform). For any f \in \mathcal{S}(\mathbb{R}^{d}) we define

\begin{aligned} \check{f}(x)  = \int_{\mathbb{R}^{d}} f(y) e^{2\pi i y \cdot x}dy \end{aligned}

Thus the Fourier inversion formula takes a compact form f(x) = \hat{f}\, \check{}(x). However, because of issues with typing the symbols in this blog we will use \mathcal{F}f(x) instead of \hat{f}(x), and \mathcal{F}^{-1}f(x) instead of \check{f}(x).

The space \mathbb{R}^{d} is rich, namely, it remains invariant under translations and invertible linear transformations, plus it is easy to analyze how the Lebesgue measure changes under such operations. In general we almost never have such a luxury, therefore, we will try to squeeze from these group actions on \mathbb{R}^{d} as much good consequences as we can.

The following properties are elementary and are left as an exercise to the reader. In what follows f, g : \mathbb{R}^{d} \to \mathbb{C}, f,g \in S(\mathbb{R}^{d}) unless stated otherwise.

  1. \begin{aligned} \mathcal{F}\left[ \frac{1}{\varepsilon^{d}}g\left(\frac{\cdot -\ell}{\varepsilon}\right)\right](\xi) =e^{-2\pi i \xi \cdot \ell} \mathcal{F}(g)(\varepsilon \xi) \end{aligned} where g \in L^{1}

    The way one should think about this identity is that if we let g_{\varepsilon}(x) = \frac{1}{\varepsilon^{d}}g(\frac{x -\ell}{\varepsilon}), then we know that g_{\varepsilon} approximates delta measure \delta_{\ell}(x) \int g as \varepsilon \to 0, i.e, for any f \in S(\mathbb{R}^{d}) we have \lim_{\varepsilon \to 0} \int_{\mathbb{R}^{d}} g_{\varepsilon}(x)f(x)dx = f(\ell) \int_{\mathbb{R}^{d}} g dx = f(\ell) \mathcal{F}(g)(0). Therefore \mathcal{F}(g_{\varepsilon})(x) should be pretty close to \mathcal{F}( \delta_{\ell}(\cdot ) \int g)(x) =e^{-2\pi i \ell x} \mathcal{F}(g)(0) which is exactly what we wrote in 1.
  2. \mathcal{F}(f(A\cdot ))(x) = \frac{1}{|\mathrm{det} A|}\mathcal{F}(f)((A^{T})^{-1}x) for any invertible linear map A : \mathbb{R}^{d} \to \mathbb{R}^{d}. In particular, \mathcal{F} is invariant with orthogonal transformations (A^{T})^{-1}=A (and hence |\mathrm{det} A|=1). (here f \in L^{1})
  3. \mathcal{F}^{-1}(g)(\xi) = \mathcal{F}(g)(-\xi) (here g \in L^{1})
  4. \mathcal{F}^{-1}(\mathcal{F}(g))(\xi) = \mathcal{F}(\mathcal{F}^{-1}(g))(\xi)=g(\xi);
  5. \mathcal{F}(\mathcal{F}(g))(\xi) = g(-\xi);
  6. \int \mathcal{F}(f) g = \int f \mathcal{F}(g) here f,g \in L^{1};
  7. \overline{\mathcal{F}(g)(x)} = \mathcal{F}^{}(\overline{g})(-x) here g \in L^{1}
  8. \int |\mathcal{F}(f)|^{2} = \int \mathcal{F}(f) \overline{\mathcal{F}(f)(x)} = \int |f|^{2}; for all f \in S(\mathbb{R}^{d}).
  9. \mathcal{F}(D^{\alpha}g)(\xi) = (2\pi i \xi)^{\alpha} \mathcal{F}(g)(\xi) where C^{\alpha} = C^{|\alpha|} for a constant C. In general if p : \mathbb{R}^{n} \to \mathbb{C} is a polynomial then \mathcal{F}(p(D)g)(\xi) = p(2\pi i \xi) \mathcal{F}(g)(\xi). A typical example one needs to keep in mind is \mathcal{F}(\Delta g)(\xi) = (2 \pi i)^{2} (\xi_{1}^{2}+\ldots+\xi_{d}^{2})\mathcal{F}(g)(\xi);
  10. D^{\beta}\mathcal{F}(f)(\xi) = \mathcal{F}((-2\pi i \cdot)^{\beta}f(\cdot))(\xi);
  11. \mathcal{F}(f*g)(\xi)=\mathcal{F}(f)(\xi) \mathcal{F}(g)(\xi) where f*g(x) = \int f(x-y)g(y)dy, \quad f,g \in L^{1}. In particular, if we let \tilde{f}(x) = \overline{f}(-x) then \mathcal{F}(f*\tilde{f}) = |\mathcal{F}(f)|^{2}\geq 0, i.e., that is how you generate functions h = f*\tilde{f} with nonnegative Fourier transform!
  12. \sup_{\xi \in \mathbb{R}^{d}}|\mathcal{F}(f)(\xi)|\leq \|f\|_{1}, here f \in L^{1}.
  13. \mathcal{F}(e^{2 \pi i \ell \cdot y} f(y))(\xi)=\mathcal{F}(f)(\xi-\ell), here f \in L^{1}.
  14. If f (x) = e^{-\pi |x|^{2}} then \mathcal{F}(f)=f.

Exercise: Recall that \mathcal{F}(f)(\xi) = \int_{\mathbb{R}^{d}}f(x)e^{-2\pi i x \cdot \xi}dx is defined for f \in L^{1}(\mathbb{R}^{d}). The property 8. says

\begin{aligned}\int_{\mathbb{R}^{d}} |\mathcal{F}f|^{2}dx =\int_{\mathbb{R}^{d}} |\mathcal{F}^{-1}f|^{2}dx= \int_{\mathbb{R}^{d}}|f|^{2} dx  \quad (12)\end{aligned}

for all f \in S(\mathbb{R}^{d}). Show that (12) holds true also for all f \in L^{1}\cap L^{2} (this is called Plancherel theorem). In particular \mathcal{F} can be extended in a unique way to a continuous linear operator (isometry) from L^{2} \to L^{2}. this extensions usually is called “Fourier–Plancherel transform”.

Hint: First prove that if g is continuous, bounded, and in L^{1} with \mathcal{F}(g)\geq 0 then g(0)=\int_{\mathbb{R}} \mathcal{F}(g). Apply this with g=f*\tilde{f} where \tilde{f}(x)=\overline{f}(-x).

The analog of the operator T_{m} g(\xi) = \sum_{k \in \mathbb{Z}^{d}} m(k)c_{k}\xi^{k} on Euclidean space \mathbb{R}^{d} is T_{m}f =  \mathcal{F}^{-1}(m(x) \mathcal{F}(f))(x) say for all f \in S(\mathbb{R}^{d}) and m \in L^{\infty}(\mathbb{R}^{d}). Similarly as in the case of the unit circle we are interested with the inequality

\begin{aligned} \| \mathcal{F}^{-1}(m(x) \mathcal{F}(f))\|_{L^{p}(\mathbb{R}^{d})} \leq C_{p} \|f\|_{L^{p}(\mathbb{R}^{d})} \quad (13)\end{aligned}

for all test functions f (say from some nice dense class of functions, for example S(\mathbb{R}^{d})). Here is the first transference theorem from \mathbb{R}^{d} to \mathbb{T}^{d}.

Proposition 1.(\mathbb{R}^{d} to \mathbb{T}^{d}) Let 1\leq p <\infty and m \in L^{\infty}(\mathbb{R}^{d}) be such that the points of \mathbb{Z}^{d} are Lebesgue points of m and (13) holds true for all f \in S(\mathbb{R}^{d}). Then

\begin{aligned} \|\sum_{\mathbb{Z}^{d}}m(k)c_{k}e^{2\pi i k\cdot x}\|_{p}\leq C_{p} \|\sum_{\mathbb{Z}^{d}}c_{k}e^{2\pi i k\cdot x}\|_{p} \end{aligned}

holds true for all finite sums \sum_{\mathbb{Z}^{d}}c_{k}e^{2 \pi i k \cdot x}.

Proof. How to “transfer” \mathbb{T}^{d} to \mathbb{R}^{d}? Let f(x) = \sum_{\mathbb{Z}^{d}} c_{k} e^{2\pi i k \cdot x} be our trigonometric polynomial. Pick any Schwartz function \varphi \in \mathcal{S}(\mathbb{R}^{d}), then f_{\varepsilon}(x) = \varphi(x \varepsilon) f(x) is a pretty nice Schwartz function on \mathbb{R}^{d}, moreover, f_{\varepsilon}(x) \to \varphi(0) f(x) as \varepsilon \to 0, and the limit is essentially our original f up to a scalar factor \varphi(0), i.e., f_{\varepsilon} approximates our periodic function f. Now what happens with Fourier transform of f_{\varepsilon}? We have

\begin{aligned}\mathcal{F} f_{\varepsilon} = \sum_{\mathbb{Z}^{d}} c_{k} \mathcal{F} (\varphi(\varepsilon x) e^{2 \pi i k \cdot x}) = \sum_{\mathbb{Z}^{d}} c_{k} \frac{1}{\varepsilon^{d}} (\mathcal{F}^{-1}\varphi)\left(\frac{-x+k}{\varepsilon}\right) \end{aligned}

by property 1 of Fourier transform. Since \frac{1}{\varepsilon^{d}} (\mathcal{F}^{-1}\varphi)\left(\frac{-x+k}{\varepsilon}\right) is pretty close to a delta function \delta_{k}(x) up to a constant factor \int (\mathcal{F}^{-1}\varphi)=\varphi(0) the whole sum is about \varphi(0) \sum c_{k} \delta_{k}(x) as \varepsilon \to 0.

On the other hand if \Phi is any continuous function, then

\begin{aligned}  \varepsilon^{d} \int_{\mathbb{R}^{d}} \Phi(f_{\varepsilon}) = \int_{0}^{1} \varepsilon^{d}\sum_{\mathbb{Z}^{d}} \Phi(\varphi(\varepsilon k + \varepsilon x) f(x))  dx  \to \int_{0}^{1} \int_{\mathbb{R}^{d}} \Phi(\varphi(y) f(x)) dy dx \end{aligned}

as \varepsilon \to 0. In particular choosing \Phi(x) = |x|^{p} we see

\begin{aligned} \varepsilon^{p/d}\| f_{\varepsilon} \|_{p} \to   \| \varphi\|_{L^{p}(\mathbb{R}^{d})}\|f\|_{L^{p}(\mathbb{T}^{d})} \quad \text{as} \quad \varepsilon \to 0. \end{aligned}

Also notice that if \varphi, \psi are Schwartz functions then

\begin{aligned} \lim_{\varepsilon \to 0} \frac{1}{\varepsilon^{d}}\int_{\mathbb{R}^{d}} m(x) \varphi(\frac{x-\ell}{\varepsilon})\psi(\frac{x-k}{\varepsilon})dx = \delta_{k \ell}m(k) \int_{\mathbb{R}^{d}} \varphi \psi \end{aligned}

where \delta_{k \ell}=0 if k \neq \ell and \delta_{k \ell}=1 if k =\ell. Perfect! Now the inequality \| \mathcal{F}^{-1}(m \mathcal{F} h)\|_{L^{p}(\mathbb{R}^{d})}\leq C_{p} \|h\|_{L^{p}(\mathbb{R}^{d})} by Holder implies

\begin{aligned}\left|\int_{\mathbb{R}^{d}} m(x) (\mathcal{F}h) (\mathcal{F}^{-1}\overline{b}) \right| \leq C_{p}\|h\|_{L^{p}(\mathbb{R}^{d})}\|b\|_{L^{q}(\mathbb{R}^{d})} \quad (14) \end{aligned}

where \frac{1}{p}+\frac{1}{q}=1. Pick any two arbitrary (real valued) Schwartz functions \varphi and \psi, and instead of h plug f_{\varepsilon} = \varphi(x \varepsilon) \sum c_{k} e^{2\pi i k \cdot x}, and instead of b plug g_{\varepsilon} = \psi(\varepsilon x) \sum a_{k} e^{2 \pi i k \cdot x}. After multiplying both sides of (14) by \varepsilon^{d} and taking the limit \varepsilon \to 0 we obtain

\begin{aligned} \left| \sum m(k)c_{k} \overline{a_{k}}\right| \left|\int \varphi \psi \right| \leq \| \varphi\|_{p}  \|f\|_{L^{p}(\mathbb{R}^{d})} \| \psi\|_{q}\| g\|_{L^{q}(\mathbb{T}^{d})} \end{aligned}

I am sure there are many real valued Schwartz functions which give equality in Holder’s inequality \left|\int \varphi \psi \right|\leq \| \varphi\|_{p} \| \psi\|_{q}. Finally, taking supremum over all trigonometric polynomials with \| g\|_{L^{q}(\mathbb{T}^{d})}=1 we obtain \|\sum m(k) c_{k} \xi^{k} \|_{p}\leq C_{p} \|\sum c_{k} \xi^{k}\|_{p}.


After having such a result one may think that probably we can forget about \mathbb{T}^{d} and work only with \mathbb{R}^{d}. Well, not always. We do also have a converse statement to Proposition 1, namely

Proposition 2.(\mathbb{T}^{d} to \mathbb{R}^{d}). Let 1\leq p <\infty, and let m \in C(\mathbb{R}^{d}\setminus\{0\}) be such that

\begin{aligned} \left\| \sum_{\mathbb{Z}^{d}\setminus\{0\}} m(k \varepsilon) c_{k} e^{2\pi i k \cdot x} \right\|_{p} \leq C_{p} \left\| \sum_{\mathbb{Z}^{d}\setminus\{0\}}  c_{k} e^{2\pi i k \cdot x} \right\|_{p}  \end{aligned}

holds true for all finite sums \sum_{\mathbb{Z}^{d}\setminus\{0\}} c_{k} e^{2\pi i k \cdot x} and some subsequence \varepsilon \to 0 (with C_{p} independent of \varepsilon). Then

\begin{aligned} \| \mathcal{F}^{-1} (m \mathcal{F}f)\|_{L^{p}(\mathbb{R}^{d})} \leq C_{p} \| f\|_{L^{p}(\mathbb{R}^{d})} \end{aligned}

holds for all Schwartz functions f.

Proof. Hint: now go backwards (as we did in Poisson summation formula). Namely, pick a function f \in \mathcal{S}(\mathbb{R}^{d}) and periodize it

\begin{aligned} f_{\varepsilon}(x) :=\sum_{\mathbb{Z}^{d}} f(x+\frac{k}{\varepsilon})  = \varepsilon^{d} \sum \hat{f}(k \varepsilon) e^{2\pi i k \cdot x \varepsilon}\end{aligned}

and show the identity

\begin{aligned} \lim_{\varepsilon \to 0}\varepsilon^{d/q} \left\| \sum_{\mathbb{Z}^{d}} \mathcal{F}(f)(k \varepsilon) e^{2 \pi i k \cdot x}\right\|_{p} = \| f\|_{L^{p}(\mathbb{R}^{d})}\end{aligned}

holds true for all f \in S(\mathbb{R}^{d}) (here \frac{1}{p}+\frac{1}{q}=1), and then one needs to proceed in the same way as before invoking Riemann sums.

The condition m \in C(\mathbb{R}^{d} \setminus\{0\}) is not by accident. Usually in practice most of the multipliers m are quite irregular at x=0, for example m(x) = \mathrm{sign}(x) on the real line, therefore the class C(\mathbb{R}^{d}) does not contain interesting examples of multipliers.