Let be the unit circle. Trigonometric polynomials on the unit circle can be written as finite sums . More generally we can consider trigonometric polynomials on product of unit circles in which case

and only finitely many are nonzero. It is convenient to denote , , and think about as “independent, uniformly distributed random variables on ”. For a vector we can set so that the expression for our trigonometric polynomial takes a compact form

In our previous lecture notes we proved (for )

Theorem 1.The Riesz projection

and the Hilbert transform

both are bounded in for i.e., and . Moreover boundedness of (or ) implies the boundedness of (or ) due to the identity

where , here is the uniform probability measure on

### Bourgain-Burkholder

The Hilbert transform (and in general a “singular integral”) is closely related to another object in probability which is called “martingale transform”. Let us briefly mention their connection (for this we need to increase the dimension of considerably).

It will be convenient to borrow some notations from probability. For example, for a trigonometric polynomial we denote

i.e., the average over all variables . Sometimes we want to fix some variables, say and to take the average with respect to the rest of the variables

We will shortly show

Proposition 1.Boundedness of the Hilbert transform implies

for all , all choices of signs , and all trigonometric polynomials , …, such that

in other words “every time we add one more coordinate to the average with respect to that coordinate must be zero”.

Remark 1:The sequence where

is called martingale, and it has “martingale property” for all . The sequence where

is called “martingale transform” of . The inequality in the proposition is referred to as “Boundedness of the martingale transform” and usually is written as where is the martingale transform of .

*Proof of Proposition 1. *

Let be arbitrary nonzero integers (will be determined later) such that for all . If are nonzero integers then of course

Set , and let (think of it as a dummy variable). Then

Now, if is increasing sufficiently fast then we can replace by due to the fact that and the vectors are from a fixed bounded set in Thus we can replace the full sum by its Hilbert transform with respect to variable, i.e.,

Another observation is that an addition of “dummy” variable does not change norms (we can add dummy integral if necessary). We obtain

where . We can continue and play the same game again now replacing by with sufficiently fast increasing and positive(!) integers so that , and hence we can replace by using the identity . This leads to the final upper bound .

Remark 2:In fact the reverse implication also holds true (due to Burkholder) i.e, boundedness of martingale transforms implies boundedness of Hilbert transform (and actually boundedness ofalmostanySingular Integral Operator of Calderon–Zygmund type). This side is more delicate and uses the fact that Hilbert transform can be bounded in terms ofaverages of dilates and translatesof so calleddyadic shiftswhich eventually can beeasilybounded in using only martingale transforms and their boundedness (also applied twice!).

The equivalence of these two inequalities can be proved in an arbitrary Banach space , i.e., in this case one considers finite sums with (instead of ), and norms are calculated as , where is the norm in the Banach space. As soon as one of the estimates holds in the Banach space (for some ) it is calledUMD space(here UMD = Unconditional Martingale Difference). In other words harmonic analysis can be developed on UMD spaces with “almost the same results” as for .

### Calderon-Zygmund theory

Hilbert transform on the unit circle has Fourier multipliers . Due to transference theorems that we obtained in our previous lecture notes, we see that the boundedness of the Hilbert transform on the unit circle (implies) and follows from the boundedness of the operator

from where . One can show that for we have *Hint*: formally the right hand side is , so taking the Fourier transform, using the formal identity “”, and then taking of both sides one informally gets the identity. Now there are several issues, first of all is not in and was proved for . To fix the issue one modifies the kernel by replacing it with

, one considers

and after invoking dominated convergence theorems one passes to the limit , .

Exercise 1:For denote . Show that . In particular is not in , and is in .

Exercise 2:show that for all (here should be understood via Fourier–Plancherel transform sense because (it is not in ). In particular, conclude that for all . Thus extends to , i.e., to a continuous linear operator on . Conclude that for all i.e., is unitary operator on .

Next, let us ignore the constants , and let us recall that boundedness of the Hilbert transform on the unit circle amounts to show that

is bounded from to . On the real line the boundedness of the Hilbert transform amounts to show

is bounded from to . We have proved that both of them are bounded for .

We would be happy to have a *general theorem* which describes the kernels such that the operator is bounded from . There are several issues, in general is not defined for (recall the example ), also may not make sense (we had to use p.v. in case of the Hilbert transform). The general theorem that we prove is called Calderon–Zygmund theorem. We will see that the family of Kernel’s that it covers essentially look like .

Theorem 2.Let be a linear operator bounded from to , and such that for any compactly supported we have

for all , where is defined for and satisfies the following conditions

and the same condition (ii) holds for replaced by . Then for all the inequality holds on a dense class of functions in .

In particular the theorem says that extends to a bounded operator from to .

Before we start proving the theorem let us a little bit talk about the assumptions in the theorem which essentially *come from its proof*.

**1)** The validity of the integral formula only for is technical, and it is needed only to avoid the issues related to *principal value*. We say that for we have this *integral formula*, and let it be something for but we really don’t need it (we are actually putting *under a carpet* issues about principal value). It is a fact(!) that if there are two operators and with the same kernel defined as in the theorem then for some bounded , i.e., the kernels do not determine uniquely.

**2)** We assume(!) that is bounded from to , i.e., . This seems to a be a pretty strong assumption (we are assuming something that we really would like to prove). In practice to figure out when a given operator is bounded in usually is a simple task and almost follows from the Plancherel’s theorem (as we did with Hilbert transform, and similarly can be done for any *convolution* type operator as long as the Fourier transform of its kernel is in ). However, there are other operators when it is really a difficult task to prove boundedness even in (in this case there are other arguments: a) argument; b) Cotlar–Stein lemma, see for instance the proof of Calderón-Vallaincourt theorem, i.e., Theorem 4 in Tao’s notes).

**3)** The assumption (i) is only needed to see why the integral converges as soon as and is compactly supported. Well, simply because

Of course, one can impose much weaker assumptions than (i), say for example for any compact as long as . For now we prefer not to be too demanding.

**4) **Condition (ii) seems a little bit artificial, it is, in fact, the most important condition** ** which lies in the heart of the proof of ` `

*weak type (1,1) *estimate

a *substitute* of the desired which, unfortunately, may not hold in general. Let us remark that the constant in (ii) is independent of the choice of the cube . Here, means we enlarge the cube twice by keeping in the same center. For example if then ; and if then . The constant in the integral is not important. One can replace it by any number (but it should be one and the same(!) for all cubes ), one can also replace cubes by balls (or by some other family of sets which is obtained by translates, shifts, and dilates of a fixed *convex body*).

A *simpler* condition which implies (ii) is

Indeed, notice that whenever and for some universal constant (for example, works fine), where denotes the side length of the cube. Therefore

for some belonging to the segment . Thus we obtain

And the last integral we can estimate as

Another type of kernels which imply (ii) (with some fixed instead of ) but do not have a luxury to satisfy (ii)’ are the ones for which

for all cubes and some fixed . The proof is just repetition of what we did in the case (ii)’ and is left to the reader as an exercise. There seems to be an asymetry in the right hand side of (ii)” but notice that we can replace by nothing really changes except the constants. Another way of more compactly writing (ii)” is to use the balls instead of cubes, for example the condition

does imply (ii) with for some instead of which is still enough for the conclusion of the theorem

**5)** “*and the same condition (ii) holds for ** replaced by *” – let us explain the reason of such a requirement. Without this requirement we can only get the estimate

for . The standard way to explain what happens for is to invoke duality arguments. Let us present an argument which has an issue but at least clearly explains what is going on. To prove for by Holder’s inequality it suffices to show

where . Now taking supremum over all with it suffices to show that . But now the kernel is again of the type described in the theorem, and thereby the last inequality holds for . In particular, extends to a bounded operator for all .

Of course in this reasoning the issue is that (1) holds true only when and have disjoint compact supports. To fix the issue first one proves that the adjoint operator is bounded on and whenever . But now the kernel is again of the type described in the theorem which means that extends to a bounded operator on for . By an *abstract duality theorem* extends to a bounded operator if and only if extends to a bounded operator (here ). In particular, one obtains that extends to a bounded operator for all .

After discussing all aspects of the assumption in the theorem let us fix the notion of *Calderon–Zygmund Operator*.

Definition. The linear operators described in Theorem 2 are called Calderon–Zygmund Operators (CZO)

Let us make the last comment. Sometimes one studies *weighted estimates*, namely, one wants to understand under what conditions on a nonnegative function (later called weight) one has the inequality

for all , where is CZO. The best result in this direction is the inequality

where –*characteristic* of , namely, is defined asand the weights for which are called Muckenhoupt weights and this class is denoted as . Here supremum is taken over all cubes , and is almost any Calderon–Zygmund operator where additionally the kernel satisfies slightly stronger assumption, namely (ii)” instead of (ii). The estimate is sharp in the sense that one cannot replace the power by a smaller one otherwise for the Hilbert transform there exists a weight which would violate the inequality. The estimate (2) may seem a little bit involved, however, there are certain

*extrapolation arguments*which say that “instead of proving (2) for all and all it is enough to prove the estimate for all and , i.e., ”.

Another remark is that one can first try to prove such weighted inequality for the “martingale transforms” and then (due to equivalence of Bourgain–Burkholder) most likely the same sharp inequality should hold for CZO. Perhaps this was the starting point and “Bellman function technique” turned out to be successful, however, later the arguments were simplified considerably.

It is a good time to start proving Theorem 2.

*Proof of Theorem 2.*

There will be many constants appearing in the proof. We decided to denote all of them by . Important thing is that they will be independent of functions , cubes , and a certain parameter that will be constantly present in the proof. The functions will be always compactly supported bounded unless we say otherwise.

The proof consists of several lemmas.

Lemma 1.Let be CZO. Then the following weak type estimates hold true

Before we prove the lemma let us see why this implies the inequality for (and then for by duality as we discussed).

Indeed, we start from a useful identity which relates to the function , namely,

To prove the identity notice that

Next, instead of we will be writing shortly . Any we can split

By linearity , and hence .

Therefore

Thus

To estimate we use (4). Indeed, we get

To estimate we use (3).

Thus

Remark 3: The argument is calledMarcinkiewicz interpolation theorem. The estimate follows from (it is calledweak type (p,p)inequality). Indeed, it follows from a simple chain of inequalities

(this is also called Chebyshev’s inequality).

All it remains is to prove Lemma 1. Inequality (4) is trivial and it follows from (we just explained it in the remark above). It is the estimate (3), i.e., weak type inequality for CZO , namely,

that is nontrivial and requires the proof.

*Proof of Lemma 1. *

After learning about the crazy idea of Marcinkiewicz, i.e., for each to decompose , where and depend(!) on , and to use the trivial estimate

it is tempting to try the same approach. All we know about is

By property (a), we can estimate one of the terms in (5), say , as

So if we succeed to choose the decomposition so that

then this would give us the right estimate (it is not really difficult to find such a decomposition , we can just choose , then even pointwise ).

For the second term it seems that the best we can do is to use again Chebyshev inequality but now with norm, namely,

At this moment it is not clear at all how to use part (b). After looking at (b) again we can make a nice observation: if is supported in a cube with zero average then for outside of we have

and then immediately from (a) we obtain

Thus, if is supported in , , and then this *almost* gives the right bound for the right hand side in (6) except we still have to deal with the remaining part .

Here is another trick: if is small enough, this gives a hope that is also small enough, but the issue is can be arbitrarily large on , i.e., we do not have “any good control” on when . How to ignore ? A simple remedy is to adapt the original estimate of as

Perfect! So if then the proof is complete as long as we find the desired decomposition with , , , and .

It looks like too many requirements! The trivial Marcinkiewicz choice determines automatically, and it is this “bad” already failing any of its desired properties. It is getting hopeless… . If we could tweak a little bit to get more control on … but how?

“There is no failure except in no longer trying”.

Elbert Hubbard

Here comes another trick: if it happens that is supported on a disjoint collection of cubes (instead of only one cube), i.e., , with essentially the same properties

then by repeating the same steps as before, and using the identity

we obtain

Thus for all it remains to find a decomposition with , , and satisfies (A), (B), and (C). This is definitely a smart decomposition (called Calderon–Zygmund decomposition) comparing to trivial Marcinkiewicz decomposition)

Lemma 2(Calderon–Zygmund decomposition) For any and any there exists a decomposition for some disjoin collection of cubes such that each is supported in and

and most importantly .

To complete the proof of weak type (1,1) inequality for , the desired decomposition follows by taking , and . Clearly

The proof of Lemma 2 mimics *stopping time arguments* which is quite often used in probability. It is related to martingales and closely tied to what is called “dyadic calculus“.

*Proof of Lemma 2.*

Let us introduce a “dyadic grid”. Take , and divide it in two halves and , and continue this process. We will obtain what is called “dyadic subdivision” of the interval which consists of

Here , and means we shift the interval by , for example, . Clearly after such dyadic subdivision any interval from a given generation looks like for some . Let us denote the collection of such dyadic intervals (from all generation) by . The following property is crucial: for any two dyadic interval , , one and only one of the following holds

Similarly we want to construct the dyadic subdivision for . It will look like this

and after a little bit of thought we can see that this family also satisfies the property (7), let us denote it by .

Pick any . Let be the collection of maximal dyadic cubes in such that

Here maximal means that if satisfies (8) then there is no other dyadic cube (i.e., “parent”) containing and satisfying (8). Since it follows that as soon as the dyadic cubes are sufficiently large they fail to satisfy (8).

Clearly the collection is disjoin and

Define

Then of course .

What was the point of being **maximal**? We can bound for any . Indeed, since is maximal with property (8), therefore its “parent” (i.e., the smallest one in containing ) must fail (8). Since we obtain

Thus we obtain for any ; ; if , and otherwise. Indeed, if then for any dyadic cube the inequality (8) fails, we can take and use the Lebesgue differentiation theorem to conclude that for almost every .

.

## Corollaries

Hint: Boundedness in is trivial. Next, ideally we want to think that so that to use Calderon–Zygmund theorem for the kernel to verify (i) and (ii). The issue is that may not make any sense (think about ). Therefore, one invokes Littlewood–Paley decomposition: pick a smooth nonnegative such that for , and . Then where . Then where . If have are bounded and have disjoint compact supports then one can show that

Theorem 3(Mikhlin–Hormander multipliers). Let be a bounded function with satisfying for all with . Then the operator

defined for Schwartz functions extends to a bounded operator for all

Thus it suffices to obtain uniform bounds (uniform in ) where , and . The last inequality can be achieved by checking that satisfies assumptions in Calderon–Zygmund theorem (with constants uniform in ). It is easier to check (just doing very rough integration by parts) the Calderon–Zygmund assumptions under the extra condition that for all (not just ). The general case requires a little bit more careful estimates.

Hint: Notice that satisfies the assumptions in Mikhlin–Hormander theorem.

Theorem 4(Riesz Transform). The operator which has mutlipliers is called Riesz transform, and it is denoted by . Then

holds for all and all .

In particular

One can verify the identity . In particular, double application of the previous theorem gives

Theorem 5(second derivatives and the Laplacian). For any , and any we have

Hint: It follows from Proposition 1 that

Theorem 6(Littlewood–Paley: martingale version).

For any sequence of functions on with for all , we have

holds true for all where are some universal constants.

for all choices of signs . Now, take power of the inequality, average over all signs , and use Khinchin’s inequality

true for all complex numbers gives the desired result (the proof of Khinchin’s inequality is given below).

Let be the Littlewood–Paley decomposition as before i.e., , and for let be “projections”. Then

Hint: the operator has the symbol which satisfies assumptions in Hormander–Mikhlin’s theorem (with constants independent of and ). Thus

Theorem 7(Littlewood–Paley: smooth Euclidean version) For any we have

Now, take the power of both sides, average over all signs and use Khinchin’s inequality, and finally take the limit .

For the reverse inequality we proceed by duality. It follows from Holder that where and . Choose where is arbitrary collection of nice functions. Then by Cauchy we get . Optimizing over all we obtain

.

Now it seems like we just need to choose to get what we want but the problem is that . To fix the issue we first obtain (LP) with corresponding to multipliers which are equal to 1 on the support of , say . Then it is an easy exercise to show that which leads to the desired lower bound $latex

Let be independent identically distributed symmetric Bernoulli random variables taking values or . Let be the average taken with respect to these random variables, i.e., for any let

Theorem 8(Hypercontractivityon ). Let , and . For any complex numbers and any we haveif and only if . Here denotes the cardinality of the set .

Remark: we will see from the proof that inequality holds also when are from an arbitrary normed space instead of . In that case the absolute values should be replaced by

Before we prove the the theorem let us mention its corollary

Hint: apply hypercontractivity to those when provided that . Then we obtain (choose )

Corollary 9(Khinchin’s inequality). For any we have

where means that it is comparable up to some universal constants depending only on and independent of and .

hods true for all . Now play with Holder inequality to get Khinchin’s inequality for all powers. For example, the trick is that if you know the estimate . Then using Holder you can get . Indeed, by Holder we have , and hence , where solves , so . Show that similarly one can go to as well.

*The proof of hypercontractivity. *

Let us first show the case , i.e., let with (the proof proceeds absolutely verbatim if is replaced by an arbitrary normed space , then in the proof we just replace the absolute values by the norms , the only property we need is triangle inequality! ) then

Notice that where is independent copy of (i.e., it is another symmetric Bernoulli random variable), here takes average only with respect to . Therefore by the triangle inequality (and the fact that , )

If we let and then

and

Denoting , and , we see that to verify it suffices to show that

Perfect! Thus we reduced the question to real numbers! So how to prove the last inequality? There are several ways to proceed from here a) one uses Taylor’s formula for for ; b) the second one goes through what is called “log-Sobolev inequality”. We prefer the second one because it is a general approach and works for other similar questions.

Using homogeneity we can assume (otherwise we can divide both sides by ). The map is even and convex, thus increasing on , therefore it suffices to consider the “worst case scenario” when . Replacing by it suffices to show the inequality

Clearly this is the same as prove that the map

is decreasing for . Due to symmetry we can assume . Taking the logarithm and differentiating we obtain (let us denote )

Let . Then the last inequality can be simply rewritten as follows

Great! But still seems to be a nontrivial inequality (especially taking into account that we want to show it for all ). Maybe(!) it suffices only to show the inequality for ? For example, suppose we can prove the following inequality (replace by , by and ),

for all . Then it turns out that this simple inequality implies the previous inequality. Indeed, suppose we can prove the last inequality (), then let us see why does it imply the general inequality. Choose so that . Of course this system has a solution with . Then the last inequality implies

If we can show that then this will prove the general inequality. Since it follows that

So it suffices to prove only the inequality

Of course (replacing by ) the inequality reduces to the case , i.e.,

Which simplifies to

Now if we let

Then it follows from direct calculation that , and

on . Now this implies that is convex and hence increasing, this completes the proof of

.

To prove the general case let

and notice that is a map of the type where do not depend on . The goal is to show that . Indeed, we have

where in the last inequality we used Minkowski’s inequality Thus we got rid off one . Iterating this process we get rid off all ‘s and finally we end up with . This finishes the proof of hypercontractivity.

.

## Concluding remarks

- For certain class of martingales (also called “conditionally symmetric martingales”), say

where , and are arbitrary functions the “extended” Lettlewood–Paley inequality holds

for all , here norms are calculated with respect to the average over . - The Martingale Littlewood–Paley inequalities are closely related (in fact sometimes one implies the other or both imply each other) to square function estimates for the “stochastic processes”. Without giving definitions the estimate holds (the right hand side for all , and the left hand side for all provided that ), here is the standard brownian motion, and is any stopping time.
- There are several versions of Littlewood–Paley inequalities in Euclidean space. In the analogy to “Conditionally symmetric martingales” the following extension of Littlewood–Paley inequality holds true on the unit circle: let be the dyadic decomposition of , say , and , then for any finite sim we have

holds true where . Moreover, when the inequality holds for any collection of disjoint intervals , the case is due to Rubio de Francia, the case is due to Bourgain, and the case is due to Kislyakov–Parilov (their result is written in spaces but it is easy to get the it for the case of (not analytic) finite sums as well). - The hypercontractivity

was recently extended to complex ‘s, namely if we let to be a complex number then the inequality holds if and only if

Here it is important that are complex numbers (and not from an arbitrary Banach space).