Wiener’s lemma

When I was an undergraduate student back in 2006–2011, I once saw a trick about polynomials by user “fedja” on this math forum (at that time I did not know who “fedja” was, I learned only later that fedja was Fedor Nazarov). The trick was impressive, however, I did not understand how could one come up with such a nice idea and, therefore, I quickly forgot about the trick. The question was to show that
\begin{aligned} \sup_{z \in \mathbb{D}} \left| \sum_{k=0}^{n} a_{k} z^{k} \right|  \geq  \sum_{k=0}^{n} \frac{|a_{k}|}{3^{k}}  \quad (*)\end{aligned}
holds for all complex numbers a_{k} \in \mathbb{C} and all n \geq 1, where \mathbb{D} \subset  \mathbb{C} is the unit disk in the complex plane.

The proof starts with the following key lemma

Lemma 1: (Wiener’s lemma) for all polynomials p(z) = \sum_{k=0}^{n} a_{k} z^{k} with \sup_{z \in \mathbb{D}} |p(z)|\leq 1 we have |a_{\ell}| \leq 1-|a_{0}|^{2} for all 1 \leq \ell \leq n.

Let us show that the lemma immediately implies the inequality (*) asked in the beginning. Indeed, without loss of generality assume that \sup_{z \in \mathbb{D}} \left| \sum_{k=0}^{n} a_{k} z^{k} \right| \leq 1. Then \sum_{k=0}^{n} \frac{|a_{k}|}{3^{k}} \leq |a_{0}| + (1-|a_{0}|^{2})\sum_{k=1}^{n} (1/3)^{k} \leq |a_{0}| + \frac{1}{2}(1-|a_{0}|^{2}) \leq 1. The last inequality holds because |a_{0}| = |p(0)| \leq 1.

The proof of the lemma: let p(z) = \sum_{k=0}^{n} a_{k} z^{k} be an arbitrary polynomial such that |p(z)| \leq 1 for all |z| \leq 1. let w_{m} be the m’th root of unity. All we want from this number is that \sum_{k=1}^{m} \omega_{m}^{jk}=0 for all 1\leq j \leq m-1. Let us consider the average g(z):=\frac{1}{m} \sum_{j=1}^{m}p(\omega^{j}_{m}z) = a_{0}+ a_{m}z^{m} +O(z^{m+1}). We have |g(z)|\leq 1 on \mathbb{D}. Since
a_{0} \in \mathbb{D} we have \begin{aligned} z  \mapsto \frac{g(z) - a_{0}}{1-\bar{a_{0}} g(z)}= \frac{a_{m}}{1-|a_{0}|^{2}} z^{m}+ O(z^{m+1}) \end{aligned} in absolute value is also bounded by 1 on the unit disk. Then it follows from Cauchy’s formula that \frac{|a_{m}|}{1-|a_{0}|^{2}}  \leq 1. \square.

Remark 1: The number \frac{1}{3} is the best possible in (*), and it is called Bohr’s radius. Inequality (*) is called Bohr’s phenomena. Initially Herald Bohr proved (*) with 1/6 instead of 1/3. This was quickly improved to 1/3 independently by M. Riesz, I. Schur, and N. Wiener.

Remark 2: Inequality (*) is a byproduct of studies for Dirichlet series. The question became interesting itself and it found some other applications (Sidon constant, unconditional basis constant), and it can be asked for other orthogonal systems, (Boas–Khavinson) for example on \mathbb{D}^{k} (product of disks) for polynomials p(z) = \sum_{|\alpha|\leq n} a_{\alpha} z^{\alpha} where \alpha=(\alpha_{1}, \ldots, \alpha_{k}) is multi-index with nonnegative integers \alpha_{1}, \ldots, \alpha_{k}, z^{\alpha} = z_{1}^{\alpha_{1}}\cdots z_{k}^{\alpha_{k}}. For each k \geq 2 it is not known what is the best possible r=r(k) (the largest) such that \left| \sum_{|\alpha|\leq n} a_{\alpha} z^{\alpha} \right|\geq \sum_{|\alpha|\leq n}  r^{|\alpha|} |a_{\alpha}|, where |\alpha| = \alpha_{1}+\ldots+\alpha_{k}, holds for all polynomials on \mathbb{D}^{k}. It was only obtained recently (2014) by Bayart–Pellegrino–Seoane-Supelveda that \lim_{k \to \infty} \frac{r(k)}{\sqrt{\frac{\ln k}{k}}} =1. Using probabilistic techniques it is easy to obtain the upper bound \limsup_{k \to \infty} \frac{r(k)}{\sqrt{\frac{\ln k}{k}}} \leq 1 just by studying the homogeneous polynomial \sum_{|\alpha| = n}  \pm a_{\alpha} z^{\alpha} and choosing the signs \pm to minimize the L^\infty norm in the left hand side (L^\infty norm can be made no more than C \sqrt{k \ln n} \sqrt{\sum_{|\alpha|=n} |a_{\alpha}|^{2}} with some universal constant C).


A little bit of Fourier Analysis

Dean’s problem and the Sherringont–Kirkpatrick model

Let a_{ij} be real numbers where 1\leq i<j\leq n. And let s_{1}, \ldots, s_{n} be spins, i.e., these are real numbers which take only two values +1 or -1.

Question (the Dean’s problem): Given real numbers a_{ij} choose the signs s_j so that to maximize the sum \begin{aligned} \sum_{1\leq i <j\leq n} a_{ij} s_{i} s_{j} \end{aligned}

Notice that if a_{ij}=1 for all i and j then the maximum is attained if all spins s_{1}=s_{2}=...=s_{n}, i.e., they have the same sign. In Ising model this is called ferromagnetic case. Clearly in this case the double sum is of order \binom{n}{2}\asymp n^{2}. Here the symbol \asymp means that they are comparable up to some multiplicative constant as n \to \infty. We are going to use such a notation very often.

However, if a_{ij}=-1 then \sum_{1\leq i<j\leq n} s_{i} s_{j} = -\frac{1}{2}\left( \sum_{j=1}^{n} s_{j} \right)^{2} +\frac{n}{2}. Then the maximum is achieved if half of the spins take value 1 and the other half -1. This is called antiferromagnetic case, and clearly in this case the maximum is of order \asymp n.

The problem of maximizing the double sum is also known as Dean’s problem: imagine faculty of n people, and let a_{ij} be measuring interactions between i‘th and j‘th person. For example, if a_{ij}>0 it means that i‘th person likes j‘th person; if a_{ij}<0 then they dislike each other. By maximizing the sum \sum_{1\leq i <j \leq n} a_{ij}s_{i} s_{j} the Dean is splitting the faculty into two groups (those ones who got assigned s_{i}=1, and those ones who got assigned s_{i}=-1), so that to maximize the positive interaction in a department.

When a_{ij} are arbitrary real numbers the problem becomes nontrivial. Let us try to see what is the typical order of the maximum of the sum when a_{ij} are i.i.d. symmetric +1 or -1. So in mean field theory (as it also usually happens in mathematics) we look at the sum
\begin{aligned} \mathbb{E} \max_{s \in \{-1,1\}^{n}} \sum_{1\leq i<j\leq n} a_{ij} s_{i} s_{j} \end{aligned}

where s=(s_{1}, \ldots, s_{n}) \in \{-1,1\}^{n}, and the expectation is taken with respect to random variables a_{ij}. This randomized problem, also known as Sherrington–Kirkpatrick model in spin glasses, nowadays is well understood:
\begin{aligned} \lim_{n \to \infty} \frac{1}{n^{3/2}} \mathbb{E} \max_{s \in \{-1,1\}^{n}} \sum_{1\leq i<j\leq n} a_{ij} s_{i} s_{j}= 0.76... \end{aligned}

In fact a little bit more is known due to the generality of the proof of this fact. For S \subset \{1,\ldots, n\}, and x = (x_{1}, \ldots, x_{n})\in \{-1,1\}^{n}, let x^{S} = \prod_{j \in S} x_{j}, and let H_{n}(x) = \frac{1}{\sqrt{n}}\sum_{S\subset \{1,\ldots, n\}, \, |S|=2} a_{S} x^{S} (this object H_{n}(x)) in physics literature is called Hamiltonian, and it measures negative energy of a system). Next, take any \beta >0 (this is called inverse temperature \beta = \frac{1}{T}), and consider the sum
\begin{aligned} \frac{1}{\beta n} \mathbb{E}_{a} \ln \mathbb{E}_{x} \exp(\beta H_{n}(x)) \end{aligned}, here \mathbb{E}_{x} takes expectation with respect to uniform counting measure on \{-1,1\}^{n}, and \mathbb{E}_{a} takes expectation with respect to random variables a_{S}. Clearly
\begin{aligned} \lim_{\beta \to \infty}  \frac{1}{\beta n} \mathbb{E}_{a} \ln \mathbb{E}_{x} \exp(\beta H_{n}(x)) = \frac{1}{n^{3/2}}\mathbb{E}_{a} \max_{x} H_{n}(x) \end{aligned},

therefore it would be sufficient to study the object \beta \mapsto \frac{1}{\beta n} \mathbb{E}_{a} \ln \mathbb{E}_{x} \exp(\beta H_{n}(x)) (also called free energy. I should mention that \beta \mapsto 2^{n}\mathbb{E}_{x} \exp(\beta H_{n}(x)) is called partition function).

Theorem(Parisi formula, Talagrand 2006) For any \beta >0 we have

\begin{aligned}\lim_{n \to \infty}\frac{1}{\beta n} \mathbb{E}_{a} \ln \mathbb{E}_{x} \exp(\beta H_{n}(x)) =\inf_{\mu \in \mathcal{P}([0,1])} \left(\frac{g(0,0;\mu)}{\beta}-\frac{\beta}{2}\int_{0}^{1}\mu([0,r])rdr \right) \end{aligned}

where infimum is taken with respect to all probability measures \mu \in \mathcal{P}([0,1]), and g(t,y), t \in [0,1], y \in \mathbb{R} is the unique solution of the parabolic PDE
\begin{aligned}\begin{cases} 2g_{t} + g_{yy}+\mu([0,y]) (g_{y})^{2} =0,\\ g(1,y;x) = \ln \cosh(\beta y). \end{cases} \end{aligned}

Remark 1(Talagrand, Panchenko): We have the following extension of Sherrington–Kirkpatrick model to a mixed d-spin case.
\begin{aligned}\lim_{n \to \infty}\frac{1}{\beta n} \mathbb{E}_{a} \ln \mathbb{E}_{x} \exp\left( \beta \sum_{k=2}^{d}\frac{c_{k} \sqrt{k!}}{n^{\frac{k-1}{2}}}\sum_{|S|=k} a_{S} x^{S} \right)=\inf_{\mu \in \mathcal{P}([0,1])} \left(\frac{g(0,0;\mu)}{\beta}-\frac{\beta}{2}\int_{0}^{1}\mu([0,r])\xi''(r)rdr \right) \end{aligned},

where \xi(r) = \sum_{k=2}^{d}c_{k}^{2} r^{k}, and g(t,y), t \in [0,1], y \in \mathbb{R} is the unique solution of the parabolic PDE

\begin{aligned}\begin{cases} 2g_{t} + \xi_{yy}(g_{yy}+\mu([0,y]) (g_{y})^{2}) =0,\\ g(1,y;x) = \ln \cosh(\beta y). \end{cases} \end{aligned}

Remark 2(Auffinger–Chen): In the critical case \beta=+\infty one can simplify the answer
\begin{aligned}\lim_{n \to \infty} \mathbb{E}_{a}\max_{x \in \{-1,1\}^{n}} \sum_{k=2}^{d}\frac{c_{k} \sqrt{k!}}{n^{\frac{k+1}{2}}}\sum_{|S|=k} a_{S} x^{S} =\inf_{\nu \in M([0,\infty))} \left(g(0,0;\nu)-\frac{1}{2}\int_{0}^{1}\nu([0,r])\xi''(r)rdr \right) \end{aligned},

where \xi(r) = \sum_{k=2}^{d}c_{k}^{2} r^{k}, and infimum is taken with respect to all nonnegative finite measures \nu on [0,\infty), i.e,. \nu([0,\infty))<\infty, and g(t,y), t \in [0,1), y \in \mathbb{R} is the unique solution of the parabolic PDE

\begin{aligned}\begin{cases} 2g_{t} + \xi_{yy}(g_{yy}+\nu([0,y]) (g_{y})^{2}) =0,\\ g(1,y;\nu) = |y|. \end{cases} \end{aligned}

To study existence/uniqueness of the minimizer \nu it would be helpful if the expression we are taking infimum of is convex in \nu. For example, clearly strict convexity in \nu would imply uniqueness of the minimizer.

Remark 3: Let y \mapsto  \xi(y) be convex. Then the functional M([0,\infty))\ni \nu \mapsto g(0,0;\nu)-\frac{1}{2}\int_{0}^{1}\nu([0,r])\xi''(r)rdr is convex. Indeed, we can recognize Hamilton–Jacobi–Bellman–Isaac PDE in 2g_{t} + \xi_{yy}(g_{yy}+\nu([0,y]) (g_{y})^{2}) =0, namely we can “linearize” it as follows:
\begin{aligned} g_{t} + \sup_{q \in \mathbb{R}} \left\{ \frac{\xi_{yy}}{2}g_{yy} +\nu([0,y])\xi_{yy} (qg_{y} -\frac{q^{2}}{2}) \right\}=0\end{aligned} with boundary condition g(1,y;\nu)=|y|, then it follows that:
g(0,0;\nu) = \sup_{\alpha_{s}}\mathbb{E} \left\{ \left| \int_{0}^{1} \alpha_{s} \xi''(s) \nu([0,s])ds + \int_{0}^{1}\sqrt{\xi''(s)}dB_{s} \right|-\frac{1}{2}\int_{0}^{1} \xi''(s) \nu([0,s]) \alpha_{s}^{2}ds \right\}, where supremum is taken with respect to all progressively measurable (with respect to Brownian filtrations) bounded controls \alpha_{s}, and here B_{s} is the standard Brownian motion. Clearly the expression under the expectation is convex in \nu, therefore supremum of its averages is convex. So the desired functional is convex \square.

The problem about convexity of the functional in \mu was open for a quite a while (see for example Panchenko’s conjecture ). Even though the solution essentially fits in the last remark this should not be very surprising, because Feynman–Kac formula together with Hamilton–Jacobi–Bellman–Isaac PDEs are not well popularized in certain communities. Another such example that comes to my mind is this mathoverflow question. Can you prove it without using any probability?

The reader notices that I also include name Isaac in the above remarks. The reason is because it is not always possible to “linearize” the PDE (i.e., write it as supremum of linear objects), unless there is some convexity involved in g_{y} variable. Sometimes the points g_{y} can be saddle points, and one may need to invoke \sup_{\alpha} \inf_{\beta} representation. See, for example, the recent proof of Ehrhard inequality by Ramon van Handel where \sup_{\alpha} \inf_{\beta} representation was used for the quantity \Phi^{-1}(\int \Phi( f d\gamma)) where f is a test function d\gamma is the standard Gaussian measure, and \Phi(t)  = \gamma((-\infty,t]).

Remark 4: With a little bit of work one can prove strict convexity of the functional \mu \mapsto \frac{g(0,0;\mu)}{\beta}-\frac{\beta}{2}\int_{0}^{1}\mu([0,r])\xi''(r)rdr in the case \beta >0 (by appropriate concatenation of optimal controls). This gives existence of unique minimizer \mu.

My final remark is that if we take a_{S} to be independent random variables (not necessarily identically distributed) such that \mathbb{E}a_{S}=0, \mathbb{E} a_{S}^{2}=1, and \mathbb{E} |a_{S}|^{3} <C with some universal constant then due to theorem by Carmona–Hu all results (the limits) will be exactly the same.

Lower bound in the case of arbitrary real coefficients, and Bohnenblust–Hille inequality

Let us go back to the original question when a_{ij} are real numbers. Let
f(x) = \sum_{1 \leq i<j \leq n} a_{ij}x_{i}x_{j} where a_{ij} are real numbers, and x = (x_{1}, \ldots, x_{n})\in \{-1,1\}^{n}.

Question: How can we estimate \max_{x \in \{-1,1\}^{n}}  f(x) (from above or below) in terms of the real numbers a_{ij}?

By certain reasons I will prefer to work with \max_{x \in \{-1,1\}^{n}} | f(x)| instead of \max_{x \in \{-1,1\}^{n}} f(x), however, notice that in most of the cases these two quantities are comparable with each other.

By playing L^{\infty} vs L^{2} game, one trivial lower bound would be
\max_{x\in \{-1,1\}^{n}} |f(x)|^{2} \geq  \frac{1}{2^{n}}\int_{[-1,1]^{n}} \left|\sum_{1 \leq i<j \leq n} a_{ij}x_{i}x_{j}\right|^{2}dx = \frac{1}{3^{2}}\left( \sum_{1\leq i <j \leq n} a_{ij}^{2}\right). We can even get rid off the number \frac{1}{3^{2}} by replacing the measure dx with uniform counting measure on extreme points of the set [-1,1]^{n} where we integrate convex function, namely uniform counting measure on \{-1,1\}^{n} (as we always do in this blog). So in fact we have
\| f\|_{L^{\infty}(\{-1,1\}^{n})} \geq \| f\|_{L^{2}(\{-1,1\}^{n})} = \sqrt{\sum_{|S|=2} a_{S}^{2}} where in the right hand side the sum is over all subsets S \subset \{1,\ldots, n\} with cardinality |S|=2. In fact this seems to be a bad lower bound, for example if a_{S} = \pm 1 then in average the left hand side is of order n^{3/2} whereas the right hand side is of order \sqrt{\binom{n}{2}} \asymp n. The average case suggests that one may have the following inequality
\| f\|_{\infty} \geq C \left(\sum_{|S|=2} |a_{S}|^{4/3} \right)^{3/4} with some universal positive constant C>0. It turns out that this is true, and in fact a little bit more is true.

Theorem(Defant–Mastylo–Perez): For any f(x) = \sum_{|S|\leq d} a_{S} x^{S} on \{-1,1\}^{n} we have
\begin{aligned}\left( \sum_{|S|\leq d} |a_{S}|^{\frac{2d}{d+1}}\right)^{\frac{d+1}{2d}}  \leq C^{\sqrt{d \ln d}} \|f\|_{\infty}\end{aligned} with some universal constant C.

There is an interesting development of such type of inequalities, especially about improving the constant in the right hand side. In the beginning the constant was of the form d^{d}, then it was replaced by C^{d}, and finally due to recent breakthroughs by Bayart–Pellegrino–Seoane-Supelveda (see also Nunez-Alarcon–Pellegrino–Seoane-Supelveda–Serrano-Rodriguez) in pushing the constant to subexponential order C^{\sqrt{d\ln d}} for a very similar problem (on the torus \mathbb{T}^{n} instead of the Hamming cube \{-1,1\}^{n}), it became clear that one can get the same bound on the Hamming cube as well.

Remark 5: If we consider f(x) = \sum_{|S|=d} a_{S} x^{S} in the Sherrington–Kirkpatrick model then one can extract that \mathbb{E} \| f\|_{\infty} \asymp n^{\frac{d+1}{2}} e^{d/2}d^{-d/2}poly(d), here poly means something like of order d^{c} with some universal constant c>0. On the other hand, \left( \sum_{|S| = d} |a_{S}|^{\frac{2d}{d+1}}\right)^{\frac{d+1}{2d}} = \binom{n}{d}^{\frac{d+1}{2d}} \asymp e^{d/2} d^{-d/2} n^{\frac{d+1}{2}} poly(d). Therefore a natural question arises

Conjecture: Show that there exists a universal constant C>0 such that for any f(x) = \sum_{|S|\leq d} a_{S} x^{S} on \{-1,1\}^{n} with any d\geq 1 we have
\begin{aligned} \left( \sum_{|S|\leq d} |a_{S}|^{\frac{2d}{d+1}}\right)^{\frac{d+1}{2d}}  \leq d^{C} \|f\|_{\infty} \end{aligned}.

Exercise: use theorem of Defant–Mastylo–Perez to show that in Aaronson–Ambainis conjecture one can get a lower bound \max_{j} \mathbb{E} |D_{j} f|^{2} \geq \left( \frac{\mathrm{Var(f)}}{c^{\sqrt{d \ln d}}}\right)^{c'} for polynomials |f|\leq 1 whose coefficients a_{S} are constant in absolute value, say |a_{S}|=t for all S \subset \{1, \ldots, n\}.

The proof of Theorem by Defant–Mastylo–Perez estimate is quite nice. Arguments and main ideas to prove such estimate originated in series of works, where the earliest one goes to Littlewood’s 4/3 inequality.

Following Defant–Mastylo–Perez I will illustrate the proof in the case of d-homogeneous polynomials, namely for f(x) = \sum_{|S|=d}a_{S}x^{S}.

Proof of \left( \sum_{|S| =  d} |a_{S}|^{\frac{2d}{d+1}}\right)^{\frac{d+1}{2d}}  \leq C^{\sqrt{d \ln d}} \|\sum_{|S|=d} a_{S} x^{S}\|_{\infty}.

Let us sketch the main idea and then give the full proof of it. Let B(d) be the best possible constant in the inequality \left( \sum_{|S| =  d} |a_{S}|^{\frac{2d}{d+1}}\right)^{\frac{d+1}{2d}}  \leq B(d) \|\sum_{|S|=d} a_{S} x^{S}\|_{L^{\infty}(\{-1,1\}^{n})} independent of n (the reason that such constant exists follows from Theorem 34 due to Ron Blei). It is not hard to see that B(d) is nondecreasing. Indeed, if f is almost optimizer which achieves the constant B(d-1) up to some small additive error \varepsilon>0 on the Hamming cube \{-1,1\}^{n}, then we have
\begin{aligned} B(d-1) - \varepsilon < \frac{\left( \sum_{|S|=d-1} |a_{S}|^{\frac{2(d-1)}{(d-1)+1}} \right)^{\frac{(d-1)+1}{2(d-1)}}}{\|f\|_{L^{\infty}(\{-1,1\}^{n})}}  \leq  \frac{\left( \sum_{|S|=d-1} |a_{S}|^{\frac{2d}{d+1}} \right)^{\frac{d+1}{2d}}}{\|x_{n+1}f\|_{L^{\infty}(\{-1,1\}^{n+1})}} \leq B(d)  \end{aligned} where we have used that x_{n+1} f is d-homogeneous polynomial with the same L^{\infty} norm, and \| \{ a_{S}\} \|_{\ell^{p}} \leq \| \{ a_{S}\} \|_{\ell^{q}} whenever q\leq  p. Next instead of directly bounding the constant B(d) one can show

Lemma 1: For all 1\leq d \leq n we have
\begin{aligned} B(d) \leq B(\lceil\sqrt{d}\rceil) e^{\alpha \sqrt{d \ln d}} \end{aligned} with some universal constant \alpha >0.

Finally iterating the lemma, i.e.,
\begin{aligned} B(d) \leq B(\lceil\sqrt{d}\rceil) e^{\alpha \sqrt{d \ln d}}  \leq B(\lceil d^{1/4} \rceil)e^{\alpha \sqrt{d \ln d}}  e^{\alpha \sqrt{\sqrt{d} \ln \sqrt{d}}} \leq B(\lceil d^{1/4} \rceil) e^{\alpha \sqrt{d \ln d} (1+\frac{1}{\sqrt{2}})}\leq   B(\lceil d^{1/8} \rceil) e^{\alpha \sqrt{d \ln d} (1+\frac{1}{\sqrt{2}} + \left(\frac{1}{\sqrt{2}}\right)^{2})}\end{aligned} we obtain that B(d)\leq B(2) e^{C \sqrt{d \ln d}}, and the result follows due to the fact that B(2)<\infty.

The proof of Lemma 1 contains 3 ingredients.

Lemma 2 (Ivanisvili-Tkocz) For any f = \sum_{|S|=d} a_{S} x^{S} we have \| f\|_{2} \leq e^{d(\frac{1}{p}-\frac{1}{2})}\| f\|_{p} for all p \in [1,2].

With slightly worse constants the Lemma 2 can be obtained from real hypercontractivity. The bound e^{d(\frac{1}{2}-\frac{1}{p})} comes from complex hypercontractivity (and this is currently the best upper bound that I know).

Next, let J(A,n):= \{ j: A\subset\{1,\ldots, n\} \to \{1,\ldots, n\}\} such that j is strictly increasing. If A = \{1,\ldots, d\}, then we will just use the notation J(d,n). Instead of a_{S}, for some S \in [n] with |S|=d it will be convenient for us to use the notation a_{j} with j \in J(d,n). If i \in J(A,n) (where A and B are disjoint), and j \in J(B,n), we will define i\oplus j = j \oplus i to be a map A\cup B \mapsto  \{1,\ldots, n\} such that i\oplus j|_{A} = i and i\oplus j|_{B} = j. Notice that we can also write \sum_{|S|=d, S \in [n]} a_{S} x^{S} = \sum_{j \in J(d,n)} a_{j} x_{j} where x_{j}:= x_{j(1)}\cdots x_{j(d)}

Lemma 3 (Blei’s inequality) For all 0<k<d we have
\begin{aligned}\left( \sum_{j \in J(d,n)} |a_{j}|^{\frac{2d}{d+1}} \right)^{\frac{d+1}{2d}} \leq \left[ \prod_{S\in [n], |S|=d} \left( \sum_{i \in J(S,n)}  \left( \sum_{j \in J(\bar{S},n)} |a_{i \oplus j}|^{2} \right)^{\frac{1}{2} \frac{2k}{k+1}}\right)^{\frac{k+1}{2k}} \right]^{\frac{1}{\binom{d}{k}}}\end{aligned}, where a‘s are real numbers, and \bar{S} denotes complement of S in \{1, \ldots, d\}.

Notice that the right hand side contains terms a_{i \oplus j} and the map i \oplus j may not be strictly increasing, so we have many extra numbers on the right hand side even though they do not participate in the left hand side. In what follows if i \oplus j is not strictly increasing then we will just put a_{i \oplus j}=a_{[i \oplus j]}, where [i \oplus j] is increasing rearrangement (for example, a_{231} = a_{123}). In case i is only non-decreasing (but not strictly increasing) then we just put a_{i}=0. Finally, we need the last polarization inequality invented by Harris.

Lemma 4: For any S \subset \{1, \ldots, d\} with |S|=k we have
\begin{aligned} \sup_{x,y \in \{-1,1\}^{n}} \left| \sum_{i \in J(S,n)} \sum_{j \in J(\bar{S},d)} a_{i \oplus j} x_{i} y_{j} \right| \leq \frac{d^{d} M_{k,d}}{k^{k} (d-k)^{d-k}} \sup_{x \in \{-1,1\}^{n}} \left|\sum_{i \in J(d,n) } a_{i} x_{i}\right| \end{aligned}, where Markov numbers M_{k,d} :=2^{k-1} \frac{d'}{k!}\frac{\left(\frac{d'+k-2}{2}\right)!}{\left( \frac{d'-k}{2}\right)!} with d'=d if k=d \, \mathrm{mod}(2), and d'=d-1 if k=d-1 \, \mathrm{mod}(2).

The proof is quite standard in approximation theory. M_{k,d} are k‘th or k-1‘th coefficient of Chebyshev polynomials of degree d.

Remark 6: It is not hard to see that for d \geq 2 and k \approx \sqrt{d/ \ln d} we have M_{k,d} \leq  e^{\alpha \sqrt{d \ln d}} for some universal constant \alpha >0.

Now we combine the lemmas 2,3 and 4 in order to prove the lemma 1.

Take an arbitrary set S \subset \{1, \ldots, d\} with |S|=k. Then
\begin{aligned} \left( \sum_{i \in J(S,n)}  \left( \sum_{j \in J(\bar{S},n)} |a_{i \oplus j}|^{2} \right)^{\frac{1}{2} \frac{2k}{k+1}}\right)^{\frac{k+1}{2k}}   = \left( \sum_{i \in J(S,n)}  \left( \mathbb{E}_{x}  \left| \sum_{j \in J(\bar{S},n)} a_{i \oplus j} x_{j} \right|^{2}\right)^{\frac{1}{2} \frac{2k}{k+1}}\right)^{\frac{k+1}{2k}}\end{aligned}
\begin{aligned} \stackrel{\mathrm{Lemma\, 2}}{\leq}  e^{\frac{d-k}{2k}}\left( \sum_{i \in J(S,n)}  \mathbb{E}_{x}  \left| \sum_{j \in J(\bar{S},n)} a_{i \oplus j} x_{j} \right|^{\frac{2k}{k+1}}\right)^{\frac{k+1}{2k}}  \leq   e^{\frac{d-k}{2k}} \sup_{x \in \{-1,1\}^{n}}\left( \sum_{i \in J(S,n)} \left| \sum_{j \in J(\bar{S},n)} a_{i \oplus j} x_{j} \right|^{\frac{2k}{k+1}}\right)^{\frac{k+1}{2k}}\end{aligned}
\begin{aligned}\stackrel{\mathrm{Def.}\, B(k)}{\leq} e^{\frac{d-k}{2k}} B(k) \sup_{x,y \in \{-1,1\}^{n}} \left| \sum_{i \in J(S,n)}\sum_{j \in J(\bar{S},n)}a_{i \oplus j} x_{j} y_{i}\right| \end{aligned}
\begin{aligned} \stackrel{\mathrm{Lemma\, 4}}{\leq}e^{\frac{d-k}{2k}} B(k)  \frac{d^{d} M_{k,d}}{k^{k} (d-k)^{d-k}} \sup_{x \in \{-1,1\}^{n}} \left| \sum_{j \in J(d,n)} a_{j} x_{j}\right| \end{aligned}.

Therefore Lemma 3 implies
\begin{aligned}\left( \sum_{j \in J(d,n)} |a_{j}|^{\frac{2d}{d+1}} \right)^{\frac{d+1}{2d}} \leq e^{\frac{d-k}{2k}} B(k)  \frac{d^{d} M_{k,d}}{k^{k} (d-k)^{d-k}} \sup_{x \in \{-1,1\}^{n}} \left| \sum_{j \in J(d,n)} a_{j} x_{j}\right| \end{aligned}. Therefore (again by definiton of B(d)) we obtain
\begin{aligned} B(d) \leq e^{\frac{d-k}{2k}} B(k)  \frac{d^{d} M_{k,d}}{k^{k} (d-k)^{d-k}} \end{aligned}. Taking k \approx \sqrt{\frac{d}{\ln d}}, using Stirling’s formula, Remark 6, and monotonicity of B(d), we obtain B(d) \leq B(\lceil\sqrt{d}\rceil) e^{\alpha \sqrt{d \ln d}}. The lemma 1 (and hence the theorem) is completely proved. \square.


Aaronson-Ambainis Conjecture

Any function f:\{-1,1\}^{n} \to \mathbb{R} can be written as

\begin{aligned} f(x) = \sum_{S \subset \{1, \ldots, n\}} a_{S} \prod_{j \in S}x_{j}, \quad x=(x_{1}, \ldots, x_{n}) \in \{-1,1\}^{n} \end{aligned}, for some real numbers a_{S}.

We say that f has degree at most d if \begin{aligned} f(x) = \sum_{\substack{S \subset \{1, \ldots, n\}\\ |S|\leq d}} a_{S} \prod_{j \in S}x_{j} \end{aligned}.

Conjecture(Aaronson-Ambainis). There exist universal constants c_{1}, c_{2}>0 such that for any f :\{-1,1\}^{n} \to [-1,1] of degree at most d\geq 1, there exists j \in \{1, \ldots, n\} such that
\begin{aligned}\mathbb{E} |D_{j} f|^{2}\geq c_{1} \left(\frac{\mathbb{E} |f-\mathbb{E}f|^{2}}{d}\right)^{c_{2}} , \quad \text{where} \quad D_{j} f = \frac{f(x_{1}, \ldots, x_{j}, \ldots, x_{n})-f(x_{1}, \ldots, -x_{j}, \ldots, x_{n})}{2}\end{aligned}.

In other words if f is a polynomial of degree at most d with nonzero variance then f has influential variable. The application of the conjecture on a very informal level is that “quantum query algorithms will never offer exponential speedups over classical query algorithms (on a certain big class of problems) if we allow small errors“. Apparently the best reference to learn about origins of the problem is the paper of Aaronson and Ambainis itself.

The problem seems to be well popularized in certain communities, for example, Y. Filmus, H. Hatami, S. Heilman, E. Mossel, R. O’Donnell, S. Sachdeva, A. Wan, and K. Wimmer list the question in their collection of open problems “Real Analysis in Computer Science: A collection of Open Problems“, where it is mentioned that the conjecture is resolved for some special class of functions, namely, f:\{-1,1\}^{n}\to \{-1,1\}, i.e., when f takes only two values. In fact, the conjecture can be proved for symmetric polynomials f : \{-1,1\}^{n} \to [-1,1] as well.

Definition: f : \{-1,1\}^{n} \to \mathbb{R} is said to be symmetric if f(x_{1}, \ldots, x_{n}) is independent of permutation of variables x_{1}, \ldots, x_{n}.

The proof for symmetric case can be found in A. Bacrkur’s preprint. It seems to be a nice proof so I decided to present it on my blog in a slightly adapted way.

Theorem: There exists a universal constant c_{1}>0 such that for any symmetric f : \{-1,1\}^{n} \to [-1,1] of degree at most d one can find j \in \{1, \ldots, n\} such that \begin{aligned}\mathbb{E} |D_{j} f|^{2}\geq c_{1} \left(\frac{\mathbb{E} |f-\mathbb{E}f|^{2}}{d}\right)^{4} \end{aligned}.

Remark: before we write the proof let us mention that the class of symmetric functions on the Hamming cube is apparently one of the most important classes in the analysis of boolean functions (we will talk about them more on this blog). Thus, validity of the conjecture in this class suggests that most likely the conjecture holds true in general.

Remark: the reason the proof of the conjecture becomes feasible in symmetric case is because the expression \mathbb{E} |D_{j} f|^{2} becomes independent of j, i.e., one can replace it by \frac{1}{n} \mathbb{E} \sum_{j=1}^{n}|D_{j}f|^{2}, so the question in a sense becomes “a typical Fourier multiplier problem” whatever that may mean.

Proof of the theorem:

If f is symmetric then \mathbb{E} |D_{j} f|^{2} is independent of j. therefore the questions becomes to show that \frac{\mathbb{E} \sum_{j=1}^{n} |D_{j} f|^{2}}{n} \geq c_{1} \left(\frac{\mathbb{E} |f-\mathbb{E}f|^{2}}{d}\right)^{4}. We will need the following classical Poincare inequality

Lemma 1: Let f:\{-1,1\}^{n} \to \mathbb{R} be a polynomial of degree d, and let |\nabla f|^{2}(x) = \sum_{j=1}^{n} |D_{j} f|^{2}. Then \mathbb{E} |f-\mathbb{E} f|^{2} \leq \mathbb{E} |\nabla f|^{2} \leq d \mathbb{E} |f-\mathbb{E} f|^{2}

Since I am planning to talk about Poincare (and isoperimetric) inequalities on the Hamming cube in another post I will skip the proof of the lemma right now (in fact the proof is quite easy and the reader can verify it without too much effort).

Thus it suffices to show the inequality \frac{1}{n} \mathbb{E} |f-\mathbb{E} f|^{2} \geq c_{1} \left(\frac{\mathbb{E} |f-\mathbb{E}f|^{2}}{d}\right)^{4}, i.e., \left(\frac{d^{4}}{n}\right)^{1/3}  \frac{1}{c_{1}^{1/3}} \geq \mathbb{E} |f-\mathbb{E} f|^{2}.

Lemma 2(Bernstein-Markov inequality): There exists a universal constant C>0 such that for any symmetric polynomial f:\{-1,1\}^{n} \to \mathbb{R} of degree at most d we have \begin{aligned} \mathbb{E} |f-\mathbb{E} f|^{2} \leq C \left( \frac{d^{4}}{n} \right)^{1/3} \| f\|^{2}_{\infty}\end{aligned}.

Remark: Clearly, once Lemma 2 is verified the theorem follows immediately (take c_{1} =\frac{1}{C^{3}}). One should think about the lemma as a certain Markov–Bernstein type inequality: the left hand side up to a factor of d represents the L^{2} norm of the discrete gradient |\nabla f|, whereas the right hand side is just L^{\infty} norm of the polynomial. An important assumption is that f is symmetric otherwise the statement if false, take f(x_{1}, \ldots, x_{n}) = x_{1}. It should be a little bit surprising that we gain the factor n in the denominator of the right hand side at all.

Proof of Lemma 2: Without loss of generality assume that |f| \leq 1. If d=0 there is nothing to prove. Any symmetric polynomial of degree at most d, sayf(x_{1}, \ldots, x_{n}), can be written as p(x_{1}+...+x_{n}) where p(t) = \sum_{0\leq k \leq d} b_{k} t^{k}. Indeed, notice that f(x) = \sum_{\ell =0}^{d} a_{\ell} \sum_{S\subset \{1, \ldots, n\}, |S| = \ell} \prod_{j \in S}x_{j} because the Fourier-Walsh coefficients a_{S} = \mathbb{E} f(x) \prod_{j \in S}x_{j} only depend on the cardinality of S, i.e., on |S|. Next, by Newton’s identities, for any elementary symmetric function we have
\begin{aligned} \sum_{\substack{S\subset \{1, \ldots, n\}\\ |S| = \ell}} \prod_{j \in S}x_{j}  = Q(p_{1}, \ldots, p_{\ell})\end{aligned} where Q is a polynomial of degree at most \ell, and p_{k}(x_{1}, \ldots, x_{n}) = x_{1}^{k}+\ldots+x_{n}^{k}=x_{1}+\ldots x_{n} if k is odd, and p_{k} =n if k is even. So we can write \mathbb{E} |f- \mathbb{E} f|^{2}  = \frac{1}{2} \mathbb{E} |f(x)-f(y)|^{2} = \frac{1}{2} \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2}, where (y_{1}, \ldots, y_{n}) \in \{-1,1\}^{n} is independent copy of (x_{1}, \ldots, x_{n}).

Since |p(y_{1}+...+y_{n}|\leq 1 an obvious bound would be \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2} \leq 2^{2} which is not good enough (we want to gain the factor n in the denominator). Another trick would be to use Markov’s inequality Namely by Markov’s inequality for any polynomial on the real line of degree at most d we have \max_{[a,b]} |p'| \leq \frac{d^{2}}{(b-a)/2}\max_{[a,b]}|p|. Therefore we can write \begin{aligned} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2} = |p'(\xi) (x_{1}+...+x_{n}-(y_{1}+...+y_{n}))|^{2} \leq (\frac{d^{2}}{n}\max_{[-n,n]} |p| 2n)^{2} = (2d^{2})^{2}  \max_{[-n,n]} |p|^{2}\end{aligned} which is of order d^{4} even if we would somehow had an extra assumption that \max_{[-n,n]} |p| \leq 1 (notice that the latter inequality is true only at integer points separated by two, namely at points x_{1}+...+x_{n}, where x_{j}=\pm 1). The bound d^{4} is too bad comparing to the trivial bound \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2}\leq 2^{2}=4.

A simple remedy to fix the last argument would be to invoke measure concentration phenomena, namely to consider the average \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2} only over certain points |x_{1}+...+x_{n}|<\lambda and |y_{1}+...+y_{n}|<\lambda where \lambda = \lambda_{n} \sim \sqrt{n} is sufficiently smaller than n (so that |x_{1}+...+x_{n}-(y_{1}+...+y_{n}))|< 2\lambda<<n and then we could still proceed with Markov’s inequality. The remaining averages over the sets |x_{1}+...+x_{n}|>\lambda we can make very small because \mathbb{P}(|x_{1}+...+x_{n}|>\lambda) can be made small as soon as \lambda=\lambda_{n}>> \sqrt{n} by the central limit theorem.

Another observation is that instead of d^{4} we could get a bound d^{2} just by starting with a trivial estimate |p(x_{1}+...+x_{n}) - p(y_{1}+..+y_{n})|^{2} \leq 2 |p(x_{1}+...+x_{n}) - p(y_{1}+..+y_{n})|.

Indeed, let us see how it works. Pick a positive number \lambda=\lambda_{n}>0 which will be determined later. Then \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|^{2} \leq 2 \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|. Next let us split the average
\begin{aligned}\mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})| = \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})| \left( 1_{U}(x) 1_{V}(y)+ 1_{U^{c}}(x) 1_{V}(y)+1_{U}(x) 1_{V^{c}}(y)+1_{U^{c}}(x)1_{V^{c}}(y)\right) \end{aligned}
here U = \{x \in \{-1,1\}^{n}, \, :\, |x_{1}+...+x_{n}|\leq \lambda\}, V = \{y \in \{-1,1\}^{n}, \, :\, |y_{1}+...+y_{n}|\leq \lambda\}, and U^{c}, V^{c} denotes the complement of these sets. Since |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|\leq 2, the integral with factor 1_{U^{c}}(x) 1_{V}(y)+1_{U}(x) 1_{V^{c}}(y)+1_{U^{c}}(x)1_{V^{c}}(y) we can estimate as 6 \mathbb{P}(|x_{1}+...+x_{n}|\geq \lambda). Next, the main contributor \mathbb{E} |p(x_{1}+...+x_{n})-p(y_{1}+...+y_{n})|  1_{U}(x) 1_{V}(y) we can bound as \max_{[-n,n]} |p'| 2\lambda  \leq 2\lambda \frac{d^{2}}{n} \max_{[-n,n]}|p|. Thus we get the bound
\begin{aligned} \mathbb{E} |f-\mathbb{E}f|^{2} \leq 4\lambda \frac{d^{2}}{n} \max_{[-n,n]}|p| + 12 \mathbb{P}(|x_{1}+...+x_{n}|\geq \lambda) \end{aligned}.

Finally, if we choose \lambda such that
\begin{aligned} 12 \mathbb{P}(|x_{1}+...+x_{n}|\geq \lambda) \leq \frac{1}{2} \mathbb{E} |f-\mathbb{E}f|^{2} ,  (1) \end{aligned}
then we would obtain the estimate
\begin{aligned} \mathbb{E} |f-\mathbb{E}f|^{2} \leq 8\lambda \frac{d^{2}}{n} \max_{[-n,n]}|p| \end{aligned} which seems pretty good except we need to somehow 1) manage to estimate \max_{[-n,n]}|p|, and 2) figure out which smallest \lambda gives the bound (1). The second issue is easy to fix, for example by Chebyshev inequality we have
\begin{aligned} \mathbb{P}(|x_{1}+...+x_{n}|\geq \lambda) \leq \frac{\mathbb{E} |x_{1}+...+x_{n}|^{2}}{\lambda^{2}} = \frac{n}{\lambda^{2}} \end{aligned}. So the choice \lambda =\lambda_{n}  = \sqrt{n} \sqrt{\frac{24 }{\mathbb{E}|f-\mathbb{E}f|^{2}}} works fine (notice that we can assume \mathbb{E} |f-\mathbb{E}f|^{2}\neq 0 otherwise there is nothing to prove in the lemma). Thus we obtain the final bound
\begin{aligned} \mathbb{E} |f-\mathbb{E}f|^{2} \leq 8\lambda \frac{d^{2}}{n} \max_{[-n,n]}|p|  = 8\sqrt{22} \frac{d^{2}}{\sqrt{n}} \frac{1}{\sqrt{\mathbb{E}|f-\mathbb{E}f|^{2}}}  \max_{[-n,n]}|p| \end{aligned}
\begin{aligned} \mathbb{E} |f- \mathbb{E}f|^{2} \leq C  \left(\frac{d^{4}}{n}\right)^{1/3} (\max_{[-n,n]}|p|)^{2/3} \end{aligned}.

We almost proved the lemma except we still need to somehow bound \max_{[-n,n]}|p| by a universal constant. Here is another trick: let \max_{[-n,n]} |p(t)| = |p(t_{0})|, t_{0} \in [-n,n]. Find x_{1}, \ldots, x_{n} = \pm 1 so that k=x_{1}+...+x_{n} is the closest point to t_{0}. Clearly |k-t_{0}|\leq 1 and |p(k)|\leq 0. Then
\begin{aligned} |p(t_{0})| = |p(k)+p'(\xi)(t_{0}-x)|\leq 1+\max_{[-n,n]}|p'|\leq 1+\frac{d^{2}}{n} |p(t_{0})| \end{aligned} where in the last inequality we used Markov’s inequality again. Next, if we had \frac{d^{2}}{n} <\frac{1}{2}, then this would imply that \max_{[-n,n]}|p| = |p(t_{0})| \leq 2, and Lemma 2 will be proved. However, notice that if \frac{d^{2}}{n} \geq \frac{1}{2} then there is nothing to prove in Lemma 2 because we can choose C large enough. This finishes the proof of the lemma \square. Thus the theorem is proved \square.

Remark: One can get different powers in the right hand side of the inequality in Lemma 2. For this one needs to invoke Hoeffding’s inequality which gives exponential decay instead of the trivial Chebyshev inequality.


Bernoulli Random Matrices and Multilinear Boolean Polynomials

In my previous post on Hadamard conjecture we talked about uncertainty principle for boolean functions. The main open question was the following

Problem 1: Let n > 2, and let B \subset \{1, \ldots, n\} be a given nonempty subset. Consider all multilinear polynomials

\begin{aligned} f(x_{1}, \ldots, x_{n}) = \sum_{\substack{S\subset \{1, \ldots, n\},\\  \, |S|\notin B}} a_{S}\prod_{j \in S}x_{j}, \quad a_{S} \in \mathbb{R},\end{aligned}

such that when restricted to \{-1,1\}^{n} they take values 0 or 1. What can be the smallest positive value for f(0)?

In particular, we showed that when B=\{2\} then \begin{aligned}\min_{f\, :\, f(0)>0 }f(0)\geq  \frac{n}{2^{n}}\end{aligned}. Moreover, we have equality for n divisible by 4 if and only if the Hadamard conjecture holds true.

Remark: For an arbitrary n>2 we also noticed that the validity of Hadamard conjecture implies the equality \begin{aligned}\min_{f\, :\, f(0)>0 }f(0)=  \frac{k}{2^{n}}\end{aligned} where k\geq n is the smallest positive integer divisible by 4.

These type of questions about uncertainty principle for boolean functions are closely related to existence of linear codes. Indeed, let us consider the case when B = \{1,2, \ldots, \ell\} for some \ell, 1<\ell \leq n, i.e., we are looking for multilinear polynomials of the form

\begin{aligned} f(x_{1}, \ldots, x_{n}) = a_{\emptyset}+\sum_{S\subset \{1, \ldots, n\}, |S|> \ell} a_{S}\prod_{j \in S}x_{j}, \quad a_{S} \in \mathbb{R}, \quad a_{\emptyset}>0, \quad (1) \end{aligned}
such that when restricted to \{-1,1\}^{n} they take values in \{0,1\}. The goal is to make a_{\emptyset} as small as possible.

One obvious way to construct such polynomials is to look at \begin{aligned} f(x_{1}, \ldots, x_{n}) = \frac{1}{2^{m}}\prod_{k=1}^{m}\left( 1+\prod_{j \in S_{k}} x_{j}\right)\end{aligned} where S_{1}, \ldots, S_{m} are certain subsets of \{1, \ldots, n\}. Clearly such f takes values 0 or 1 whenever x=(x_{1}, \ldots, x_{n}) \in \{-1,1\}^{n}. Looks like it is a good candidate except we also need to make sure that when we open parentheses (and use identities x_{j}^{2}=1 as many times as possible) then the resulted expression has the form (1), i.e,. it does not have low degree monomials. Using the identity x_{j}^{2} we see
\begin{aligned} \prod_{k=1}^{m}\left( 1+\prod_{j \in S_{k}} x_{j}\right) = 1+\sum_{k=1}^{m} \prod_{j \in S_{k}} x_{j} +  \sum_{1\leq k_{1}<k_{2}\leq n} \prod_{j \in S_{k_{1}}\Delta S_{k_{2}}} x_{j}+\sum_{1\leq k_{1}<k_{2}<k_{3}\leq n } \prod_{j \in S_{k_{1}}\Delta S_{k_{2}}\Delta S_{k_{3}}} x_{j} +...\end{aligned}
where \Delta denotes symmetric difference. Therefore our candidate would be of the form (1) if and only if |S_{j}|>\ell for all j=1, \ldots, m, |S_{i}\Delta S_{j}| >\ell for all 1\leq i <j\leq n, etc. So the question is how to find such sets S_{1}, \ldots, S_{m}. For each set S_{k} \subset \{1, \ldots, n\} we can associate a point x=(x_{1}, \ldots, x_{n}) \in \{0,1\}^{n} in a unique way by setting x_{j}=1 if j \in S_{k} and x_{j}=0 otherwise. Then the symmetric difference S_{i}\Delta S_{j} would correspond to the sum x+y mod(2) of the corresponding points, and the cardinality of the set S_{k} will correspond to the Hamming weight of the corresponding point x, i.e., the Hamming distance from x to the origin. Let us consider \{0,1\}^{n} as a vector space over the field \mathbb{F}_{2} with addition mod (2) (the traditional notation for such vector spaces is \mathbb{F}_{2}^{n}). Thus it would suffice to find m linearly independent vectors v_{1}, \ldots, v_{m} \in \mathbb{F}_{2}^{n} such that for any two different vectors u, v \in \mathrm{span}\{v_{1},\ldots, v_{m}\} we have d_{H}(u,v)\geq \ell+1, i.e., the Hamming distance d_{H} between u and v is at least \ell+1. Our goal is to minimize the number \frac{1}{2^{m}} therefore given n>2 and \ell+1 we would like to maximize m. Since d_{H}(u,v)=d_{H}(u-v,0) it suffices to find a subspace C \subset \mathbb{F}_{2}^{n} of the largest dimension k (i.e., k linearly independent vectors in \mathbb{F}_{2}^{n}) such that d_{H}(v,0)\geq \ell+1 for any nonzero v \in C.

Definition: Let 1\leq k \leq n and 1\leq d \leq n. Subspace C \subset \mathbb{F}_{2}^{n} of dimension k with the Hamming distance d_{H}(v,0)\geq d for any nonzero v \in C is called linear code and is denoted by [n,k,d]_{2}.

A major open question in coding theory is

Problem 2: Given arbitrary integers 1\leq d \leq n find the largest possible k=k(n, d) such that the linear code [n,k,d]_{2} exists.

Remark 1: Clearly the knowledge of k(n, d) would give us the upper bound \begin{aligned}\min_{f\, :\, f(0)>0 }f(0)\leq \frac{1}{2^{k(n,\ell+1)}}\end{aligned} where minimum is taken over all multilinear polynomials (1) which take values 0,1 on the Hamming cube \{-1,1\}^{n}.

For certain parameters 1 \leq d \leq n the largest possible dimension k = k(n,d) is known.

Lemma 1: We have k(n,3)=\lfloor n-\log_{2}(n+1)\rfloor, i.e., there exists the linear code [n,\lfloor n-\log_{2}(n+1)\rfloor, 3]_{2}

Definition: Let us say that C \subset \mathbb{F}_{2}^{n} has Hamming distance at least d>0 if for any u, v \in C, u \neq v we have d_{H}(u,v)\geq d.

Proof of Lemma 1: Let C \subset \mathbb{F}_{2}^{n} be the subspaces with Hamming distance at least 3 and having the largest possible dimension. Let v_{1}, \ldots, v_{k} be the set of linearly independent vectors which span C over the field \mathbb{F}_{2}. Let h_{1}, \ldots, h_{n-k} be the set of linearly independent vectors such that h_{j} is orthogonal to all v_{1}, \ldots, v_{k} (i.e., take the basis of orthogonal complement of C in \mathbb{F}_{2}^{n}). Let H = (h_{1}, \ldots, h_{n-k}) be n \times (n-k) matrix. Clearly v H =0 if and only if v \in C  = \mathrm{span}\{v_{1}, \ldots, v_{k}\}. In particular, if we take w with hamming weight exactly one, say w = (1, 0, \ldots, 0) then w H \neq 0 because w \notin C. This means that no row of H is identically zero. Next, if we consider w‘s of Hamming weight 2, say w=(1,1,0,\ldots, 0), then wH\neq 0, which means that the sum of any two different rows of H is never zero. However, p+q=0 in \mathbb{F}_{2}^{n} if and only if p=q. Thus we conclude that all rows of H are nonzero, and any two rows of H are different. In addition to that we know that H has rank n-k. How large the value for k can be? Take the first row of H to be (1, 0, \ldots, 0); the second row (0,1,0, \ldots, 0) etc, take the n-k row to be (0,\ldots, 0,1). Next, take all possible rows with Hamming weight 2; then hamming weight 3 etc. So in total we can create 2^{n-k}-1 such nonzero vectors. Therefore the obvious requirement is that 2^{n-k}-1\geq n of course such matrix will have rank n-k since the first n-k rows are the standard basis vectors. Thus k could be any number such that 2^{n-k}-1\geq n. Thus we have an upper bound k \leq \lfloor n-\log_{2}(n+1)\rfloor. Let us show that we can reach this upper bound. Let k=\lfloor n-\log_{2}(n+1)\rfloor. Next, construct the matrix H as before. Finally we just need to choose C to be the orthogonal complement to \mathrm{span}\{h_{1}, \ldots, h_{n-k}\} in \mathbb{F}_{2}^{n}. Since any H has nonzero columns and any two column is different therefore C does not contain vectors with hamming weight 1 or 2. The lemma is proved \square.

Corollary 1 (Hamming code): For any k \geq 2 there exists the linear code [2^{k}-1, 2^{k}-k-1, 3]_{2}.

Definition: Let B \subset \{1, \ldots, n\}. Denote by f_{B} any multilinear polynomial of the form \begin{aligned} f(x_{1}, \ldots, x_{n}) = \sum_{\substack{S\subset \{1, \ldots, n\},\\  \, |S|\notin B}} a_{S}\prod_{j \in S}x_{j}, \quad a_{S} \in \mathbb{R},\end{aligned} such that when restricted to \{-1,1\}^{n} it takes values 0 or 1.

Corollary 2. For any n \geq 2 we have

\begin{aligned} \frac{2^{\lceil \log_{2}(n+1) \rceil }}{2^{n}}\geq \min_{f_{\{1,2\}}, \, f(0)>0} f(0)\geq \min_{f_{\{2\}}, \, f(0)>0} f(0)\geq \frac{n}{2^{n}}\end{aligned}.

The picture represents the graph of \varphi(n) = 2^{\lceil \log_{2}(n+1) \rceil} - n for 1\leq n \leq 200. We also see that \varphi(2^{\ell}-1)=1 for all \ell \geq 1.

Gilbert-Varshamov bound for linear codes.

In what follows we will find good lower bounds for k(n,d) for an arbitrary d \geq 3.

Theorem(Gilbert-Varshamov): Let n> d \geq 2. There exists the linear code [n,\lfloor-\log_{2}(2^{-n}+V_{n}(d))\rfloor,d]_{2}, where V_{n}(d) = \mathbb{P}(\xi_{1}+\ldots+\xi_{n}<d), \xi_{1}, \ldots, \xi_{n} are i.i.d. Bernoulli 0, 1 with probability 1/2.

Proof: If V \subset \mathbb{F}_{2}^{n} is a subspaces of dimension k, then we can find k linearly independent vectors v_{1}, \ldots, v_{k} which span V. Let C = (v_1, \ldots, v_k) be n\times k matrix. Clearly V = \{Cx, \, : x \in \mathbb{F}_{2}^{k}\}. Thus if we pick an arbitrary n \times k matrix with entries 0, 1 of maximal rank then we automatically find some V. So let us look at random 0,1 matrices of size C. They are good candidates to generate such vector subspaces V except we need to make sure that \mathrm{rank} (C) = k. The set of
n\times k Bernoulli matrices with \mathrm{rank}(C)=k is quite large comparing to the set of not necessarily full rank matrices.

Let \mathbb{P} be the uniform probability measure over such n \times k Bernoulli 0,1 random matrices. Let us find for which k, we have

\begin{aligned} \mathbb{P} ( \exists x \in \mathbb{F}_{2}^{k}, x\neq 0,\, :\,  d_{H}(Cx, 0)<d)+\mathbb{P}(\forall x \in \mathbb{F}_{2}^{k}, x \neq 0, d_{H}(Cx,0)\geq d, \, \mathrm{rank}(C)<k)<1  \quad (1)\end{aligned}.

Indeed, if this is the case then the complement of the union of these evente has nonzero probability which means that there will exists at least one C with \mathrm{rank}(C)=k such that
\forall x \in \mathbb{F}_{2}^{k}, x\neq 0 we have d_{H}(Cx, 0)\geq d.

Let us estimate the second term trivially as \begin{aligned}\mathbb{P}(\forall x \in \mathbb{F}_{2}^{k}, x \neq 0, d_{H}(Cx,0)\geq d, \, \mathrm{rank}(C)<k)\leq \mathbb{P}(\mathrm{rank}(C)<k) = 1-\mathbb{P}(\mathrm{rank}(C)=k)\end{aligned}

Lemma 1: Let n \geq k \geq 1. Let C be n \times k Bernoulli 0,1 (with probability 1/2) matrix with i.i.d. entries. Then \begin{aligned}\mathbb{P}(\mathrm{rank}(C)=k) = \prod_{j=n-k+1}^{n}\left(1-\frac{1}{2^{j}}\right)\end{aligned}.

Remark: If one asks the same question with linear independence over the field \mathbb{R} instead of \mathbb{F}_{2} then it seems to be a difficult problem. The best results in this direction are due to Konstantin Tikhomirov, see Singularity of random Bernoulli matrices.

Proof of Lemma 1: what is the probability that the first column of C, call it v, is nonzero? The answer is \frac{2^{n}-1}{2^{n}}. This column spans 1 dimensional subspace in \mathbb{F}_{2}^{n}. What is the probability that the second column falls into the orthogonal complement of \mathrm{span}\{v\}? In total we have 2^{n} vectors. The \mathrm{span}\{v\}, as a 1-dimensional subspace, contains 2^{1} vectors. So the probability of picking the second column from the complement is \frac{2^{n}-2}{2^{n}}. Now the first two columns span 2-dimensional subspace. Therefore the probability of picking the third column from the complement of the span of the first two columns is \frac{2^{n}-2^{2}}{2^{n}} etc. Finally, we obtain that \begin{aligned} \mathbb{P}(\mathrm{rank}(C)=k) =  \prod_{j=0}^{k-1} \left(\frac{2^{n}-2^{j}}{2^{n}}\right)\end{aligned}. The lemma is proved \square.

Thus we can estimate the second term in (1) by 1-\prod_{j=n-k+1}^{n}\left(1-\frac{1}{2^{j}}\right).

Exercise: Let n \geq k \geq 1. Show that 1-\prod_{j=n-k+1}^{n}\left(1-\frac{1}{2^{j}}\right) \leq \frac{1}{2^{n -k}}.

Next, let us estimate the first term. First we will use the union bound

\begin{aligned} \mathbb{P} ( \exists x \in \mathbb{F}_{2}^{k}, x\neq 0,\, :\,  d_{H}(Cx, 0)<d) \leq \sum_{x \in \mathbb{F}_{2}^{k}, x \neq 0}\mathbb{P} (d_{H}(Cx, 0)<d) \end{aligned}

Since x \neq 0, we work in the field \mathbb{F}_{2}, and the entries of C are i.i.d. Bernoulli 0,1 with probability 1/2, we see that the coordinates of Cx are \xi_{1}, \ldots, \xi_{n} i.i.d. Bernoulli 0,1 with probability 1/2 (independent of x). Thus we have

\begin{aligned} \mathbb{P} (d_{H}(Cx, 0)<d) = \mathbb{P}(\xi_{1}+\ldots +\xi_{n}<d) \end{aligned} where \xi_{1}, \ldots, \xi_{n} are i.i.d. Bernoulli 0, 1 random variables with probability 1/2. Finally we get the bound:

\begin{aligned} \mathbb{P} ( \exists x \in \mathbb{F}_{2}^{k}, x\neq 0,\, :\,  d_{H}(Cx, 0)<d)+\mathbb{P}(\forall x \in \mathbb{F}_{2}^{k}, x \neq 0, d_{H}(Cx,0)\geq d, \, \mathrm{rank}(C)<k)  <\end{aligned}
\begin{aligned}(2^{k}-1)\mathbb{P}(\xi_{1}+\ldots +\xi_{n}<d)+\frac{1}{2^{n-k}} <2^{k}[\mathbb{P}(\xi_{1}+\ldots +\xi_{n}<d)+2^{-n}] \end{aligned}. This finishes the proof of the theorem. \square.

Remark: The reader with probabilistic background may notice that the proof of Gilbert-Varshamov’s bound (1952) for linear codes is essentially is the same proof as the one for Johnson–Lindenstrauss lemma (1984).

Corollary 3 (Uncertainty Principle): Let n \geq \ell \geq 1, and let f : \{-1,1\}^{n} \to \{0,1\} be not identically zero such that a_{S}(f)=0 whenever 1\leq |S| \leq \ell. Then
\mathbb{E} f \leq 2^{\lceil \log_{2}(2^{-n}+V_{n}(\ell+1)) \rceil} where V_{n}(d) = \mathbb{P}(\xi_{1}+\ldots+\xi_{n}<d) denotes the volume of the Hamming ball of radii d (here \xi_{1}, \ldots, \xi_{n} are i.i.d. Bernoulli 0,1 with probability 1/2).


Hadamard Conjecture, Uncertainty Principle, and the Hamming Code

Back in October 2018, Asaf Ferber from MIT, (now my colleague at UCI), gave an interesting talk based on his paper “On the number of Hadamard matrices via anti-concentration” jointly with V. Jain and Y. Zhao on our probability seminar at UC Irvine. Surprisingly to me this was the first time when I learned about Hadamard conjecture. After the talk I had a series of discussions with Wencai Liu about this interesting question. We found some unexpected to us connections of the conjecture with uncertainty principle in harmonic analysis and the Hamming codes.

Hadamard Conjecture

Can one find n vectors in \mathbb{R}^{n}, say v_{1}, \ldots, v_{n}, such that the dot product v_{i}\cdot  v_{j} =0 for all i, j =1, \ldots n, i \neq j ? Well, one can just choose (1, 0, \ldots, 0), (0,1,0,\ldots, 0), …, (0,\ldots, 0,1).

On the other hand, if we require that all vectors v_{1}, \ldots, v_{n} must have coordinates either 1 or -1 then it becomes an interesting problem. For n=2 we could just take (1,1), and (1,-1). For n=3 after trying different possibilities one realizes that this is impossible. However, if we try n=4 then after a moment of thinking we see that (1,1,1,1), (1,-1,1,-1), (1,1,-1,-1) and (1,-1,-1,1) works.

It is easy to see that if such vectors exist then n must be divisible by 2. In fact one can show

Exercise 1: Let n>2. If v_{1}, \ldots, v_{n} \in \mathbb{R}^{n} are vectors with coordinates \pm 1 such that v_{i} \cdot v_{i}=0 for all i\neq j, 1\leq i,j\leq n, then n=4k for some positive integer k\geq 1. [Hint: show that if there are 3 such orthogonal vectors in \mathbb{R}^{n} then n must be divisible by 4]

The Hadamard conjecture states that the converse is also true: if n=4k then such vectors exist. The conjecture was verified for all n such that n \leq 668, also for n=2^{\ell}, and for some other certain kind of n‘s.

Uncertainty Principle

The set \{-1,1\}^{n} is called the Hamming cube of dimension n. It consists of vectors x=(x_{1}, \ldots, x_{n}) such that x_{j}= \pm 1. Any function f : \{-1.1\}^{n} \to \mathbb{R} can be decomposed into orthogonal sum, i.e., Fourier–Walsh sum f(x) = \sum_{S \subset \{1,\ldots, n\}} a_{S} \prod_{j \in S} x_{j}, where the summation runs over all subsets S \subset \{1,\ldots, n\}, a_{S}=a_{S}(f) are called Fourier coefficients of f, and \prod_{j\in S} x_{j} =: W_{S}(x) are called Walsh functions. Define the average of f :\{-1,1\}^{n} \to \mathbb{R} as \mathbb{E} f = \frac{1}{2^{n}} \sum_{x \in \{-1,1\}^{n}} f(x). Then with respect to inner product \mathbb{E} fg the system \{ W_{S}\}_{S \subset \{1,\ldots n\}} is orthonormal system in L^{2} (\{-1,1\}^{n}) equipped with uniform counting measure. In other words, one can show that

\mathbb{E} W_{S} W_{S'} =\begin{cases}1 & S=S',\\ 0 & S\neq S'. \end{cases}

Then it follows that a_{S} = \mathbb{E} f W_{S}. Notice that in total we have 2^{n} numbers a_{S}, and f takes at most 2^{n} values. So one may view it it as a linear algebra problem to see that there is one-to-one correspondence between multilinear polynomials x \mapsto  \sum_{S \subset \{1,\ldots, n\}} a_{S} \prod_{j \in S} x_{j}, and functions f :\{-1,1\}^{n} \to \mathbb{R}.

Proof: Indeed, given an arbitrary f :\{-1,1\}^{n} \to \mathbb{R}, let us show that the system f(x) = \sum_{S \subset \{1,\ldots, n} a_{S}W_{S}(x), x \in \{-1,1\}^{n} has a unique solution \{a_{S}\}_{S \subset \{1, \ldots, n\}} for some a_{S} \in \mathbb{R}. This is easy to see by induction. When n=1, we have the polynomial a_{\emptyset}+x_{1} a_{1}. Since x_{1} takes only two values \pm 1 we get the system of equations \begin{cases}a_{\emptyset} + a_{1} = f(1)\\ a_{\emptyset} - a_{1} = f(-1) \end{cases} which has a unique solution in variables \{a_{\emptyset}, a_{1}\} because \det \begin{pmatrix} 1 &  1 \\ 1 & -1 \end{pmatrix} \neq 0.

In general, for n>1, instead of \begin{pmatrix} 1 &  1 \\ 1 & -1 \end{pmatrix} we should look at the matrix of size 2^{n}\times 2^{n} with rows \{ W_{S}(x) \}_{S \in \{1, \ldots, n\}}, x \in \{-1,1\}^{n}. Next, in the row \{ W_{S}(x) \}_{S \in \{1, \ldots, n\}} we could rearrange terms and first put those monomials W_{s}(x) which do not have variable x_{1}, and then put the rest. So we can write the rows of our matrix as \{ W_{S}(x) \}_{S \in \{1, \ldots, n\}} = \{ \{ W_{S}(x)\}_{S \subset \{2, \ldots, n\}}, x_{1} \{W_{S}(x)\}_{S \subset \{2, \ldots, n\}}\} . Therefore, if we denote the matrix with rows \{ W_{S}(x)\}_{S \subset \{2, \ldots, n\}} by A (which has nonzero determinant by induction assumption), then our matrix will be of the form \begin{pmatrix} A  & A \\ A & -A \end{pmatrix} whose determinant (by Gaussian elimination) is \det (2A(-A)) = (-2)^{2^{n-1}} \det(A)^{2} \neq 0. \square

Remark: Notice an interesting relation: if we look at the rows of the matrix A= \begin{pmatrix} 1 &  1 \\ 1 & -1 \end{pmatrix} as vectors in \mathbb{R}^{2}, pairwise orthogonal with coordinates \pm 1, then the rows of the new matrix \begin{pmatrix} A  & A \\ A & -A \end{pmatrix} will be vectors in \mathbb{R}^{4}, pairwise orthogonal with coordinates \pm 1. In this way we can easily generate such orthogonal vectors in \mathbb{R}^{n} for n=2^{\ell}.

Question 1: Let f :\{-1,1\}^{n} \to \{0,1\} be not identically zero and such that a_{S}(f)=0 for all S \subset \{1, \ldots, n\} with |S| \in B \subset \{1, \ldots, n\}, where B is some fixed subset. How small the size of the support of f can be?

By support we mean the number of points x \in \{-1,1\}^{n} such that f(x) \neq 0.

In other words if many Fourier coefficients of f are zero how small/localized the function itself can be? In analysis these type of questions are considered as uncertainty principle.

Next, we consider the simplest cases when B is the single element subset of \{1, \ldots, n\}

Case B = \{\emptyset\}. This case is not interesting because this simply means that \mathbb{E} f =0 which cannot happen for nonzero f.

Case B = \{1\}. In this case we have a_{1}(f) =a_{2}(f)=\ldots = a_{n}(f)=0. Suppose f lives at points u_{1}, \ldots, u_{k} \in \{-1,1\}^{n}. Then a_{j}(f) = \mathbb{E} fW_{j}(x) = \frac{1}{2^{n}} \sum_{\ell=1}^{k} W_{j}(u_{\ell})=0. The latter inequality means that the sum of j‘th coordinates of vectors u_{1}, \ldots, u_{k} is zero. Clearly the smallest such k we can have is k=2, and we can choose v_{1}=(1, \ldots, 1), v_{2} = (-1, \ldots, -1).

Case B =\{ 2\}. Let us show that this case is no different than Hadamard conjecture. Indeed, suppose f is nonzero at points u_{1}, \ldots u_{k} \in \{-1, 1\}^{n}. Then for any S \subset \{1, \ldots, n\} with |S|=2 we have a_{S}(f) = \frac{1}{2^{n}} \sum_{\ell=1}^{k} W_{S}(u_{\ell})=0. This means that if we take \{i,j\}= S coordinates of each vector u_{\ell}, multiply by each other and sum up over all vectors we get 0. Consider n \times k matrix A with columns u_{1}, \ldots, u_{k}. Then the above condition simply means that the rows of A are pairwise orthogonal. If n vectors in \mathbb{R}^{k} are orthogonal then necessarily n \leq k. Hadamard conjecture says that we can choose k=n if and only if n is divisible by 4.

Theorem 1. There exists not identically zero f :\{-1,1\}^{n} \to \{0,1\} such that all its second order Fourier coefficients are zero with |\mathrm{supp}(f)|=k if and only if there exist pairwise orthogonal v_{1}, \ldots, v_{n} in \mathbb{R}^{k} with coordinates \pm 1.

Proof: We already proved that existence of such function implies existence of such vectors. Let us prove the converse result. Let v_{1}, \ldots, v_{n} be pairwise orthogonal vectors in \mathbb{R}^{k} with coordinates \pm 1. Consider n \times k matrix A with rows v_{1}, \ldots, v_{n}. Let u_{1}, \ldots, u_{k} be the columns of A. Define f : \{-1,1\}^{n} \to \{0,1\} so that f(u_{j})=1 for all j=1, \ldots k, and f=0 otherwise. Clearly |\mathrm{supp}(f)|=k. On the other hand for any S \subset \{1, \ldots, n\} with |S|=2 we have a_{S}(f) = \mathbb{E} f W_{S} = \frac{1}{2^{n}} \sum_{\ell=1}^{k} W_{S}(u_{\ell}). The latter sum is zero because it simply means that the product of \{i,j\} = S coordinates of vectors u_{\ell} (and then the sum over all \ell=1, \ldots, k) is zero which is just orthogonality of vectors v_{i} and v_{j}. \square.

Remark: Let n\geq 2. Assuming the Hadamard Conjecture it then follows that the smallest possible size of the support of not identically zero f : \{-1,1\}^{n} \to \{0,1\} whose all second order Fourier coefficients are zero is the least possible integer k=k(n) \geq n which is divisible by 4.

Hamming code

By Theorem 1 it suffices to study the following problem

Problem 1. Let n\geq 2. Consider an arbitrary multilinear polynomial

\begin{aligned} f(x_{1}, \ldots, x_{n}) = \sum_{\substack{S \subset \{1, \ldots, n\} \\ |S|\neq 2}} a_{S} \prod_{j \in S} x_{j}, \quad a_{S} \in \mathbb{R}\end{aligned}

so that when restricted to \{-1,1\}^{n} it takes values in \{0,1\}. What can be the smallest positive value for f(0) ?

Notice that |\mathrm{supp}(f)| = 2^{n}\mathbb{E} f = 2^{n} a_{\emptyset}=2^{n}f(0), and since we require in the problem that f(0)>0 it excludes the case when f is identically zero.

If n=2 we can take f(x) =\frac{1+x_{1}}{2}. Then |\mathrm{supp}(f)|  = 2^{n} \cdot \frac{1}{2} =  2.

If n=3 we still can take f(x) =\frac{1+x_{1}}{2}. Then |\mathrm{supp}(f)|  = 2^{n} \cdot \frac{1}{2} =  4

If n=4 we should take f(x) = \frac{1}{2^{2}}(1+x_{1})(1+x_{2}x_{3}x_{4}). Clearly such f does not have second order monomials. Then |\mathrm{supp}(f)| = 2^{n} \cdot \frac{1}{2^{2}}=4. Since we always have the inequality |\mathrm{supp}(f)| \geq n this choice is the best possible.

If n=5 then f(x) =  \frac{1}{2^{2}}(1+x_{1})(1+x_{2}x_{3}x_{4}) gives |\mathrm{supp}(f)| =8.

If n=8 then f(x) = \frac{1}{2^{5}}(1+x_{1})(1+x_{2}x_{3}x_{4})(1+x_{5}x_{6}x_{7}x_{8})(1+x_{2}x_{3}x_{6}x_{7})(1+x_{3}x_{4}x_{7}x_{8}) gives |\mathrm{supp}(f)| =8

It is a good question to see in general how much we can factorize these polynomials.

Proposition 1: Let \ell>1, and let n=2^{\ell}. There exists polynomial of the form \begin{aligned} f(x) = \frac{1}{2^{n-\ell}}\prod_{j=1}^{n-\ell}\left(1+\prod_{k \in S_{j}} x_{k}\right) \stackrel{\{\pm 1\}^{n}}{=} \frac{1}{2^{n-\ell}} + \sum_{\substack{S\subset \{1, \ldots, n\}\\ |S|>0, |S|\neq 2}} a_{S} \prod_{j \in S}x_{j}\end{aligned} for some S_{1}, \ldots, S_{n-\ell} \subset \{1, \ldots, n\}, so that that when restricted to the Hamming cube \{-1,1\}^{n} it takes values 0 or 1.

Remark: notice that the symbol \stackrel{\{\pm 1\}^{n}}{=} means equality on the Hamming cube, i.e., we should take into account the identities x_{j}^{2}=1 for all j=1, \ldots, n

Remark: notice that |\mathrm{supp}(f)| = 2^{n} \mathbb{E} f = 2^{\ell} =n. In particular Proposition 1 together with Theorem 1 gives one more solution of Hadamard conjecture in the case n=2^{\ell}.

Proof of the Proposition 1: We shall find f in the form f = \frac{1}{2^{n-\ell}}(1+x_{1}) \prod_{j=1}^{n-\ell-1} (1+W_{S_{j}}(x)), i.e., S_{n-\ell}=\{1\}. The factor (1+x_{1}) can be justified by the following reason: if v_{1}, \ldots v_{n} are pairwise orthogonal vectors in \mathbb{R}^{n} with coordinates \pm 1, then without loss of generality we can assume that v_{1} = (1, \ldots, 1). Indeed, consider the matrix A with rows v_{1}, \ldots, v_{n} if we switch the columns/rows, multiply columns or rows by -1, then the rows of the new matrix still have the orthogonality property. Doing such operations we can make the first row to be of the form (1, \ldots, 1). This corresponds to the fact that whenever x_{1}=-1 then f=0. Thus we can always assume that f = \frac{1}{2^{n-\ell}}(1+x_{1}) \prod_{j=1}^{n-\ell-1} (1+W_{S_{j}}(x)).

Notice that W_{S_{j}} W_{S_{i}} \stackrel{\{\pm 1\}^{n}}{=} W_{S_{i}\Delta S_{j}} where S_{i}\Delta S_{j} is the symmetric difference of the sets S_{i}, and S_{j}. Therefore

\begin{aligned} \prod_{j=1}^{n-\ell-1} (1+W_{S_{j}}(x)) \stackrel{\{\pm 1\}^{n}}{=} 1+ \sum W_{S_{j}}+\sum W_{S_{i}\Delta S_{j}} +\sum W_{S_{i}\Delta S_{j}\Delta S_{k}}...\end{aligned}

Therefore, let us look for those subsets S_{1}, \ldots, S_{n-\ell-1} \subset \{2, \ldots, n\} which have cardinality |S_{j}|\geq 3, and such that symmetric difference of any finite number of them has cardinality at least 3. If such family of sets exists then f = \frac{1}{2^{n-\ell}}(1+x_{1}) \prod_{j=1}^{n-\ell-1} (1+W_{S_{j}}(x))\stackrel{\{\pm 1\}^{n}}{=} \frac{1}{2^{n-\ell}} + \sum_{\substack{S\subset \{1, \ldots, n\}\\ |S|>0, |S|\neq 2}} a_{S} \prod_{j \in S}x_{j} is the desired polynomial. It remains to prove the following

Lemma 1: Let \ell \geq 2. There exist subsets S_{1}, \ldots, S_{2^{\ell}-\ell-1} \subset \{1, \ldots, 2^{\ell}-1\} having cardinality at least 3 and such that any finite number of symmetric differences of these sets have cardinality at least 3.

Proof of Lemma 1: it is a good idea sometimes to go from sets to corresponding points in \{0,1\}^{m}. For example, any subset of \{1, \ldots, 2^{\ell}-1\} can be identified in a unique way with a point in \{0,1\}^{2^{\ell}-1}. Indeed, let S \subset \{1, \ldots, 2^{\ell}-1\}. Define x=(x_{1}, \ldots, x_{2^{\ell}-1}) \in \{0,1\}^{2^{\ell}-1} so that x_{j}=1 if j \in S, and x_{j}=0 otherwise. This gives one-to-one correspondence.

Notice that the symmetric difference of two sets S, W \subset  \{1, \ldots, 2^{\ell}-1\} corresponds to addition mod (2) of the corresponding points in \{0,1\}^{2^{\ell}-1}. Therefore we would like to find 2^{\ell}-\ell-1 points in \{0,1\}^{2^{\ell}-1} with Hamming distance at least 3 from the origin, and such that the Hamming distance of the sum mod (2) of any finite number of them to the origin is also at least 3. Existence of such set of points simply follows from the existence of the Hamming code [2^{\ell}-1, 2^{\ell}-\ell-1, 3]_{2}. Let us elaborate on this a little bit: existence of binary linear code [n,k,d]_{2} means the following: let us consider \{0,1\}^{n} as a vector space with addition mod (2). Then there exists a linear subspace C \subset \{0,1\}^{n} of dimension k such that the Hamming distance from any nonzero point x \in C to the origin is at least d. Therefore existence of the linear Hamming code [2^{\ell}-1, 2^{\ell}-\ell-1, 3]_{2} implies that we can choose 2^{\ell}-\ell-1 linearly independent vectors in C \subset \{0,1\}^{2^{\ell}-1} such that the Hamming distance from the origin to the sum of any finite number number of these vectors is at least 3. The lemma is proved \square. This finishes the proof of the proposition \square.

Create your website at WordPress.com
Get started