This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Principles of operator algebras

Teo Banica Department of Mathematics, University of Cergy-Pontoise, F-95000 Cergy-Pontoise, France. [email protected]
Abstract.

This is an introduction to the algebras AB(H)A\subset B(H) that the linear operators T:HHT:H\to H can form, once a complex Hilbert space HH is given. Motivated by quantum mechanics, we are mainly interested in the von Neumann algebras, which are stable under taking adjoints, TTT\to T^{*}, and are weakly closed. When the algebra has a trace tr:Atr:A\to\mathbb{C}, we can think of it as being of the form A=L(X)A=L^{\infty}(X), with XX being a quantum measured space. Of particular interest is the free case, where the center of the algebra reduces to the scalars, Z(A)=Z(A)=\mathbb{C}. Following von Neumann, Connes, Jones, Voiculescu and others, we discuss the basic properties of such algebras AA, and how to do algebra, geometry, analysis and probability on the underlying quantum spaces XX.

Key words and phrases:
Linear operator, Operator algebra
2010 Mathematics Subject Classification:
46L10

Preface

Quantum mechanics as we know it is the source of many puzzling questions. The simplest quantum mechanical system is the hydrogen atom, consisting of a negative charge, an electron, moving around a positive charge, a proton. This reminds electrodynamics, and accepting the fact that the electron is a bit of a slippery particle, whose position and speed are described by probability, rather than by exact formulae, the hydrogen atom can indeed be solved, by starting with electrodynamics, and making a long series of corrections, for the most coming from experiments, but sometimes coming as well from intuition, with the idea in mind that beautiful mathematics should correspond to true physics. The solution, as we presently know it, is something quite complicated.


Mathematically, the commonly accepted belief is that the good framework for the study of quantum mechanics is an infinite dimensional complex Hilbert space HH, whose vectors can be thought of as being states of the system, and with the linear operators T:HHT:H\to H corresponding to the observables. This is however to be taken with care, because in order to do “true physics”, things must be far sharper than that. Always remember indeed that the simplest object of quantum mechanics is the hydrogen atom, whose simplest states and observables are something quite complicated. Thus when talking about “states and observables”, we have a whole continuum of possible considerations and theories, ranging from true physics to very abstract mathematics.


For making things worse, even the existence and exact relevance of the Hilbert space HH is subject to debate. This is something more philosophical, related to the 2-body hydrogen problem evoked above, which has twisted the minds of many scientists, starting with Einstein and others. Can we get someday to a better quantum mechanics, by adding more variables to those available inside HH? No one really knows the answer here.


The present book is an introduction to the algebras AB(H)A\subset B(H) that the bounded linear operators T:HHT:H\to H can form, once a Hilbert space HH is given. There has been an enormous amount of work on such algebras, starting with von Neumann in the 1930s, and we will insist here on the aspects which are beautiful. With the idea, or rather hope in mind, that beautiful mathematics should correspond to true physics.


So, what is beauty, in the operator algebra framework? In our opinion, the source of all possible beauty is an old result of von Neumann, related to the Spectral Theorem for normal operators, which states that any commutative von Neumann algebra AB(H)A\subset B(H) must be of the form A=L(X)A=L^{\infty}(X), with XX being a measured space.


This is something subtle and interesting, which suggests doing several things with the von Neumann algebras AB(H)A\subset B(H). Given such an algebra we can write the center as Z(A)=L(X)Z(A)=L^{\infty}(X), we have then a decomposition of type A=XAxdxA=\int_{X}A_{x}dx, and the problem is that of understanding the structure of the fibers, called “factors”. This is what von Neumann himself, and then Connes and others, did. Another idea, more speculative, following later work of Connes, and in parallel work of Voiculescu, is that of writing A=L(X)A=L^{\infty}(X), with XX being an abstract “quantum measured space”, and then trying to understand the geometry and probabilistic theory of XX. Finally, yet another beautiful idea, due this time to Jones, is that of looking at the inclusions A0A1A_{0}\subset A_{1} of von Neumann algebras, instead at the von Neumann algebras themselves, the point being that the “symmetries” of such an inclusion lead to interesting combinatorics.


All in all, many things that can be done with a von Neumann algebra AB(H)A\subset B(H), and explaining the basics, plus having a look at the above 4 directions of research, is already what a medium sized book can cover. And this book is written exactly with this idea in mind. We will talk about all the above, keeping things as simple as possible, and with everything being accessible with a minimal knowledge of undergraduate mathematics.


The book is organized in 4 parts, with Part I explaining the basics of operator theory, Part II explaining the basics of operator algebras, with a look into geometry and probability too, then Part III going into the structure of the von Neumann factors, and finally Part IV being an introduction to the subfactor theory of Jones.


This book contains, besides the basics of the operator algebra theory, some modern material as well, namely quantum group illustrations for pretty much everything, and I am grateful to Julien Bichon, Benoît Collins, Steve Curran and the others, for our joint work. Many thanks go as well to my cats. Their views and opinions on mathematics, and knowledge of advanced functional analysis, have always been of great help.


Cergy, August 2024

Teo Banica

Part I Linear operators


Does anybody here remember Vera Lynn

Remember how she said that

We would meet again

Some sunny day

Chapter 1 Linear algebra

1a. Linear maps

According to various findings in physics, starting with those of Heisenberg from the early 1920s, basic quantum mechanics involves linear operators T:HHT:H\to H from a complex Hilbert space HH to itself. The space HH is typically infinite dimensional, a basic example being the Schrödinger space H=L2(3)H=L^{2}(\mathbb{R}^{3}) of the wave functions ψ:3\psi:\mathbb{R}^{3}\to\mathbb{C} of the electron. In fact, in what regards the electron, this space H=L2(3)H=L^{2}(\mathbb{R}^{3}) is basically the correct one, with the only adjustment needed, due to Pauli and others, being that of tensoring with a copy of K=2K=\mathbb{C}^{2}, in order to account for the electron spin.


But more on this later. Let us start this book more modestly, as follows:

Fact 1.1.

We are interested in quantum mechanics, taking place in infinite dimensions, but as a main source of inspiration we will have H=NH=\mathbb{C}^{N}, with scalar product

<x,y>=ixiy¯i<x,y>=\sum_{i}x_{i}\bar{y}_{i}

with the linearity at left being the standard mathematical convention. More specifically, we will be interested in the mathematics of the linear operators T:HHT:H\to H.

The point now, that you surely know about, is that the above operators T:HHT:H\to H correspond to the square matrices AMN()A\in M_{N}(\mathbb{C}). Thus, as a preliminary to what we want to do in this book, we need a good knowledge of linear algebra over \mathbb{C}.


You probably know well linear algebra, but always good to recall this, and this will be the purpose of the present chapter. Let us start with the very basics:

Theorem 1.2.

The linear maps T:NNT:\mathbb{C}^{N}\to\mathbb{C}^{N} are in correspondence with the square matrices AMN()A\in M_{N}(\mathbb{C}), with the linear map associated to such a matrix being

Tx=AxTx=Ax

and with the matrix associated to a linear map being Aij=<Tej,ei>A_{ij}=<Te_{j},e_{i}>.

Proof.

The first assertion is clear, because a linear map T:NNT:\mathbb{C}^{N}\to\mathbb{C}^{N} must send a vector xNx\in\mathbb{C}^{N} to a certain vector TxNTx\in\mathbb{C}^{N}, all whose components are linear combinations of the components of xx. Thus, we can write, for certain complex numbers AijA_{ij}\in\mathbb{C}:

T(x1xN)=(A11x1++A1NxNAN1x1++ANNxN)T\begin{pmatrix}x_{1}\\ \vdots\\ \vdots\\ x_{N}\end{pmatrix}=\begin{pmatrix}A_{11}x_{1}+\ldots+A_{1N}x_{N}\\ \vdots\\ \vdots\\ A_{N1}x_{1}+\ldots+A_{NN}x_{N}\end{pmatrix}

Now the parameters AijA_{ij}\in\mathbb{C} can be regarded as being the entries of a square matrix AMN()A\in M_{N}(\mathbb{C}), and with the usual convention for matrix multiplication, we have:

Tx=AxTx=Ax

Regarding the second assertion, with Tx=AxTx=Ax as above, if we denote by e1,,eNe_{1},\ldots,e_{N} the standard basis of N\mathbb{C}^{N}, then we have the following formula:

Tej=(A1jANj)Te_{j}=\begin{pmatrix}A_{1j}\\ \vdots\\ \vdots\\ A_{Nj}\end{pmatrix}

But this gives the second formula, <Tej,ei>=Aij<Te_{j},e_{i}>=A_{ij}, as desired. ∎

Our claim now is that, no matter what we want to do with TT or AA, of advanced type, we will run at some point into their adjoints TT^{*} and AA^{*}, constructed as follows:

Theorem 1.3.

The adjoint operator T:NNT^{*}:\mathbb{C}^{N}\to\mathbb{C}^{N}, which is given by

<Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y>

corresponds to the adjoint matrix AMN()A^{*}\in M_{N}(\mathbb{C}), given by

(A)ij=A¯ji(A^{*})_{ij}=\bar{A}_{ji}

via the correspondence between linear maps and matrices constructed above.

Proof.

Given a linear map T:NNT:\mathbb{C}^{N}\to\mathbb{C}^{N}, fix yNy\in\mathbb{C}^{N}, and consider the linear form φ(x)=<Tx,y>\varphi(x)=<Tx,y>. This form must be as follows, for a certain vector TyNT^{*}y\in\mathbb{C}^{N}:

φ(x)=<x,Ty>\varphi(x)=<x,T^{*}y>

Thus, we have constructed a map yTyy\to T^{*}y as in the statement, which is obviously linear, and that we can call TT^{*}. Now by taking the vectors x,yNx,y\in\mathbb{C}^{N} to be elements of the standard basis of N\mathbb{C}^{N}, our defining formula for TT^{*} reads:

<Tei,ej>=<ei,Tej><Te_{i},e_{j}>=<e_{i},T^{*}e_{j}>

By reversing the scalar product on the right, this formula can be written as:

<Tej,ei>=<Tei,ej>¯<T^{*}e_{j},e_{i}>=\overline{<Te_{i},e_{j}>}

But this means that the matrix of TT^{*} is given by (A)ij=A¯ji(A^{*})_{ij}=\bar{A}_{ji}, as desired. ∎

Getting back to our claim, the adjoints * are indeed ubiquitous, as shown by:

Theorem 1.4.

The following happen:

  1. (1)

    T(x)=UxT(x)=Ux with UMN()U\in M_{N}(\mathbb{C}) is an isometry precisely when U=U1U^{*}=U^{-1}.

  2. (2)

    T(x)=PxT(x)=Px with PMN()P\in M_{N}(\mathbb{C}) is a projection precisely when P2=P=PP^{2}=P^{*}=P.

Proof.

Let us first recall that the lengths, or norms, of the vectors xNx\in\mathbb{C}^{N} can be recovered from the knowledge of the scalar products, as follows:

||x||=<x,x>||x||=\sqrt{<x,x>}

Conversely, we can recover the scalar products out of norms, by using the following difficult to remember formula, called complex polarization identity:

4<x,y>=||x+y||2||xy||2+i||x+iy||2i||xiy||24<x,y>=||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}

The proof of this latter formula is indeed elementary, as follows:

||x+y||2||xy||2+i||x+iy||2i||xiy||2\displaystyle||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}
=\displaystyle= ||x||2+||y||2||x||2||y||2+i||x||2+i||y||2i||x||2i||y||2\displaystyle||x||^{2}+||y||^{2}-||x||^{2}-||y||^{2}+i||x||^{2}+i||y||^{2}-i||x||^{2}-i||y||^{2}
+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)\displaystyle+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)
=\displaystyle= 4<x,y>\displaystyle 4<x,y>

Finally, we will use Theorem 1.3, and more specifically the following formula coming from there, valid for any matrix AMN()A\in M_{N}(\mathbb{C}) and any two vectors x,yNx,y\in\mathbb{C}^{N}:

<Ax,y>=<x,Ay><Ax,y>=<x,A^{*}y>

(1) Given a matrix UMN()U\in M_{N}(\mathbb{C}), we have indeed the following equivalences, with the first one coming from the polarization identity, and the other ones being clear:

||Ux||=||x||\displaystyle||Ux||=||x|| \displaystyle\iff <Ux,Uy>=<x,y>\displaystyle<Ux,Uy>=<x,y>
\displaystyle\iff <x,UUy>=<x,y>\displaystyle<x,U^{*}Uy>=<x,y>
\displaystyle\iff UUy=y\displaystyle U^{*}Uy=y
\displaystyle\iff UU=1\displaystyle U^{*}U=1
\displaystyle\iff U=U1\displaystyle U^{*}=U^{-1}

(2) Given a matrix PMN()P\in M_{N}(\mathbb{C}), in order for xPxx\to Px to be an oblique projection, we must have P2=PP^{2}=P. Now observe that this projection is orthogonal when:

<Pxx,Py>=0\displaystyle<Px-x,Py>=0 \displaystyle\iff <PPxPx,y>=0\displaystyle<P^{*}Px-P^{*}x,y>=0
\displaystyle\iff PPxPx=0\displaystyle P^{*}Px-P^{*}x=0
\displaystyle\iff PPP=0\displaystyle P^{*}P-P^{*}=0
\displaystyle\iff PP=P\displaystyle P^{*}P=P^{*}

The point now is that by conjugating the last formula, we obtain PP=PP^{*}P=P. Thus we must have P=PP=P^{*}, and this gives the result. ∎

Summarizing, the linear operators come in pairs T,TT,T^{*}, and the associated matrices come as well in pairs A,AA,A^{*}. This is something quite interesting, philosophically speaking, and will keep this in mind, and come back to it later, on numerous occasions.

1b. Diagonalization

Let us discuss now the diagonalization question for the linear maps and matrices. Again, we will be quite brief here, and for more, we refer to any standard linear algebra book. By the way, there will be some complex analysis involved too, and here we refer to Rudin [rud]. Which book of Rudin will be in fact the one and only true prerequisite for reading the present book, but more on references and reading later.


The basic diagonalization theory, formulated in terms of matrices, is as follows:

Proposition 1.5.

A vector vNv\in\mathbb{C}^{N} is called eigenvector of AMN()A\in M_{N}(\mathbb{C}), with corresponding eigenvalue λ\lambda, when AA multiplies by λ\lambda in the direction of vv:

Av=λvAv=\lambda v

In the case where N\mathbb{C}^{N} has a basis v1,,vNv_{1},\ldots,v_{N} formed by eigenvectors of AA, with corresponding eigenvalues λ1,,λN\lambda_{1},\ldots,\lambda_{N}, in this new basis AA becomes diagonal, as follows:

A(λ1λN)A\sim\begin{pmatrix}\lambda_{1}\\ &\ddots\\ &&\lambda_{N}\end{pmatrix}

Equivalently, if we denote by D=diag(λ1,,λN)D=diag(\lambda_{1},\ldots,\lambda_{N}) the above diagonal matrix, and by P=[v1vN]P=[v_{1}\ldots v_{N}] the square matrix formed by the eigenvectors of AA, we have:

A=PDP1A=PDP^{-1}

In this case we say that the matrix AA is diagonalizable.

Proof.

This is something which is clear, the idea being as follows:

(1) The first assertion is clear, because the matrix which multiplies each basis element viv_{i} by a number λi\lambda_{i} is precisely the diagonal matrix D=diag(λ1,,λN)D=diag(\lambda_{1},\ldots,\lambda_{N}).

(2) The second assertion follows from the first one, by changing the basis. We can prove this by a direct computation as well, because we have Pei=viPe_{i}=v_{i}, and so:

PDP1vi\displaystyle PDP^{-1}v_{i} =\displaystyle= PDei\displaystyle PDe_{i}
=\displaystyle= Pλiei\displaystyle P\lambda_{i}e_{i}
=\displaystyle= λiPei\displaystyle\lambda_{i}Pe_{i}
=\displaystyle= λivi\displaystyle\lambda_{i}v_{i}

Thus, the matrices AA and PDP1PDP^{-1} coincide, as stated. ∎

Let us recall as well that the basic example of a non diagonalizable matrix, over the complex numbers as above, is the following matrix:

J=(0100)J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

Indeed, we have J(xy)=(y0)J\binom{x}{y}=\binom{y}{0}, so the eigenvectors are the vectors of type (x0)\binom{x}{0}, all with eigenvalue 0. Thus, we have not enough eigenvectors for constructing a basis of 2\mathbb{C}^{2}.


In general, in order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:

Theorem 1.6.

Let AMN()A\in M_{N}(\mathbb{C}), and for any eigenvalue λ\lambda\in\mathbb{C} define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:

Eλ={vN|Av=λv}E_{\lambda}=\left\{v\in\mathbb{C}^{N}\Big{|}Av=\lambda v\right\}

These eigenspaces EλE_{\lambda} are then in a direct sum position, in the sense that given vectors v1Eλ1,,vkEλkv_{1}\in E_{\lambda_{1}},\ldots,v_{k}\in E_{\lambda_{k}} corresponding to different eigenvalues λ1,,λk\lambda_{1},\ldots,\lambda_{k}, we have:

icivi=0ci=0\sum_{i}c_{i}v_{i}=0\implies c_{i}=0

In particular we have the following estimate, with sum over all the eigenvalues,

λdim(Eλ)N\sum_{\lambda}\dim(E_{\lambda})\leq N

and our matrix is diagonalizable precisely when we have equality.

Proof.

We prove the first assertion by recurrence on kk\in\mathbb{N}. Assume by contradiction that we have a formula as follows, with the scalars c1,,ckc_{1},\ldots,c_{k} being not all zero:

c1v1++ckvk=0c_{1}v_{1}+\ldots+c_{k}v_{k}=0

By dividing by one of these scalars, we can assume that our formula is:

vk=c1v1++ck1vk1v_{k}=c_{1}v_{1}+\ldots+c_{k-1}v_{k-1}

Now let us apply AA to this vector. On the left we obtain:

Avk=λkvk=λkc1v1++λkck1vk1Av_{k}=\lambda_{k}v_{k}=\lambda_{k}c_{1}v_{1}+\ldots+\lambda_{k}c_{k-1}v_{k-1}

On the right we obtain something different, as follows:

A(c1v1++ck1vk1)\displaystyle A(c_{1}v_{1}+\ldots+c_{k-1}v_{k-1}) =\displaystyle= c1Av1++ck1Avk1\displaystyle c_{1}Av_{1}+\ldots+c_{k-1}Av_{k-1}
=\displaystyle= c1λ1v1++ck1λk1vk1\displaystyle c_{1}\lambda_{1}v_{1}+\ldots+c_{k-1}\lambda_{k-1}v_{k-1}

We conclude from this that the following equality must hold:

λkc1v1++λkck1vk1=c1λ1v1++ck1λk1vk1\lambda_{k}c_{1}v_{1}+\ldots+\lambda_{k}c_{k-1}v_{k-1}=c_{1}\lambda_{1}v_{1}+\ldots+c_{k-1}\lambda_{k-1}v_{k-1}

On the other hand, we know by recurrence that the vectors v1,,vk1v_{1},\ldots,v_{k-1} must be linearly independent. Thus, the coefficients must be equal, at right and at left:

λkc1=c1λ1\lambda_{k}c_{1}=c_{1}\lambda_{1}
\vdots
λkck1=ck1λk1\lambda_{k}c_{k-1}=c_{k-1}\lambda_{k-1}

Now since at least one of the numbers cic_{i} must be nonzero, from λkci=ciλi\lambda_{k}c_{i}=c_{i}\lambda_{i} we obtain λk=λi\lambda_{k}=\lambda_{i}, which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one. ∎

In order to reach now to more advanced results, we can use the characteristic polynomial, which appears via the following fundamental result:

Theorem 1.7.

Given a matrix AMN()A\in M_{N}(\mathbb{C}), consider its characteristic polynomial:

P(x)=det(Ax1N)P(x)=\det(A-x1_{N})

The eigenvalues of AA are then the roots of PP. Also, we have the inequality

dim(Eλ)mλ\dim(E_{\lambda})\leq m_{\lambda}

where mλm_{\lambda} is the multiplicity of λ\lambda, as root of PP.

Proof.

The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:

v,Av=λv\displaystyle\exists v,Av=\lambda v \displaystyle\iff v,(Aλ1N)v=0\displaystyle\exists v,(A-\lambda 1_{N})v=0
\displaystyle\iff det(Aλ1N)=0\displaystyle\det(A-\lambda 1_{N})=0

Regarding now the second assertion, given an eigenvalue λ\lambda of our matrix AA, consider the dimension dλ=dim(Eλ)d_{\lambda}=\dim(E_{\lambda}) of the corresponding eigenspace. By changing the basis of N\mathbb{C}^{N}, as for the eigenspace EλE_{\lambda} to be spanned by the first dλd_{\lambda} basis elements, our matrix becomes as follows, with BB being a certain smaller matrix:

A(λ1dλ00B)A\sim\begin{pmatrix}\lambda 1_{d_{\lambda}}&0\\ 0&B\end{pmatrix}

We conclude that the characteristic polynomial of AA is of the following form:

PA=Pλ1dλPB=(λx)dλPBP_{A}=P_{\lambda 1_{d_{\lambda}}}P_{B}=(\lambda-x)^{d_{\lambda}}P_{B}

Thus the multiplicity mλm_{\lambda} of our eigenvalue λ\lambda, as a root of PP, satisfies mλdλm_{\lambda}\geq d_{\lambda}, and this leads to the conclusion in the statement. ∎

Now recall that we are over \mathbb{C}, which is something that we have not used yet, in our last two statements. And the point here is that we have the following key result:

Theorem 1.8.

Any polynomial P[X]P\in\mathbb{C}[X] decomposes as

P=c(Xa1)(XaN)P=c(X-a_{1})\ldots(X-a_{N})

with cc\in\mathbb{C} and with a1,,aNa_{1},\ldots,a_{N}\in\mathbb{C}.

Proof.

It is enough to prove that PP has one root, and we do this by contradiction. Assume that PP has no roots, and pick a number zz\in\mathbb{C} where |P||P| attains its minimum:

|P(z)|=minx|P(x)|>0|P(z)|=\min_{x\in\mathbb{C}}|P(x)|>0

Since Q(t)=P(z+t)P(z)Q(t)=P(z+t)-P(z) is a polynomial which vanishes at t=0t=0, this polynomial must be of the form ctkct^{k} + higher terms, with c0c\neq 0, and with k1k\geq 1 being an integer. We obtain from this that, with tt\in\mathbb{C} small, we have the following estimate:

P(z+t)P(z)+ctkP(z+t)\simeq P(z)+ct^{k}

Now let us write t=rwt=rw, with r>0r>0 small, and with |w|=1|w|=1. Our estimate becomes:

P(z+rw)P(z)+crkwkP(z+rw)\simeq P(z)+cr^{k}w^{k}

Now recall that we have assumed P(z)0P(z)\neq 0. We can therefore choose w𝕋w\in\mathbb{T} such that cwkcw^{k} points in the opposite direction to that of P(z)P(z), and we obtain in this way:

|P(z+rw)||P(z)+crkwk|=|P(z)|(1|c|rk)|P(z+rw)|\simeq|P(z)+cr^{k}w^{k}|=|P(z)|(1-|c|r^{k})

Now by choosing r>0r>0 small enough, as for the error in the first estimate to be small, and overcame by the negative quantity |c|rk-|c|r^{k}, we obtain from this:

|P(z+rw)|<|P(z)||P(z+rw)|<|P(z)|

But this contradicts our definition of zz\in\mathbb{C}, as a point where |P||P| attains its minimum. Thus PP has a root, and by recurrence it has NN roots, as stated. ∎

Now by putting everything together, we obtain the following result:

Theorem 1.9.

Given a matrix AMN()A\in M_{N}(\mathbb{C}), consider its characteristic polynomial

P(X)=det(AX1N)P(X)=\det(A-X1_{N})

then factorize this polynomial, by computing the complex roots, with multiplicities,

P(X)=(1)N(Xλ1)n1(Xλk)nkP(X)=(-1)^{N}(X-\lambda_{1})^{n_{1}}\ldots(X-\lambda_{k})^{n_{k}}

and finally compute the corresponding eigenspaces, for each eigenvalue found:

Ei={vN|Av=λiv}E_{i}=\left\{v\in\mathbb{C}^{N}\Big{|}Av=\lambda_{i}v\right\}

The dimensions of these eigenspaces satisfy then the following inequalities,

dim(Ei)ni\dim(E_{i})\leq n_{i}

and AA is diagonalizable precisely when we have equality for any ii.

Proof.

This follows by combining Theorem 1.6, Theorem 1.7 and Theorem 1.8. Indeed, the statement is well formulated, thanks to Theorem 1.8. By summing the inequalities dim(Eλ)mλ\dim(E_{\lambda})\leq m_{\lambda} from Theorem 1.7, we obtain an inequality as follows:

λdim(Eλ)λmλN\sum_{\lambda}\dim(E_{\lambda})\leq\sum_{\lambda}m_{\lambda}\leq N

On the other hand, we know from Theorem 1.6 that our matrix is diagonalizable when we have global equality. Thus, we are led to the conclusion in the statement. ∎

This was for the main result of linear algebra. There are countless applications of this, and generally speaking, advanced linear algebra consists in building on Theorem 1.9.


In practice, diagonalizing a matrix remains something quite complicated. Let us record a useful algorithmic version of the above result, as follows:

Theorem 1.10.

The square matrices AMN()A\in M_{N}(\mathbb{C}) can be diagonalized as follows:

  1. (1)

    Compute the characteristic polynomial.

  2. (2)

    Factorize the characteristic polynomial.

  3. (3)

    Compute the eigenvectors, for each eigenvalue found.

  4. (4)

    If there are no NN eigenvectors, AA is not diagonalizable.

  5. (5)

    Otherwise, AA is diagonalizable, A=PDP1A=PDP^{-1}.

Proof.

This is an informal reformulation of Theorem 1.9, with (4) referring to the total number of linearly independent eigenvectors found in (3), and with A=PDP1A=PDP^{-1} in (5) being the usual diagonalization formula, with P,DP,D being as before. ∎

As an illustration for all this, which is a must-know computation, we have:

Proposition 1.11.

The rotation of angle tt\in\mathbb{R} in the plane diagonalizes as:

(costsintsintcost)=12(11ii)(eit00eit)(1i1i)\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}=\frac{1}{2}\begin{pmatrix}1&1\\ i&-i\end{pmatrix}\begin{pmatrix}e^{-it}&0\\ 0&e^{it}\end{pmatrix}\begin{pmatrix}1&-i\\ 1&i\end{pmatrix}

Over the reals this is impossible, unless t=0,πt=0,\pi, where the rotation is diagonal.

Proof.

Observe first that, as indicated, unlike we are in the case t=0,πt=0,\pi, where our rotation is ±12\pm 1_{2}, our rotation is a “true” rotation, having no eigenvectors in the plane. Fortunately the complex numbers come to the rescue, via the following computation:

(costsintsintcost)(1i)=(costisinticost+sint)=eit(1i)\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{i}=\binom{\cos t-i\sin t}{i\cos t+\sin t}=e^{-it}\binom{1}{i}

We have as well a second complex eigenvector, coming from:

(costsintsintcost)(1i)=(cost+isinticost+sint)=eit(1i)\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{-i}=\binom{\cos t+i\sin t}{-i\cos t+\sin t}=e^{it}\binom{1}{-i}

Thus, we are led to the conclusion in the statement. ∎

1c. Matrix tricks

At the level of basic examples of diagonalizable matrices, we first have the following result, which provides us with the “generic” examples:

Theorem 1.12.

For a matrix AMN()A\in M_{N}(\mathbb{C}) the following conditions are equivalent,

  1. (1)

    The eigenvalues are different, λiλj\lambda_{i}\neq\lambda_{j},

  2. (2)

    The characteristic polynomial PP has simple roots,

  3. (3)

    The characteristic polynomial satisfies (P,P)=1(P,P^{\prime})=1,

  4. (4)

    The resultant of P,PP,P^{\prime} is nonzero, R(P,P)0R(P,P^{\prime})\neq 0,

  5. (5)

    The discriminant of PP is nonzero, Δ(P)0\Delta(P)\neq 0,

and in this case, the matrix is diagonalizable.

Proof.

The last assertion holds indeed, due to Theorem 1.9. As for the equivalences in the statement, these are all standard, the idea for their proofs, along with some more theory, needed for using in practice the present result, being as follows:

(1)(2)(1)\iff(2) This follows from Theorem 1.9.

(2)(3)(2)\iff(3) This is standard, the double roots of PP being roots of PP^{\prime}.

(3)(4)(3)\iff(4) The idea here is that associated to any two polynomials P,QP,Q is their resultant R(P,Q)R(P,Q), which checks whether P,QP,Q have a common root. Let us write:

P=c(Xa1)(Xak)P=c(X-a_{1})\ldots(X-a_{k})
Q=d(Xb1)(Xbl)Q=d(X-b_{1})\ldots(X-b_{l})

We can define then the resultant as being the following quantity:

R(P,Q)=cldkij(aibj)R(P,Q)=c^{l}d^{k}\prod_{ij}(a_{i}-b_{j})

The point now, that we will explain as well, is that this is a polynomial in the coefficients of P,QP,Q, with integer coefficients. Indeed, this can be checked as follows:

– We can expand the formula of R(P,Q)R(P,Q), and in what regards a1,,aka_{1},\ldots,a_{k}, which are the roots of PP, we obtain in this way certain symmetric functions in these variables, which will be therefore polynomials in the coefficients of PP, with integer coefficients.

– We can then look what happens with respect to the remaining variables b1,,blb_{1},\ldots,b_{l}, which are the roots of QQ. Once again what we have here are certain symmetric functions, and so polynomials in the coefficients of QQ, with integer coefficients.

– Thus, we are led to the above conclusion, that R(P,Q)R(P,Q) is a polynomial in the coefficients of P,QP,Q, with integer coefficients, and with the remark that the cldkc^{l}d^{k} factor is there for these latter coefficients to be indeed integers, instead of rationals.

Alternatively, let us write our two polynomials in usual form, as follows:

P=pkXk++p1X+p0P=p_{k}X^{k}+\ldots+p_{1}X+p_{0}
Q=qlXl++q1X+q0Q=q_{l}X^{l}+\ldots+q_{1}X+q_{0}

The corresponding resultant appears then as the determinant of an associated matrix, having size k+lk+l, and having 0 coefficients at the blank spaces, as follows:

R(P,Q)=|pkqlp0pkq0qlp0q0|R(P,Q)=\begin{vmatrix}p_{k}&&&q_{l}\\ \vdots&\ddots&&\vdots&\ddots\\ p_{0}&&p_{k}&q_{0}&&q_{l}\\ &\ddots&\vdots&&\ddots&\vdots\\ &&p_{0}&&&q_{0}\end{vmatrix}

(4)(5)(4)\iff(5) Once again this is something standard, the idea here being that the discriminant Δ(P)\Delta(P) of a polynomial P[X]P\in\mathbb{C}[X] is, modulo scalars, the resultant R(P,P)R(P,P^{\prime}). To be more precise, let us write our polynomial as follows:

P(X)=cXN+dXN1+P(X)=cX^{N}+dX^{N-1}+\ldots

Its discriminant is then defined as being the following quantity:

Δ(P)=(1)(N2)cR(P,P)\Delta(P)=\frac{(-1)^{\binom{N}{2}}}{c}R(P,P^{\prime})

This is a polynomial in the coefficients of PP, with integer coefficients, with the division by cc being indeed possible, under \mathbb{Z}, and with the sign being there for various reasons, including the compatibility with some well-known formulae, at small values of NN. ∎

All the above might seem a bit complicated, so as an illustration, let us work out an example. Consider the case of a polynomial of degree 2, and a polynomial of degree 1:

P=ax2+bx+c,Q=dx+eP=ax^{2}+bx+c\quad,\quad Q=dx+e

In order to compute the resultant, let us factorize our polynomials:

P=a(xp)(xq),Q=d(xr)P=a(x-p)(x-q)\quad,\quad Q=d(x-r)

The resultant can be then computed as follows, by using the two-step method:

R(P,Q)\displaystyle R(P,Q) =\displaystyle= ad2(pr)(qr)\displaystyle ad^{2}(p-r)(q-r)
=\displaystyle= ad2(pq(p+q)r+r2)\displaystyle ad^{2}(pq-(p+q)r+r^{2})
=\displaystyle= cd2+bd2r+ad2r2\displaystyle cd^{2}+bd^{2}r+ad^{2}r^{2}
=\displaystyle= cd2bde+ae2\displaystyle cd^{2}-bde+ae^{2}

Observe that R(P,Q)=0R(P,Q)=0 corresponds indeed to the fact that P,QP,Q have a common root. Indeed, the root of QQ is r=e/dr=-e/d, and we have:

P(r)=ae2d2bed+c=R(P,Q)d2P(r)=\frac{ae^{2}}{d^{2}}-\frac{be}{d}+c=\frac{R(P,Q)}{d^{2}}

We can recover as well the resultant as a determinant, as follows:

R(P,Q)=|ad0bedc0e|=ae2bde+cd2R(P,Q)=\begin{vmatrix}a&d&0\\ b&e&d\\ c&0&e\end{vmatrix}=ae^{2}-bde+cd^{2}

Finally, in what regards the discriminant, let us see what happens in degree 2. Here we must compute the resultant of the following two polynomials:

P=aX2+bX+c,P=2aX+bP=aX^{2}+bX+c\quad,\quad P^{\prime}=2aX+b

The resultant is then given by the following formula:

R(P,P)\displaystyle R(P,P^{\prime}) =\displaystyle= ab2b(2a)b+c(2a)2\displaystyle ab^{2}-b(2a)b+c(2a)^{2}
=\displaystyle= 4a2cab2\displaystyle 4a^{2}c-ab^{2}
=\displaystyle= a(b24ac)\displaystyle-a(b^{2}-4ac)

Now by doing the discriminant normalizations, we obtain, as we should:

Δ(P)=b24ac\Delta(P)=b^{2}-4ac

As already mentioned, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, which are quite advanced:

Theorem 1.13.

The following happen, inside MN()M_{N}(\mathbb{C}):

  1. (1)

    The invertible matrices are dense.

  2. (2)

    The matrices having distinct eigenvalues are dense.

  3. (3)

    The diagonalizable matrices are dense.

Proof.

These are quite advanced results, which can be proved as follows:

(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition detA0\det A\neq 0. Thus, the set formed by these matrices appears as the complement of the hypersurface detA=0\det A=0, and so must be dense inside MN()M_{N}(\mathbb{C}), as claimed.

(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the hypersurface given by Δ(PA)=0\Delta(P_{A})=0, and so must be dense inside MN()M_{N}(\mathbb{C}), as claimed.

(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 1.12. There are of course some other proofs as well, for instance by putting the matrix in Jordan form. ∎

As an application of the above results, and of our methods in general, we have:

Theorem 1.14.

The following happen:

  1. (1)

    We have PAB=PBAP_{AB}=P_{BA}, for any two matrices A,BMN()A,B\in M_{N}(\mathbb{C}).

  2. (2)

    AB,BAAB,BA have the same eigenvalues, with the same multiplicities.

  3. (3)

    If AA has eigenvalues λ1,,λN\lambda_{1},\ldots,\lambda_{N}, then f(A)f(A) has eigenvalues f(λ1),,f(λN)f(\lambda_{1}),\ldots,f(\lambda_{N}).

Proof.

These results can be deduced by using Theorem 1.13, as follows:

(1) It follows from definitions that the characteristic polynomial of a matrix is invariant under conjugation, in the sense that we have the following formula:

PC=PACA1P_{C}=P_{ACA^{-1}}

Now observe that, when assuming that AA is invertible, we have:

AB=A(BA)A1AB=A(BA)A^{-1}

Thus, we have the result when AA is invertible. By using now Theorem 1.13 (1), we conclude that this formula holds for any matrix AA, by continuity.

(2) This is a reformulation of (1), via the fact that PP encodes the eigenvalues, with multiplicities, which is hard to prove with bare hands.

(3) This is something quite informal, clear for the diagonal matrices DD, then for the diagonalizable matrices PDP1PDP^{-1}, and finally for all matrices, by using Theorem 1.13 (3), provided that ff has suitable regularity properties. We will be back to this. ∎

Let us go back to the main problem raised by the diagonalization procedure, namely the computation of the roots of characteristic polynomials. We have here:

Theorem 1.15.

The complex eigenvalues of a matrix AMN()A\in M_{N}(\mathbb{C}), counted with multiplicities, have the following properties:

  1. (1)

    Their sum is the trace.

  2. (2)

    Their product is the determinant.

Proof.

Consider indeed the characteristic polynomial PP of the matrix:

P(X)\displaystyle P(X) =\displaystyle= det(AX1N)\displaystyle\det(A-X1_{N})
=\displaystyle= (1)NXN+(1)N1Tr(A)XN1++det(A)\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}Tr(A)X^{N-1}+\ldots+\det(A)

We can factorize this polynomial, by using its NN complex roots, and we obtain:

P(X)\displaystyle P(X) =\displaystyle= (1)N(Xλ1)(XλN)\displaystyle(-1)^{N}(X-\lambda_{1})\ldots(X-\lambda_{N})
=\displaystyle= (1)NXN+(1)N1(iλi)XN1++iλi\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}\left(\sum_{i}\lambda_{i}\right)X^{N-1}+\ldots+\prod_{i}\lambda_{i}

Thus, we are led to the conclusion in the statement. ∎

Regarding now the intermediate terms, we have here:

Theorem 1.16.

Assume that AMN()A\in M_{N}(\mathbb{C}) has eigenvalues λ1,,λN\lambda_{1},\ldots,\lambda_{N}\in\mathbb{C}, counted with multiplicities. The basic symmetric functions of these eigenvalues, namely

ck=i1<<ikλi1λikc_{k}=\sum_{i_{1}<\ldots<i_{k}}\lambda_{i_{1}}\ldots\lambda_{i_{k}}

are then given by the fact that the characteristic polynomial of the matrix is:

P(X)=(1)Nk=0N(1)kckXkP(X)=(-1)^{N}\sum_{k=0}^{N}(-1)^{k}c_{k}X^{k}

Moreover, all symmetric functions of the eigenvalues, such as the sums of powers

ds=λ1s++λNsd_{s}=\lambda_{1}^{s}+\ldots+\lambda_{N}^{s}

appear as polynomials in these characteristic polynomial coefficients ckc_{k}.

Proof.

These results can be proved by doing some algebra, as follows:

(1) Consider indeed the characteristic polynomial PP of the matrix, factorized by using its NN complex roots, taken with multiplicities. By expanding, we obtain:

P(X)\displaystyle P(X) =\displaystyle= (1)N(Xλ1)(XλN)\displaystyle(-1)^{N}(X-\lambda_{1})\ldots(X-\lambda_{N})
=\displaystyle= (1)NXN+(1)N1(iλi)XN1++iλi\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}\left(\sum_{i}\lambda_{i}\right)X^{N-1}+\ldots+\prod_{i}\lambda_{i}
=\displaystyle= (1)NXN+(1)N1c1XN1++(1)0cN\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}c_{1}X^{N-1}+\ldots+(-1)^{0}c_{N}
=\displaystyle= (1)N(XNc1XN1++(1)NcN)\displaystyle(-1)^{N}\left(X^{N}-c_{1}X^{N-1}+\ldots+(-1)^{N}c_{N}\right)

With the convention c0=1c_{0}=1, we are led to the conclusion in the statement.

(2) This is something standard, coming by doing some abstract algebra. Working out the formulae for the sums of powers ds=iλisd_{s}=\sum_{i}\lambda_{i}^{s}, at small values of the exponent ss\in\mathbb{N}, is an excellent exercise, which shows how to proceed in general, by recurrence. ∎

1d. Spectral theorems

Let us go back now to the diagonalization question. Here is a key result:

Theorem 1.17.

Any matrix AMN()A\in M_{N}(\mathbb{C}) which is self-adjoint, A=AA=A^{*}, is diagonalizable, with the diagonalization being of the following type,

A=UDUA=UDU^{*}

with UUNU\in U_{N}, and with DMN()D\in M_{N}(\mathbb{R}) diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because if we take a matrix of the form A=UDUA=UDU^{*}, with UU unitary and DD diagonal and real, then we have:

A\displaystyle A^{*} =\displaystyle= (UDU)\displaystyle(UDU^{*})^{*}
=\displaystyle= UDU\displaystyle UD^{*}U^{*}
=\displaystyle= UDU\displaystyle UDU^{*}
=\displaystyle= A\displaystyle A

In the other sense now, assume that AA is self-adjoint, A=AA=A^{*}. Our first claim is that the eigenvalues are real. Indeed, assuming Av=λvAv=\lambda v, we have:

λ<v,v>\displaystyle\lambda<v,v> =\displaystyle= <λv,v>\displaystyle<\lambda v,v>
=\displaystyle= <Av,v>\displaystyle<Av,v>
=\displaystyle= <v,Av>\displaystyle<v,Av>
=\displaystyle= <v,λv>\displaystyle<v,\lambda v>
=\displaystyle= λ¯<v,v>\displaystyle\bar{\lambda}<v,v>

Thus we obtain λ\lambda\in\mathbb{R}, as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

Av=λv,Aw=μwAv=\lambda v\quad,\quad Aw=\mu w

We have then the following computation, using λ,μ\lambda,\mu\in\mathbb{R}:

λ<v,w>\displaystyle\lambda<v,w> =\displaystyle= <λv,w>\displaystyle<\lambda v,w>
=\displaystyle= <Av,w>\displaystyle<Av,w>
=\displaystyle= <v,Aw>\displaystyle<v,Aw>
=\displaystyle= <v,μw>\displaystyle<v,\mu w>
=\displaystyle= μ<v,w>\displaystyle\mu<v,w>

Thus λμ\lambda\neq\mu implies vwv\perp w, as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of AA span the whole space N\mathbb{C}^{N}. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

Av=λvAv=\lambda v

Assuming now that we have a vector ww orthogonal to it, vwv\perp w, we have:

<Aw,v>\displaystyle<Aw,v> =\displaystyle= <w,Av>\displaystyle<w,Av>
=\displaystyle= <w,λv>\displaystyle<w,\lambda v>
=\displaystyle= λ<w,v>\displaystyle\lambda<w,v>
=\displaystyle= 0\displaystyle 0

Thus, if vv is an eigenvector, then the vector space vv^{\perp} is invariant under AA. Moreover, since a matrix AA is self-adjoint precisely when <Av,v><Av,v>\in\mathbb{R} for any vector vNv\in\mathbb{C}^{N}, as one can see by expanding the scalar product, the restriction of AA to the subspace vv^{\perp} is self-adjoint. Thus, we can proceed by recurrence, and we obtain the result. ∎

As basic examples of self-adjoint matrices, we have the orthogonal projections. The diagonalization result regarding them is as follows:

Proposition 1.18.

The matrices PMN()P\in M_{N}(\mathbb{C}) which are projections,

P2=P=PP^{2}=P^{*}=P

are precisely those which diagonalize as follows,

P=UDUP=UDU^{*}

with UUNU\in U_{N}, and with DMN(0,1)D\in M_{N}(0,1) being diagonal.

Proof.

The equation for the projections being P2=P=PP^{2}=P^{*}=P, the eigenvalues λ\lambda are real, and we have as well the following condition, coming from P2=PP^{2}=P:

λ<v,v>\displaystyle\lambda<v,v> =\displaystyle= <λv,v>\displaystyle<\lambda v,v>
=\displaystyle= <Pv,v>\displaystyle<Pv,v>
=\displaystyle= <P2v,v>\displaystyle<P^{2}v,v>
=\displaystyle= <Pv,Pv>\displaystyle<Pv,Pv>
=\displaystyle= <λv,λv>\displaystyle<\lambda v,\lambda v>
=\displaystyle= λ2<v,v>\displaystyle\lambda^{2}<v,v>

Thus we obtain λ{0,1}\lambda\in\{0,1\}, as claimed, and as a final conclusion here, the diagonalization of the self-adjoint matrices is as follows, with ei{0,1}e_{i}\in\{0,1\}:

P(e1eN)P\sim\begin{pmatrix}e_{1}\\ &\ddots\\ &&e_{N}\end{pmatrix}

To be more precise, the number of 1 values is the dimension of the image of PP, and the number of 0 values is the dimension of space of vectors sent to 0 by PP. ∎

An important class of self-adjoint matrices, which includes for instance all the projections, are the positive matrices. The theory here is as follows:

Theorem 1.19.

For a matrix AMN()A\in M_{N}(\mathbb{C}) the following conditions are equivalent, and if they are satisfied, we say that AA is positive:

  1. (1)

    A=B2A=B^{2}, with B=BB=B^{*}.

  2. (2)

    A=CCA=CC^{*}, for some CMN()C\in M_{N}(\mathbb{C}).

  3. (3)

    <Ax,x>0<Ax,x>\geq 0, for any vector xNx\in\mathbb{C}^{N}.

  4. (4)

    A=AA=A^{*}, and the eigenvalues are positive, λi0\lambda_{i}\geq 0.

  5. (5)

    A=UDUA=UDU^{*}, with UUNU\in U_{N} and with DMN(+)D\in M_{N}(\mathbb{R}_{+}) diagonal.

Proof.

The idea is that the equivalences in the statement basically follow from some elementary computations, with only Theorem 1.17 needed, at some point:

(1)(2)(1)\implies(2) This is clear, because we can take C=BC=B.

(2)(3)(2)\implies(3) This follows from the following computation:

<Ax,x>\displaystyle<Ax,x> =\displaystyle= <CCx,x>\displaystyle<CC^{*}x,x>
=\displaystyle= <Cx,Cx>\displaystyle<C^{*}x,C^{*}x>
\displaystyle\geq 0\displaystyle 0

(3)(4)(3)\implies(4) By using the fact that <Ax,x><Ax,x> is real, we have:

<Ax,x>\displaystyle<Ax,x> =\displaystyle= <x,Ax>\displaystyle<x,A^{*}x>
=\displaystyle= <Ax,x>\displaystyle<A^{*}x,x>

Thus we have A=AA=A^{*}, and the remaining assertion, regarding the eigenvalues, follows from the following computation, assuming Ax=λxAx=\lambda x:

<Ax,x>\displaystyle<Ax,x> =\displaystyle= <λx,x>\displaystyle<\lambda x,x>
=\displaystyle= λ<x,x>\displaystyle\lambda<x,x>
\displaystyle\geq 0\displaystyle 0

(4)(5)(4)\implies(5) This follows indeed by using Theorem 1.17.

(5)(1)(5)\implies(1) Assuming A=UDUA=UDU^{*}, with UUNU\in U_{N}, and with DMN(+)D\in M_{N}(\mathbb{R}_{+}) being diagonal, we can set B=UDUB=U\sqrt{D}U^{*}. Then BB is self-adjoint, and its square is given by:

B2\displaystyle B^{2} =\displaystyle= UDUUDU\displaystyle U\sqrt{D}U^{*}\cdot U\sqrt{D}U^{*}
=\displaystyle= UDU\displaystyle UDU^{*}
=\displaystyle= A\displaystyle A

Thus, we are led to the conclusion in the statement. ∎

Let us record as well the following technical version of the above result:

Theorem 1.20.

For a matrix AMN()A\in M_{N}(\mathbb{C}) the following conditions are equivalent, and if they are satisfied, we say that AA is strictly positive:

  1. (1)

    A=B2A=B^{2}, with B=BB=B^{*}, invertible.

  2. (2)

    A=CCA=CC^{*}, for some CMN()C\in M_{N}(\mathbb{C}) invertible.

  3. (3)

    <Ax,x>>0<Ax,x>>0, for any nonzero vector xNx\in\mathbb{C}^{N}.

  4. (4)

    A=AA=A^{*}, and the eigenvalues are strictly positive, λi>0\lambda_{i}>0.

  5. (5)

    A=UDUA=UDU^{*}, with UUNU\in U_{N} and with DMN(+)D\in M_{N}(\mathbb{R}_{+}^{*}) diagonal.

Proof.

This follows either from Theorem 1.19, by adding the various extra assumptions in the statement, or from the proof of Theorem 1.19, by modifying where needed. ∎

Let us discuss now the case of the unitary matrices. We have here:

Theorem 1.21.

Any matrix UMN()U\in M_{N}(\mathbb{C}) which is unitary, U=U1U^{*}=U^{-1}, is diagonalizable, with the eigenvalues on 𝕋\mathbb{T}. More precisely we have

U=VDVU=VDV^{*}

with VUNV\in U_{N}, and with DMN(𝕋)D\in M_{N}(\mathbb{T}) diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because given a matrix of type U=VDVU=VDV^{*}, with VUNV\in U_{N}, and with DMN(𝕋)D\in M_{N}(\mathbb{T}) being diagonal, we have:

U\displaystyle U^{*} =\displaystyle= (VDV)\displaystyle(VDV^{*})^{*}
=\displaystyle= VDV\displaystyle VD^{*}V^{*}
=\displaystyle= VD1V1\displaystyle VD^{-1}V^{-1}
=\displaystyle= (V)1D1V1\displaystyle(V^{*})^{-1}D^{-1}V^{-1}
=\displaystyle= (VDV)1\displaystyle(VDV^{*})^{-1}
=\displaystyle= U1\displaystyle U^{-1}

Let us prove now the first assertion, stating that the eigenvalues of a unitary matrix UUNU\in U_{N} belong to 𝕋\mathbb{T}. Indeed, assuming Uv=λvUv=\lambda v, we have:

<v,v>\displaystyle<v,v> =\displaystyle= <UUv,v>\displaystyle<U^{*}Uv,v>
=\displaystyle= <Uv,Uv>\displaystyle<Uv,Uv>
=\displaystyle= <λv,λv>\displaystyle<\lambda v,\lambda v>
=\displaystyle= |λ|2<v,v>\displaystyle|\lambda|^{2}<v,v>

Thus we obtain λ𝕋\lambda\in\mathbb{T}, as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

Uv=λv,Uw=μwUv=\lambda v\quad,\quad Uw=\mu w

We have then the following computation, using U=U1U^{*}=U^{-1} and λ,μ𝕋\lambda,\mu\in\mathbb{T}:

λ<v,w>\displaystyle\lambda<v,w> =\displaystyle= <λv,w>\displaystyle<\lambda v,w>
=\displaystyle= <Uv,w>\displaystyle<Uv,w>
=\displaystyle= <v,Uw>\displaystyle<v,U^{*}w>
=\displaystyle= <v,U1w>\displaystyle<v,U^{-1}w>
=\displaystyle= <v,μ1w>\displaystyle<v,\mu^{-1}w>
=\displaystyle= μ<v,w>\displaystyle\mu<v,w>

Thus λμ\lambda\neq\mu implies vwv\perp w, as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of UU span the whole space N\mathbb{C}^{N}. For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

Uv=λvUv=\lambda v

Assuming that we have a vector ww orthogonal to it, vwv\perp w, we have:

<Uw,v>\displaystyle<Uw,v> =\displaystyle= <w,Uv>\displaystyle<w,U^{*}v>
=\displaystyle= <w,U1v>\displaystyle<w,U^{-1}v>
=\displaystyle= <w,λ1v>\displaystyle<w,\lambda^{-1}v>
=\displaystyle= λ<w,v>\displaystyle\lambda<w,v>
=\displaystyle= 0\displaystyle 0

Thus, if vv is an eigenvector, then the vector space vv^{\perp} is invariant under UU. Now since UU is an isometry, so is its restriction to this space vv^{\perp}. Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result. ∎

The self-adjoint matrices and the unitary matrices are particular cases of the general notion of a “normal matrix”, and we have here:

Theorem 1.22.

Any matrix AMN()A\in M_{N}(\mathbb{C}) which is normal, AA=AAAA^{*}=A^{*}A, is diagonalizable, with the diagonalization being of the following type,

A=UDUA=UDU^{*}

with UUNU\in U_{N}, and with DMN()D\in M_{N}(\mathbb{C}) diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because if we take a matrix of the form A=UDUA=UDU^{*}, with UU unitary and DD diagonal, then we have:

AA\displaystyle AA^{*} =\displaystyle= UDUUDU\displaystyle UDU^{*}\cdot UD^{*}U^{*}
=\displaystyle= UDDU\displaystyle UDD^{*}U^{*}
=\displaystyle= UDDU\displaystyle UD^{*}DU^{*}
=\displaystyle= UDUUDU\displaystyle UD^{*}U^{*}\cdot UDU^{*}
=\displaystyle= AA\displaystyle A^{*}A

In the other sense now, this is something more technical. Our first claim is that a matrix AA is normal precisely when the following happens, for any vector vv:

||Av||=||Av||||Av||=||A^{*}v||

Indeed, the above equality can be written as follows:

<AAv,v>=<AAv,v><AA^{*}v,v>=<A^{*}Av,v>

But this is equivalent to AA=AAAA^{*}=A^{*}A, by expanding the scalar products. Our next claim is that A,AA,A^{*} have the same eigenvectors, with conjugate eigenvalues:

Av=λvAv=λ¯vAv=\lambda v\implies A^{*}v=\bar{\lambda}v

Indeed, this follows from the following computation, and from the trivial fact that if AA is normal, then so is any matrix of type Aλ1NA-\lambda 1_{N}:

||(Aλ¯1N)v||\displaystyle||(A^{*}-\bar{\lambda}1_{N})v|| =\displaystyle= ||(Aλ1N)v||\displaystyle||(A-\lambda 1_{N})^{*}v||
=\displaystyle= ||(Aλ1N)v||\displaystyle||(A-\lambda 1_{N})v||
=\displaystyle= 0\displaystyle 0

Let us prove now, by using this, that the eigenspaces of AA are pairwise orthogonal. Assume that we have two eigenvectors, corresponding to different eigenvalues, λμ\lambda\neq\mu:

Av=λv,Aw=μwAv=\lambda v\quad,\quad Aw=\mu w

We have the following computation, which shows that λμ\lambda\neq\mu implies vwv\perp w:

λ<v,w>\displaystyle\lambda<v,w> =\displaystyle= <λv,w>\displaystyle<\lambda v,w>
=\displaystyle= <Av,w>\displaystyle<Av,w>
=\displaystyle= <v,Aw>\displaystyle<v,A^{*}w>
=\displaystyle= <v,μ¯w>\displaystyle<v,\bar{\mu}w>
=\displaystyle= μ<v,w>\displaystyle\mu<v,w>

In order to finish, it remains to prove that the eigenspaces of AA span the whole N\mathbb{C}^{N}. This is something that we have already seen for the self-adjoint matrices, and for unitaries, and we will use here these results, in order to deal with the general normal case. As a first observation, given an arbitrary matrix AA, the matrix AAAA^{*} is self-adjoint:

(AA)=AA(AA^{*})^{*}=AA^{*}

Thus, we can diagonalize this matrix AAAA^{*}, as follows, with the passage matrix being a unitary, VUNV\in U_{N}, and with the diagonal form being real, EMN()E\in M_{N}(\mathbb{R}):

AA=VEVAA^{*}=VEV^{*}

Now observe that, for matrices of type A=UDUA=UDU^{*}, which are those that we supposed to deal with, we have the following formulae:

V=U,E=DD¯V=U\quad,\quad E=D\bar{D}

In particular, the matrices AA and AAAA^{*} have the same eigenspaces. So, this will be our idea, proving that the eigenspaces of AAAA^{*} are eigenspaces of AA. In order to do so, let us pick two eigenvectors v,wv,w of the matrix AAAA^{*}, corresponding to different eigenvalues, λμ\lambda\neq\mu. The eigenvalue equations are then as follows:

AAv=λv,AAw=μwAA^{*}v=\lambda v\quad,\quad AA^{*}w=\mu w

We have the following computation, using the normality condition AA=AAAA^{*}=A^{*}A, and the fact that the eigenvalues of AAAA^{*}, and in particular μ\mu, are real:

λ<Av,w>\displaystyle\lambda<Av,w> =\displaystyle= <λAv,w>\displaystyle<\lambda Av,w>
=\displaystyle= <Aλv,w>\displaystyle<A\lambda v,w>
=\displaystyle= <AAAv,w>\displaystyle<AAA^{*}v,w>
=\displaystyle= <AAAv,w>\displaystyle<AA^{*}Av,w>
=\displaystyle= <Av,AAw>\displaystyle<Av,AA^{*}w>
=\displaystyle= <Av,μw>\displaystyle<Av,\mu w>
=\displaystyle= μ<Av,w>\displaystyle\mu<Av,w>

We conclude that we have <Av,w>=0<Av,w>=0. But this reformulates as follows:

λμA(Eλ)Eμ\lambda\neq\mu\implies A(E_{\lambda})\perp E_{\mu}

Now since the eigenspaces of AAAA^{*} are pairwise orthogonal, and span the whole N\mathbb{C}^{N}, we deduce from this that these eigenspaces are invariant under AA:

A(Eλ)EλA(E_{\lambda})\subset E_{\lambda}

But with this result in hand, we can finish. Indeed, we can decompose the problem, and the matrix AA itself, following these eigenspaces of AAAA^{*}, which in practice amounts in saying that we can assume that we only have 1 eigenspace. Now by rescaling, this is the same as assuming that we have AA=1AA^{*}=1. But with this, we are now into the unitary case, that we know how to solve, as explained in Theorem 1.21, and so done. ∎

As a first application, we have the following result:

Theorem 1.23.

Given a matrix AMN()A\in M_{N}(\mathbb{C}), we can construct a matrix |A||A| as follows, by using the fact that AAA^{*}A is diagonalizable, with positive eigenvalues:

|A|=AA|A|=\sqrt{A^{*}A}

This matrix |A||A| is then positive, and its square is |A|2=AA|A|^{2}=A^{*}A. In the case N=1N=1, we obtain in this way the usual absolute value of the complex numbers.

Proof.

Consider indeed the matrix AAA^{*}A, which is normal. According to Theorem 1.22, we can diagonalize this matrix as follows, with UUNU\in U_{N}, and with DD diagonal:

A=UDUA=UDU^{*}

From AA0A^{*}A\geq 0 we obtain D0D\geq 0. But this means that the entries of DD are real, and positive. Thus we can extract the square root D\sqrt{D}, and then set:

AA=UDU\sqrt{A^{*}A}=U\sqrt{D}U^{*}

Thus, we are basically done. Indeed, if we call this latter matrix |A||A|, then we are led to the conclusions in the statement. Finally, the last assertion is clear from definitions. ∎

We can now formulate a first polar decomposition result, as follows:

Theorem 1.24.

Any invertible matrix AMN()A\in M_{N}(\mathbb{C}) decomposes as

A=U|A|A=U|A|

with UUNU\in U_{N}, and with |A|=AA|A|=\sqrt{A^{*}A} as above.

Proof.

This is routine, and follows by comparing the actions of A,|A|A,|A| on the vectors vNv\in\mathbb{C}^{N}, and deducing from this the existence of a unitary UUNU\in U_{N} as above. We will be back to this, later on, directly in the case of the linear operators on Hilbert spaces. ∎

Observe that at N=1N=1 we obtain in this way the usual polar decomposition of the nonzero complex numbers. More generally now, we have the following result:

Theorem 1.25.

Any square matrix AMN()A\in M_{N}(\mathbb{C}) decomposes as

A=U|A|A=U|A|

with UU being a partial isometry, and with |A|=AA|A|=\sqrt{A^{*}A} as above.

Proof.

Again, this follows by comparing the actions of A,|A|A,|A| on the vectors vNv\in\mathbb{C}^{N}, and deducing from this the existence of a partial isometry UU as above. Alternatively, we can get this from Theorem 1.24, applied on the complement of the 0-eigenvectors. ∎

This was for our basic presentation of linear algebra. There are of course many other things that can be said, but we will come back to some of them in what follows, directly in the case of the linear operators on the arbitrary Hilbert spaces.

1e. Exercises

Linear algebra is a wide topic, and there are countless interesting matrices, and exercises about them. As a continuation of our discussion about rotations, we have:

Exercise 1.26.

Prove that the symmetry and projection with respect to the OxOx axis rotated by an angle t/2t/2\in\mathbb{R} are given by the matrices

St=(costsintsintcost)S_{t}=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix}
Pt=12(1+costsintsint1cost)P_{t}=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix}

and then diagonalize these matrices, and if possible without computations.

Here the first part can only be clear on pictures, and by the way, prior to this, do not forget to verify as well that our formula of RtR_{t} is the good one. As for the second part, just don’t go head-first into computations, there might be some geometry over there.

Exercise 1.27.

Prove that the isometries of 2\mathbb{R}^{2} are rotations or symmetries,

Rt=(costsintsintcost),St=(costsintsintcost)R_{t}=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\quad,\quad S_{t}=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix}

and then try as well to find a formula for the isometries of 3\mathbb{R}^{3}.

Here for the first question you should look first at the determinant of such an isometry. As for the second question, this is something quite difficult. If you’re good at computers, you can look into the code of 3D games, the rotation formula is probably there.

Exercise 1.28.

Prove that the isometries of 2\mathbb{C}^{2} of determinant 11 are

U=(abb¯a¯),|a|2+|b|2=1U=\begin{pmatrix}a&b\\ -\bar{b}&\bar{a}\end{pmatrix}\quad,\quad|a|^{2}+|b|^{2}=1

then work out as well the general case, of arbitrary determinant.

As a comment here, if done with this exercise about 2\mathbb{C}^{2}, but not yet with the previous one about 3\mathbb{R}^{3}, you can go back to that exercise, by using a 24\mathbb{C}^{2}\simeq\mathbb{R}^{4} trick. And in case this trick leads to tough computations and big headache, look it up.

Exercise 1.29.

Prove that the flat matrix, which is the all-one N×NN\times N matrix, diagonalizes over the complex numbers as follows,

(1111)=1NFN(N00)FN\begin{pmatrix}1&\ldots&\ldots&1\\ \vdots&&&\vdots\\ \vdots&&&\vdots\\ 1&\ldots&\ldots&1\end{pmatrix}=\frac{1}{N}\,F_{N}\begin{pmatrix}N\\ &0\\ &&\ddots\\ &&&0\end{pmatrix}F_{N}^{*}

where FN=(wij)ijF_{N}=(w^{ij})_{ij} with w=e2πi/Nw=e^{2\pi i/N} is the Fourier matrix, with the convention that the indices are taken to be i,j=0,1,,N1i,j=0,1,\ldots,N-1.

This is something very instructive. Normally you have to look for eigenvectors for the flat matrix, and you are led in this way to the equation x0++xN1=0x_{0}+\ldots+x_{N-1}=0. The problem however is that this equation, while looking very gentle, has no “canonical” solutions over the real numbers. Thus you are led to the complex numbers, and more specifically to the roots of unity, and their magic, leading to the above result. Enjoy.

Chapter 2 Linear operators

2a. Hilbert spaces

We discuss in what follows an extension of the linear algebra results from the previous chapter, obtained by looking at the linear operators T:HHT:H\to H, with the space HH being no longer assumed to be finite dimensional. Our motivations come from quantum mechanics, and in order to get motivated, here is some suggested reading:


(1) Generally speaking, physics is best learned from Feynman [fey]. If you already know some, and want to learn quantum mechanics, go with Griffiths [gri]. And if you’re already a bit familiar with quantum mechanics, a good book is Weinberg [wei].


(2) A look at classics like Dirac [dir], von Neumann [vn4] or Weyl [wey] can be instructive too. On the opposite, you have as well modern, fancy books on quantum information, such as Bengtsson-Życzkowski [bzy], Nielsen-Chuang [nch] or Watrous [wat].


(3) In short, many ways of getting familiar with this big mess which is quantum mechanics, and as long as you stay away from books advertised as “rigorous”, “axiomatic”, “mathematical”, things fine. By the way, you can try as well my book [ba4].


Getting to work now, physics tells us to look at infinite dimensional complex spaces, such as the space of wave functions ψ:3\psi:\mathbb{R}^{3}\to\mathbb{C} of the electron. In order to do some mathematics on these spaces, we will need scalar products. So, let us start with:

Definition 2.1.

A scalar product on a complex vector space HH is a binary operation H×HH\times H\to\mathbb{C}, denoted (x,y)<x,y>(x,y)\to<x,y>, satisfying the following conditions:

  1. (1)

    <x,y><x,y> is linear in xx, and antilinear in yy.

  2. (2)

    <x,y>¯=<y,x>\overline{<x,y>}=<y,x>, for any x,yx,y.

  3. (3)

    <x,x>>0<x,x>>0, for any x0x\neq 0.

As before in chapter 1, we use here mathematicians’ convention for scalar products, that is, <,><\,,> linear at left, as opposed to physicists’ convention, <,><\,,> linear at right. The reasons for this are quite subtle, coming from the fact that, while basic quantum mechanics looks better with <,><\,,> linear at right, advanced quantum mechanics looks better with <,><\,,> linear at left. Or at least that’s what my cats say.


As a basic example for Definition 2.1, we have the finite dimensional vector space H=NH=\mathbb{C}^{N}, with its usual scalar product, namely:

<x,y>=ixiy¯i<x,y>=\sum_{i}x_{i}\bar{y}_{i}

There are many other examples, and notably various spaces of L2L^{2} functions, which naturally appear in problems coming from physics. We will discuss them later on. In order to study now the scalar products, let us formulate the following definition:

Definition 2.2.

The norm of a vector xHx\in H is the following quantity:

||x||=<x,x>||x||=\sqrt{<x,x>}

We also call this number length of xx, or distance from xx to the origin.

The terminology comes from what happens in N\mathbb{C}^{N}, where the length of the vector, as defined above, coincides with the usual length, given by:

||x||=i|xi|2||x||=\sqrt{\sum_{i}|x_{i}|^{2}}

In analogy with what happens in finite dimensions, we have two important results regarding the norms. First we have the Cauchy-Schwarz inequality, as follows:

Theorem 2.3.

We have the Cauchy-Schwarz inequality

|<x,y>|||x||||y|||<x,y>|\leq||x||\cdot||y||

and the equality case holds precisely when x,yx,y are proportional.

Proof.

This is something very standard. Consider indeed the following quantity, depending on a real variable tt\in\mathbb{R}, and on a variable on the unit circle, w𝕋w\in\mathbb{T}:

f(t)=||twx+y||2f(t)=||twx+y||^{2}

By developing ff, we see that this is a degree 2 polynomial in tt:

f(t)\displaystyle f(t) =\displaystyle= <twx+y,twx+y>\displaystyle<twx+y,twx+y>
=\displaystyle= t2<x,x>+tw<x,y>+tw¯<y,x>+<y,y>\displaystyle t^{2}<x,x>+tw<x,y>+t\bar{w}<y,x>+<y,y>
=\displaystyle= t2||x||2+2tRe(w<x,y>)+||y||2\displaystyle t^{2}||x||^{2}+2tRe(w<x,y>)+||y||^{2}

Since ff is obviously positive, its discriminant must be negative:

4Re(w<x,y>)24||x||2||y||204Re(w<x,y>)^{2}-4||x||^{2}\cdot||y||^{2}\leq 0

But this is equivalent to the following condition:

|Re(w<x,y>)|||x||||y|||Re(w<x,y>)|\leq||x||\cdot||y||

Now the point is that we can arrange for the number w𝕋w\in\mathbb{T} to be such that the quantity w<x,y>w<x,y> is real. Thus, we obtain the following inequality:

|<x,y>|||x||||y|||<x,y>|\leq||x||\cdot||y||

Finally, the study of the equality case is straightforward, by using the fact that the discriminant of ff vanishes precisely when we have a root. But this leads to the conclusion in the statement, namely that the vectors x,yx,y must be proportional. ∎

As a second main result now, we have the Minkowski inequality:

Theorem 2.4.

We have the Minkowski inequality

||x+y||||x||+||y||||x+y||\leq||x||+||y||

and the equality case holds precisely when x,yx,y are proportional.

Proof.

This follows indeed from the Cauchy-Schwarz inequality, as follows:

||x+y||||x||+||y||\displaystyle||x+y||\leq||x||+||y||
\displaystyle\iff ||x+y||2(||x||+||y||)2\displaystyle||x+y||^{2}\leq(||x||+||y||)^{2}
\displaystyle\iff ||x||2+||y||2+2Re<x,y>||x||2+||y||2+2||x||||y||\displaystyle||x||^{2}+||y||^{2}+2Re<x,y>\leq||x||^{2}+||y||^{2}+2||x||\cdot||y||
\displaystyle\iff Re<x,y>||x||||y||\displaystyle Re<x,y>\leq||x||\cdot||y||

As for the equality case, this is clear from Cauchy-Schwarz as well. ∎

As a consequence of this, we have the following result:

Theorem 2.5.

The following function is a distance on HH,

d(x,y)=||xy||d(x,y)=||x-y||

in the usual sense, that of the abstract metric spaces.

Proof.

This follows indeed from the Minkowski inequality, which corresponds to the triangle inequality, the other two axioms for a distance being trivially satisfied. ∎

The above result is quite important, because it shows that we can do geometry and analysis in our present setting, with distances and angles, a bit as in the finite dimensional case. In order to do such abstract geometry, we will often need the following key result, which shows that everything can be recovered in terms of distances:

Proposition 2.6.

The scalar products can be recovered from distances, via the formula

4<x,y>=||x+y||2||xy||2+i||x+iy||2i||xiy||24<x,y>=||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}

called complex polarization identity.

Proof.

This is something that we have already met in finite dimensions. In arbitrary dimensions the proof is similar, as follows:

||x+y||2||xy||2+i||x+iy||2i||xiy||2\displaystyle||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}
=\displaystyle= ||x||2+||y||2||x||2||y||2+i||x||2+i||y||2i||x||2i||y||2\displaystyle||x||^{2}+||y||^{2}-||x||^{2}-||y||^{2}+i||x||^{2}+i||y||^{2}-i||x||^{2}-i||y||^{2}
+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)\displaystyle+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)
=\displaystyle= 4<x,y>\displaystyle 4<x,y>

Thus, we are led to the conclusion in the statement. ∎

In order to do analysis on our spaces, we need the Cauchy sequences that we construct to converge. This is something which is automatic in finite dimensions, but in arbitrary dimensions, this can fail. It is convenient here to formulate a detailed new definition, as follows, which will be the starting point for our various considerations to follow:

Definition 2.7.

A Hilbert space is a complex vector space HH given with a scalar product <x,y><x,y>, satisfying the following conditions:

  1. (1)

    <x,y><x,y> is linear in xx, and antilinear in yy.

  2. (2)

    <x,y>¯=<y,x>\overline{<x,y>}=<y,x>, for any x,yx,y.

  3. (3)

    <x,x>>0<x,x>>0, for any x0x\neq 0.

  4. (4)

    HH is complete with respect to the norm ||x||=<x,x>||x||=\sqrt{<x,x>}.

In other words, we have taken here Definition 2.1 above, and added the condition that HH must be complete with respect to the norm ||x||=<x,x>||x||=\sqrt{<x,x>}, that we know indeed to be a norm, according to the Minkowski inequality proved above. As a basic example, as before, we have the space H=NH=\mathbb{C}^{N}, with its usual scalar product, namely:

<x,y>=ixiy¯i<x,y>=\sum_{i}x_{i}\bar{y}_{i}

More generally now, we have the following construction of Hilbert spaces:

Proposition 2.8.

The sequences of complex numbers (xi)(x_{i}) which are square-summable,

i|xi|2<\sum_{i}|x_{i}|^{2}<\infty

form a Hilbert space l2()l^{2}(\mathbb{N}), with the following scalar product:

<x,y>=ixiy¯i<x,y>=\sum_{i}x_{i}\bar{y}_{i}

In fact, given any index set II, we can construct a Hilbert space l2(I)l^{2}(I), in this way.

Proof.

There are several things to be proved, as follows:

(1) Our first claim is that l2()l^{2}(\mathbb{N}) is a vector space. For this purpose, we must prove that x,yl2()x,y\in l^{2}(\mathbb{N}) implies x+yl2()x+y\in l^{2}(\mathbb{N}). But this leads us into proving ||x+y||||x||+||y||||x+y||\leq||x||+||y||, where ||x||=<x,x>||x||=\sqrt{<x,x>}. Now since we know this inequality to hold on each subspace Nl2()\mathbb{C}^{N}\subset l^{2}(\mathbb{N}) obtained by truncating, this inequality holds everywhere, as desired.

(2) Our second claim is that <,><\,,> is well-defined on l2()l^{2}(\mathbb{N}). But this follows from the Cauchy-Schwarz inequality, |<x,y>|||x||||y|||<x,y>|\leq||x||\cdot||y||, which can be established by truncating, a bit like we established the Minkowski inequality in (1) above.

(3) It is also clear that <,><\,,> is a scalar product on l2()l^{2}(\mathbb{N}), so it remains to prove that l2()l^{2}(\mathbb{N}) is complete with respect to ||x||=<x,x>||x||=\sqrt{<x,x>}. But this is clear, because if we pick a Cauchy sequence {xn}nl2()\{x^{n}\}_{n\in\mathbb{N}}\subset l^{2}(\mathbb{N}), then each numeric sequence {xni}i\{x^{n}_{i}\}_{i\in\mathbb{N}}\subset\mathbb{C} is Cauchy, and by setting xi=limnxnix_{i}=\lim_{n\to\infty}x^{n}_{i}, we have xnxx^{n}\to x inside l2()l^{2}(\mathbb{N}), as desired.

(4) Finally, the same arguments extend to the case of an arbitrary index set II, leading to a Hilbert space l2(I)l^{2}(I), and with the remark here that there is absolutely no problem of taking about quantities of type ||x||2=iI|xi|2[0,]||x||^{2}=\sum_{i\in I}|x_{i}|^{2}\in[0,\infty], even if the index set II is uncountable, because we are summing positive numbers. ∎

Even more generally, we have the following construction of Hilbert spaces:

Theorem 2.9.

Given a measured space XX, the functions f:Xf:X\to\mathbb{C}, taken up to equality almost everywhere, which are square-summable,

X|f(x)|2dx<\int_{X}|f(x)|^{2}dx<\infty

form a Hilbert space L2(X)L^{2}(X), with the following scalar product:

<f,g>=Xf(x)g(x)¯dx<f,g>=\int_{X}f(x)\overline{g(x)}dx

In the case X=IX=I, with the counting measure, we obtain in this way the space l2(I)l^{2}(I).

Proof.

This is a straightforward generalization of Proposition 2.8, with the arguments from the proof of Proposition 2.8 carrying over in our case, as follows:

(1) The first part, regarding Cauchy-Schwarz and Minkowski, extends without problems, by using this time approximation by step functions.

(2) Regarding the fact that <,><\,,> is indeed a scalar product on L2(X)L^{2}(X), there is a subtlety here, because if we want <f,f>>0<f,f>>0 for f0f\neq 0, we must declare that f=0f=0 when f=0f=0 almost everywhere, and so that f=gf=g when f=gf=g almost everywhere.

(3) Regarding the fact that L2(X)L^{2}(X) is complete with respect to ||f||=<f,f>||f||=\sqrt{<f,f>}, this is again basic measure theory, by picking a Cauchy sequence {fn}nL2(X)\{f_{n}\}_{n\in\mathbb{N}}\subset L^{2}(X), and then constructing a pointwise, and hence L2L^{2} limit, fnff_{n}\to f, almost everywhere.

(4) Finally, the last assertion is clear, because the integration with respect to the counting measure is by definition a sum, and so L2(I)=l2(I)L^{2}(I)=l^{2}(I) in this case. ∎

Quite remarkably, any Hilbert space must be of the form L2(X)L^{2}(X), and even of the particular form l2(I)l^{2}(I). This follows indeed from the following key result:

Theorem 2.10.

Let HH be a Hilbert space.

  1. (1)

    Any algebraic basis of this space {fi}iI\{f_{i}\}_{i\in I} can be turned into an orthonormal basis {ei}iI\{e_{i}\}_{i\in I}, by using the Gram-Schmidt procedure.

  2. (2)

    Thus, HH has an orthonormal basis, and so we have Hl2(I)H\simeq l^{2}(I), with II being the indexing set for this orthonormal basis.

Proof.

All this is standard by Gram-Schmidt, the idea being as follows:

(1) First of all, in finite dimensions an orthonormal basis {ei}iI\{e_{i}\}_{i\in I} is by definition a usual algebraic basis, satisfying <ei,ej>=δij<e_{i},e_{j}>=\delta_{ij}. But the existence of such a basis follows by applying the Gram-Schmidt procedure to any algebraic basis {fi}iI\{f_{i}\}_{i\in I}, as claimed.

(2) In infinite dimensions, a first issue comes from the fact that the standard basis {δi}i\{\delta_{i}\}_{i\in\mathbb{N}} of the space l2()l^{2}(\mathbb{N}) is not an algebraic basis in the usual sense, with the finite linear combinations of the functions δi\delta_{i} producing only a dense subspace of l2()l^{2}(\mathbb{N}), that of the functions having finite support. Thus, we must fine-tune our definition of “basis”.

(3) But this can be done in two ways, by saying that {fi}iI\{f_{i}\}_{i\in I} is a basis of HH when the functions fif_{i} are linearly independent, and when either the finite linear combinations of these functions fif_{i} form a dense subspace of HH, or the linear combinations with l2(I)l^{2}(I) coefficients of these functions fif_{i} form the whole HH. For orthogonal bases {ei}iI\{e_{i}\}_{i\in I} these definitions are equivalent, and in any case, our statement makes now sense.

(4) Regarding now the proof, in infinite dimensions, this follows again from Gram-Schmidt, exactly as in the finite dimensional case, but by using this time a tool from logic, called Zorn lemma, in order to correctly do the recurrence. ∎

The above result, and its relation with Theorem 2.9, is something quite subtle, so let us further get into this. First, we have the following definition, based on the above:

Definition 2.11.

A Hilbert space HH is called separable when the following equivalent conditions are satisfied:

  1. (1)

    HH has a countable algebraic basis {fi}i\{f_{i}\}_{i\in\mathbb{N}}.

  2. (2)

    HH has a countable orthonormal basis {ei}i\{e_{i}\}_{i\in\mathbb{N}}.

  3. (3)

    We have Hl2()H\simeq l^{2}(\mathbb{N}), isomorphism of Hilbert spaces.

In what follows we will be mainly interested in the separable Hilbert spaces, where most of the questions coming from quantum physics take place. In view of the above, the following philosophical question appears: why not simply talking about l2()l^{2}(\mathbb{N})?


In answer to this, we cannot really do so, because many of the separable spaces that we are interested in appear as spaces of functions, and such spaces do not necessarily have a very simple or explicit orthonormal basis, as shown by the following result:

Proposition 2.12.

The Hilbert space H=L2[0,1]H=L^{2}[0,1] is separable, having as orthonormal basis the orthonormalized version of the algebraic basis fn=xnf_{n}=x^{n} with nn\in\mathbb{N}.

Proof.

This follows from the Weierstrass theorem, which provides us with the basis fn=xnf_{n}=x^{n}, which can be orthogonalized by using the Gram-Schmidt procedure, as explained in Theorem 2.10. Working out the details here is actually an excellent exercise. ∎

As a conclusion to all this, we are interested in 1 space, namely the unique separable Hilbert space HH, but due to various technical reasons, it is often better to forget that we have H=l2()H=l^{2}(\mathbb{N}), and say instead that we have H=L2(X)H=L^{2}(X), with XX being a separable measured space, or simply say that HH is an abstract separable Hilbert space.

2b. Linear operators

Let us get now into the study of linear operators T:HHT:H\to H. Before anything, we should mention that things are quite tricky with respect to quantum mechanics, and physics in general. Indeed, if there is a central operator in physics, this is the Laplace operator on the smooth functions f:Nf:\mathbb{R}^{N}\to\mathbb{C}, given by:

Δf(x)=id2fdxi2\Delta f(x)=\sum_{i}\frac{d^{2}f}{dx_{i}^{2}}

And the problem is that what we have here is an operator Δ:C(N)C(N)\Delta:C^{\infty}(\mathbb{R}^{N})\to C^{\infty}(\mathbb{R}^{N}), which does not extend into an operator Δ:L2(N)L2(N)\Delta:L^{2}(\mathbb{R}^{N})\to L^{2}(\mathbb{R}^{N}). Thus, we should perhaps look at operators T:HHT:H\to H which are densely defined, instead of looking at operators T:HHT:H\to H which are everywhere defined. We will not do so, for two reasons:


(1) Tactical retreat. When physics looks too complicated, as it is the case now, you can always declare that mathematics comes first. So, let us be pure mathematicians, simply looking in generalizing linear algebra to infinite dimensions. And from this viewpoint, it is a no-brainer to look at everywhere defined operators T:HHT:H\to H.


(2) Modern physics. We will see later, towards the middle of the present book, when talking about various mathematical physics findings of Connes, Jones, Voiculescu and others, that a lot of interesting mathematics, which is definitely related to modern physics, can be developed by using the everywhere defined operators T:HHT:H\to H.


In short, you’ll have to trust me here. And hang on, we are not done yet, because with this choice made, there is one more problem, mathematical this time. The problem comes from the fact that in infinite dimensions the everywhere defined operators T:HHT:H\to H can be bounded or not, and for reasons which are mathematically intuitive and obvious, and physically acceptable too, we want to deal with the bounded case only.


Long story short, let us avoid too much thinking, and start in a simple way, with:

Proposition 2.13.

For a linear operator T:HHT:H\to H, the following are equivalent:

  1. (1)

    TT is continuous.

  2. (2)

    TT is continuous at 0.

  3. (3)

    T(B)cBT(B)\subset cB for some c<c<\infty, where BHB\subset H is the unit ball.

  4. (4)

    TT is bounded, in the sense that ||T||=sup||x||1||Tx||||T||=\sup_{||x||\leq 1}||Tx|| satisfies ||T||<||T||<\infty.

Proof.

This is elementary, with (1)(2)(1)\iff(2) coming from the linearity of TT, then (2)(3)(2)\iff(3) coming from definitions, and finally (3)(4)(3)\iff(4) coming from the fact that the number ||T||||T|| from (4) is the infimum of the numbers cc making (3) work. ∎

Regarding such operators, we have the following result:

Theorem 2.14.

The linear operators T:HHT:H\to H which are bounded,

||T||=sup||x||1||Tx||<||T||=\sup_{||x||\leq 1}||Tx||<\infty

form a complex algebra with unit B(H)B(H), having the property

||ST||||S||||T||||ST||\leq||S||\cdot||T||

and which is complete with respect to the norm.

Proof.

The fact that we have indeed an algebra, satisfying the product condition in the statement, follows from the following estimates, which are all elementary:

||S+T||||S||+||T||||S+T||\leq||S||+||T||
||λT||=|λ|||T||||\lambda T||=|\lambda|\cdot||T||
||ST||||S||||T||||ST||\leq||S||\cdot||T||

Regarding now the last assertion, if {Tn}B(H)\{T_{n}\}\subset B(H) is Cauchy then {Tnx}\{T_{n}x\} is Cauchy for any xHx\in H, so we can define the limit T=limnTnT=\lim_{n\to\infty}T_{n} by setting:

Tx=limnTnxTx=\lim_{n\to\infty}T_{n}x

Let us first check that the application xTxx\to Tx is linear. We have:

T(x+y)\displaystyle T(x+y) =\displaystyle= limnTn(x+y)\displaystyle\lim_{n\to\infty}T_{n}(x+y)
=\displaystyle= limnTn(x)+Tn(y)\displaystyle\lim_{n\to\infty}T_{n}(x)+T_{n}(y)
=\displaystyle= limnTn(x)+limnTn(y)\displaystyle\lim_{n\to\infty}T_{n}(x)+\lim_{n\to\infty}T_{n}(y)
=\displaystyle= T(x)+T(y)\displaystyle T(x)+T(y)

Similarly, we have as well the following computation:

T(λx)\displaystyle T(\lambda x) =\displaystyle= limnTn(λx)\displaystyle\lim_{n\to\infty}T_{n}(\lambda x)
=\displaystyle= λlimnTn(x)\displaystyle\lambda\lim_{n\to\infty}T_{n}(x)
=\displaystyle= λT(x)\displaystyle\lambda T(x)

Thus we have a linear map T:AAT:A\to A. It remains to prove that we have TB(H)T\in B(H), and that we have TnTT_{n}\to T in norm. For this purpose, observe that we have:

||TnTm||ε,n,mN\displaystyle||T_{n}-T_{m}||\leq\varepsilon\ ,\ \forall n,m\geq N
\displaystyle\implies ||TnxTmx||ε,||x||=1,n,mN\displaystyle||T_{n}x-T_{m}x||\leq\varepsilon\ ,\ \forall||x||=1\ ,\ \forall n,m\geq N
\displaystyle\implies ||TnxTx||ε,||x||=1,nN\displaystyle||T_{n}x-Tx||\leq\varepsilon\ ,\ \forall||x||=1\ ,\ \forall n\geq N
\displaystyle\implies ||TNxTx||ε,||x||=1\displaystyle||T_{N}x-Tx||\leq\varepsilon\ ,\ \forall||x||=1
\displaystyle\implies ||TNT||ε\displaystyle||T_{N}-T||\leq\varepsilon

As a first consequence, we obtain TB(H)T\in B(H), because we have:

||T||\displaystyle||T|| =\displaystyle= ||TN+(TTN)||\displaystyle||T_{N}+(T-T_{N})||
\displaystyle\leq ||TN||+||TTN||\displaystyle||T_{N}||+||T-T_{N}||
\displaystyle\leq ||TN||+ε\displaystyle||T_{N}||+\varepsilon
<\displaystyle< \displaystyle\infty

As a second consequence, we obtain TNTT_{N}\to T in norm, and we are done. ∎

In the case where HH comes with a basis {ei}iI\{e_{i}\}_{i\in I}, we can talk about the infinite matrices MMI()M\in M_{I}(\mathbb{C}), with the remark that the multiplication of such matrices is not always defined, in the case |I|=|I|=\infty. In this context, we have the following result:

Theorem 2.15.

Let HH be a Hilbert space, with orthonormal basis {ei}iI\{e_{i}\}_{i\in I}. The bounded operators TB(H)T\in B(H) can be then identified with matrices MMI()M\in M_{I}(\mathbb{C}) via

Tx=Mx,Mij=<Tej,ei>Tx=Mx\quad,\quad M_{ij}=<Te_{j},e_{i}>

and we obtain in this way an embedding as follows, which is multiplicative:

B(H)MI()B(H)\subset M_{I}(\mathbb{C})

In the case H=NH=\mathbb{C}^{N} we obtain in this way the usual isomorphism B(H)MN()B(H)\simeq M_{N}(\mathbb{C}). In the separable case we obtain in this way a proper embedding B(H)M()B(H)\subset M_{\infty}(\mathbb{C}).

Proof.

We have several assertions to be proved, the idea being as follows:

(1) Regarding the first assertion, given a bounded operator T:HHT:H\to H, let us associate to it a matrix MMI()M\in M_{I}(\mathbb{C}) as in the statement, by the following formula:

Mij=<Tej,ei>M_{ij}=<Te_{j},e_{i}>

It is clear that this correspondence TMT\to M is linear, and also that its kernel is {0}\{0\}. Thus, we have an embedding of linear spaces B(H)MI()B(H)\subset M_{I}(\mathbb{C}).

(2) Our claim now is that this embedding is multiplicative. But this is clear too, because if we denote by TMTT\to M_{T} our correspondence, we have:

(MST)ij\displaystyle(M_{ST})_{ij} =\displaystyle= <STej,ei>\displaystyle<STe_{j},e_{i}>
=\displaystyle= Sk<Tej,ek>ek,ei\displaystyle\left<S\sum_{k}<Te_{j},e_{k}>e_{k},e_{i}\right>
=\displaystyle= k<Sek,ei><Tej,ek>\displaystyle\sum_{k}<Se_{k},e_{i}><Te_{j},e_{k}>
=\displaystyle= k(MS)ik(MT)kj\displaystyle\sum_{k}(M_{S})_{ik}(M_{T})_{kj}
=\displaystyle= (MSMT)ij\displaystyle(M_{S}M_{T})_{ij}

(3) Finally, we must prove that the original operator T:HHT:H\to H can be recovered from its matrix MMI()M\in M_{I}(\mathbb{C}) via the formula in the statement, namely Tx=MxTx=Mx. But this latter formula holds for the vectors of the basis, x=ejx=e_{j}, because we have:

(Tej)i\displaystyle(Te_{j})_{i} =\displaystyle= <Tej,ei>\displaystyle<Te_{j},e_{i}>
=\displaystyle= Mij\displaystyle M_{ij}
=\displaystyle= (Mej)i\displaystyle(Me_{j})_{i}

Now by linearity we obtain from this that the formula Tx=MxTx=Mx holds everywhere, on any vector xHx\in H, and this finishes the proof of the first assertion.

(4) In finite dimensions we obtain an isomorphism, because any matrix MMN()M\in M_{N}(\mathbb{C}) determines an operator T:NNT:\mathbb{C}^{N}\to\mathbb{C}^{N}, according to the formula <Tej,ei>=Mij<Te_{j},e_{i}>=M_{ij}. In infinite dimensions, however, we do not have an isomorphism. For instance on H=l2()H=l^{2}(\mathbb{N}) the following matrix does not define an operator:

M=(1111)M=\begin{pmatrix}1&1&\ldots\\ 1&1&\ldots\\ \vdots&\vdots\end{pmatrix}

Indeed, T(e1)T(e_{1}) should be the all-one vector, which is not square-summable. ∎

In connection with our previous comments on bases, the above result is something quite theoretical, because for basic Hilbert spaces like L2[0,1]L^{2}[0,1], which do not have a simple orthonormal basis, the embedding B(H)M()B(H)\subset M_{\infty}(\mathbb{C}) that we obtain is not something very useful. In short, while the bounded operators T:HHT:H\to H are basically some infinite matrices, it is better to think of these operators as being objects on their own.


As another comment, the construction TMT\to M makes sense for any linear operator T:HHT:H\to H, but when dimH=\dim H=\infty, we do not obtain an embedding (H)MI()\mathcal{L}(H)\subset M_{I}(\mathbb{C}) in this way. Indeed, set H=l2()H=l^{2}(\mathbb{N}), let E=span(ei)E=span(e_{i}) be the linear space spanned by the standard basis, and pick an algebraic complement FF of this space EE, so that we have H=EFH=E\oplus F, as an algebraic direct sum. Then any linear operator S:FFS:F\to F gives rise to a linear operator T:HHT:H\to H, given by T(e,f)=(0,S(f))T(e,f)=(0,S(f)), whose associated matrix is 0. And, restrospectively speaking, it is in order to avoid such pathologies that we decided some time ago to restrict the attention to the bounded case, TB(H)T\in B(H).


As in the finite dimensional case, we can talk about adjoint operators, in this setting, the definition and main properties of the construction TTT\to T^{*} being as follows:

Theorem 2.16.

Given a bounded operator TB(H)T\in B(H), the following formula defines a bounded operator TB(H)T^{*}\in B(H), called adjoint of HH:

<Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y>

The correspondence TTT\to T^{*} is antilinear, antimultiplicative, and is an involution, and an isometry. In finite dimensions, we recover the usual adjoint operator.

Proof.

There are several things to be done here, the idea being as follows:

(1) We will need a standard functional analysis result, stating that the continuous linear forms φ:H\varphi:H\to\mathbb{C} appear as scalar products, as follows, with zHz\in H:

φ(x)=<x,z>\varphi(x)=<x,z>

Indeed, in one sense this is clear, because given zHz\in H, the application φ(x)=<x,z>\varphi(x)=<x,z> is linear, and continuous as well, because by Cauchy-Schwarz we have:

|φ(x)|||x||||z|||\varphi(x)|\leq||x||\cdot||z||

Conversely now, by using a basis we can assume H=l2()H=l^{2}(\mathbb{N}), and our linear form φ:H\varphi:H\to\mathbb{C} must be then, by linearity, given by a formula of the following type:

φ(x)=ixiz¯i\varphi(x)=\sum_{i}x_{i}\bar{z}_{i}

But, again by Cauchy-Schwarz, in order for such a formula to define indeed a continuous linear form φ:H\varphi:H\to\mathbb{C} we must have zl2()z\in l^{2}(\mathbb{N}), and so zHz\in H, as desired.

(2) With this in hand, we can now construct the adjoint TT^{*}, by the formula in the statement. Indeed, given yHy\in H, the formula φ(x)=<Tx,y>\varphi(x)=<Tx,y> defines a linear map HH\to\mathbb{C}. Thus, we must have a formula as follows, for a certain vector TyHT^{*}y\in H:

φ(x)=<x,Ty>\varphi(x)=<x,T^{*}y>

Moreover, this vector TyHT^{*}y\in H is unique with this property, and we conclude from this that the formula yTyy\to T^{*}y defines a certain map T:HHT^{*}:H\to H, which is unique with the property in the statement, namely <Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y> for any x,yx,y.

(3) Let us prove that we have TB(H)T^{*}\in B(H). By using once again the uniqueness of TT^{*}, we conclude that we have the following formulae, which show that TT^{*} is linear:

T(x+y)=Tx+Ty,T(λx)=λTxT^{*}(x+y)=T^{*}x+T^{*}y\quad,\quad T^{*}(\lambda x)=\lambda T^{*}x

Observe also that TT^{*} is bounded as well, because we have:

||T||\displaystyle||T|| =\displaystyle= sup||x||=1sup||y||=1<Tx,y>\displaystyle\sup_{||x||=1}\sup_{||y||=1}<Tx,y>
=\displaystyle= sup||y||=1sup||x||=1<x,Ty>\displaystyle\sup_{||y||=1}\sup_{||x||=1}<x,T^{*}y>
=\displaystyle= ||T||\displaystyle||T^{*}||

(4) The fact that the correspondence TTT\to T^{*} is antilinear, antimultiplicative, and is an involution comes from the following formulae, coming from uniqueness:

(S+T)=S+T,(λT)=λ¯T(S+T)^{*}=S^{*}+T^{*}\quad,\quad(\lambda T)^{*}=\bar{\lambda}T^{*}
(ST)=TS,(T)=T(ST)^{*}=T^{*}S^{*}\quad,\quad(T^{*})^{*}=T

As for the isometry property with respect to the operator norm, ||T||=||T||||T||=||T^{*}||, this is something that we already know, from the proof of (3) above.

(5) Regarding finite dimensions, let us first examine the general case where our Hilbert space comes with a basis, H=l2(I)H=l^{2}(I). We can compute the matrix MMI()M^{*}\in M_{I}(\mathbb{C}) associated to the operator TB(H)T^{*}\in B(H), by using <Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y>, in the following way:

(M)ij\displaystyle(M^{*})_{ij} =\displaystyle= <Tej,ei>\displaystyle<T^{*}e_{j},e_{i}>
=\displaystyle= <ei,Tej>¯\displaystyle\overline{<e_{i},T^{*}e_{j}>}
=\displaystyle= <Tei,ej>¯\displaystyle\overline{<Te_{i},e_{j}>}
=\displaystyle= M¯ji\displaystyle\overline{M}_{ji}

Thus, we have reached to the usual formula for the adjoints of matrices, and in the particular case H=NH=\mathbb{C}^{N}, we conclude that TT^{*} comes indeed from the usual MM^{*}. ∎

As in finite dimensions, the operators T,TT,T^{*} can be thought of as being “twin brothers”, and there is a lot of interesting mathematics connecting them. We first have:

Proposition 2.17.

Given a bounded operator TB(H)T\in B(H), the following happen:

  1. (1)

    kerT=(ImT)\ker T^{*}=(ImT)^{\perp}.

  2. (2)

    ImT¯=(kerT)\overline{ImT^{*}}=(\ker T)^{\perp}.

Proof.

Both these assertions are elementary, as follows:

(1) Let us first prove “\subset”. Assuming Tx=0T^{*}x=0, we have indeed xImTx\perp ImT, because:

<x,Ty>=<Tx,y>=0<x,Ty>=<T^{*}x,y>=0

As for “\supset”, assuming <x,Ty>=0<x,Ty>=0 for any yy, we have Tx=0T^{*}x=0, because:

<Tx,y>=<x,Ty>=0<T^{*}x,y>=<x,Ty>=0

(2) This can be deduced from (1), applied to the operator TT^{*}, as follows:

(kerT)=(ImT)=ImT¯(\ker T)^{\perp}=(ImT^{*})^{\perp\perp}=\overline{ImT^{*}}

Here we have used the formula K=K¯K^{\perp\perp}=\bar{K}, valid for any linear subspace KHK\subset H of a Hilbert space, which for KK closed reads K=KK^{\perp\perp}=K, and comes from H=KKH=K\oplus K^{\perp}, and which in general follows from KK¯=K¯K^{\perp\perp}\subset\bar{K}^{\perp\perp}=\bar{K}, the reverse inclusion being clear. ∎

Let us record as well the following useful formula, relating TT and TT^{*}:

Theorem 2.18.

We have the following formula,

||TT||=||T||2||TT^{*}||=||T||^{2}

valid for any operator TB(H)T\in B(H).

Proof.

We recall from Theorem 2.16 that the correspondence TTT\to T^{*} is an isometry with respect to the operator norm, in the sense that we have:

||T||=||T||||T||=||T^{*}||

In order to prove now the formula in the statement, observe first that we have:

||TT||||T||||T||=||T||2||TT^{*}||\leq||T||\cdot||T^{*}||=||T||^{2}

On the other hand, we have as well the following estimate:

||T||2\displaystyle||T||^{2} =\displaystyle= sup||x||=1|<Tx,Tx>|\displaystyle\sup_{||x||=1}|<Tx,Tx>|
=\displaystyle= sup||x||=1|<x,TTx>|\displaystyle\sup_{||x||=1}|<x,T^{*}Tx>|
\displaystyle\leq ||TT||\displaystyle||T^{*}T||

By replacing TTT\to T^{*} we obtain from this that we have:

||T||2||TT||||T||^{2}\leq||TT^{*}||

Thus, we have obtained the needed inequality, and we are done. ∎

2c. Unitaries, projections

Let us discuss now some explicit examples of operators, in analogy with what happens in finite dimensions. The most basic examples of linear transformations are the rotations, symmetries and projections. Then, we have certain remarkable classes of linear transformations, such as the positive, self-adjoint and normal ones. In what follows we will develop the basic theory of such transformations, in the present Hilbert space setting.


Let us begin with the rotations. The situation here is quite tricky in arbitrary dimensions, and we have several notions instead of one. We first have the following result:

Theorem 2.19.

For a linear operator UB(H)U\in B(H) the following conditions are equivalent, and if they are satisfied, we say that UU is an isometry:

  1. (1)

    UU is a metric space isometry, d(Ux,Uy)=d(x,y)d(Ux,Uy)=d(x,y).

  2. (2)

    UU is a normed space isometry, ||Ux||=||x||||Ux||=||x||.

  3. (3)

    UU preserves the scalar product, <Ux,Uy>=<x,y><Ux,Uy>=<x,y>.

  4. (4)

    UU satisfies the isometry condition UU=1U^{*}U=1.

In finite dimensions, we recover in this way the usual unitary transformations.

Proof.

The proofs are similar to those in finite dimensions, as follows:

(1)(2)(1)\iff(2) This follows indeed from the formula of the distances, namely:

d(x,y)=||xy||d(x,y)=||x-y||

(2)(3)(2)\iff(3) This is again standard, because we can pass from scalar products to distances, and vice versa, by using ||x||=<x,x>||x||=\sqrt{<x,x>}, and the polarization formula.

(3)(4)(3)\iff(4) We have indeed the following equivalences, by using the standard formula <Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y>, which defines the adjoint operator:

<Ux,Uy>=<x,y>\displaystyle<Ux,Uy>=<x,y> \displaystyle\iff <x,UUy>=<x,y>\displaystyle<x,U^{*}Uy>=<x,y>
\displaystyle\iff UUy=y\displaystyle U^{*}Uy=y
\displaystyle\iff UU=1\displaystyle U^{*}U=1

Thus, we are led to the conclusions in the statement. ∎

The point now is that the condition UU=1U^{*}U=1 does not imply in general UU=1UU^{*}=1, the simplest counterexample here being the shift operator on l2()l^{2}(\mathbb{N}):

Proposition 2.20.

The shift operator on the space l2()l^{2}(\mathbb{N}), given by

S(ei)=ei+1S(e_{i})=e_{i+1}

is an isometry, SS=1S^{*}S=1. However, we have SS1SS^{*}\neq 1.

Proof.

The adjoint of the shift is given by the following formula:

S(ei)={ei1ifi>00ifi=0S^{*}(e_{i})=\begin{cases}e_{i-1}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

When composing S,SS,S^{*}, in one sense we obtain the following formula:

SS(ei)=eiS^{*}S(e_{i})=e_{i}

In other other sense now, we obtain the following formula:

SS(ei)={eiifi>00ifi=0SS^{*}(e_{i})=\begin{cases}e_{i}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

Summarizing, the compositions are given by the following formulae:

SS=1,SS=Proj(e0)S^{*}S=1\quad,\quad SS^{*}=Proj(e_{0}^{\perp})

Thus, we are led to the conclusions in the statement. ∎

As a conclusion, the notion of isometry is not the correct infinite dimensional analogue of the notion of unitary, and the unitary operators must be introduced as follows:

Theorem 2.21.

For a linear operator UB(H)U\in B(H) the following conditions are equivalent, and if they are satisfied, we say that UU is a unitary:

  1. (1)

    UU is an isometry, which is invertible.

  2. (2)

    UU, U1U^{-1} are both isometries.

  3. (3)

    UU, UU^{*} are both isometries.

  4. (4)

    UU=UU=1UU^{*}=U^{*}U=1.

  5. (5)

    U=U1U^{*}=U^{-1}.

Moreover, the unitary operators from a group U(H)B(H)U(H)\subset B(H).

Proof.

There are several statements here, the idea being as follows:

(1) The various equivalences in the statement are all clear from definitions, and from Theorem 2.19 in what regards the various possible notions of isometries which can be used, by using the formula (ST)=TS(ST)^{*}=T^{*}S^{*} for the adjoints of the products of operators.

(2) The fact that the products and inverses of unitaries are unitaries is also clear, and we conclude that the unitary operators from a group U(H)B(H)U(H)\subset B(H), as stated. ∎

Let us discuss now the projections. Modulo the fact that all the subspaces KHK\subset H where these projections project must be assumed to be closed, in the present setting, here the result is perfectly similar to the one in finite dimensions, as follows:

Theorem 2.22.

For a linear operator PB(H)P\in B(H) the following conditions are equivalent, and if they are satisfied, we say that PP is a projection:

  1. (1)

    PP is the orthogonal projection on a closed subspace KHK\subset H.

  2. (2)

    PP satisfies the projection equations P2=P=PP^{2}=P^{*}=P.

Proof.

As in finite dimensions, PP is an abstract projection, not necessarily orthogonal, when it is an idempotent, algebrically speaking, in the sense that we have:

P2=PP^{2}=P

The point now is that this projection is orthogonal when:

<Pxx,Py>=0\displaystyle<Px-x,Py>=0 \displaystyle\iff <PPxPx,y>=0\displaystyle<P^{*}Px-P^{*}x,y>=0
\displaystyle\iff PPxPx=0\displaystyle P^{*}Px-P^{*}x=0
\displaystyle\iff PPP=0\displaystyle P^{*}P-P^{*}=0
\displaystyle\iff PP=P\displaystyle P^{*}P=P^{*}

Now observe that by conjugating, we obtain PP=PP^{*}P=P. Thus, we must have P=PP=P^{*}, and so we have shown that any orthogonal projection must satisfy, as claimed:

P2=P=PP^{2}=P^{*}=P

Conversely, if this condition is satisfied, P2=PP^{2}=P shows that PP is a projection, and P=PP=P^{*} shows via the above computation that PP is indeed orthogonal. ∎

There is a relation between the projections and the general isometries, such as the shift SS that we met before, and we have the following result:

Proposition 2.23.

Given an isometry UB(H)U\in B(H), the operator

P=UUP=UU^{*}

is a projection, namely the orthogonal projection on Im(U)Im(U).

Proof.

Assume indeed that we have an isometry, UU=1U^{*}U=1. The fact that P=UUP=UU^{*} is indeed a projection can be checked abstractly, as follows:

(UU)=UU(UU^{*})^{*}=UU^{*}
UUUU=UUUU^{*}UU^{*}=UU^{*}

As for the last assertion, this is something that we already met, for the shift, and the situation in general is similar, with the result itself being clear. ∎

More generally now, along the same lines, and clarifying the whole situation with the unitaries and isometries, we have the following result:

Theorem 2.24.

An operator UB(H)U\in B(H) is a partial isometry, in the usual geometric sense, when the following two operators are projections:

P=UU,Q=UUP=UU^{*}\quad,\quad Q=U^{*}U

Moreover, the isometries, adjoints of isometries and unitaries are respectively characterized by the conditions Q=1Q=1, P=1P=1, P=Q=1P=Q=1.

Proof.

The first assertion is a straightforward extension of Proposition 2.23, and the second assertion follows from various results regarding isometries established above. ∎

It is possible to talk as well about symmetries, in the following way:

Definition 2.25.

An operator SB(H)S\in B(H) is called a symmetry when S2=1S^{2}=1, and a unitary symmetry when one of the following equivalent conditions is satisfied:

  1. (1)

    SS is a unitary, S=S1S^{*}=S^{-1}, and a symmetry as well, S2=1S^{2}=1.

  2. (2)

    SS satisfies the equations S=S=S1S=S^{*}=S^{-1}.

Here the terminology is a bit non-standard, because even in finite dimensions, S2=1S^{2}=1 is not exactly what you would require for a “true” symmetry, as shown by the following transformation, which is a symmetry in our sense, but not a unitary symmetry:

(021/20)(xy)=(2yx/2)\begin{pmatrix}0&2\\ 1/2&0\end{pmatrix}\binom{x}{y}=\binom{2y}{x/2}

Let us study now some larger classes of operators, which are of particular importance, namely the self-adjoint, positive and normal ones. We first have:

Theorem 2.26.

For an operator TB(H)T\in B(H), the following conditions are equivalent, and if they are satisfied, we call TT self-adjoint:

  1. (1)

    T=TT=T^{*}.

  2. (2)

    <Tx,x><Tx,x>\in\mathbb{R}.

In finite dimensions, we recover in this way the usual self-adjointness notion.

Proof.

There are several assertions here, the idea being as follows:

(1)(2)(1)\implies(2) This is clear, because we have:

<Tx,x>¯\displaystyle\overline{<Tx,x>} =\displaystyle= <x,Tx>\displaystyle<x,Tx>
=\displaystyle= <Tx,x>\displaystyle<T^{*}x,x>
=\displaystyle= <Tx,x>\displaystyle<Tx,x>

(2)(1)(2)\implies(1) In order to prove this, observe that the beginning of the above computation shows that, when assuming <Tx,x><Tx,x>\in\mathbb{R}, the following happens:

<Tx,x>=<Tx,x><Tx,x>=<T^{*}x,x>

Thus, in terms of the operator S=TTS=T-T^{*}, we have:

<Sx,x>=0<Sx,x>=0

In order to finish, we use a polarization trick. We have the following formula:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x><S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

Since the first 3 terms vanish, the sum of the 2 last terms vanishes too. But, by using S=SS^{*}=-S, coming from S=TTS=T-T^{*}, we can process this latter vanishing as follows:

<Sx,y>\displaystyle<Sx,y> =\displaystyle= <Sy,x>\displaystyle-<Sy,x>
=\displaystyle= <y,Sx>\displaystyle<y,Sx>
=\displaystyle= <Sx,y>¯\displaystyle\overline{<Sx,y>}

Thus we must have <Sx,y><Sx,y>\in\mathbb{R}, and with yiyy\to iy we obtain <Sx,y>i<Sx,y>\in i\mathbb{R} too, and so <Sx,y>=0<Sx,y>=0. Thus S=0S=0, which gives T=TT=T^{*}, as desired.

(3) Finally, in what regards the finite dimensions, or more generally the case where our Hilbert space comes with a basis, H=l2(I)H=l^{2}(I), here the condition T=TT=T^{*} corresponds to the usual self-adjointness condition M=MM=M^{*} at the level of the associated matrices. ∎

At the level of the basic examples, the situation is as follows:

Proposition 2.27.

The folowing operators are self-adjoint:

  1. (1)

    The projections, P2=P=PP^{2}=P^{*}=P. In fact, an abstract, algebraic projection is an orthogonal projection precisely when it is self-adjoint.

  2. (2)

    The unitary symmetries, S=S=S1S=S^{*}=S^{-1}. In fact, a unitary is a unitary symmetry precisely when it is self-adjoint.

Proof.

These assertions are indeed all clear from definitions. ∎

Next in line, we have the notion of positive operator. We have here:

Theorem 2.28.

The positive operators, which are the operators TB(H)T\in B(H) satisfying <Tx,x>0<Tx,x>\geq 0, have the following properties:

  1. (1)

    They are self-adjoint, T=TT=T^{*}.

  2. (2)

    As examples, we have the projections, P2=P=PP^{2}=P^{*}=P.

  3. (3)

    More generally, T=SST=S^{*}S is positive, for any SB(H)S\in B(H).

  4. (4)

    In finite dimensions, we recover the usual positive operators.

Proof.

All these assertions are elementary, the idea being as follows:

(1) This follows from Theorem 2.26, because <Tx,x>0<Tx,x>\geq 0 implies <Tx,x><Tx,x>\in\mathbb{R}.

(2) This is clear from P2=P=PP^{2}=P=P^{*}, because we have:

<Px,x>\displaystyle<Px,x> =\displaystyle= <P2x,x>\displaystyle<P^{2}x,x>
=\displaystyle= <Px,Px>\displaystyle<Px,Px>
=\displaystyle= ||Px||2\displaystyle||Px||^{2}

(3) This follows from a similar computation, namely:

<SSx,x>=<Sx,Sx>=||Sx||2<S^{*}Sx,x>=<Sx,Sx>=||Sx||^{2}

(4) This is well-known, the idea being that the condition <Tx,x>0<Tx,x>\geq 0 corresponds to the usual positivity condition A0A\geq 0, at the level of the associated matrix. ∎

It is possible to talk as well about strictly positive operators, and we have here:

Theorem 2.29.

The strictly positive operators, which are the operators TB(H)T\in B(H) satisfying <Tx,x>>0<Tx,x>>0, for any x0x\neq 0, have the following properties:

  1. (1)

    They are self-adjoint, T=TT=T^{*}.

  2. (2)

    As examples, T=SST=S^{*}S is positive, for any SB(H)S\in B(H) injective.

  3. (3)

    In finite dimensions, we recover the usual strictly positive operators.

Proof.

As before, all these assertions are elementary, the idea being as follows:

(1) This is something that we know, from Theorem 2.28.

(2) This follows from the injectivity of SS, because for any x0x\neq 0 we have:

<SSx,x>\displaystyle<S^{*}Sx,x> =\displaystyle= <Sx,Sx>\displaystyle<Sx,Sx>
=\displaystyle= ||Sx||2\displaystyle||Sx||^{2}
>\displaystyle> 0\displaystyle 0

(3) This is well-known, the idea being that the condition <Tx,x>>0<Tx,x>>0 corresponds to the usual strict positivity condition A>0A>0, at the level of the associated matrix. ∎

As a comment, while any strictly positive matrix A>0A>0 is well-known to be invertible, the analogue of this fact does not hold in infinite dimensions, a counterexample here being the following operator on l2()l^{2}(\mathbb{N}), which is strictly positive but not invertible:

T=(11213)T=\begin{pmatrix}1\\ &\frac{1}{2}\\ &&\frac{1}{3}\\ &&&\ddots\end{pmatrix}

As a last remarkable class of operators, we have the normal ones. We have here:

Theorem 2.30.

For an operator TB(H)T\in B(H), the following conditions are equivalent, and if they are satisfied, we call TT normal:

  1. (1)

    TT=TTTT^{*}=T^{*}T.

  2. (2)

    ||Tx||=||Tx||||Tx||=||T^{*}x||.

In finite dimensions, we recover in this way the usual normality notion.

Proof.

There are several assertions here, the idea being as follows:

(1)(2)(1)\implies(2) This is clear, due to the following computation:

||Tx||2\displaystyle||Tx||^{2} =\displaystyle= <Tx,Tx>\displaystyle<Tx,Tx>
=\displaystyle= <TTx,x>\displaystyle<T^{*}Tx,x>
=\displaystyle= <TTx,x>\displaystyle<TT^{*}x,x>
=\displaystyle= <Tx,Tx>\displaystyle<T^{*}x,T^{*}x>
=\displaystyle= ||Tx||2\displaystyle||T^{*}x||^{2}

(2)(1)(2)\implies(1) This is clear as well, because the above computation shows that, when assuming ||Tx||=||Tx||||Tx||=||T^{*}x||, the following happens:

<TTx,x>=<TTx,x><TT^{*}x,x>=<T^{*}Tx,x>

Thus, in terms of the operator S=TTTTS=TT^{*}-T^{*}T, we have:

<Sx,x>=0<Sx,x>=0

In order to finish, we use a polarization trick. We have the following formula:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x><S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

Since the first 3 terms vanish, the sum of the 2 last terms vanishes too. But, by using S=SS=S^{*}, coming from S=TTTTS=TT^{*}-T^{*}T, we can process this latter vanishing as follows:

<Sx,y>\displaystyle<Sx,y> =\displaystyle= <Sy,x>\displaystyle-<Sy,x>
=\displaystyle= <y,Sx>\displaystyle-<y,Sx>
=\displaystyle= <Sx,y>¯\displaystyle-\overline{<Sx,y>}

Thus we must have <Sx,y>i<Sx,y>\in i\mathbb{R}, and with yiyy\to iy we obtain <Sx,y><Sx,y>\in\mathbb{R} too, and so <Sx,y>=0<Sx,y>=0. Thus S=0S=0, which gives TT=TTTT^{*}=T^{*}T, as desired.

(3) Finally, in what regards finite dimensions, or more generally the case where our Hilbert space comes with a basis, H=l2(I)H=l^{2}(I), here the condition TT=TTTT^{*}=T^{*}T corresponds to the usual normality condition MM=MMMM^{*}=M^{*}M at the level of the associated matrices. ∎

Observe that the normal operators generalize both the self-adjoint operators, and the unitaries. We will be back to such operators, on many occassions, in what follows.

2d. Diagonal operators

Let us work out now what happens in the case that we are mostly interested in, namely H=L2(X)H=L^{2}(X), with XX being a measured space. We first have:

Theorem 2.31.

Given a measured space XX, consider the Hilbert space H=L2(X)H=L^{2}(X). Associated to any function fL(X)f\in L^{\infty}(X) is then the multiplication operator

Tf:HH,Tf(g)=fgT_{f}:H\to H\quad,\quad T_{f}(g)=fg

which is well-defined, linear and bounded, having norm as follows:

||Tf||=||f||||T_{f}||=||f||_{\infty}

Moreover, the correspondence fTff\to T_{f} is linear, multiplicative and involutive.

Proof.

There are several assertions here, the idea being as follows:

(1) We must first prove that the formula in the statement, Tf(g)=fgT_{f}(g)=fg, defines indeed an operator HHH\to H, which amounts in saying that we have:

fL(X),gL2(X)fgL2(X)f\in L^{\infty}(X),\ g\in L^{2}(X)\implies fg\in L^{2}(X)

But this follows from the following explicit estimate:

||fg||2\displaystyle||fg||_{2} =\displaystyle= X|f(x)|2|g(x)|2dμ(x)\displaystyle\sqrt{\int_{X}|f(x)|^{2}|g(x)|^{2}d\mu(x)}
\displaystyle\leq supxX|f(x)|2X|g(x)|2dμ(x)\displaystyle\sup_{x\in X}|f(x)|^{2}\sqrt{\int_{X}|g(x)|^{2}d\mu(x)}
=\displaystyle= ||f||||g||2\displaystyle||f||_{\infty}||g||_{2}
<\displaystyle< \displaystyle\infty

(2) Next in line, we must prove that TT is linear and bounded. We have:

Tf(g+h)=Tf(g)+Tf(h),Tf(λg)=λTf(g)T_{f}(g+h)=T_{f}(g)+T_{f}(h)\quad,\quad T_{f}(\lambda g)=\lambda T_{f}(g)

As for the boundedness condition, this follows from the estimate from the proof of (1), which gives, in terms of the operator norm of B(H)B(H):

||Tf||||f||||T_{f}||\leq||f||_{\infty}

(3) Let us prove now that we have equality, ||Tf||=||f||||T_{f}||=||f||_{\infty}, in the above estimate. For this purpose, we use the well-known fact that the LL^{\infty} functions can be approximated by L2L^{2} functions. Indeed, with such an approximation gnfg_{n}\to f we obtain:

||fgn||2\displaystyle||fg_{n}||_{2} =\displaystyle= X|f(x)|2|gn(x)|2dμ(x)\displaystyle\sqrt{\int_{X}|f(x)|^{2}|g_{n}(x)|^{2}d\mu(x)}
\displaystyle\simeq supxX|f(x)|2X|gn(x)|2dμ(x)\displaystyle\sup_{x\in X}|f(x)|^{2}\sqrt{\int_{X}|g_{n}(x)|^{2}d\mu(x)}
=\displaystyle= ||f||||gn||2\displaystyle||f||_{\infty}||g_{n}||_{2}

Thus, with nn\to\infty we obtain ||Tf||||f||||T_{f}||\geq||f||_{\infty}, which is reverse to the inequality obtained in the proof of (2), and this leads to the conclusion in the statement.

(4) Regarding now the fact that the correspondence fTff\to T_{f} is indeed linear and multiplicative, the corresponding formulae are as follows, both clear:

Tf+h(g)=Tf(g)+Th(g),Tλf(g)=λTf(g)T_{f+h}(g)=T_{f}(g)+T_{h}(g)\quad,\quad T_{\lambda f}(g)=\lambda T_{f}(g)

(5) Finally, let us prove that the correspondence fTff\to T_{f} is involutive, in the sense that it transforms the standard involution ff¯f\to\bar{f} of the algebra L(X)L^{\infty}(X) into the standard involution TTT\to T^{*} of the algebra B(H)B(H). We must prove that we have:

Tf=Tf¯T_{f}^{*}=T_{\bar{f}}

But this follows from the following computation:

<Tfg,h>\displaystyle<T_{f}g,h> =\displaystyle= <fg,h>\displaystyle<fg,h>
=\displaystyle= Xf(x)g(x)h¯(x)dμ(x)\displaystyle\int_{X}f(x)g(x)\bar{h}(x)d\mu(x)
=\displaystyle= Xg(x)f(x)h¯(x)dμ(x)\displaystyle\int_{X}g(x)f(x)\bar{h}(x)d\mu(x)
=\displaystyle= <g,f¯h>\displaystyle<g,\bar{f}h>
=\displaystyle= <g,Tf¯h>\displaystyle<g,T_{\bar{f}}h>

Indeed, since the adjoint is unique, we obtain from this Tf=Tf¯T_{f}^{*}=T_{\bar{f}}. Thus the correspondence fTff\to T_{f} is indeed involutive, as claimed. ∎

In what regards now the basic classes of operators, the above construction provides us with many new examples, which are very explicit, and are complementary to the usual finite dimensional examples that we usually have in mind, as follows:

Theorem 2.32.

The multiplication operators Tf(g)=fgT_{f}(g)=fg on the Hilbert space H=L2(X)H=L^{2}(X) associated to the functions fL(X)f\in L^{\infty}(X) are as follows:

  1. (1)

    TfT_{f} is unitary when f:X𝕋f:X\to\mathbb{T}.

  2. (2)

    TfT_{f} is a symmetry when f:X{1,1}f:X\to\{-1,1\}.

  3. (3)

    TfT_{f} is a projection when f=χYf=\chi_{Y} with YXY\in X.

  4. (4)

    There are no non-unitary isometries.

  5. (5)

    There are no non-unitary symmetries.

  6. (6)

    TfT_{f} is positive when f:X+f:X\to\mathbb{R}_{+}.

  7. (7)

    TfT_{f} is self-adjoint when f:Xf:X\to\mathbb{R}.

  8. (8)

    TfT_{f} is always normal, for any f:Xf:X\to\mathbb{C}.

Proof.

All these assertions are clear from definitions, and from the various properties of the correspondence fTff\to T_{f}, established above, as follows:

(1) The unitarity condition U=U1U^{*}=U^{-1} for the operator TfT_{f} reads f¯=f1\bar{f}=f^{-1}, which means that we must have f:X𝕋f:X\to\mathbb{T}, as claimed.

(2) The symmetry condition S2=1S^{2}=1 for the operator TfT_{f} reads f2=1f^{2}=1, which means that we must have f:X{1,1}f:X\to\{-1,1\}, as claimed.

(3) The projection condition P2=P=PP^{2}=P^{*}=P for the operator TfT_{f} reads f2=f=f¯f^{2}=f=\bar{f}, which means that we must have f:X{0,1}f:X\to\{0,1\}, or equivalently, f=χYf=\chi_{Y} with YXY\subset X.

(4) A non-unitary isometry must satisfy by definition UU=1,UU1U^{*}U=1,UU^{*}\neq 1, and for the operator TfT_{f} this means that we must have |f|2=1,|f|21|f|^{2}=1,|f|^{2}\neq 1, which is impossible.

(5) This follows from (1) and (2), because the solutions found in (2) for the symmetry problem are included in the solutions found in (1) for the unitarity problem.

(6) The fact that TfT_{f} is positive amounts in saying that we must have <fg,g>0<fg,g>\geq 0 for any gL2(X)g\in L^{2}(X), and this is equivalent to the fact that we must have f0f\geq 0, as desired.

(7) The self-adjointness condition T=TT=T^{*} for the operator TfT_{f} reads f=f¯f=\bar{f}, which means that we must have f:Xf:X\to\mathbb{R}, as claimed.

(8) The normality condition TT=TTTT^{*}=T^{*}T for the operator TfT_{f} reads ff¯=f¯ff\bar{f}=\bar{f}f, which is automatic for any function f:Xf:X\to\mathbb{C}, as claimed. ∎

The above result might look quite puzzling, at a first glance, messing up our intuition with various classes of operators, coming from usual linear algebra. However, a bit of further thinking tells us that there is no contradiction, and that Theorem 2.32 in fact is very similar to what we know about the diagonal matrices. To be more precise, the diagonal matrices are unitaries precisely when their entries are in 𝕋\mathbb{T}, there are no non-unitary isometries, all such matrices are normal, and so on. In order to understand all this, let us work out what happens with the correspondence fTff\to T_{f}, in finite dimensions. The situation here is in fact extremely simple, and illuminating, as follows:

Theorem 2.33.

Assuming X={1,,N}X=\{1,\ldots,N\} with the counting measure, the embedding

L(X)B(L2(X))L^{\infty}(X)\subset B(L^{2}(X))

constructed via multiplication operators, Tf(g)=fgT_{f}(g)=fg, corresponds to the embedding

NMN()\mathbb{C}^{N}\subset M_{N}(\mathbb{C})

given by the diagonal matrices, constructed as follows:

fdiag(f1,,fN)f\to diag(f_{1},\ldots,f_{N})

Thus, Theorem 2.32 generalizes what we know about the diagonal matrices.

Proof.

The idea is that all this is trivial, with not a single new computation needed, modulo some algebraic thinking, of quite soft type. Let us go back indeed to Theorem 2.31 above and its proof, with the abstract measured space XX appearing there being now the following finite space, with its counting mesure:

X={1,,N}X=\{1,\ldots,N\}

Regarding the functions fL(X)f\in L^{\infty}(X), these are now functions as follows:

f:{1,,N}f:\{1,\ldots,N\}\to\mathbb{C}

We can identify such a function with the corresponding vector (f(i))iN(f(i))_{i}\in\mathbb{C}^{N}, and so we conclude that our input algebra L(X)L^{\infty}(X) is the algebra N\mathbb{C}^{N}:

L(X)=NL^{\infty}(X)=\mathbb{C}^{N}

Regarding now the Hilbert space H=L2(X)H=L^{2}(X), this is equal as well to N\mathbb{C}^{N}, and for the same reasons, namely that gL2(X)g\in L^{2}(X) can be identified with the vector (g(i))iN(g(i))_{i}\in\mathbb{C}^{N}:

L2(X)=NL^{2}(X)=\mathbb{C}^{N}

Observe that, due to our assumption that XX comes with its counting measure, the scalar product that we obtain on N\mathbb{C}^{N} is the usual one, without weights. Now, let us identify the operators on L2(X)=NL^{2}(X)=\mathbb{C}^{N} with the square matrices, in the usual way:

B(L2(X))=MN()B(L^{2}(X))=M_{N}(\mathbb{C})

This was our final identification, in order to get started. Now by getting back to Theorem 2.31, the embedding L(X)B(L2(X))L^{\infty}(X)\subset B(L^{2}(X)) constructed there reads:

NMN()\mathbb{C}^{N}\subset M_{N}(\mathbb{C})

But this can only be the embedding given by the diagonal matrices, so are basically done. In order to finish, however, let us understand what the operator associated to an arbitrary vector fNf\in\mathbb{C}^{N} is. We can regard this vector as a function, f(i)=fif(i)=f_{i}, and so the action Tf(g)=fgT_{f}(g)=fg on the vectors of L2(X)=NL^{2}(X)=\mathbb{C}^{N} is by componentwise multiplication by the numbers f1,,fNf_{1},\ldots,f_{N}. But this is exactly the action of the diagonal matrix diag(f1,,fN)diag(f_{1},\ldots,f_{N}), and so we are led to the conclusion in the statement. ∎

There are other things that can be said about the embedding L(X)B(L2(X))L^{\infty}(X)\subset B(L^{2}(X)), a key observation here, which is elementary to prove, being the fact that the image of L(X)L^{\infty}(X) is closed with respect to the weak topology, the one where TnTT_{n}\to T when TnxTxT_{n}x\to Tx for any xHx\in H. And with this meaning that L(X)L^{\infty}(X) is a so-called von Neumann algebra on L2(X)L^{2}(X). We will be back to this, on numerous occasions, in what follows.

2e. Exercises

As before with linear algebra, operator theory is a wide area of mathematics, and there are many interesting operators, and exercises about them. We first have:

Exercise 2.34.

Find an explicit orthonormal basis for the Hilbert space

H=L2[0,1]H=L^{2}[0,1]

by starting with the algebraic basic fn=xnf_{n}=x^{n} with nn\in\mathbb{N}, and applying Gram-Schmidt.

This is actually quite non-trivial, and in case you’re stuck with complicated computations, better look it up, preferably in the physics literature, physicists being well-known to adore such things, and then write a brief account of what you found.

Exercise 2.35.

Find all the 2×22\times 2 complex matrices

S=(abcd)S=\begin{pmatrix}a&b\\ c&d\end{pmatrix}

which are symmetries, S2=1S^{2}=1, and interpret them geometrically.

Here you can of course start with the real case first, SM2()S\in M_{2}(\mathbb{R}). Also, you can have a look at 3 dimensions too, real or complex, and beware of the computations here.

Exercise 2.36.

Prove that any positive operator T0T\geq 0 appears as

T=S2T=S^{2}

with SS self-adjoint, first in finite dimensions, then in general.

Here the discussion in finite dimensions involves positive eigenvalues and their square roots, which is something quite standard. In infinite dimensions things are a bit more complicated, because we don’t have yet such eigenvalue technology, and with this being actually to come in the next chapter, but you can try of course some other tricks.

Chapter 3 Spectral theorems

3a. Basic theory

We discuss in this chapter the diagonalization problem for the operators TB(H)T\in B(H), in analogy with the diagonalization problem for the usual matrices AMN()A\in M_{N}(\mathbb{C}). As a first observation, we can talk about eigenvalues and eigenvectors, as follows:

Definition 3.1.

Given an operator TB(H)T\in B(H), assuming that we have

Tx=λxTx=\lambda x

we say that xHx\in H is an eigenvector of TT, with eigenvalue λ\lambda\in\mathbb{C}.

We know many things about eigenvalues and eigenvectors, in the finite dimensional case. However, most of these will not extend to the infinite dimensional case, or at least not extend in a straightforward way, due to a number of reasons:

  1. (1)

    Most of basic linear algebra is based on the fact that Tx=λxTx=\lambda x is equivalent to (Tλ)x=0(T-\lambda)x=0, so that λ\lambda is an eigenvalue when TλT-\lambda is not invertible. In the infinite dimensional setting TλT-\lambda might be injective and not surjective, or vice versa, or invertible with (Tλ)1(T-\lambda)^{-1} not bounded, and so on.

  2. (2)

    Also, in linear algebra TλT-\lambda is not invertible when det(Tλ)=0\det(T-\lambda)=0, and with this leading to most of the advanced results about eigenvalues and eigenvectors. In infinite dimensions, however, it is impossible to construct a determinant function det:B(H)\det:B(H)\to\mathbb{C}, and this even for the diagonal operators on l2()l^{2}(\mathbb{N}).

Summarizing, we are in trouble with our extension program, and this right from the beginning. In order to have some theory started, however, let us forget about (2), which obviously leads nowhere, and focus on the difficulties in (1).


In order to cut short the discussion there, regarding the various properties of TλT-\lambda, we can just say that TλT-\lambda is either invertible with bounded inverse, the “good case”, or not. We are led in this way to the following definition:

Definition 3.2.

The spectrum of an operator TB(H)T\in B(H) is the set

σ(T)={λ|TλB(H)1}\sigma(T)=\left\{\lambda\in\mathbb{C}\Big{|}T-\lambda\not\in B(H)^{-1}\right\}

where B(H)1B(H)B(H)^{-1}\subset B(H) is the set of invertible operators.

As a basic example, in the finite dimensional case, H=NH=\mathbb{C}^{N}, the spectrum of a usual matrix AMN()A\in M_{N}(\mathbb{C}) is the collection of its eigenvalues, taken without multiplicities. We will see many other examples. In general, the spectrum has the following properties:

Proposition 3.3.

The spectrum of TB(H)T\in B(H) contains the eigenvalue set

ε(T)={λ|ker(Tλ){0}}\varepsilon(T)=\left\{\lambda\in\mathbb{C}\Big{|}\ker(T-\lambda)\neq\{0\}\right\}

and ε(T)σ(T)\varepsilon(T)\subset\sigma(T) is an equality in finite dimensions, but not in infinite dimensions.

Proof.

We have several assertions here, the idea being as follows:

(1) First of all, the eigenvalue set is indeed the one in the statement, because Tx=λxTx=\lambda x tells us precisely that TλT-\lambda must be not injective. The fact that we have ε(T)σ(T)\varepsilon(T)\subset\sigma(T) is clear as well, because if TλT-\lambda is not injective, it is not bijective.

(2) In finite dimensions we have ε(T)=σ(T)\varepsilon(T)=\sigma(T), because TλT-\lambda is injective if and only if it is bijective, with the boundedness of the inverse being automatic.

(3) In infinite dimensions we can assume H=l2()H=l^{2}(\mathbb{N}), and the shift operator S(ei)=ei+1S(e_{i})=e_{i+1} is injective but not surjective. Thus 0σ(T)ε(T)0\in\sigma(T)-\varepsilon(T). ∎

We will see more examples and counterexamples, and some general theory, in a moment. Philosophically speaking, the best way of thinking at all this is as follows:

– The numbers λσ(T)\lambda\notin\sigma(T) are good, because we can invert TλT-\lambda.

– The numbers λσ(T)ε(T)\lambda\in\sigma(T)-\varepsilon(T) are bad.

– The eigenvalues λε(T)\lambda\in\varepsilon(T) are evil.

Note that this is somewhat contrary to what happens in linear algebra, where the eigenvalues are highly valued, and cherished, and regarded as being the source of all good things on Earth. Welcome to operator theory, where some things are upside down.


Let us develop now some general theory for the spectrum, or perhaps for its complement, with the promise to come back to eigenvalues later. As a first result, we would like to prove that the spectra are non-empty. This is something tricky, and we will need:

Proposition 3.4.

The following happen:

  1. (1)

    ||T||<1(1T)1=1+T+T2+||T||<1\implies(1-T)^{-1}=1+T+T^{2}+\ldots

  2. (2)

    The set B(H)1B(H)^{-1} is open.

  3. (3)

    The map TT1T\to T^{-1} is differentiable.

Proof.

All these assertions are elementary, as follows:

(1) This follows as in the scalar case, the computation being as follows, provided that everything converges under the norm, which amounts in saying that ||T||<1||T||<1:

(1T)(1+T+T2+)\displaystyle(1-T)(1+T+T^{2}+\ldots) =\displaystyle= 1T+TT2+T2T3+\displaystyle 1-T+T-T^{2}+T^{2}-T^{3}+\ldots
=\displaystyle= 1\displaystyle 1

(2) Assuming TB(H)1T\in B(H)^{-1}, let us pick SB(H)S\in B(H) such that:

||TS||<1||T1||||T-S||<\frac{1}{||T^{-1}||}

We have then the following estimate:

||1T1S||\displaystyle||1-T^{-1}S|| =\displaystyle= ||T1(TS)||\displaystyle||T^{-1}(T-S)||
\displaystyle\leq ||T1||||TS||\displaystyle||T^{-1}||\cdot||T-S||
<\displaystyle< 1\displaystyle 1

Thus we have T1SB(H)1T^{-1}S\in B(H)^{-1}, and so SB(H)1S\in B(H)^{-1}, as desired.

(3) In the scalar case, the derivative of f(t)=t1f(t)=t^{-1} is f(t)=t2f^{\prime}(t)=-t^{-2}. In the present normed space setting the derivative is no longer a number, but rather a linear transformation, which can be found by developing f(T)=T1f(T)=T^{-1} at order 1, as follows:

(T+S)1\displaystyle(T+S)^{-1} =\displaystyle= ((1+ST1)T)1\displaystyle((1+ST^{-1})T)^{-1}
=\displaystyle= T1(1+ST1)1\displaystyle T^{-1}(1+ST^{-1})^{-1}
=\displaystyle= T1(1ST1+(ST1)2)\displaystyle T^{-1}(1-ST^{-1}+(ST^{-1})^{2}-\ldots)
\displaystyle\simeq T1(1ST1)\displaystyle T^{-1}(1-ST^{-1})
=\displaystyle= T1T1ST1\displaystyle T^{-1}-T^{-1}ST^{-1}

Thus f(T)=T1f(T)=T^{-1} is indeed differentiable, with derivative f(T)S=T1ST1f^{\prime}(T)S=-T^{-1}ST^{-1}. ∎

We can now formulate our first theorem about spectra, as follows:

Theorem 3.5.

The spectrum of a bounded operator TB(H)T\in B(H) is:

  1. (1)

    Compact.

  2. (2)

    Contained in the disc D0(||T||)D_{0}(||T||).

  3. (3)

    Non-empty.

Proof.

This can be proved by using Proposition 3.4, along with a bit of complex and functional analysis, for which we refer to Rudin [rud] and Lax [lax], as follows:

(1) In view of (2) below, it is enough to prove that σ(T)\sigma(T) is closed. But this follows from the following computation, with |ε||\varepsilon| being small:

λσ(T)\displaystyle\lambda\notin\sigma(T) \displaystyle\implies TλB(H)1\displaystyle T-\lambda\in B(H)^{-1}
\displaystyle\implies TλεB(H)1\displaystyle T-\lambda-\varepsilon\in B(H)^{-1}
\displaystyle\implies λ+εσ(T)\displaystyle\lambda+\varepsilon\notin\sigma(T)

(2) This follows from the following computation:

λ>||T||\displaystyle\lambda>||T|| \displaystyle\implies ||Tλ||<1\displaystyle\Big{|}\Big{|}\frac{T}{\lambda}\Big{|}\Big{|}<1
\displaystyle\implies 1TλB(H)1\displaystyle 1-\frac{T}{\lambda}\in B(H)^{-1}
\displaystyle\implies λTB(H)1\displaystyle\lambda-T\in B(H)^{-1}
\displaystyle\implies λσ(T)\displaystyle\lambda\notin\sigma(T)

(3) Assume by contradiction σ(T)=\sigma(T)=\emptyset. Given a linear form fB(H)f\in B(H)^{*}, consider the following map, which is well-defined, due to our assumption σ(T)=\sigma(T)=\emptyset:

φ:,λf((Tλ)1)\varphi:\mathbb{C}\to\mathbb{C}\quad,\quad\lambda\to f((T-\lambda)^{-1})

By using the fact that TT1T\to T^{-1} is differentiable, that we know from Proposition 3.4, we conclude that this map is differentiable, and so holomorphic. Also, we have:

λ\displaystyle\lambda\to\infty \displaystyle\implies Tλ\displaystyle T-\lambda\to\infty
\displaystyle\implies (Tλ)10\displaystyle(T-\lambda)^{-1}\to 0
\displaystyle\implies f((Tλ))10\displaystyle f((T-\lambda))^{-1}\to 0

Thus by the Liouville theorem we obtain φ=0\varphi=0. But, in view of the definition of φ\varphi, this gives (Tλ)1=0(T-\lambda)^{-1}=0, which is a contradiction, as desired. ∎

Here is now a second basic result regarding the spectra, inspired from what happens in finite dimensions, for the usual complex matrices, and which shows that things do not necessarily extend without troubles to the infinite dimensional setting:

Theorem 3.6.

We have the following formula, valid for any operators S,TS,T:

σ(ST){0}=σ(TS){0}\sigma(ST)\cup\{0\}=\sigma(TS)\cup\{0\}

In finite dimensions we have σ(ST)=σ(TS)\sigma(ST)=\sigma(TS), but this fails in infinite dimensions.

Proof.

There are several assertions here, the idea being as follows:

(1) This is something that we know in finite dimensions, coming from the fact that the characteristic polynomials of the associated matrices A,BA,B coincide:

PAB=PBAP_{AB}=P_{BA}

Thus we obtain σ(ST)=σ(TS)\sigma(ST)=\sigma(TS) in this case, as claimed. Observe that this improves twice the general formula in the statement, first because we have no issues at 0, and second because what we obtain is actually an equality of sets with mutiplicities.

(2) In general now, let us first prove the main assertion, stating that σ(ST),σ(TS)\sigma(ST),\sigma(TS) coincide outside 0. We first prove that we have the following implication:

1σ(ST)1σ(TS)1\notin\sigma(ST)\implies 1\notin\sigma(TS)

Assume indeed that 1ST1-ST is invertible, with inverse denoted RR:

R=(1ST)1R=(1-ST)^{-1}

We have then the following formulae, relating our variables R,S,TR,S,T:

RST=STR=R1RST=STR=R-1

By using RST=R1RST=R-1, we have the following computation:

(1+TRS)(1TS)\displaystyle(1+TRS)(1-TS) =\displaystyle= 1+TRSTSTRSTS\displaystyle 1+TRS-TS-TRSTS
=\displaystyle= 1+TRSTSTRS+TS\displaystyle 1+TRS-TS-TRS+TS
=\displaystyle= 1\displaystyle 1

A similar computation, using STR=R1STR=R-1, shows that we have:

(1TS)(1+TRS)=1(1-TS)(1+TRS)=1

Thus 1TS1-TS is invertible, with inverse 1+TRS1+TRS, which proves our claim. Now by multiplying by scalars, we deduce from this that for any λ{0}\lambda\in\mathbb{C}-\{0\} we have:

λσ(ST)λσ(TS)\lambda\notin\sigma(ST)\implies\lambda\notin\sigma(TS)

But this leads to the conclusion in the statement.

(3) Regarding now the counterexample to the formula σ(ST)=σ(TS)\sigma(ST)=\sigma(TS), in general, let us take SS to be the shift on H=L2()H=L^{2}(\mathbb{N}), given by the following formula:

S(ei)=ei+1S(e_{i})=e_{i+1}

As for TT, we can take it to be the adjoint of SS, which is the following operator:

S(ei)={ei1ifi>00ifi=0S^{*}(e_{i})=\begin{cases}e_{i-1}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

Let us compose now these two operators. In one sense, we have:

SS=10σ(SS)S^{*}S=1\implies 0\notin\sigma(S^{*}S)

In the other sense, however, the situation is different, as follows:

SS=Proj(e0)0σ(SS)SS^{*}=Proj(e_{0}^{\perp})\implies 0\in\sigma(SS^{*})

Thus, the spectra do not match on 0, and we have our counterexample, as desired. ∎

3b. Spectral radius

Let us develop now some systematic theory for the computation of the spectra, based on what we know about the eigenvalues of the usual complex matrices. As a first result, which is well-known for the usual matrices, and extends well, we have:

Theorem 3.7.

We have the “polynomial functional calculus” formula

σ(P(T))=P(σ(T))\sigma(P(T))=P(\sigma(T))

valid for any polynomial P[X]P\in\mathbb{C}[X], and any operator TB(H)T\in B(H).

Proof.

We pick a scalar λ\lambda\in\mathbb{C}, and we decompose the polynomial PλP-\lambda:

P(X)λ=c(Xr1)(Xrn)P(X)-\lambda=c(X-r_{1})\ldots(X-r_{n})

We have then the following equivalences:

λσ(P(T))\displaystyle\lambda\notin\sigma(P(T)) \displaystyle\iff P(T)λB(H)1\displaystyle P(T)-\lambda\in B(H)^{-1}
\displaystyle\iff c(Tr1)(Trn)B(H)1\displaystyle c(T-r_{1})\ldots(T-r_{n})\in B(H)^{-1}
\displaystyle\iff Tr1,,TrnB(H)1\displaystyle T-r_{1},\ldots,T-r_{n}\in B(H)^{-1}
\displaystyle\iff r1,,rnσ(T)\displaystyle r_{1},\ldots,r_{n}\notin\sigma(T)
\displaystyle\iff λP(σ(T))\displaystyle\lambda\notin P(\sigma(T))

Thus, we are led to the formula in the statement. ∎

The above result is something very useful, and generalizing it will be our next task. As a first ingredient here, assuming that AMN()A\in M_{N}(\mathbb{C}) is invertible, we have:

σ(A1)=σ(A)1\sigma(A^{-1})=\sigma(A)^{-1}

It is possible to extend this formula to the arbitrary operators, and we will do this in a moment. Before starting, however, we have to think in advance on how to unify this potential result, that we have in mind, with Theorem 3.7 itself.


What we have to do here is to find a class of functions generalizing both the polynomials P[X]P\in\mathbb{C}[X] and the inverse function xx1x\to x^{-1}, and the answer to this question is provided by the rational functions, which are as follows:

Definition 3.8.

A rational function f(X)f\in\mathbb{C}(X) is a quotient of polynomials:

f=PQf=\frac{P}{Q}

Assuming that P,QP,Q are prime to each other, we can regard ff as a usual function,

f:Xf:\mathbb{C}-X\to\mathbb{C}

with XX being the set of zeros of QQ, also called poles of ff.

Here the term “poles” comes from the fact that, if you want to imagine the graph of such a rational function ff, in two complex dimensions, what you get is some sort of tent, supported by poles of infinite height, situated at the zeros of QQ. For more on all this, and on complex analysis in general, we refer as usual to Rudin [rud]. Although a look at an abstract algebra book can be interesting as well.


Now that we have our class of functions, the next step consists in applying them to operators. Here we cannot expect f(T)f(T) to make sense for any ff and any TT, for instance because T1T^{-1} is defined only when TT is invertible. We are led in this way to:

Definition 3.9.

Given an operator TB(H)T\in B(H), and a rational function f=P/Qf=P/Q having poles outside σ(T)\sigma(T), we can construct the following operator,

f(T)=P(T)Q(T)1f(T)=P(T)Q(T)^{-1}

that we can denote as a usual fraction, as follows,

f(T)=P(T)Q(T)f(T)=\frac{P(T)}{Q(T)}

due to the fact that P(T),Q(T)P(T),Q(T) commute, so that the order is irrelevant.

To be more precise, f(T)f(T) is indeed well-defined, and the fraction notation is justified too. In more formal terms, we can say that we have a morphism of complex algebras as follows, with (X)T\mathbb{C}(X)^{T} standing for the rational functions having poles outside σ(T)\sigma(T):

(X)TB(H),ff(T)\mathbb{C}(X)^{T}\to B(H)\quad,\quad f\to f(T)

Summarizing, we have now a good class of functions, generalizing both the polynomials and the inverse map xx1x\to x^{-1}. We can now extend Theorem 3.7, as follows:

Theorem 3.10.

We have the “rational functional calculus” formula

σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T))

valid for any rational function f(X)f\in\mathbb{C}(X) having poles outside σ(T)\sigma(T).

Proof.

We pick a scalar λ\lambda\in\mathbb{C}, we write f=P/Qf=P/Q, and we set:

F=PλQF=P-\lambda Q

By using now Theorem 3.7, for this polynomial, we obtain:

λσ(f(T))\displaystyle\lambda\in\sigma(f(T)) \displaystyle\iff F(T)B(H)1\displaystyle F(T)\notin B(H)^{-1}
\displaystyle\iff 0σ(F(T))\displaystyle 0\in\sigma(F(T))
\displaystyle\iff 0F(σ(T))\displaystyle 0\in F(\sigma(T))
\displaystyle\iff μσ(T),F(μ)=0\displaystyle\exists\mu\in\sigma(T),F(\mu)=0
\displaystyle\iff λf(σ(T))\displaystyle\lambda\in f(\sigma(T))

Thus, we are led to the formula in the statement. ∎

As an application of the above methods, we can investigate certain special classes of operators, such as the self-adjoint ones, and the unitary ones. Let us start with:

Proposition 3.11.

The following happen:

  1. (1)

    We have σ(T)=σ(T)¯\sigma(T^{*})=\overline{\sigma(T)}, for any TB(H)T\in B(H).

  2. (2)

    If T=TT=T^{*} then X=σ(T)X=\sigma(T) satisfies X=X¯X=\overline{X}.

  3. (3)

    If U=U1U^{*}=U^{-1} then X=σ(U)X=\sigma(U) satisfies X1=X¯X^{-1}=\overline{X}.

Proof.

We have several assertions here, the idea being as follows:

(1) The spectrum of the adjoint operator TT^{*} can be computed as follows:

σ(T)\displaystyle\sigma(T^{*}) =\displaystyle= {λ|TλB(H)1}\displaystyle\left\{\lambda\in\mathbb{C}\Big{|}T^{*}-\lambda\notin B(H)^{-1}\right\}
=\displaystyle= {λ|Tλ¯B(H)1}\displaystyle\left\{\lambda\in\mathbb{C}\Big{|}T-\bar{\lambda}\notin B(H)^{-1}\right\}
=\displaystyle= σ(T)¯\displaystyle\overline{\sigma(T)}

(2) This is clear indeed from (1).

(3) For a unitary operator, U=U1U^{*}=U^{-1}, Theorem 3.10 and (1) give:

σ(U)1=σ(U1)=σ(U)=σ(U)¯\sigma(U)^{-1}=\sigma(U^{-1})=\sigma(U^{*})=\overline{\sigma(U)}

Thus, we are led to the conclusion in the statement. ∎

In analogy with what happens for the usual matrices, we would like to improve now (2,3) above, with results stating that the spectrum X=σ(T)X=\sigma(T) satisfies XX\subset\mathbb{R} for self-adjoints, and X𝕋X\subset\mathbb{T} for unitaries. This will be tricky. Let us start with:

Theorem 3.12.

The spectrum of a unitary operator

U=U1U^{*}=U^{-1}

is on the unit circle, σ(U)𝕋\sigma(U)\subset\mathbb{T}.

Proof.

Assuming U=U1U^{*}=U^{-1}, we have the following norm computation:

||U||=||UU||=1=1||U||=\sqrt{||UU^{*}||}=\sqrt{1}=1

Now if we denote by DD the unit disk, we obtain from this:

σ(U)D\sigma(U)\subset D

On the other hand, once again by using U=U1U^{*}=U^{-1}, we have as well:

||U1||=||U||=||U||=1||U^{-1}||=||U^{*}||=||U||=1

Thus, as before with DD being the unit disk in the complex plane, we have:

σ(U1)D\sigma(U^{-1})\subset D

Now by using Theorem 3.10, we obtain σ(U)DD1=𝕋\sigma(U)\subset D\cap D^{-1}=\mathbb{T}, as desired. ∎

We have as well a similar result for self-adjoints, as follows:

Theorem 3.13.

The spectrum of a self-adjoint operator

T=TT=T^{*}

consists of real numbers, σ(T)\sigma(T)\subset\mathbb{R}.

Proof.

The idea is that we can deduce the result from Theorem 3.12, by using the following remarkable rational function, depending on a parameter rr\in\mathbb{R}:

f(z)=z+irzirf(z)=\frac{z+ir}{z-ir}

Indeed, for r>>0r>>0 the operator f(T)f(T) is well-defined, and we have:

(T+irTir)=TirT+ir=(T+irTir)1\left(\frac{T+ir}{T-ir}\right)^{*}=\frac{T-ir}{T+ir}=\left(\frac{T+ir}{T-ir}\right)^{-1}

Thus f(T)f(T) is unitary, and by using Theorem 3.12 we obtain:

σ(T)\displaystyle\sigma(T) \displaystyle\subset f1(f(σ(T)))\displaystyle f^{-1}(f(\sigma(T)))
=\displaystyle= f1(σ(f(T)))\displaystyle f^{-1}(\sigma(f(T)))
\displaystyle\subset f1(𝕋)\displaystyle f^{-1}(\mathbb{T})
=\displaystyle= \displaystyle\mathbb{R}

Thus, we are led to the conclusion in the statement. ∎

As a theoretical remark, it is possible to deduce as well Theorem 3.12 from Theorem 3.13, by performing the above computation in the other sense. Indeed, by assuming that Theorem 3.13 holds indeed, and starting with a unitary UB(H)U\in B(H), we obtain:

σ(U)\displaystyle\sigma(U) \displaystyle\subset f(f1(σ(U)))\displaystyle f(f^{-1}(\sigma(U)))
=\displaystyle= f(σ(f1(U)))\displaystyle f(\sigma(f^{-1}(U)))
\displaystyle\subset f()\displaystyle f(\mathbb{R})
=\displaystyle= 𝕋\displaystyle\mathbb{T}

As a conclusion now, we have so far a beginning of spectral theory, with results allowing us to investigate the unitaries and the self-adjoints, and with the remark that these two classes of operators are related by a certain wizarding rational function, namely:

f(z)=z+irzirf(z)=\frac{z+ir}{z-ir}

Let us keep building on this, with more complex analysis involved. One key thing that we know about matrices, and which follows for instance by using the fact that the diagonalizable matrices are dense, is the following formula:

σ(eA)=eσ(A)\sigma(e^{A})=e^{\sigma(A)}

We would like to have such formulae for the general operators TB(H)T\in B(H), but this is something quite technical. Consider the rational calculus morphism from Definition 3.9, which is as follows, with the exponent standing for “having poles outside σ(T)\sigma(T)”:

(X)TB(H),ff(T)\mathbb{C}(X)^{T}\to B(H)\quad,\quad f\to f(T)

As mentioned before, the rational functions are holomorphic outside their poles, and this raises the question of extending this morphism, as follows:

Hol(σ(T))B(H),ff(T)Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Normally this can be done in several steps. Let us start with:

Proposition 3.14.

We can exponentiate any operator TB(H)T\in B(H), by setting:

eT=k=0Tkk!e^{T}=\sum_{k=0}^{\infty}\frac{T^{k}}{k!}

Similarly, we can define f(T)f(T), for any holomorphic function f:f:\mathbb{C}\to\mathbb{C}.

Proof.

We must prove that the series defining eTe^{T} converges, and this follows from:

||eT||k=0||T||kk!=e||T||||e^{T}||\leq\sum_{k=0}^{\infty}\frac{||T||^{k}}{k!}=e^{||T||}

The case of the arbitrary holomorphic functions f:f:\mathbb{C}\to\mathbb{C} is similar. ∎

In general, the holomorphic functions are not entire, and the above method won’t cover the rational functions f(X)Tf\in\mathbb{C}(X)^{T} that we want to generalize. Thus, we must use something else. And the answer here comes from the Cauchy formula:

f(t)=12πiγf(z)ztdzf(t)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-t}\,dz

Indeed, given a rational function f(X)Tf\in\mathbb{C}(X)^{T}, the operator f(T)B(H)f(T)\in B(H), constructed in Definition 3.9, can be recaptured in an analytic way, as follows:

f(T)=12πiγf(z)zTdzf(T)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-T}\,dz

Now given an arbitrary function fHol(σ(T))f\in Hol(\sigma(T)), we can define f(T)B(H)f(T)\in B(H) by the exactly same formula, and we obtain in this way the desired correspondence:

Hol(σ(T))B(H),ff(T)Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

This was for the plan. In practice now, all this needs a bit of care, with many verifications needed, and with the technical remark that a winding number must be added to the above Cauchy formulae, for things to be correct. The result is as follows:

Theorem 3.15.

We have the “holomorphic functional calculus” formula

σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T))

valid for any holomorphic function fHol(σ(T))f\in Hol(\sigma(T)).

Proof.

This is something that we will not really need, for the purposes of the present book, which is more algebraic than analytic, but here is the general idea:

(1) As explained above, given a rational function f(X)Tf\in\mathbb{C}(X)^{T}, the corresponding operator f(T)B(H)f(T)\in B(H) can be recaptured in an analytic way, as follows:

f(T)=12πiγf(z)zTdzf(T)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-T}\,dz

(2) Now given an arbitrary function fHol(σ(T))f\in Hol(\sigma(T)), we can define f(T)B(H)f(T)\in B(H) by the exactly same formula, and we obtain in this way the desired correspondence:

Hol(σ(T))B(H),ff(T)Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

(3) In practice now, all this needs a bit of care, notably with the verification of the fact that the operator f(T)B(H)f(T)\in B(H) does not depend on γ\gamma, and with the technical remark that a winding number must be added to the above Cauchy formulae, for things to be correct. But this can be done via a standard study, keeping in mind the fact that in the case H=H=\mathbb{C}, where our operators are usual numbers, B(H)=B(H)=\mathbb{C}, what we want to do is simply proving that the usual Cauchy formula holds indeed.

(4) Now with this correspondence ff(T)f\to f(T) constructed, and so with the formula in the statement, namely σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T)), making now sense, it remains to prove that this formula holds indeed. But this follows as well via a careful use of the Cauchy formula, or by using approximation by polynomials, or rational functions. ∎

As already said, the above result is important for advanced operator theory and applications, and we will not get further into this subject. We will be back, however, to all this in the special case of the normal operators, which is of particular interest for us.


In order to formulate now our next result, we will need the following notion:

Definition 3.16.

Given an operator TB(H)T\in B(H), its spectral radius

ρ(T)[0,||T||]\rho(T)\in\big{[}0,||T||\big{]}

is the radius of the smallest disk centered at 0 containing σ(T)\sigma(T).

Here we have included for convenience a number of basic results from Theorem 3.5, namely the fact that the spectrum is non-empty, and is contained in the disk D0(||T||)D_{0}(||T||), which provide us respectively with the inequalities ρ(T)0\rho(T)\geq 0, with the usual convention sup=\sup\emptyset=-\infty, and ρ(T)||T||\rho(T)\leq||T||. Now with this notion in hand, we have the following key result, improving our key result so far, namely σ(T)\sigma(T)\neq\emptyset, from Theorem 3.5:

Theorem 3.17.

The spectral radius of an operator TB(H)T\in B(H) is given by

ρ(T)=limn||Tn||1/n\rho(T)=\lim_{n\to\infty}||T^{n}||^{1/n}

and in this formula, we can replace the limit by an inf.

Proof.

We have several things to be proved, the idea being as follows:

(1) Our first claim is that the numbers un=||Tn||1/nu_{n}=||T^{n}||^{1/n} satisfy:

(n+m)un+mnun+mum(n+m)u_{n+m}\leq nu_{n}+mu_{m}

Indeed, we have the following estimate, using the Young inequality abap/p+bq/qab\leq a^{p}/p+b^{q}/q, with exponents p=(n+m)/np=(n+m)/n and q=(n+m)/mq=(n+m)/m:

un+m\displaystyle u_{n+m} =\displaystyle= ||Tn+m||1/(n+m)\displaystyle||T^{n+m}||^{1/(n+m)}
\displaystyle\leq ||Tn||1/(n+m)||Tm||1/(n+m)\displaystyle||T^{n}||^{1/(n+m)}||T^{m}||^{1/(n+m)}
\displaystyle\leq ||Tn||1/nnn+m+||Tm||1/mmn+m\displaystyle||T^{n}||^{1/n}\cdot\frac{n}{n+m}+||T^{m}||^{1/m}\cdot\frac{m}{n+m}
=\displaystyle= nun+mumn+m\displaystyle\frac{nu_{n}+mu_{m}}{n+m}

(2) Our second claim is that the second assertion holds, namely:

limn||Tn||1/n=infn||Tn||1/n\lim_{n\to\infty}||T^{n}||^{1/n}=\inf_{n}||T^{n}||^{1/n}

For this purpose, we just need the inequality found in (1). Indeed, fix m1m\geq 1, let n1n\geq 1, and write n=lm+rn=lm+r with 0rm10\leq r\leq m-1. By using twice uabubu_{ab}\leq u_{b}, we get:

un\displaystyle u_{n} \displaystyle\leq 1n(lmulm+rur)\displaystyle\frac{1}{n}(lmu_{lm}+ru_{r})
\displaystyle\leq 1n(lmum+ru1)\displaystyle\frac{1}{n}(lmu_{m}+ru_{1})
\displaystyle\leq um+rnu1\displaystyle u_{m}+\frac{r}{n}\,u_{1}

It follows that we have limsupnunum\lim\sup_{n}u_{n}\leq u_{m}, which proves our claim.

(3) Summarizing, we are left with proving the main formula, which is as follows, and with the remark that we already know that the sequence on the right converges:

ρ(T)=limn||Tn||1/n\rho(T)=\lim_{n\to\infty}||T^{n}||^{1/n}

In one sense, we can use the polynomial calculus formula σ(Tn)=σ(T)n\sigma(T^{n})=\sigma(T)^{n}. Indeed, this gives the following estimate, valid for any nn, as desired:

ρ(T)\displaystyle\rho(T) =\displaystyle= supλσ(T)|λ|\displaystyle\sup_{\lambda\in\sigma(T)}|\lambda|
=\displaystyle= supρσ(T)n|ρ|1/n\displaystyle\sup_{\rho\in\sigma(T)^{n}}|\rho|^{1/n}
=\displaystyle= supρσ(Tn)|ρ|1/n\displaystyle\sup_{\rho\in\sigma(T^{n})}|\rho|^{1/n}
=\displaystyle= ρ(Tn)1/n\displaystyle\rho(T^{n})^{1/n}
\displaystyle\leq ||Tn||1/n\displaystyle||T^{n}||^{1/n}

(4) For the reverse inequality, we fix a number ρ>ρ(T)\rho>\rho(T), and we want to prove that we have ρlimn||Tn||1/n\rho\geq\lim_{n\to\infty}||T^{n}||^{1/n}. By using the Cauchy formula, we have:

12πi|z|=ρznzTdz\displaystyle\frac{1}{2\pi i}\int_{|z|=\rho}\frac{z^{n}}{z-T}\,dz =\displaystyle= 12πi|z|=ρk=0znk1Tkdz\displaystyle\frac{1}{2\pi i}\int_{|z|=\rho}\sum_{k=0}^{\infty}z^{n-k-1}T^{k}\,dz
=\displaystyle= k=012πi(|z|=ρznk1dz)Tk\displaystyle\sum_{k=0}^{\infty}\frac{1}{2\pi i}\left(\int_{|z|=\rho}z^{n-k-1}dz\right)T^{k}
=\displaystyle= k=0δn,k+1Tk\displaystyle\sum_{k=0}^{\infty}\delta_{n,k+1}T^{k}
=\displaystyle= Tn1\displaystyle T^{n-1}

By applying the norm we obtain from this formula:

||Tn1||\displaystyle||T^{n-1}|| \displaystyle\leq 12π|z|=ρ||znzT||dz\displaystyle\frac{1}{2\pi}\int_{|z|=\rho}\left|\left|\frac{z^{n}}{z-T}\right|\right|\,dz
\displaystyle\leq ρnsup|z|=ρ||1zT||\displaystyle\rho^{n}\cdot\sup_{|z|=\rho}\left|\left|\frac{1}{z-T}\right|\right|

Since the sup does not depend on nn, by taking nn-th roots, we obtain in the limit:

ρlimn||Tn||1/n\rho\geq\lim_{n\to\infty}||T^{n}||^{1/n}

Now recall that ρ\rho was by definition an arbitrary number satisfying ρ>ρ(T)\rho>\rho(T). Thus, we have obtained the following estimate, valid for any TB(H)T\in B(H):

ρ(T)limn||Tn||1/n\rho(T)\geq\lim_{n\to\infty}||T^{n}||^{1/n}

Thus, we are led to the conclusion in the statement. ∎

In the case of the normal elements, we have the following finer result:

Theorem 3.18.

The spectral radius of a normal element,

TT=TTTT^{*}=T^{*}T

is equal to its norm.

Proof.

We can proceed in two steps, as follows:

Step 1. In the case T=TT=T^{*} we have ||Tn||=||T||n||T^{n}||=||T||^{n} for any exponent of the form n=2kn=2^{k}, by using the formula ||TT||=||T||2||TT^{*}||=||T||^{2}, and by taking nn-th roots we get:

ρ(T)||T||\rho(T)\geq||T||

Thus, we are done with the self-adjoint case, with the result ρ(T)=||T||\rho(T)=||T||.

Step 2. In the general normal case TT=TTTT^{*}=T^{*}T we have Tn(Tn)=(TT)nT^{n}(T^{n})^{*}=(TT^{*})^{n}, and by using this, along with the result from Step 1, applied to TTTT^{*}, we obtain:

ρ(T)\displaystyle\rho(T) =\displaystyle= limn||Tn||1/n\displaystyle\lim_{n\to\infty}||T^{n}||^{1/n}
=\displaystyle= limn||Tn(Tn)||1/n\displaystyle\sqrt{\lim_{n\to\infty}||T^{n}(T^{n})^{*}||^{1/n}}
=\displaystyle= limn||(TT)n||1/n\displaystyle\sqrt{\lim_{n\to\infty}||(TT^{*})^{n}||^{1/n}}
=\displaystyle= ρ(TT)\displaystyle\sqrt{\rho(TT^{*})}
=\displaystyle= ||T||2\displaystyle\sqrt{||T||^{2}}
=\displaystyle= ||T||\displaystyle||T||

Thus, we are led to the conclusion in the statement. ∎

As a first comment, the spectral radius formula ρ(T)=||T||\rho(T)=||T|| does not hold in general, the simplest counterexample being the following non-normal matrix:

J=(0100)J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

As another comment, we can combine the formula ρ(T)=||T||\rho(T)=||T|| for normal operators with the formula ||TT||=||T||2||TT^{*}||=||T||^{2}, and we are led to the following statement:

Theorem 3.19.

The norm of B(H)B(H) is given by

||T||=sup{λ|TTλB(H)1}||T||=\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}TT^{*}-\lambda\notin B(H)^{-1}\right\}}

and so is a purely algebraic quantity.

Proof.

We have the following computation, using the formula ||TT||=||T||2||TT^{*}||=||T||^{2}, then the spectral radius formula for TTTT^{*}, and finally the definition of the spectral radius:

||T||\displaystyle||T|| =\displaystyle= ||TT||\displaystyle\sqrt{||TT^{*}||}
=\displaystyle= ρ(TT)\displaystyle\sqrt{\rho(TT^{*})}
=\displaystyle= sup{λ|λσ(TT)}\displaystyle\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}\lambda\in\sigma(TT^{*})\right\}}
=\displaystyle= sup{λ|TTλB(H)1}\displaystyle\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}TT^{*}-\lambda\notin B(H)^{-1}\right\}}

Thus, we are led to the conclusion in the statement. ∎

The above result is quite interesting, philosophically speaking. We will be back to this, with further results and comments on B(H)B(H), and other algebras of the same type.

3c. Normal operators

By using Theorem 3.18 we can say a number of non-trivial things concerning the normal operators, commonly known as “spectral theorem for normal operators”. As a first result here, we can improve the polynomial functional calculus formula:

Theorem 3.20.

Given TB(H)T\in B(H) normal, we have a morphism of algebras

[X]B(H),PP(T)\mathbb{C}[X]\to B(H)\quad,\quad P\to P(T)

having the properties ||P(T)||=||P|σ(T)||||P(T)||=||P_{|\sigma(T)}||, and σ(P(T))=P(σ(T))\sigma(P(T))=P(\sigma(T)).

Proof.

This is an improvement of Theorem 3.7 in the normal case, with the extra assertion being the norm estimate. But the element P(T)P(T) being normal, we can apply to it the spectral radius formula for normal elements, and we obtain:

||P(T)||\displaystyle||P(T)|| =\displaystyle= ρ(P(T))\displaystyle\rho(P(T))
=\displaystyle= supλσ(P(T))|λ|\displaystyle\sup_{\lambda\in\sigma(P(T))}|\lambda|
=\displaystyle= supλP(σ(T))|λ|\displaystyle\sup_{\lambda\in P(\sigma(T))}|\lambda|
=\displaystyle= ||P|σ(T)||\displaystyle||P_{|\sigma(T)}||

Thus, we are led to the conclusions in the statement. ∎

We can improve as well the rational calculus formula, and the holomorphic calculus formula, in the same way. Importantly now, at a more advanced level, we have:

Theorem 3.21.

Given TB(H)T\in B(H) normal, we have a morphism of algebras

C(σ(T))B(H),ff(T)C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

which is isometric, ||f(T)||=||f||||f(T)||=||f||, and has the property σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T)).

Proof.

The idea here is to “complete” the morphism in Theorem 3.20, namely:

[X]B(H),PP(T)\mathbb{C}[X]\to B(H)\quad,\quad P\to P(T)

Indeed, we know from Theorem 3.20 that this morphism is continuous, and is in fact isometric, when regarding the polynomials P[X]P\in\mathbb{C}[X] as functions on σ(T)\sigma(T):

||P(T)||=||P|σ(T)||||P(T)||=||P_{|\sigma(T)}||

We conclude from this that we have a unique isometric extension, as follows:

C(σ(T))B(H),ff(T)C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

It remains to prove σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T)), and we can do this by double inclusion:

\subset” Given a continuous function fC(σ(T))f\in C(\sigma(T)), we must prove that we have:

λf(σ(T))λσ(f(T))\lambda\notin f(\sigma(T))\implies\lambda\notin\sigma(f(T))

For this purpose, consider the following function, which is well-defined:

1fλC(σ(T))\frac{1}{f-\lambda}\in C(\sigma(T))

We can therefore apply this function to TT, and we obtain:

(1fλ)T=1f(T)λ\left(\frac{1}{f-\lambda}\right)T=\frac{1}{f(T)-\lambda}

In particular f(T)λf(T)-\lambda is invertible, so λσ(f(T))\lambda\notin\sigma(f(T)), as desired.

\supset” Given a continuous function fC(σ(T))f\in C(\sigma(T)), we must prove that we have:

λf(σ(T))λσ(f(T))\lambda\in f(\sigma(T))\implies\lambda\in\sigma(f(T))

But this is the same as proving that we have:

μσ(T)f(μ)σ(f(T))\mu\in\sigma(T)\implies f(\mu)\in\sigma(f(T))

For this purpose, we approximate our function by polynomials, PnfP_{n}\to f, and we examine the following convergence, which follows from PnfP_{n}\to f:

Pn(T)Pn(μ)f(T)f(μ)P_{n}(T)-P_{n}(\mu)\to f(T)-f(\mu)

We know from polynomial functional calculus that we have:

Pn(μ)Pn(σ(T))=σ(Pn(T))P_{n}(\mu)\in P_{n}(\sigma(T))=\sigma(P_{n}(T))

Thus, the operators Pn(T)Pn(μ)P_{n}(T)-P_{n}(\mu) are not invertible. On the other hand, we know that the set formed by the invertible operators is open, so its complement is closed. Thus the limit f(T)f(μ)f(T)-f(\mu) is not invertible either, and so f(μ)σ(f(T))f(\mu)\in\sigma(f(T)), as desired. ∎

As an important comment, Theorem 3.21 is not exactly in final form, because it misses an important point, namely that our correspondence maps:

z¯T\bar{z}\to T^{*}

However, this is something non-trivial, and we will be back to this later. Observe however that Theorem 3.21 is fully powerful for the self-adjoint operators, T=TT=T^{*}, where the spectrum is real, and so where z=z¯z=\bar{z} on the spectrum. We will be back to this.


As a second result now, along the same lines, we can further extend Theorem 3.21 into a measurable functional calculus theorem, as follows:

Theorem 3.22.

Given TB(H)T\in B(H) normal, we have a morphism of algebras as follows, with LL^{\infty} standing for abstract measurable functions, or Borel functions,

L(σ(T))B(H),ff(T)L^{\infty}(\sigma(T))\to B(H)\quad,\quad f\to f(T)

which is isometric, ||f(T)||=||f||||f(T)||=||f||, and has the property σ(f(T))=f(σ(T))\sigma(f(T))=f(\sigma(T)).

Proof.

As before, the idea will be that of “completing” what we have. To be more precise, we can use the Riesz theorem and a polarization trick, as follows:

(1) Given a vector xHx\in H, consider the following functional:

C(σ(T)),g<g(T)x,x>C(\sigma(T))\to\mathbb{C}\quad,\quad g\to<g(T)x,x>

By the Riesz theorem, this functional must be the integration with respect to a certain measure μ\mu on the space σ(T)\sigma(T). Thus, we have a formula as follows:

<g(T)x,x>=σ(T)g(z)dμ(z)<g(T)x,x>=\int_{\sigma(T)}g(z)d\mu(z)

Now given an arbitrary Borel function fL(σ(T))f\in L^{\infty}(\sigma(T)), as in the statement, we can define a number <f(T)x,x><f(T)x,x>\in\mathbb{C}, by using exactly the same formula, namely:

<f(T)x,x>=σ(T)f(z)dμ(z)<f(T)x,x>=\int_{\sigma(T)}f(z)d\mu(z)

Thus, we have managed to define numbers <f(T)x,x><f(T)x,x>\in\mathbb{C}, for all vectors xHx\in H, and in addition we can recover these numbers as follows, with gnC(σ(T))g_{n}\in C(\sigma(T)):

<f(T)x,x>=limgnf<gn(T)x,x><f(T)x,x>=\lim_{g_{n}\to f}<g_{n}(T)x,x>

(2) In order to define now numbers <f(T)x,y><f(T)x,y>\in\mathbb{C}, for all vectors x,yHx,y\in H, we can use a polarization trick. Indeed, for any operator SB(H)S\in B(H) we have:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x><S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

By replacing yiyy\to iy, we have as well the following formula:

<S(x+iy),x+iy>=<Sx,x>+<Sy,y>i<Sx,y>+i<Sy,x><S(x+iy),x+iy>=<Sx,x>+<Sy,y>-i<Sx,y>+i<Sy,x>

By multiplying this latter formula by ii, we obtain the following formula:

i<S(x+iy),x+iy>=i<Sx,x>+i<Sy,y>+<Sx,y><Sy,x>i<S(x+iy),x+iy>=i<Sx,x>+i<Sy,y>+<Sx,y>-<Sy,x>

Now by summing this latter formula with the first one, we obtain:

<S(x+y),x+y>+i<S(x+iy),x+iy>\displaystyle<S(x+y),x+y>+i<S(x+iy),x+iy> =\displaystyle= (1+i)[<Sx,x>+<Sy,y>]\displaystyle(1+i)[<Sx,x>+<Sy,y>]
+\displaystyle+ 2<Sx,y>\displaystyle 2<Sx,y>

(3) But with this, we can now finish. Indeed, by combining (1,2), given a Borel function fL(σ(T))f\in L^{\infty}(\sigma(T)), we can define numbers <f(T)x,y><f(T)x,y>\in\mathbb{C} for any x,yHx,y\in H, and it is routine to check, by using approximation by continuous functions gnfg_{n}\to f as in (1), that we obtain in this way an operator f(T)B(H)f(T)\in B(H), having all the desired properties. ∎

The same comments as before apply. Theorem 3.22 is not exactly in final form, because it misses an important point, namely that our correspondence maps:

z¯T\bar{z}\to T^{*}

However, this is something non-trivial, and we will be back to this later. Observe however that Theorem 3.22 is fully powerful for the self-adjoint operators, T=TT=T^{*}, where the spectrum is real, and so where z=z¯z=\bar{z} on the spectrum. We will be back to this.


As another comment, the above result and its proof provide us with more than a Borel functional calculus, because what we got is a certain measure on the spectrum σ(T)\sigma(T), along with a functional calculus for the LL^{\infty} functions with respect to this measure. We will be back to this later, and for the moment we will only need Theorem 3.22 as formulated, with L(σ(T))L^{\infty}(\sigma(T)) standing, a bit abusively, for the Borel functions on σ(T)\sigma(T).

3d. Diagonalization

We can now diagonalize the normal operators. We will do this in 3 steps, first for the self-adjoint operators, then for the families of commuting self-adjoint operators, and finally for the general normal operators, by using a trick of the following type:

T=Re(T)+iIm(T)T=Re(T)+iIm(T)

The diagonalization in infinite dimensions is more tricky than in finite dimensions, and instead of writing a formula of type T=UDUT=UDU^{*}, with U,DB(H)U,D\in B(H) being respectively unitary and diagonal, we will express our operator as T=UMUT=U^{*}MU, with U:HKU:H\to K being a certain unitary, and with MB(K)M\in B(K) being a certain diagonal operator.


This is indeed how the spectral theorem is best formulated, in view of applications. In practice, the explicit construction of U,MU,M, which will be actually rather part of the proof, is also needed. For the self-adjoint operators, the statement and proof are as follows:

Theorem 3.23.

Any self-adjoint operator TB(H)T\in B(H) can be diagonalized,

T=UMfUT=U^{*}M_{f}U

with U:HL2(X)U:H\to L^{2}(X) being a unitary operator from HH to a certain L2L^{2} space associated to TT, with f:Xf:X\to\mathbb{R} being a certain function, once again associated to TT, and with

Mf(g)=fgM_{f}(g)=fg

being the usual multiplication operator by ff, on the Hilbert space L2(X)L^{2}(X).

Proof.

The construction of U,fU,f can be done in several steps, as follows:

(1) We first prove the result in the special case where our operator TT has a cyclic vector xHx\in H, with this meaning that the following holds:

span(Tkx|n)¯=H\overline{span\left(T^{k}x\Big{|}n\in\mathbb{N}\right)}=H

For this purpose, let us go back to the proof of Theorem 3.22. We will use the following formula from there, with μ\mu being the measure on X=σ(T)X=\sigma(T) associated to xx:

<g(T)x,x>=σ(T)g(z)dμ(z)<g(T)x,x>=\int_{\sigma(T)}g(z)d\mu(z)

Our claim is that we can define a unitary U:HL2(X)U:H\to L^{2}(X), first on the dense part spanned by the vectors TkxT^{k}x, by the following formula, and then by continuity:

U[g(T)x]=gU[g(T)x]=g

Indeed, the following computation shows that UU is well-defined, and isometric:

||g(T)x||2\displaystyle||g(T)x||^{2} =\displaystyle= <g(T)x,g(T)x>\displaystyle<g(T)x,g(T)x>
=\displaystyle= <g(T)g(T)x,x>\displaystyle<g(T)^{*}g(T)x,x>
=\displaystyle= <|g|2(T)x,x>\displaystyle<|g|^{2}(T)x,x>
=\displaystyle= σ(T)|g(z)|2dμ(z)\displaystyle\int_{\sigma(T)}|g(z)|^{2}d\mu(z)
=\displaystyle= ||g||22\displaystyle||g||_{2}^{2}

We can then extend UU by continuity into a unitary U:HL2(X)U:H\to L^{2}(X), as claimed. Now observe that we have the following formula:

UTUg\displaystyle UTU^{*}g =\displaystyle= U[Tg(T)x]\displaystyle U[Tg(T)x]
=\displaystyle= U[(zg)(T)x]\displaystyle U[(zg)(T)x]
=\displaystyle= zg\displaystyle zg

Thus our result is proved in the present case, with UU as above, and with f(z)=zf(z)=z.

(2) We discuss now the general case. Our first claim is that HH has a decomposition as follows, with each HiH_{i} being invariant under TT, and admitting a cyclic vector xix_{i}:

H=iHiH=\bigoplus_{i}H_{i}

Indeed, this is something elementary, the construction being by recurrence in finite dimensions, in the obvious way, and by using the Zorn lemma in general. Now with this decomposition in hand, we can make a direct sum of the diagonalizations obtained in (1), for each of the restrictions T|HiT_{|H_{i}}, and we obtain the formula in the statement. ∎

We have the following technical generalization of the above result:

Theorem 3.24.

Any family of commuting self-adjoint operators TiB(H)T_{i}\in B(H) can be jointly diagonalized,

Ti=UMfiUT_{i}=U^{*}M_{f_{i}}U

with U:HL2(X)U:H\to L^{2}(X) being a unitary operator from HH to a certain L2L^{2} space associated to {Ti}\{T_{i}\}, with fi:Xf_{i}:X\to\mathbb{R} being certain functions, once again associated to TiT_{i}, and with

Mfi(g)=figM_{f_{i}}(g)=f_{i}g

being the usual multiplication operator by fif_{i}, on the Hilbert space L2(X)L^{2}(X).

Proof.

This is similar to the proof of Theorem 3.23, by suitably modifying the measurable calculus formula, and the measure μ\mu itself, as to have this formula working for all the operators TiT_{i}. With this modification done, everything extends. ∎

In order to discuss now the case of the arbitrary normal operators, we will need:

Proposition 3.25.

Any operator TB(H)T\in B(H) can be written as

T=Re(T)+iIm(T)T=Re(T)+iIm(T)

with Re(T),Im(T)B(H)Re(T),Im(T)\in B(H) being self-adjoint, and this decomposition is unique.

Proof.

This is something elementary, the idea being as follows:

(1) As a first observation, in the case H=H=\mathbb{C} our operators are usual complex numbers, and the formula in the statement corresponds to the following basic fact:

z=Re(z)+iIm(z)z=Re(z)+iIm(z)

(2) In general now, we can use the same formulae for the real and imaginary part as in the complex number case, the decomposition formula being as follows:

T=T+T2+iTT2iT=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

To be more precise, both the operators on the right are self-adjoint, and the summing formula holds indeed, and so we have our decomposition result, as desired.

(3) Regarding now the uniqueness, by linearity it is enough to show that R+iS=0R+iS=0 with R,SR,S both self-adjoint implies R=S=0R=S=0. But this follows by applying the adjoint to R+iS=0R+iS=0, which gives RiS=0R-iS=0, and so R=S=0R=S=0, as desired. ∎

As a comment here, the above result is just the “tip of the iceberg”, in what regards decomposition results for the operators TB(H)T\in B(H), in analogy with decomposition results for the complex numbers zz\in\mathbb{C}. As a sample result here, improving Proposition 3.25, we can write any operator TB(H)T\in B(H) as a linear combination of 4 positive operators, by writing both Re(T),Im(T)Re(T),Im(T) as differences of positive operators. More on this later.


Good news, after all these preliminaries, that you enjoyed I hope, as much as I did, we can eventually discuss the case of arbitrary normal operators. We have here the following result, generalizing what we know from chapter 1 about the normal matrices:

Theorem 3.26.

Any normal operator TB(H)T\in B(H) can be diagonalized,

T=UMfUT=U^{*}M_{f}U

with U:HL2(X)U:H\to L^{2}(X) being a unitary operator from HH to a certain L2L^{2} space associated to TT, with f:Xf:X\to\mathbb{C} being a certain function, once again associated to TT, and with

Mf(g)=fgM_{f}(g)=fg

being the usual multiplication operator by ff, on the Hilbert space L2(X)L^{2}(X).

Proof.

This is our main diagonalization theorem, the idea being as follows:

(1) Consider the decomposition of TT into its real and imaginary parts, as constructed in the proof of Proposition 3.25, namely:

T=T+T2+iTT2iT=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

We know that the real and imaginary parts are self-adjoint operators. Now since TT was assumed to be normal, TT=TTTT^{*}=T^{*}T, these real and imaginary parts commute:

[T+T2,TT2i]=0\left[\frac{T+T^{*}}{2}\,,\,\frac{T-T^{*}}{2i}\right]=0

Thus Theorem 3.24 applies to these real and imaginary parts, and gives the result.

(2) Alternatively, we can use methods similar to those that we used in chapter 1, in order to deal with the usual normal matrices, involving the special relation between TT and the operator TTTT^{*}, which is self-adjoint. We will leave this as an instructive exercise. ∎

This was for our series of diagonalization theorems. There is of course one more result here, regarding the families of commuting normal operators, as follows:

Theorem 3.27.

Any family of commuting normal operators TiB(H)T_{i}\in B(H) can be jointly diagonalized,

Ti=UMfiUT_{i}=U^{*}M_{f_{i}}U

with U:HL2(X)U:H\to L^{2}(X) being a unitary operator from HH to a certain L2L^{2} space associated to {Ti}\{T_{i}\}, with fi:Xf_{i}:X\to\mathbb{C} being certain functions, once again associated to TiT_{i}, and with

Mfi(g)=figM_{f_{i}}(g)=f_{i}g

being the usual multiplication operator by fif_{i}, on the Hilbert space L2(X)L^{2}(X).

Proof.

This is similar to the proof of Theorem 3.24 and Theorem 3.26, by combining the arguments there. To be more precise, this follows as Theorem 3.24, by using the decomposition trick from the proof of Theorem 3.26. ∎

With the above diagonalization results in hand, we can now “fix” the continuous and measurable functional calculus theorems, with a key complement, as follows:

Theorem 3.28.

Given a normal operator TB(H)T\in B(H), the following hold, for both the functional calculus and the measurable calculus morphisms:

  1. (1)

    These morphisms are *-morphisms.

  2. (2)

    The function z¯\bar{z} gets mapped to TT^{*}.

  3. (3)

    The functions Re(z),Im(z)Re(z),Im(z) get mapped to Re(T),Im(T)Re(T),Im(T).

  4. (4)

    The function |z|2|z|^{2} gets mapped to TT=TTTT^{*}=T^{*}T.

  5. (5)

    If ff is real, then f(T)f(T) is self-adjoint.

Proof.

These assertions are more or less equivalent, with (1) being the main one, which obviously implies everything else. But this assertion (1) follows from the diagonalization result for normal operators, from Theorem 3.26. ∎

This was for the spectral theory of arbitrary and normal operators, or at least for the basics of this theory. As a conclusion here, our main results are as follows:

  1. (1)

    Regarding the arbitrary operators, the main results here, or rather the most advanced results, are the holomorphic calculus formula from Theorem 3.15, and the spectral radius estimate from Theorem 3.17.

  2. (2)

    For the self-adjoint operators, the main results are the spectral radius formula from Theorem 3.18, the measurable calculus formula from Theorem 3.22, and the diagonalization result from Theorem 3.23.

  3. (3)

    For general normal operators, the main results are the spectral radius formula from Theorem 3.18, the measurable calculus formula from Theorem 3.22, complemented by Theorem 3.28, and the diagonalization result in Theorem 3.26.

There are of course many other things that can be said about the spectral theory of the bounded operators TB(H)T\in B(H), and on that of the unbounded operators too. As a complement, we recommend any good operator theory book, with the comment however that there is a bewildering choice here, depending on taste, and on what exactly you want to do with your operators TB(H)T\in B(H). In what concerns us, who are rather into general quantum mechanics, but with our operators being bounded, good choices are the functional analysis book of Lax [lax], or the operator algebra book of Blackadar [bla].

3e. Exercises

The main theoretical notion introduced in this chapter was that of the spectrum of an operator, and as a first exercise here, we have:

Exercise 3.29.

Prove that for the usual matrices A,BMN()A,B\in M_{N}(\mathbb{C}) we have

σ+(AB)=σ+(BA)\sigma^{+}(AB)=\sigma^{+}(BA)

where σ+\sigma^{+} denotes the set of eigenvalues, taken with multiplicities.

As a remark, we have seen in the above that σ(AB)=σ(BA)\sigma(AB)=\sigma(BA) holds outside {0}\{0\}, and the equality on {0}\{0\} holds as well, because ABAB is invertible if and only if BABA is invertible. However, in what regards the eigenvalues taken with multiplicities, things are more tricky here, and the answer should be somewhere inside your linear algebra knowledge.

Exercise 3.30.

Clarify, with examples and counterexamples, the relation between the eigenvalues of an operator TB(H)T\in B(H), and its spectrum σ(T)\sigma(T)\subset\mathbb{C}.

Here, as usual, the counterexamples could only come from the shift operator SS, on the space H=l2()H=l^{2}(\mathbb{N}). As a bonus exercise here, try computing the spectrum of SS.

Exercise 3.31.

Draw the picture of the following function, and of its inverse,

f(z)=z+irzirf(z)=\frac{z+ir}{z-ir}

with rr\in\mathbb{R}, and prove that for r>>0r>>0 and T=TT=T^{*}, the element f(T)f(T) is well-defined.

This is something that we used in the above, when computing spectra of self-adjoints and unitaries, and the problem is that of working out all the details.

Exercise 3.32.

Comment on the spectral radius theorem, stating that for a normal operator, TT=TTTT^{*}=T^{*}T, the spectral radius is equal to the norm,

ρ(T)=||T||\rho(T)=||T||

with examples and counterexamples, and simpler proofs of well, in various particular cases of interest, such as the finite dimensional one.

This is of course something a bit philosophical, but the spectral radius theorem being our key technical result so far, some further thinking on it is definitely a good thing.

Exercise 3.33.

Develop a theory of *-algebras AA for which the quantity

||a||=sup{λ|aaλA1}||a||=\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}aa^{*}-\lambda\notin A^{-1}\right\}}

defines a norm, for the elements aAa\in A.

As pointed out in the above, the spectral radius formula shows that for A=B(H)A=B(H) the norm is given by the above formula, and so there should be such a theory of “good” *-algebras, with A=B(H)A=B(H) as a main example. However, this is tricky.

Exercise 3.34.

Find and write down a proof for the spectral theorem for normal operators in the spirit of the proof for normal matrices from chapter 1, and vice versa.

To be more precise, the problem is that the proof of the spectral theorem for the usual matrices, from chapter 1, was using a certain kind of trick, while the proof of the spectral theorem for the arbitrary operators, given in this chapter, was using some other kind of trick. Thus, for fully understanding all this, working out more proofs, both for the usual matrices and for the arbitrary operators, is a useful thing.

Exercise 3.35.

Find and write down an enhancement of the proof given above for the spectral theorem, as for z¯T\bar{z}\to T^{*} to appear way before the end of the proof.

This is something a bit philosophical, and check here first the various comments made above, and maybe work out this as well in parallel with the previous exercise.

Chapter 4 Compact operators

4a. Polar decomposition

We have seen so far the basic theory of bounded operators, in the arbitrary, normal and self-adjoint cases, and in a few other cases of interest. In this chapter we discuss a number of more specialized questions, for the most dealing with the compact operators, which are particularly close, conceptually speaking, to the usual complex matrices.


We have in fact considerably many interesting things that we can talk about, in this final chapter on operator theory, and our choices will be as follows:


(1) Before anything, at the general level, we would like to understand the matrix and operator theory analogues of the various things that we know about the complex numbers zM1()z\in M_{1}(\mathbb{C}), such as zz¯=|z|2z\bar{z}=|z|^{2}, or z=reitz=re^{it} and so on. We will discuss this first.


(2) Then, motivated by advanced linear algebra, we will go on a lengthy discussion on the algebra of compact operators K(H)B(H)K(H)\subset B(H), which for many advanced operator theory purposes is the correct generalization of the matrix algebra MN()M_{N}(\mathbb{C}).


(3) Our discussion on the compact operators will feature as well some more specialized types of operators, F(H)B1(H)B2(H)K(H)F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H), with F(H)F(H) being the finite rank ones, B1(H)B_{1}(H) being the trace class ones, and B2(H)B_{2}(H) being the Hilbert-Schmidt ones.


And that is pretty much it, all basic things, that must be known. Of course this will be just the tip of the iceberg, and more of an introduction to modern operator theory.


Getting started now, we would first like to systematically develop the theory of positive operators, and then establish polar decomposition results for the operators TB(H)T\in B(H). We first have the following result, improving our knowledge from chapter 2:

Theorem 4.1.

For an operator TB(H)T\in B(H), the following are equivalent:

  1. (1)

    <Tx,x>0<Tx,x>\geq 0, for any xHx\in H.

  2. (2)

    TT is normal, and σ(T)[0,)\sigma(T)\subset[0,\infty).

  3. (3)

    T=S2T=S^{2}, for some SB(H)S\in B(H) satisfying S=SS=S^{*}.

  4. (4)

    T=RRT=R^{*}R, for some RB(H)R\in B(H).

If these conditions are satisfied, we call TT positive, and write T0T\geq 0.

Proof.

We have already seen some implications in chapter 2, but the best is to forget the few partial results that we know, and prove everything, as follows:

(1)(2)(1)\implies(2) Assuming <Tx,x>0<Tx,x>\geq 0, with S=TTS=T-T^{*} we have:

<Sx,x>\displaystyle<Sx,x> =\displaystyle= <Tx,x><Tx,x>\displaystyle<Tx,x>-<T^{*}x,x>
=\displaystyle= <Tx,x><x,Tx>\displaystyle<Tx,x>-<x,Tx>
=\displaystyle= <Tx,x><Tx,x>¯\displaystyle<Tx,x>-\overline{<Tx,x>}
=\displaystyle= 0\displaystyle 0

The next step is to use a polarization trick, as follows:

<Sx,y>\displaystyle<Sx,y> =\displaystyle= <S(x+y),x+y><Sx,x><Sy,y><Sy,x>\displaystyle<S(x+y),x+y>-<Sx,x>-<Sy,y>-<Sy,x>
=\displaystyle= <Sy,x>\displaystyle-<Sy,x>
=\displaystyle= <y,Sx>\displaystyle<y,Sx>
=\displaystyle= <Sx,y>¯\displaystyle\overline{<Sx,y>}

Thus we must have <Sx,y><Sx,y>\in\mathbb{R}, and with yiyy\to iy we obtain <Sx,y>i<Sx,y>\in i\mathbb{R} too, and so <Sx,y>=0<Sx,y>=0. Thus S=0S=0, which gives T=TT=T^{*}. Now since TT is self-adjoint, it is normal as claimed. Moreover, by self-adjointness, we have:

σ(T)\sigma(T)\subset\mathbb{R}

In order to prove now that we have indeed σ(T)[0,)\sigma(T)\subset[0,\infty), as claimed, we must invert T+λT+\lambda, for any λ>0\lambda>0. For this purpose, observe that we have:

<(T+λ)x,x>\displaystyle<(T+\lambda)x,x> =\displaystyle= <Tx,x>+<λx,x>\displaystyle<Tx,x>+<\lambda x,x>
\displaystyle\geq <λx,x>\displaystyle<\lambda x,x>
=\displaystyle= λ||x||2\displaystyle\lambda||x||^{2}

But this shows that T+λT+\lambda is injective. In order to prove now the surjectivity, and the boundedness of the inverse, observe first that we have:

Im(T+λ)\displaystyle Im(T+\lambda)^{\perp} =\displaystyle= ker(T+λ)\displaystyle\ker(T+\lambda)^{*}
=\displaystyle= ker(T+λ)\displaystyle\ker(T+\lambda)
=\displaystyle= {0}\displaystyle\{0\}

Thus Im(T+λ)Im(T+\lambda) is dense. On the other hand, observe that we have:

||(T+λ)x||2\displaystyle||(T+\lambda)x||^{2} =\displaystyle= <Tx+λx,Tx+λx>\displaystyle<Tx+\lambda x,Tx+\lambda x>
=\displaystyle= ||Tx||2+2λ<Tx,x>+λ2||x||2\displaystyle||Tx||^{2}+2\lambda<Tx,x>+\lambda^{2}||x||^{2}
\displaystyle\geq λ2||x||2\displaystyle\lambda^{2}||x||^{2}

Thus for any vector in the image yIm(T+λ)y\in Im(T+\lambda) we have:

||y||λ||(T+λ)1y||||y||\geq\lambda\big{|}\big{|}(T+\lambda)^{-1}y\big{|}\big{|}

As a conclusion to what we have so far, T+λT+\lambda is bijective and invertible as a bounded operator from HH onto its image, with the following norm bound:

||(T+λ)1||λ1||(T+\lambda)^{-1}||\leq\lambda^{-1}

But this shows that Im(T+λ)Im(T+\lambda) is complete, hence closed, and since we already knew that Im(T+λ)Im(T+\lambda) is dense, our operator T+λT+\lambda is surjective, and we are done.

(2)(3)(2)\implies(3) Since TT is normal, and with spectrum contained in [0,)[0,\infty), we can use the continuous functional calculus formula for the normal operators from chapter 3, with the function f(x)=xf(x)=\sqrt{x}, as to construct a square root S=TS=\sqrt{T}.

(3)(4)(3)\implies(4) This is trivial, because we can set R=SR=S.

(4)(1)(4)\implies(1) This is clear, because we have the following computation:

<RRx,x>=<Rx,Rx>=||Rx||2<R^{*}Rx,x>=<Rx,Rx>=||Rx||^{2}

Thus, we have the equivalences in the statement. ∎

In analogy with what happens in finite dimensions, where among the positive matrices A0A\geq 0 we have the strictly positive ones, A>0A>0, given by the fact that the eigenvalues are strictly positive, we have as well a “strict” version of the above result, as follows:

Theorem 4.2.

For an operator TB(H)T\in B(H), the following are equivalent:

  1. (1)

    TT is positive and invertible.

  2. (2)

    TT is normal, and σ(T)(0,)\sigma(T)\subset(0,\infty).

  3. (3)

    T=S2T=S^{2}, for some SB(H)S\in B(H) invertible, satisfying S=SS=S^{*}.

  4. (4)

    T=RRT=R^{*}R, for some RB(H)R\in B(H) invertible.

If these conditions are satisfied, we call TT strictly positive, and write T>0T>0.

Proof.

Our claim is that the above conditions (1-4) are precisely the conditions (1-4) in Theorem 4.1, with the assumption “TT is invertible” added. Indeed:

(1) This is clear by definition.

(2) In the context of Theorem 4.1 (2), namely when TT is normal, and σ(T)[0,)\sigma(T)\subset[0,\infty), the invertibility of TT, which means 0σ(T)0\notin\sigma(T), gives σ(T)(0,)\sigma(T)\subset(0,\infty), as desired.

(3) In the context of Theorem 4.1 (3), namely when T=S2T=S^{2}, with S=SS=S^{*}, by using the basic properties of the functional calculus for normal operators, the invertibility of TT is equivalent to the invertibility of its square root S=TS=\sqrt{T}, as desired.

(4) In the context of Theorem 4.1 (4), namely when T=RRT=RR^{*}, the invertibility of TT is equivalent to the invertibility of RR. This can be either checked directly, or deduced via the equivalence (3)(4)(3)\iff(4) from Theorem 4.1, by using the above argument (3). ∎

As a subtlety now, we have the following complement to the above result:

Proposition 4.3.

For a strictly positive operator, T>0T>0, we have

<Tx,x>>0,x0<Tx,x>>0\quad,\quad\forall x\neq 0

but the converse of this fact is not true, unless we are in finite dimensions.

Proof.

We have several things to be proved, the idea being as follows:

(1) Regarding the main assertion, the inequality can be deduced as follows, by using the fact that the operator S=TS=\sqrt{T} is invertible, and in particular injective:

<Tx,x>\displaystyle<Tx,x> =\displaystyle= <S2x,x>\displaystyle<S^{2}x,x>
=\displaystyle= <Sx,Sx>\displaystyle<Sx,S^{*}x>
=\displaystyle= <Sx,Sx>\displaystyle<Sx,Sx>
=\displaystyle= ||Sx||2\displaystyle||Sx||^{2}
>\displaystyle> 0\displaystyle 0

(2) In finite dimensions, assuming <Tx,x>>0<Tx,x>>0 for any x0x\neq 0, we know from Theorem 4.1 that we have T0T\geq 0. Thus we have σ(T)[0,)\sigma(T)\subset[0,\infty), and assuming by contradiction 0σ(T)0\in\sigma(T), we obtain that TT has λ=0\lambda=0 as eigenvalue, and the corresponding eigenvector x0x\neq 0 has the property <Tx,x>=0<Tx,x>=0, contradiction. Thus T>0T>0, as claimed.

(3) Regarding now the counterexample, consider the following operator on l2()l^{2}(\mathbb{N}):

T=(11213)T=\begin{pmatrix}1\\ &\frac{1}{2}\\ &&\frac{1}{3}\\ &&&\ddots\end{pmatrix}

This operator TT is well-defined and bounded, and we have <Tx,x>>0<Tx,x>>0 for any x0x\neq 0. However TT is not invertible, and so the converse does not hold, as stated. ∎

With this done, let us discuss now some decomposition results for the bounded operators TB(H)T\in B(H). We know that any zz\in\mathbb{C} can be written as follows, with a,ba,b\in\mathbb{R}:

z=a+ibz=a+ib

Also, we know that both the real and imaginary parts a,ba,b\in\mathbb{R}, and more generally any real number cc\in\mathbb{R}, can be written as follows, with r,s0r,s\geq 0:

c=rsc=r-s

Here are the operator theoretic generalizations of these results:

Proposition 4.4.

Given an operator TB(H)T\in B(H), the following happen:

  1. (1)

    We can write T=A+iBT=A+iB, with A,BB(H)A,B\in B(H) being self-adjoint.

  2. (2)

    When T=TT=T^{*}, we can write T=RST=R-S, with R,SB(H)R,S\in B(H) being positive.

  3. (3)

    Thus, we can write any TT as a linear combination of 44 positive elements.

Proof.

All this follows from basic spectral theory, as follows:

(1) This is something that we have already met in chapter 3, when proving the spectral theorem in its general form, the decomposition formula being as follows:

T=T+T2+iTT2iT=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

(2) This follows from the measurable functional calculus. Indeed, assuming T=TT=T^{*} we have σ(T)\sigma(T)\subset\mathbb{R}, so we can use the following decomposition formula on \mathbb{R}:

1=χ[0,)+χ(,0)1=\chi_{[0,\infty)}+\chi_{(-\infty,0)}

To be more precise, let us multiply by zz, and rewrite this formula as follows:

z=χ[0,)zχ(,0)(z)z=\chi_{[0,\infty)}z-\chi_{(-\infty,0)}(-z)

Now by applying these measurable functions to TT, we obtain as formula as follows, with both the operators T+,TB(H)T_{+},T_{-}\in B(H) being positive, as desired:

T=T+TT=T_{+}-T_{-}

(3) This follows indeed by combining the results in (1) and (2) above. ∎

Going ahead with our decomposition results, another basic thing that we know about complex numbers is that any zz\in\mathbb{C} appears as a real multiple of a unitary:

z=reitz=re^{it}

Finding the correct operator theoretic analogue of this is quite tricky, and this even for the usual matrices AMN()A\in M_{N}(\mathbb{C}). As a basic result here, we have:

Proposition 4.5.

Given an operator TB(H)T\in B(H), the following happen:

  1. (1)

    When T=TT=T^{*} and ||T||1||T||\leq 1, we can write TT as an average of 22 unitaries:

    T=U+V2T=\frac{U+V}{2}
  2. (2)

    In the general T=TT=T^{*} case, we can write TT as a rescaled sum of unitaries:

    T=λ(U+V)T=\lambda(U+V)
  3. (3)

    Thus, in general, we can write TT as a rescaled sum of 44 unitaries.

Proof.

This follows from the results that we have, as follows:

(1) Assuming T=TT=T^{*} and ||T||1||T||\leq 1 we have 1T201-T^{2}\geq 0, and the decomposition that we are looking for is as follows, with both the components being unitaries:

T=T+i1T22+Ti1T22T=\frac{T+i\sqrt{1-T^{2}}}{2}+\frac{T-i\sqrt{1-T^{2}}}{2}

To be more precise, the square root can be extracted as in Theorem 4.1 (3), and the check of the unitarity of the components goes as follows:

(T+i1T2)(Ti1T2)\displaystyle(T+i\sqrt{1-T^{2}})(T-i\sqrt{1-T^{2}}) =\displaystyle= T2+(1T2)\displaystyle T^{2}+(1-T^{2})
=\displaystyle= 1\displaystyle 1

(2) This simply follows by applying (1) to the operator T/||T||T/||T||.

(3) Assuming first ||T||1||T||\leq 1, we know from Proposition 4.4 (1) that we can write T=A+iBT=A+iB, with A,BA,B being self-adjoint, and satisfying ||A||,||B||1||A||,||B||\leq 1. Now by applying (1) to both AA and BB, we obtain a decomposition of TT as follows:

T=U+V+W+X2T=\frac{U+V+W+X}{2}

In general, we can apply this to the operator T/||T||T/||T||, and we obtain the result. ∎

All this gets us into the multiplicative theory of the complex numbers, that we will attempt to generalize now. As a first construction, that we would like to generalize to the bounded operator setting, we have the construction of the modulus, as follows:

|z|=zz¯|z|=\sqrt{z\bar{z}}

The point now is that we can indeed generalize this construction, as follows:

Proposition 4.6.

Given an operator TB(H)T\in B(H), we can construct a positive operator |T|B(H)|T|\in B(H) as follows, by using the fact that TTT^{*}T is positive:

|T|=TT|T|=\sqrt{T^{*}T}

The square of this operator is then |T|2=TT|T|^{2}=T^{*}T. In the case H=H=\mathbb{C}, we obtain in this way the usual absolute value of the complex numbers:

|z|=zz¯|z|=\sqrt{z\bar{z}}

More generally, in the case where H=NH=\mathbb{C}^{N} is finite dimensional, we obtain in this way the usual moduli of the complex matrices AMN()A\in M_{N}(\mathbb{C}).

Proof.

We have several things to be proved, the idea being as follows:

(1) The first assertion follows from Theorem 4.1. Indeed, according to (4) there the operator TTT^{*}T is indeed positive, and then according to (2) there we can extract the square root of this latter positive operator, by applying to it the function z\sqrt{z}.

(2) By functional calculus we have then |T|2=TT|T|^{2}=T^{*}T, as desired.

(3) In the case H=H=\mathbb{C}, we obtain indeed the absolute value of complex numbers.

(4) In the case where the space HH is finite dimensional, H=NH=\mathbb{C}^{N}, we obtain indeed the usual moduli of the complex matrices AMN()A\in M_{N}(\mathbb{C}). ∎

As a comment here, it is possible to talk as well about TT\sqrt{TT^{*}}, which is in general different from TT\sqrt{T^{*}T}. Note that when TT is normal, no issue, because we have:

TT=TTTT=TTTT^{*}=T^{*}T\implies\sqrt{TT^{*}}=\sqrt{T^{*}T}

Regarding now the polar decomposition formula, let us start with a weak version of this statement, regarding the invertible operators, as follows:

Theorem 4.7.

We have the polar decomposition formula

T=UTTT=U\sqrt{T^{*}T}

with UU being a unitary, for any TB(H)T\in B(H) invertible.

Proof.

According to our definition of the modulus, |T|=TT|T|=\sqrt{T^{*}T}, we have:

<|T|x,|T|y>\displaystyle<|T|x,|T|y> =\displaystyle= <x,|T|2y>\displaystyle<x,|T|^{2}y>
=\displaystyle= <x,TTy>\displaystyle<x,T^{*}Ty>
=\displaystyle= <Tx,Ty>\displaystyle<Tx,Ty>

Thus we can define a unitary operator UB(H)U\in B(H) by the following formula:

U(|T|x)=TxU(|T|x)=Tx

But this formula shows that we have T=U|T|T=U|T|, as desired. ∎

Observe that we have uniqueness in the above result, in what regards the choice of the unitary UB(H)U\in B(H), due to the fact that we can write this unitary as follows:

U=T(TT)1U=T(\sqrt{T^{*}T})^{-1}

More generally now, we have the following result:

Theorem 4.8.

We have the polar decomposition formula

T=UTTT=U\sqrt{T^{*}T}

with UU being a partial isometry, for any TB(H)T\in B(H).

Proof.

As before, we have the following equality, for any two vectors x,yHx,y\in H:

<|T|x,|T|y>=<Tx,Ty><|T|x,|T|y>=<Tx,Ty>

We conclude that the following linear application is well-defined, and isometric:

U:Im|T|Im(T),|T|xTxU:Im|T|\to Im(T)\quad,\quad|T|x\to Tx

Now by continuity we can extend this isometry UU into an isometry between certain Hilbert subspaces of HH, as follows:

U:Im|T|¯Im(T)¯,|T|xTxU:\overline{Im|T|}\to\overline{Im(T)}\quad,\quad|T|x\to Tx

Moreover, we can further extend UU into a partial isometry U:HHU:H\to H, by setting Ux=0Ux=0, for any xIm|T|¯x\in\overline{Im|T|}^{\perp}, and with this convention, the result follows. ∎

4b. Compact operators

We have seen so far the basic theory of the bounded operators, in the arbitrary, normal and self-adjoint cases, and in a few other cases of interest. We will keep building on this, with a number of more specialized results, regarding the finite rank operators and compact operators, and other special classes of related operators, namely the trace class operators, and the Hilbert-Schmidt operators. Let us start with a basic definition, as follows:

Definition 4.9.

An operator TB(H)T\in B(H) is said to be of finite rank if its image

Im(T)HIm(T)\subset H

is finite dimensional. The set of such operators is denoted F(H)F(H).

There are many interesting examples of finite rank operators, the most basic ones being the finite rank projections, on the finite dimensional subspaces KHK\subset H. Observe also that in the case where HH is finite dimensional, any operator TB(H)T\in B(H) is automatically of finite rank. In general, this is of course wrong, but we have the following result:

Proposition 4.10.

The set of finite rank operators

F(H)B(H)F(H)\subset B(H)

is a two-sided *-ideal.

Proof.

We have several assertions to be proved, the idea being as follows:

(1) It is clear from definitions that F(H)F(H) is indeed a vector space, with this due to the following formulae, valid for any S,TB(H)S,T\in B(H), which are both clear:

dim(Im(S+T))dim(Im(S))+dim(Im(T))\dim(Im(S+T))\leq\dim(Im(S))+\dim(Im(T))
dim(Im(λT))=dim(Im(T))\dim(Im(\lambda T))=\dim(Im(T))

(2) Let us prove now that F(H)F(H) is stable under *. Given TF(H)T\in F(H), we can regard it as an invertible operator between finite dimensional Hilbert spaces, as follows:

T:(kerT)Im(T)T:(\ker T)^{\perp}\to Im(T)

We conclude from this that we have the following dimension equality:

dim((kerT))=dim(Im(T))\dim((\ker T)^{\perp})=\dim(Im(T))

Our claim now, in relation with our problem, is that we have equalities as follows:

dim(Im(T))\displaystyle\dim(Im(T^{*})) =\displaystyle= dim(Im(T)¯)\displaystyle\dim(\overline{Im(T^{*})})
=\displaystyle= dim((kerT))\displaystyle\dim((\ker T)^{\perp})
=\displaystyle= dim(Im(T))\displaystyle\dim(Im(T))

Indeed, the third equality is the one above, and the second equality is something that we know too, from chapter 2. Now by combining these two equalities we deduce that Im(T)Im(T^{*}) is finite dimensional, and so the first equality holds as well. Thus, our equalities are proved, and this shows that we have TF(H)T^{*}\in F(H), as desired.

(3) Finally, regarding the ideal property, this follows from the following two formulae, valid for any S,TB(H)S,T\in B(H), which are once again clear from definitions:

dim(Im(ST))dim(Im(T))\dim(Im(ST))\leq\dim(Im(T))
dim(Im(TS))dim(Im(T))\dim(Im(TS))\leq\dim(Im(T))

Thus, we are led to the conclusion in the statement. ∎

Let us discuss now the compact operators, which will be the main topic of discussion, for the present chapter. These are best introduced as follows:

Definition 4.11.

An operator TB(H)T\in B(H) is said to be compact if the closed set

T(B1)¯H\overline{T(B_{1})}\subset H

is compact, where B1HB_{1}\subset H is the unit ball. The set of such operators is denoted K(H)K(H).

Equivalently, an operator TB(H)T\in B(H) is compact when for any sequence {xn}B1\{x_{n}\}\subset B_{1}, or more generally for any bounded sequence {xn}H\{x_{n}\}\subset H, the sequence {T(xn)}\{T(x_{n})\} has a convergence subsequence. We will see later some further criteria of compactness.


In finite dimensions any operator is compact. In general, as a first observation, any finite rank operator is compact. We have in fact the following result:

Proposition 4.12.

Any finite rank operator is compact,

F(H)K(H)F(H)\subset K(H)

and the finite rank operators are dense inside the compact operators.

Proof.

The first assertion is clear, because if Im(T)Im(T) is finite dimensional, then the following subset is closed and bounded, and so it is compact:

T(B1)¯Im(T)\overline{T(B_{1})}\subset Im(T)

Regarding the second assertion, let us pick a compact operator TK(H)T\in K(H), and a number ε>0\varepsilon>0. By compactness of TT we can find a finite set SB1S\subset B_{1} such that:

T(B1)xSBε(Tx)T(B_{1})\subset\bigcup_{x\in S}B_{\varepsilon}(Tx)

Consider now the orthogonal projection PP onto the following finite dimensional space:

E=span(Tx|xS)E=span\left(Tx\Big{|}x\in S\right)

Since the set SS is finite, this space EE is finite dimensional, and so PP is of finite rank, PF(H)P\in F(H). Now observe that for any norm one yHy\in H and any xSx\in S we have:

||TyTx||2\displaystyle||Ty-Tx||^{2} =\displaystyle= ||TyPTx||2\displaystyle||Ty-PTx||^{2}
=\displaystyle= ||TyPTy+PTyPTx||2\displaystyle||Ty-PTy+PTy-PTx||^{2}
=\displaystyle= ||TyPTy||2+||PTxPTy||2\displaystyle||Ty-PTy||^{2}+||PTx-PTy||^{2}

Now by picking xSx\in S such that the ball Bε(Tx)B_{\varepsilon}(Tx) covers the point TyTy, we conclude from this that we have the following estimate:

||TyPTy||||TyTx||ε||Ty-PTy||\leq||Ty-Tx||\leq\varepsilon

Thus we have ||TPT||ε||T-PT||\leq\varepsilon, which gives the density result. ∎

Quite remarkably, the set of compact operators is closed, and we have:

Theorem 4.13.

The set of compact operators

K(H)B(H)K(H)\subset B(H)

is a closed two-sided *-ideal.

Proof.

We have several assertions here, the idea being as follows:

(1) It is clear from definitions that K(H)K(H) is indeed a vector space, with this due to the following formulae, valid for any S,TB(H)S,T\in B(H), which are both clear:

(S+T)(B1)S(B1)+T(B1)(S+T)(B_{1})\subset S(B_{1})+T(B_{1})
(λT)(B1)=|λ|T(B1)(\lambda T)(B_{1})=|\lambda|\cdot T(B_{1})

(2) In order to prove now that K(H)K(H) is closed, assume that a sequence TnK(H)T_{n}\in K(H) converges to TB(H)T\in B(H). Given ε>0\varepsilon>0, let us pick NN\in\mathbb{N} such that:

||TTN||ε||T-T_{N}||\leq\varepsilon

By compactness of TNT_{N} we can find a finite set SB1S\subset B_{1} such that:

TN(B1)xSBε(TNx)T_{N}(B_{1})\subset\bigcup_{x\in S}B_{\varepsilon}(T_{N}x)

We conclude that for any yB1y\in B_{1} there exists xSx\in S such that:

||TyTx||\displaystyle||Ty-Tx|| \displaystyle\leq ||TyTNy||+||TNyTNx||+||TNxTx||\displaystyle||Ty-T_{N}y||+||T_{N}y-T_{N}x||+||T_{N}x-Tx||
\displaystyle\leq ε+ε+ε\displaystyle\varepsilon+\varepsilon+\varepsilon
=\displaystyle= 3ε\displaystyle 3\varepsilon

Thus, we have an inclusion as follows, with SB1S\subset B_{1} being finite:

T(B1)xSB3ε(Tx)T(B_{1})\subset\bigcup_{x\in S}B_{3\varepsilon}(Tx)

But this shows that our limiting operator TT is compact, as desired.

(3) Regarding the fact that K(H)K(H) is stable under involution, this follows from Proposition 4.10, Proposition 4.12 and (2). Indeed, by using Proposition 4.12, given TK(H)T\in K(H) we can write it as a limit of finite rank operators, as follows:

T=limnTnT=\lim_{n\to\infty}T_{n}

Now by applying the adjoint, we obtain that we have as well:

T=limnTnT^{*}=\lim_{n\to\infty}T_{n}^{*}

We know from Proposition 4.10 that the operators TnT_{n}^{*} are of finite rank, and so compact by Proposition 4.12, and by using (2) we obtain that TT^{*} is compact too, as desired.

(4) Finally, regarding the ideal property, this follows from the following two formulae, valid for any S,TB(H)S,T\in B(H), which are once again clear from definitions:

(ST)(B1)=S(T(B1))(ST)(B_{1})=S(T(B_{1}))
(TS)(B1)||S||T(B1)(TS)(B_{1})\subset||S||\cdot T(B_{1})

Thus, we are led to the conclusion in the statement. ∎

Here is now a second key result regarding the compact operators:

Theorem 4.14.

A bounded operator TB(H)T\in B(H) is compact precisely when

Ten0Te_{n}\to 0

for any orthonormal system {en}H\{e_{n}\}\subset H.

Proof.

We have two implications to be proved, the idea being as follows:

\implies” Assume that TT is compact. By contradiction, assume Ten0Te_{n}\not\to 0. This means that there exists ε>0\varepsilon>0 and a subsequence satisfying ||Tenk||>ε||Te_{n_{k}}||>\varepsilon, and by replacing {en}\{e_{n}\} with this subsequence, we can assume that the following holds, with ε>0\varepsilon>0:

||Ten||>ε||Te_{n}||>\varepsilon

Since TT was assumed to be compact, and the sequence {en}\{e_{n}\} is bounded, a certain subsequence {Tenk}\{Te_{n_{k}}\} must converge. Thus, by replacing once again {en}\{e_{n}\} with a subsequence, we can assume that the following holds, with x0x\neq 0:

TenxTe_{n}\to x

But this is a contradiction, because we obtain in this way:

<x,x>\displaystyle<x,x> =\displaystyle= limn<Ten,x>\displaystyle\lim_{n\to\infty}<Te_{n},x>
=\displaystyle= limn<en,Tx>\displaystyle\lim_{n\to\infty}<e_{n},T^{*}x>
=\displaystyle= 0\displaystyle 0

Thus our assumption Ten0Te_{n}\not\to 0 was wrong, and we obtain the result.

\Longleftarrow” Assume Ten0Te_{n}\to 0, for any orthonormal system {en}H\{e_{n}\}\subset H. In order to prove that TT is compact, we use the various results established above, which show that this is the same as proving that TT is in the closure of the space of finite rank operators:

TF(H)¯T\in\overline{F(H)}

We do this by contradiction. So, assume that the above is wrong, and so that there exists ε>0\varepsilon>0 such that the following holds:

SF(H)||TS||>εS\in F(H)\implies||T-S||>\varepsilon

As a first observation, by using S=0S=0 we obtain ||T||>ε||T||>\varepsilon. Thus, we can find a norm one vector e1He_{1}\in H such that the following holds:

||Te1||>ε||Te_{1}||>\varepsilon

Our claim, which will bring the desired contradiction, is that we can construct by recurrence vectors e1,,ene_{1},\ldots,e_{n} such that the following holds, for any ii:

||Tei||>ε||Te_{i}||>\varepsilon

Indeed, assume that we have constructed such vectors e1,,ene_{1},\ldots,e_{n}. Let EHE\subset H be the linear space spanned by these vectors, and let us set:

P=Proj(E)P=Proj(E)

Since the operator TPTP has finite rank, our assumption above shows that we have:

||TTP||>ε||T-TP||>\varepsilon

Thus, we can find a vector xHx\in H such that the following holds:

||(TTP)x||>ε||(T-TP)x||>\varepsilon

We have then xEx\not\in E, and so we can consider the following nonzero vector:

y=(1P)xy=(1-P)x

With this nonzero vector yy constructed, in this way, now let us set:

en+1=y||y||e_{n+1}=\frac{y}{||y||}

This vector en+1e_{n+1} is then orthogonal to EE, has norm one, and satisfies:

||Ten+1||||y||1εε||Te_{n+1}||\geq||y||^{-1}\varepsilon\geq\varepsilon

Thus we are done with our construction by recurrence, and this contradicts our assumption that Ten0Te_{n}\to 0, for any orthonormal system {en}H\{e_{n}\}\subset H, as desired. ∎

Summarizing, we have so far a number of results regarding the compact operators, in analogy with what we know about the usual complex matrices. Let us discuss now the spectral theory of the compact operators. We first have the following result:

Proposition 4.15.

Assuming that TB(H)T\in B(H), with dimH=\dim H=\infty, is compact and self-adjoint, the following happen:

  1. (1)

    The eigenvalues of TT form a sequence λn0\lambda_{n}\to 0.

  2. (2)

    All eigenvalues λn0\lambda_{n}\neq 0 have finite multiplicity.

Proof.

We prove both the assertions at the same time. For this purpose, we fix a number ε>0\varepsilon>0, we consider all the eigenvalues satisfying |λ|ε|\lambda|\geq\varepsilon, and for each such eigenvalue we consider the corresponding eigenspace EλHE_{\lambda}\subset H. Let us set:

E=span(Eλ||λ|ε)E=span\left(E_{\lambda}\,\Big{|}\,|\lambda|\geq\varepsilon\right)

Our claim, which will prove both (1) and (2), is that this space EE is finite dimensional. In now to prove now this claim, we can proceed as follows:

(1) We know that we have EIm(T)E\subset Im(T). Our claim is that we have:

E¯Im(T)\bar{E}\subset Im(T)

Indeed, assume that we have a sequence gnEg_{n}\in E which converges, gngE¯g_{n}\to g\in\bar{E}. Let us write gn=Tfng_{n}=Tf_{n}, with fnHf_{n}\in H. By definition of EE, the following condition is satisfied:

hE||Th||ε||h||h\in E\implies||Th||\geq\varepsilon||h||

Now since the sequence {gn}\{g_{n}\} is Cauchy we obtain from this that the sequence {fn}\{f_{n}\} is Cauchy as well, and with fnff_{n}\to f we have TfnTfTf_{n}\to Tf, as desired.

(2) Consider now the projection PB(H)P\in B(H) onto the closure E¯\bar{E} of the above vector space EE. The composition PTPT is then as follows, surjective on its target:

PT:HE¯PT:H\to\bar{E}

On the other hand since TT is compact so must be PTPT, and if follows from this that the space E¯\bar{E} is finite dimensional. Thus EE itself must be finite dimensional too, and as explained in the beginning of the proof, this gives (1) and (2), as desired. ∎

In order to construct now eigenvalues, we will need:

Proposition 4.16.

If TT is compact and self-adjoint, one of the numbers

||T||,||T||||T||\ ,\ -||T||

must be an eigenvalue of TT.

Proof.

We know from the spectral theory of the self-adjoint operators that the spectral radius ||T||||T|| of our operator TT is attained, and so one of the numbers ||T||,||T||||T||,-||T|| must be in the spectrum. In order to prove now that one of these numbers must actually appear as an eigenvalue, we must use the compactness of TT, as follows:

(1) First, we can assume ||T||=1||T||=1. By functional calculus this implies ||T3||=1||T^{3}||=1 too, and so we can find a sequence of norm one vectors xnHx_{n}\in H such that:

|<T3xn,xn>|1|<T^{3}x_{n},x_{n}>|\to 1

By using our assumption T=TT=T^{*}, we can rewrite this formula as follows:

|<T2xn,Txn>|1|<T^{2}x_{n},Tx_{n}>|\to 1

Now since TT is compact, and {xn}\{x_{n}\} is bounded, we can assume, up to changing the sequence {xn}\{x_{n}\} to one of its subsequences, that the sequence TxnTx_{n} converges:

TxnyTx_{n}\to y

Thus, the convergence formula found above reformulates as follows, with y0y\neq 0:

|<Ty,y>|=1|<Ty,y>|=1

(2) Our claim now, which will finish the proof, is that this latter formula implies Ty=±yTy=\pm y. Indeed, by using Cauchy-Schwarz and ||T||=1||T||=1, we have:

|<Ty,y>|||Ty||||y||1|<Ty,y>|\leq||Ty||\cdot||y||\leq 1

We know that this must be an equality, so Ty,yTy,y must be proportional. But since TT is self-adjoint the proportionality factor must be ±1\pm 1, and so we obtain, as claimed:

Ty=±yTy=\pm y

Thus, we have constructed an eigenvector for λ=±1\lambda=\pm 1, as desired. ∎

We can further build on the above results in the following way:

Proposition 4.17.

If TT is compact and self-adjoint, there is an orthogonal basis of HH made of eigenvectors of TT.

Proof.

We use Proposition 4.15. According to the results there, we can arrange the nonzero eigenvalues of TT, taken with multiplicities, into a sequence λn0\lambda_{n}\to 0. Let ynHy_{n}\in H be the corresponding eigenvectors, and consider the following space:

E=span(yn)¯E=\overline{span(y_{n})}

The result follows then from the following observations:

(1) Since we have T=TT=T^{*}, both EE and its orthogonal EE^{\perp} are invariant under TT.

(2) On the space EE, our operator TT is by definition diagonal.

(3) On the space EE^{\perp}, our claim is that we have T=0T=0. Indeed, assuming that the restriction S=TES=T_{E^{\perp}} is nonzero, we can apply Proposition 4.16 to this restriction, and we obtain an eigenvalue for SS, and so for TT, contradicting the maximality of EE. ∎

With the above results in hand, we can now formulate a first spectral theory result for compact operators, which closes the discussion in the self-adjoint case:

Theorem 4.18.

Assuming that TB(H)T\in B(H), with dimH=\dim H=\infty, is compact and self-adjoint, the following happen:

  1. (1)

    The spectrum σ(T)\sigma(T)\subset\mathbb{R} consists of a sequence λn0\lambda_{n}\to 0.

  2. (2)

    All spectral values λσ(T){0}\lambda\in\sigma(T)-\{0\} are eigenvalues.

  3. (3)

    All eigenvalues λσ(T){0}\lambda\in\sigma(T)-\{0\} have finite multiplicity.

  4. (4)

    There is an orthogonal basis of HH made of eigenvectors of TT.

Proof.

This follows from the various results established above:

(1) In view of Proposition 4.15 (1), this will follow from (2) below.

(2) Assume that λ0\lambda\neq 0 belongs to the spectrum σ(T)\sigma(T), but is not an eigenvalue. By using Proposition 4.17, let us pick an orthonormal basis {en}\{e_{n}\} of HH consisting of eigenvectors of TT, and then consider the following operator:

Sx=n<x,en>λnλenSx=\sum_{n}\frac{<x,e_{n}>}{\lambda_{n}-\lambda}\,e_{n}

Then SS is an inverse for TλT-\lambda, and so we have λσ(T)\lambda\notin\sigma(T), as desired.

(3) This is something that we know, from Proposition 4.15 (2).

(4) This is something that we know too, from Proposition 4.17. ∎

Finally, we have the following result, regarding the general case:

Theorem 4.19.

The compact operators TB(H)T\in B(H), with dimH=\dim H=\infty, are the operators of the following form, with {en}\{e_{n}\}, {fn}\{f_{n}\} being orthonormal families, and with λn0\lambda_{n}\searrow 0:

T(x)=nλn<x,en>fnT(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

The numbers λn\lambda_{n}, called singular values of TT, are the eigenvalues of |T||T|. In fact, the polar decomposition of TT is given by T=U|T|T=U|T|, with

|T|(x)=nλn<x,en>en|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

and with UU being given by Uen=fnUe_{n}=f_{n}, and U=0U=0 on the complement of span(ei)span(e_{i}).

Proof.

This basically follows from Theorem 4.8 and Theorem 4.18, as follows:

(1) Given two orthonormal families {en}\{e_{n}\}, {fn}\{f_{n}\}, and a sequence of real numbers λn0\lambda_{n}\searrow 0, consider the linear operator given by the formula in the statement, namely:

T(x)=nλn<x,en>fnT(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Our first claim is that TT is bounded. Indeed, when assuming |λn|ε|\lambda_{n}|\leq\varepsilon for any nn, which is something that we can do if we want to prove that TT is bounded, we have:

||T(x)||2\displaystyle||T(x)||^{2} =\displaystyle= |nλn<x,en>fn|2\displaystyle\left|\sum_{n}\lambda_{n}<x,e_{n}>f_{n}\right|^{2}
=\displaystyle= n|λn|2|<x,en>|2\displaystyle\sum_{n}|\lambda_{n}|^{2}|<x,e_{n}>|^{2}
\displaystyle\leq ε2n|<x,en>|2\displaystyle\varepsilon^{2}\sum_{n}|<x,e_{n}>|^{2}
\displaystyle\leq ε2||x||2\displaystyle\varepsilon^{2}||x||^{2}

(2) The next observation is that this operator is indeed compact, because it appears as the norm limit, TNTT_{N}\to T, of the following sequence of finite rank operators:

TN=nNλn<x,en>fnT_{N}=\sum_{n\leq N}\lambda_{n}<x,e_{n}>f_{n}

(3) Regarding now the polar decomposition assertion, for the above operator, this follows once again from definitions. Indeed, the adjoint is given by:

T(x)=nλn<x,fn>enT^{*}(x)=\sum_{n}\lambda_{n}<x,f_{n}>e_{n}

Thus, when composing TT^{*} with TT, we obtain the following operator:

TT(x)=nλn2<x,en>enT^{*}T(x)=\sum_{n}\lambda_{n}^{2}<x,e_{n}>e_{n}

Now by extracting the square root, we obtain the formula in the statement, namely:

|T|(x)=nλn<x,en>en|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

(4) Conversely now, assume that TB(H)T\in B(H) is compact. Then TTT^{*}T, which is self-adjoint, must be compact as well, and so by Theorem 4.18 we have a formula as follows, with {en}\{e_{n}\} being a certain orthonormal family, and with λn0\lambda_{n}\searrow 0:

TT(x)=nλn2<x,en>enT^{*}T(x)=\sum_{n}\lambda_{n}^{2}<x,e_{n}>e_{n}

By extracting the square root we obtain the formula of |T||T| in the statement, and then by setting U(en)=fnU(e_{n})=f_{n} we obtain a second orthonormal family, {fn}\{f_{n}\}, such that:

T(x)=U|T|=nλn<x,en>fnT(x)=U|T|=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Thus, our compact operator TB(H)T\in B(H) appears indeed as in the statement. ∎

As a technical remark here, it is possible to slightly improve a part of the above statement. Consider indeed an operator of the following form, with {en}\{e_{n}\}, {fn}\{f_{n}\} being orthonormal families as before, and with λn0\lambda_{n}\to 0 being now complex numbers:

T(x)=nλn<x,en>fnT(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Then the same proof as before shows that TT is compact, and that the polar decomposition of TT is given by T=U|T|T=U|T|, with the modulus |T||T| being as follows:

|T|(x)=n|λn|<x,en>en|T|(x)=\sum_{n}|\lambda_{n}|<x,e_{n}>e_{n}

As for the partial isometry UU, this is given by Uen=wnfnUe_{n}=w_{n}f_{n}, and U=0U=0 on the complement of span(ei)span(e_{i}), where wn𝕋w_{n}\in\mathbb{T} are such that λn=|λn|wn\lambda_{n}=|\lambda_{n}|w_{n}.

4c. Trace class operators

We have not talked so far about the trace of operators TB(H)T\in B(H), in analogy with the trace of the usual matrices MMN()M\in M_{N}(\mathbb{C}). This is because the trace can be finite or infinite, or even not well-defined, and we will discuss this now. Let us start with:

Proposition 4.20.

Given a positive operator TB(H)T\in B(H), the quantity

Tr(T)=n<Ten,en>[0,]Tr(T)=\sum_{n}<Te_{n},e_{n}>\in[0,\infty]

is indpendent on the choice of an orthonormal basis {en}\{e_{n}\}.

Proof.

If {fn}\{f_{n}\} is another orthonormal basis, we have:

n<Tfn,fn>\displaystyle\sum_{n}<Tf_{n},f_{n}> =\displaystyle= n<Tfn,Tfn>\displaystyle\sum_{n}<\sqrt{T}f_{n},\sqrt{T}f_{n}>
=\displaystyle= n||Tfn||2\displaystyle\sum_{n}||\sqrt{T}f_{n}||^{2}
=\displaystyle= mn|<Tfn,em>|2\displaystyle\sum_{mn}|<\sqrt{T}f_{n},e_{m}>|^{2}
=\displaystyle= mn|<T1/4fn,T1/4em>|2\displaystyle\sum_{mn}|<T^{1/4}f_{n},T^{1/4}e_{m}>|^{2}

Since this quantity is symmetric in e,fe,f, this gives the result. ∎

We can now introduce the trace class operators, as follows:

Definition 4.21.

An operator TB(H)T\in B(H) is said to be of trace class if:

Tr|T|<Tr|T|<\infty

The set of such operators, also called integrable, is denoted B1(H)B_{1}(H).

In finite dimensions, any operator is of course of trace class. In arbitrary dimension, finite or not, we first have the following result, regarding such operators:

Proposition 4.22.

Any finite rank operator is of trace class, and any trace class operator is compact, so that we have embeddings as follows:

F(H)B1(H)K(H)F(H)\subset B_{1}(H)\subset K(H)

Moreover, for any compact operator TK(H)T\in K(H) we have the formula

Tr|T|=nλnTr|T|=\sum_{n}\lambda_{n}

where λn0\lambda_{n}\geq 0 are the singular values, and so TB1(H)T\in B_{1}(H) precisely when nλn<\sum_{n}\lambda_{n}<\infty.

Proof.

We have several assertions here, the idea being as follows:

(1) If TT is of finite rank, it is clearly of trace class.

(2) In order to prove now the second assertion, assume first that T>0T>0 is of trace class. For any orthonormal basis {en}\{e_{n}\} we have:

n||Ten||2\displaystyle\sum_{n}||\sqrt{T}e_{n}||^{2} =\displaystyle= n<Ten,en>\displaystyle\sum_{n}<Te_{n},e_{n}>
\displaystyle\leq Tr(T)\displaystyle Tr(T)
<\displaystyle< \displaystyle\infty

But this shows that we have a convergence as follows:

Ten0\sqrt{T}e_{n}\to 0

Thus the operator T\sqrt{T} is compact. Now since the compact operators form an ideal, it follows that T=TTT=\sqrt{T}\cdot\sqrt{T} is compact as well, as desired.

(3) In order to prove now the second assertion in general, assume that TB(H)T\in B(H) is of trace class. Then |T||T| is also of trace class, and so compact by (2), and since we have T=U|T|T=U|T| by polar decomposition, it follows that TT is compact too.

(4) Finally, in order to prove the last assertion, assume that TT is compact. The singular value decomposition of |T||T|, from Theorem 4.19, is then as follows:

|T|(x)=nλn<x,en>en|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

But this gives the formula for Tr|T|Tr|T| in the statement, and proves the last assertion. ∎

Here is a useful reformulation of the above result, or rather of the above result coupled with Theorem 4.19, without reference to compact operators:

Theorem 4.23.

The trace class operators are precisely the operators of the form

|T|(x)=nλn<x,en>fn|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

with {en}\{e_{n}\}, {fn}\{f_{n}\} being orthonormal systems, and with λ0\lambda\searrow 0 being a sequence satisfying:

nλn<\sum_{n}\lambda_{n}<\infty

Moreover, for such an operator we have the following estimate:

|Tr(T)|Tr|T|=nλn|Tr(T)|\leq Tr|T|=\sum_{n}\lambda_{n}
Proof.

This follows indeed from Proposition 4.22, or rather for step (4) in the proof of Proposition 4.22, coupled with Theorem 4.19. ∎

Next, we have the following result, which comes as a continuation of Proposition 4.22, and is our central result here, regarding the trace class operators:

Theorem 4.24.

The space of trace class operators, which appears as an intermediate space between the finite rank operators and the compact operators,

F(H)B1(H)K(H)F(H)\subset B_{1}(H)\subset K(H)

is a two-sided *-ideal of K(H)K(H). The following is a Banach space norm on B1(H)B_{1}(H),

||T||1=Tr|T|||T||_{1}=Tr|T|

satisfying ||T||||T||1||T||\leq||T||_{1}, and for TB1(H)T\in B_{1}(H) and SB(H)S\in B(H) we have:

||ST||1||S||||T||1||ST||_{1}\leq||S||\cdot||T||_{1}

Also, the subspace F(H)F(H) is dense inside B1(H)B_{1}(H), with respect to this norm.

Proof.

There are several assertions here, the idea being as follows:

(1) In order to prove that B1(H)B_{1}(H) is a linear space, and that ||T||1=Tr|T|||T||_{1}=Tr|T| is a norm on it, the only non-trivial point is that of proving the following inequality:

Tr|S+T|Tr|S|+Tr|T|Tr|S+T|\leq Tr|S|+Tr|T|

For this purpose, consider the polar decompositions of these operators:

S=U|S|,T=V|T|,S+T=W|S+T|S=U|S|\quad,\quad T=V|T|\quad,\quad S+T=W|S+T|

Given an orthonormal basis {en}\{e_{n}\}, we have the following formula:

Tr|S+T|\displaystyle Tr|S+T| =\displaystyle= n<|S+T|en,en>\displaystyle\sum_{n}<|S+T|e_{n},e_{n}>
=\displaystyle= n<W(S+T)en,en>\displaystyle\sum_{n}<W^{*}(S+T)e_{n},e_{n}>
=\displaystyle= n<WU|S|en,en>+n<WV|T|en,en>\displaystyle\sum_{n}<W^{*}U|S|e_{n},e_{n}>+\sum_{n}<W^{*}V|T|e_{n},e_{n}>

The point now is that the first sum can be estimated as follows:

n<WU|S|en,en>\displaystyle\sum_{n}<W^{*}U|S|e_{n},e_{n}>
=\displaystyle= n<|S|en,|S|UWen>\displaystyle\sum_{n}<\sqrt{|S|}e_{n},\sqrt{|S|}U^{*}We_{n}>
\displaystyle\leq n|||S|en|||||S|UWen||\displaystyle\sum_{n}\Big{|}\Big{|}\sqrt{|S|}e_{n}\Big{|}\Big{|}\cdot\Big{|}\Big{|}\sqrt{|S|}U^{*}We_{n}\Big{|}\Big{|}
\displaystyle\leq n|||S|en||2n|||S|UWen||2\displaystyle\sqrt{\sum_{n}\Big{|}\Big{|}\sqrt{|S|}e_{n}\Big{|}\Big{|}^{2}}\cdot\sqrt{\sum_{n}\Big{|}\Big{|}\sqrt{|S|}U^{*}We_{n}\Big{|}\Big{|}^{2}}

In order to estimate the terms on the right, we can proceed as follows:

n|||S|UWen||2\displaystyle\sum_{n}\Big{|}\Big{|}\sqrt{|S|}U^{*}We_{n}\Big{|}\Big{|}^{2} =\displaystyle= n<WU|S|UWen,en>\displaystyle\sum_{n}<W^{*}U|S|U^{*}We_{n},e_{n}>
=\displaystyle= Tr(WU|S|UW)\displaystyle Tr(W^{*}U|S|U^{*}W)
\displaystyle\leq Tr(U|S|U)\displaystyle Tr(U|S|U^{*})
\displaystyle\leq Tr(|S|)\displaystyle Tr(|S|)

The second sum in the above formula of Tr|S+T|Tr|S+T| can be estimated in the same way, and in the end we obtain, as desired:

Tr|S+T|Tr|S|+Tr|T|Tr|S+T|\leq Tr|S|+Tr|T|

(2) The estimate ||T||||T||1||T||\leq||T||_{1} can be established as follows:

||T||\displaystyle||T|| =\displaystyle= |||T|||\displaystyle\big{|}\big{|}|T|\big{|}\big{|}
=\displaystyle= sup||x||=1<|T|x,x>\displaystyle\sup_{||x||=1}<|T|x,x>
\displaystyle\leq Tr|T|\displaystyle Tr|T|

(3) The fact that B1(H)B_{1}(H) is indeed a Banach space follows by constructing a limit for any Cauchy sequence, by using the singular value decomposition.

(4) The fact that B1(H)B_{1}(H) is indeed closed under the involution follows from:

Tr(T)\displaystyle Tr(T^{*}) =\displaystyle= n<Ten,en>\displaystyle\sum_{n}<T^{*}e_{n},e_{n}>
=\displaystyle= n<en,TeN>\displaystyle\sum_{n}<e_{n},Te_{N}>
=\displaystyle= Tr(T)¯\displaystyle\overline{Tr(T)}

(5) In order to prove now the ideal property of B1(H)B_{1}(H), we use the standard fact, that we know from Proposition 4.5, that any bounded operator TB(H)T\in B(H) can be written as a linear combination of 4 unitary operators, as follows:

T=λ1U1+λ2U2+λ3U3+λ4U4T=\lambda_{1}U_{1}+\lambda_{2}U_{2}+\lambda_{3}U_{3}+\lambda_{4}U_{4}

Indeed, by taking the real and imaginary part we can first write TT as a linear combination of 2 self-adjoint operators, and then by functional calculus each of these 2 self-adjoint operators can be written as a linear linear combination of 2 unitary operators.

(6) With this trick in hand, we can now prove the ideal property of B1(H)B_{1}(H). Indeed, it is enough to prove that we have:

TB1(H),UU(H)UT,TUB1(H)T\in B_{1}(H),U\in U(H)\implies UT,TU\in B_{1}(H)

But this latter result follows by using the polar decomposition theorem.

(7) With a bit more care, we obtain from this the estimate ||ST||1||S||||T||1||ST||_{1}\leq||S||\cdot||T||_{1} from the statement. As for the last assertion, this is clear as well. ∎

This was for the basic theory of the trace class operators. Much more can be said, and we refer here to the literature, such as Lax [lax]. In what concerns us, we will be back to these operators later in this book, in Part III, when discussing operator algebras.

4d. Hilbert-Schmidt operators

As a last topic of this chapter, let us discuss yet another important class of operators, namely the Hilbert-Schmidt ones. These operators, that we will need on several key occasions in what follows, when talking operator algebras, are introduced as follows:

Definition 4.25.

An operator TB(H)T\in B(H) is said to be Hilbert-Schmidt if:

Tr(TT)<Tr(T^{*}T)<\infty

The set of such operators is denoted B2(H)B_{2}(H).

As before with other sets of operators, in finite dimensions we obtain in this way all the operators. In general, we have the following result, regarding such operators:

Theorem 4.26.

The space B2(H)B_{2}(H) of Hilbert-Schmidt operators, which appears as an intermediate space between the trace class operators and the compact operators,

F(H)B1(H)B2(H)K(H)F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H)

is a two-sided *-ideal of K(H)K(H). This ideal has the property

S,TB2(H)STB1(H)S,T\in B_{2}(H)\implies ST\in B_{1}(H)

and conversely, each TB1(H)T\in B_{1}(H) appears as product of two operators in B2(H)B_{2}(H). In terms of the singular values (λn)(\lambda_{n}), the Hilbert-Schmidt operators are characterized by:

nλn2<\sum_{n}\lambda_{n}^{2}<\infty

Also, the following formula, whose output is finite by Cauchy-Schwarz,

<S,T>=Tr(ST)<S,T>=Tr(ST^{*})

defines a scalar product of B2(H)B_{2}(H), making it a Hilbert space.

Proof.

All this is quite standard, from the results that we have already, and more specifically from the singular value decomposition theorem, and its applications. To be more precise, the proof of the various assertions goes as follows:

(1) First of all, the fact that the space of Hilbert-Schmidt operators B2(H)B_{2}(H) is stable under taking sums, and so is a vector space, follows from:

(S+T)(S+T)\displaystyle(S+T)^{*}(S+T) \displaystyle\leq (S+T)(S+T)+(ST)(ST)\displaystyle(S+T)^{*}(S+T)+(S-T)^{*}(S-T)
=\displaystyle= (S+T)(S+T)+(ST)(ST)\displaystyle(S^{*}+T^{*})(S+T)+(S^{*}-T^{*})(S-T)
=\displaystyle= 2(SS+TT)\displaystyle 2(S^{*}S+T^{*}T)

Regarding now multiplicative properties, we can use here the following inequality:

(ST)(ST)=TSST||S||2TT(ST)^{*}(ST)=T^{*}S^{*}ST\leq||S||^{2}T^{*}T

Thus, the space B2(H)B_{2}(H) is a two-sided *-ideal of K(H)K(H), as claimed.

(2) In order to prove now that the product of any two Hilbert-Schmidt operators is a trace class operator, we can use the following formula, which is elementary:

ST=k=14ik(SiT)(SiT)S^{*}T=\sum_{k=1}^{4}i^{k}(S-iT)^{*}(S-iT)

Conversely, given an arbitrary trace class operator TB1(H)T\in B_{1}(H), we have:

TB1(H)|T|B1(H)|T|B2(H)T\in B_{1}(H)\implies|T|\in B_{1}(H)\implies\sqrt{|T|}\in B_{2}(H)

Thus, by using the polar decomposition T=U|T|T=U|T|, we obtain the following decomposition for TT, with both components being Hilbert-Schmidt operators:

T=U|T|=U|T||T|T=U|T|=U\sqrt{|T|}\cdot\sqrt{|T|}

(3) The condition for the singular values is clear.

(4) The fact that we have a scalar product is clear as well.

(5) The proof of the completness property is routine as well. ∎

We have as well the following key result, regarding the Hilbert-Schmidt operators:

Theorem 4.27.

We have the following formula,

Tr(ST)=Tr(TS)Tr(ST)=Tr(TS)

valied for any Hilbert-Schmidt operators S,TB2(H)S,T\in B_{2}(H).

Proof.

We can prove this in two steps, as follows:

(1) Assume first that |S||S| is trace class. Consider the polar decomposition S=U|S|S=U|S|, and choose an orthonormal basis {xi}\{x_{i}\} for the image of UU, suitably extended to an orthonormal basis of HH. We have then the following computation, as desired:

Tr(ST)\displaystyle Tr(ST) =\displaystyle= i<U|S|Txi,xi>\displaystyle\sum_{i}<U|S|Tx_{i},x_{i}>
=\displaystyle= i<|S|TUUxi,Uxi>\displaystyle\sum_{i}<|S|TUU^{*}x_{i},U^{*}x_{i}>
=\displaystyle= Tr(|S|TU)\displaystyle Tr(|S|TU)
=\displaystyle= Tr(TU|S|)\displaystyle Tr(TU|S|)
=\displaystyle= Tr(TS)\displaystyle Tr(TS)

(2) Assume now that we are in the general case, where SS is only assumed to be Hilbert-Schmidt. For any finite rank operator SS^{\prime} we have then:

|Tr(ST)Tr(TS)|\displaystyle|Tr(ST)-Tr(TS)| =\displaystyle= |Tr((SS)T)Tr(T(SS))|\displaystyle|Tr((S-S^{\prime})T)-Tr(T(S-S^{\prime}))|
\displaystyle\leq 2||SS||2||T||2\displaystyle 2||S-S^{\prime}||_{2}\cdot||T||_{2}

Thus by choosing SS^{\prime} with ||SS||20||S-S^{\prime}||_{2}\to 0, we obtain the result. ∎

This was for the basic theory of bounded operators on a Hilbert space, TB(H)T\in B(H). In the remainder of this book we will be rather interested in the operator algebras AB(H)A\subset B(H) that these operators can form. This is of course related to operator theory, because we can, at least in theory, take A=<T>A=<T>, and then study TT via the properties of AA. Actually, this is something that we already did a few times, when doing spectral theory, and notably when talking about functional calculus for normal operators.


For further operator theory, however, nothing beats a good operator theory book, and various ad-hoc methods, depending on the type of operators involved, and especially, on what you want to do with them. As before, in relation with topics to be later discussed in this book, we recommend here the books of Lax [lax] and Blackadar [bla].


Let us mention as well that there is a lot of interesting theory regarding the unbounded operators T(H)T\in\mathcal{L}(H) too, which is something quite technical, and here once again, we warmly recommend a good operator theory book. In addition, we recommend as well a good PDE book, because most of the questions making appear unbounded operators usually have PDE formulations as well, which are extremely efficient.

4e. Exercises

There has been a lot of theory in this chapter, with some of the things not really explained in great detail, and we have several exercises about all this. First comes:

Exercise 4.28.

Try to find the best operator theoretic analogue of the formula

z=reitz=re^{it}

for the complex numbers, telling us that any number is a real multiple of a unitary.

As explained in the above, a weak analogue of this holds, stating that any operator is a linear combination of 4 unitaries. The problem is that of improving this.

Exercise 4.29.

Work out a few explicit examples of the polar decomposition formula

T=UTTT=U\sqrt{T^{*}T}

with, if possible, a non-trivial computation for the square root.

This is actually something quite tricky, even for the usual matrices. So, as a preliminary exercise here, have some fun with the 2×22\times 2 matrices.

Exercise 4.30.

Look up the various extra general properties of the sets of finite rank, trace class, Hilbert-Schmidt and compact operators,

F(H)B1(H)B2(H)K(H)F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H)

coming in addition to what has been said above, about such operators.

This is of course quite vague, and, as good news, it is not indicated either if you should just come with a list of such properties, or with a list of such properties coming with complete proofs. Up to you here, and the more the better.

Part II Operator algebras


There was something in the air that night

The stars were bright, Fernando

They were shining there for you and me

For liberty, Fernando

Chapter 5 Operator algebras

5a. Normed algebras

We have seen that the study of the bounded operators TB(H)T\in B(H) often leads to the consideration of the algebras <T>B(H)<T>\subset B(H) generated by such operators, the idea being that the study of A=<T>A=<T> can lead to results about TT itself. In the remainder of this book we focus on the study of such algebras AB(H)A\subset B(H). Before anything, we should mention that there are countless ways of getting introduced to operator algebras, depending on motivations and taste, with the available books including:


(1) The old book of von Neumann [vn4], which started everything. This is a very classical book, with mathematical physics content, written at times when mathematics and physics were starting to part ways. A great book, still enjoyable nowadays.


(2) Various post-war treatises, such as Dixmier [dix], Kadison-Ringrose [kri], Strătilă-Zsidó [szs] and Takesaki [tak]. As a warning, however, these books are purely mathematical. Also, they sometimes avoid deep results of von Neumann and Connes.


(3) More recent books, including Arveson [arv], Blackadar [bla], Brown-Ozawa [boz], Connes [co3], Davidson [dav], Jones [jo6], Murphy [mur], Pedersen [ped] and Sakai [sak]. These are well-concieved one-volume books, written with various purposes in mind.


Our presentation below is inspired by Blackadar [bla], Connes [co3], Jones [jo6], but is yet another type of beast, often insisting on probabilistic aspects. But probably enough talking, more on this later, and let us get to work. We are interested in the study of the algebras of bounded operators AB(H)A\subset B(H). Let us start our discussion with the following broad definition, obtained by imposing the “minimal” set of reasonable axioms:

Definition 5.1.

An operator algebra is an algebra of bounded operators AB(H)A\subset B(H) which contains the unit, is closed under taking adjoints,

TATAT\in A\implies T^{*}\in A

and is closed as well under the norm.

Here, as in the previous chapters, HH is an arbitrary Hilbert space, with the case that we are mostly interested in being the separable one. By separable we mean having a countable orthonormal basis, {ei}iI\{e_{i}\}_{i\in I} with II countable, and such a space is of course unique. The simplest model is the space l2()l^{2}(\mathbb{N}), but in practice, we are particularly interested in the spaces of the form H=L2(X)H=L^{2}(X), which are separable too, but with the basis {ei}i\{e_{i}\}_{i\in\mathbb{N}} and the subsequent identification Hl2()H\simeq l^{2}(\mathbb{N}) being not necessarily very explicit.


Also as in the previous chapters, B(H)B(H) is the algebra of linear operators T:HHT:H\to H which are bounded, in the sense that the norm ||T||=sup||x||=1||Tx||||T||=\sup_{||x||=1}||Tx|| is finite. This algebra has an involution TTT\to T^{*}, with the adjoint operator TB(H)T^{*}\in B(H) being defined by the formula <Tx,y>=<x,Ty><Tx,y>=<x,T^{*}y>, and in the above definition, the assumption TATAT\in A\implies T^{*}\in A refers to this involution. Thus, AA must be a *-algebra.


As a first result now regarding the operator algebras, in relation with the normal operators, where most of the non-trivial results that we have so far are, we have:

Theorem 5.2.

The operator algebra <T>B(H)<T>\subset B(H) generated by a normal operator TB(H)T\in B(H) appears as an algebra of continuous functions,

<T>=C(σ(T))<T>=C(\sigma(T))

where σ(T)\sigma(T)\subset\mathbb{C} denotes as usual the spectrum of TT.

Proof.

This is an abstract reformulation of the continuous functional calculus theorem for the normal operators, that we know from chapter 3. Indeed, that theorem tells us that we have a continuous morphism of *-algebras, as follows:

C(σ(T))B(H),ff(T)C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Moreover, by the general properties of the continuous calculus, also established in chapter 3, this morphism is injective, and its image is the norm closed algebra <T><T> generated by T,TT,T^{*}. Thus, we obtain the isomorphism in the statement. ∎

The above result is very nice, and it is possible to further build on it, by using this time the spectral theorem for families of normal operators, as follows:

Theorem 5.3.

The operator algebra <Ti>B(H)<T_{i}>\subset B(H) generated by a family of normal operators TiB(H)T_{i}\in B(H) appears as an algebra of continuous functions,

<T>=C(X)<T>=C(X)

where XX\subset\mathbb{C} is a certain compact space associated to the family {Ti}\{T_{i}\}. Equivalently, any commutative operator algebra AB(H)A\subset B(H) is of the form A=C(X)A=C(X).

Proof.

We have two assertions here, the idea being as follows:

(1) Regarding the first assertion, this follows exactly as in the proof of Theorem 5.2, by using this time the spectral theorem for families of normal operators.

(2) As for the second assertion, this is clear from the first one, because any commutative algebra AB(H)A\subset B(H) is generated by its elements TAT\in A, which are all normal. ∎

All this is good to know, but Theorem 5.2 and Theorem 5.3 remain something quite heavy, based on the spectral theorem. We would like to present now an alternative proof for these results, which is rather elementary, and has the advantage of reconstructing the compact space XX directly from the knowledge of the algebra AA. We will need:

Theorem 5.4.

Given an operator TAB(H)T\in A\subset B(H), define its spectrum as:

σ(T)={λ|TλA1}\sigma(T)=\left\{\lambda\in\mathbb{C}\Big{|}T-\lambda\notin A^{-1}\right\}

The following spectral theory results hold, exactly as in the A=B(H)A=B(H) case:

  1. (1)

    We have σ(ST){0}=σ(TS){0}\sigma(ST)\cup\{0\}=\sigma(TS)\cup\{0\}.

  2. (2)

    We have polynomial, rational and holomorphic calculus.

  3. (3)

    As a consequence, the spectra are compact and non-empty.

  4. (4)

    The spectra of unitaries (U=U1)(U^{*}=U^{-1}) and self-adjoints (T=T)(T=T^{*}) are on 𝕋,\mathbb{T},\mathbb{R}.

  5. (5)

    The spectral radius of normal elements (TT=TT)(TT^{*}=T^{*}T) is given by ρ(T)=||T||\rho(T)=||T||.

In addition, assuming TABT\in A\subset B, the spectra of TT with respect to AA and to BB coincide.

Proof.

This is something that we know from the beginning of chapter 3, in the case A=B(H)A=B(H). In general the proof is similar, the idea being as follows:

(1) Regarding the assertions (1-5), which are of course formulated a bit informally, the proofs here are perfectly similar to those for the full operator algebra A=B(H)A=B(H). All this is standard material, and in fact, things in chapter 3 were written in such a way as for their extension now, to the general operator algebra setting, to be obvious.

(2) Regarding the last assertion, the inclusion σB(T)σA(T)\sigma_{B}(T)\subset\sigma_{A}(T) is clear. For the converse, assume TλB1T-\lambda\in B^{-1}, and consider the following self-adjoint element:

S=(Tλ)(Tλ)S=(T-\lambda)^{*}(T-\lambda)

The difference between the two spectra of SABS\in A\subset B is then given by:

σA(S)σB(S)={μσB(S)|(Sμ)1BA}\sigma_{A}(S)-\sigma_{B}(S)=\left\{\mu\in\mathbb{C}-\sigma_{B}(S)\Big{|}(S-\mu)^{-1}\in B-A\right\}

Thus this difference in an open subset of \mathbb{C}. On the other hand SS being self-adjoint, its two spectra are both real, and so is their difference. Thus the two spectra of SS are equal, and in particular SS is invertible in AA, and so TλA1T-\lambda\in A^{-1}, as desired.

(3) As an observation, the last assertion applied with B=B(H)B=B(H) shows that the spectrum σ(T)\sigma(T) as constructed in the statement coincides with the spectrum σ(T)\sigma(T) as constructed and studied in chapter 3, so the fact that (1-5) hold indeed is no surprise.

(4) Finally, I can hear you screaming that I should have concieved this book differently, matter of not proving the same things twice. Good point, with my distinguished colleague Bourbaki saying the same, and in answer, wait for chapter 7 below, where we will prove exactly the same things a third time. We can discuss pedagogy at that time. ∎

We can now get back to the commutative algebras, and we have the following result, due to Gelfand, which provides an alternative to Theorem 5.2 and Theorem 5.3:

Theorem 5.5.

Any commutative operator algebra AB(H)A\subset B(H) is of the form

A=C(X)A=C(X)

with the “spectrum” XX of such an algebra being the space of characters χ:A\chi:A\to\mathbb{C}, with topology making continuous the evaluation maps evT:χχ(T)ev_{T}:\chi\to\chi(T).

Proof.

Given a commutative operator algebra AA, we can define XX as in the statement. Then XX is compact, and TevTT\to ev_{T} is a morphism of algebras, as follows:

ev:AC(X)ev:A\to C(X)

(1) We first prove that evev is involutive. We use the following formula, which is similar to the z=Re(z)+iIm(z)z=Re(z)+iIm(z) formula for the usual complex numbers:

T=T+T2+iTT2iT=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

Thus it is enough to prove the equality evT=evTev_{T^{*}}=ev_{T}^{*} for self-adjoint elements TT. But this is the same as proving that T=TT=T^{*} implies that evTev_{T} is a real function, which is in turn true, because evT(χ)=χ(T)ev_{T}(\chi)=\chi(T) is an element of σ(T)\sigma(T), contained in \mathbb{R}.

(2) Since AA is commutative, each element is normal, so evev is isometric:

||evT||=ρ(T)=||T||||ev_{T}||=\rho(T)=||T||

(3) It remains to prove that evev is surjective. But this follows from the Stone-Weierstrass theorem, because ev(A)ev(A) is a closed subalgebra of C(X)C(X), which separates the points. ∎

The above theorem of Gelfand is something very beautiful, and far-reaching. It is possible to further build on it, indefinitely high. We will be back to this.

5b. Von Neumann algebras

Instead of further building on the above results, which are already quite non-trivial, let us return to our modest status of apprentice operator algebraists, and declare ourselves rather unsatisfied with Definition 5.1, on the following intuitive grounds:

Thought 5.6.

Our assumption that AB(H)A\subset B(H) is norm closed is not satisfying, because we would like AA to be stable under polar decomposition, under taking spectral projections, and more generally, under measurable functional calculus.

Here all these “defects” are best visible in the context of Theorem 5.3, with the algebra A=C(X)A=C(X) found there, with X=σ(T)X=\sigma(T), being obviously too small. In fact, Theorem 5.3 teaches us that, when looking for a fix, we should look for a weaker topology on B(H)B(H), as for the algebra A=<T>A=<T> generated by a normal operator to be A=L(X)A=L^{\infty}(X).


So, let us get now into this, topologies on B(H)B(H), and fine-tunings of Definition 5.1, based on them. The result that we will need, which is elementary, is as follows:

Proposition 5.7.

For a subalgebra AB(H)A\subset B(H), the following are equivalent:

  1. (1)

    AA is closed under the weak operator topology, making each of the linear maps T<Tx,y>T\to<Tx,y> continuous.

  2. (2)

    AA is closed under the strong operator topology, making each of the linear maps TTxT\to Tx continuous.

In the case where these conditions are satisfied, AA is closed under the norm topology.

Proof.

There are several statements here, the proof being as follows:

(1) It is clear that the norm topology is stronger than the strong operator topology, which is in turn stronger than the weak operator topology. At the level of the subsets SB(H)S\subset B(H) which are closed things get reversed, in the sense that weakly closed implies strongly closed, which in turn implies norm closed. Thus, we are left with proving that for any algebra AB(H)A\subset B(H), strongly closed implies weakly closed.

(2) Consider the Hilbert space obtained by summing nn times HH with itself:

K=HHK=H\oplus\ldots\oplus H

The operators over KK can be regarded as being square matrices with entries in B(H)B(H), and in particular, we have a representation π:B(H)B(K)\pi:B(H)\to B(K), as follows:

π(T)=(TT)\pi(T)=\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}

Assume now that we are given an operator TA¯T\in\bar{A}, with the bar denoting the weak closure. We have then, by using the Hahn-Banach theorem, for any xKx\in K:

TA¯\displaystyle T\in\bar{A} \displaystyle\implies π(T)π(A)¯\displaystyle\pi(T)\in\overline{\pi(A)}
\displaystyle\implies π(T)xπ(A)x¯\displaystyle\pi(T)x\in\overline{\pi(A)x}
\displaystyle\implies π(T)xπ(A)x¯||.||\displaystyle\pi(T)x\in\overline{\pi(A)x}^{\,||.||}

Now observe that the last formula tells us that for any x=(x1,,xn)x=(x_{1},\ldots,x_{n}), and any ε>0\varepsilon>0, we can find SAS\in A such that the following holds, for any ii:

||SxiTxi||<ε||Sx_{i}-Tx_{i}||<\varepsilon

Thus TT belongs to the strong operator closure of AA, as desired. ∎

Observe that in the above the terminology is a bit confusing, because the norm topology is stronger than the strong operator topology. As a solution, we agree to call the norm topology “strong”, and the weak and strong operator topologies “weak”, whenever these two topologies coincide. With this convention made, the algebras AB(H)A\subset B(H) in Proposition 5.7 are those which are weakly closed. Thus, we can now formulate:

Definition 5.8.

A von Neumann algebra is an operator algebra

AB(H)A\subset B(H)

which is closed under the weak topology.

These algebras will be our main objects of study, in what follows. As basic examples, we have the algebra B(H)B(H) itself, then the singly generated algebras, A=<T>A=<T> with TB(H)T\in B(H), and then the multiply generated algebras, A=<Ti>A=<T_{i}> with TiB(H)T_{i}\in B(H). But for the moment, let us keep things simple, and build directly on Definition 5.8, by using basic functional analysis methods. We will need the following key result:

Theorem 5.9.

For an operator algebra AB(H)A\subset B(H), we have

A=A¯A^{\prime\prime}=\bar{A}

with AA^{\prime\prime} being the bicommutant inside B(H)B(H), and A¯\bar{A} being the weak closure.

Proof.

We can prove this by double inclusion, as follows:

\supset” Since any operator commutes with the operators that it commutes with, we have a trivial inclusion SSS\subset S^{\prime\prime}, valid for any set SB(H)S\subset B(H). In particular, we have:

AAA\subset A^{\prime\prime}

Our claim now is that the algebra AA^{\prime\prime} is closed, with respect to the strong operator topology. Indeed, assuming that we have TiTT_{i}\to T in this topology, we have:

TiA\displaystyle T_{i}\in A^{\prime\prime} \displaystyle\implies STi=TiS,SA\displaystyle ST_{i}=T_{i}S,\ \forall S\in A^{\prime}
\displaystyle\implies ST=TS,SA\displaystyle ST=TS,\ \forall S\in A^{\prime}
\displaystyle\implies TA\displaystyle T\in A

Thus our claim is proved, and together with Proposition 5.7, which allows us to pass from the strong to the weak operator topology, this gives A¯A\bar{A}\subset A^{\prime\prime}, as desired.

\subset” Here we must prove that we have the following implication, valid for any TB(H)T\in B(H), with the bar denoting as usual the weak operator closure:

TATA¯T\in A^{\prime\prime}\implies T\in\bar{A}

For this purpose, we use the same amplification trick as in the proof of Proposition 5.7. Consider the Hilbert space obtained by summing nn times HH with itself:

K=HHK=H\oplus\ldots\oplus H

The operators over KK can be regarded as being square matrices with entries in B(H)B(H), and in particular, we have a representation π:B(H)B(K)\pi:B(H)\to B(K), as follows:

π(T)=(TT)\pi(T)=\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}

The idea will be that of doing the computations in this representation. First, in this representation, the image of our algebra AB(H)A\subset B(H) is given by:

π(A)={(TT)|TA}\pi(A)=\left\{\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}\Big{|}T\in A\right\}

We can compute the commutant of this image, exactly as in the usual scalar matrix case, and we obtain the following formula:

π(A)={(S11S1nSn1Snn)|SijA}\pi(A)^{\prime}=\left\{\begin{pmatrix}S_{11}&\ldots&S_{1n}\\ \vdots&&\vdots\\ S_{n1}&\ldots&S_{nn}\end{pmatrix}\Big{|}S_{ij}\in A^{\prime}\right\}

We conclude from this that, given an operator TAT\in A^{\prime\prime} as above, we have:

(TT)π(A)\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}\in\pi(A)^{\prime\prime}

In other words, the conclusion of all this is that we have:

TAπ(T)π(A)T\in A^{\prime\prime}\implies\pi(T)\in\pi(A)^{\prime\prime}

Now given a vector xKx\in K, consider the orthogonal projection PB(K)P\in B(K) on the norm closure of the vector space π(A)xK\pi(A)x\subset K. Since the subspace π(A)xK\pi(A)x\subset K is invariant under the action of π(A)\pi(A), so is its norm closure inside KK, and we obtain from this:

Pπ(A)P\in\pi(A)^{\prime}

By combining this with what we found above, we conclude that we have:

TAπ(T)P=Pπ(T)T\in A^{\prime\prime}\implies\pi(T)P=P\pi(T)

Since this holds for any xKx\in K, we conclude that any operator TAT\in A^{\prime\prime} belongs to the strong operator closure of AA. By using now Proposition 5.7, which allows us to pass from the strong to the weak operator closure, we conclude that we have:

AA¯A^{\prime\prime}\subset\bar{A}

Thus, we have the desired reverse inclusion, and this finishes the proof. ∎

Now by getting back to the von Neumann algebras, from Definition 5.8, we have the following result, which is a reformulation of Theorem 5.9, by using this notion:

Theorem 5.10.

For an operator algebra AB(H)A\subset B(H), the following are equivalent:

  1. (1)

    AA is weakly closed, so it is a von Neumann algebra.

  2. (2)

    AA equals its algebraic bicommutant AA^{\prime\prime}, taken inside B(H)B(H).

Proof.

This follows from the formula A=A¯A^{\prime\prime}=\bar{A} from Theorem 5.9, along with the trivial fact that the commutants are automatically weakly closed. ∎

The above statement, called bicommutant theorem, and due to von Neumann [vn1], is quite interesting, philosophically speaking. Among others, it shows that the von Neumann algebras are exactly the commutants of the self-adjoint sets of operators:

Proposition 5.11.

Given a subset SB(H)S\subset B(H) which is closed under *, the commutant

A=SA=S^{\prime}

is a von Neumann algebra. Any von Neumann algebra appears in this way.

Proof.

We have two assertions here, the idea being as follows:

(1) Given SB(H)S\subset B(H) satisfying S=SS=S^{*}, the commutant A=SA=S^{\prime} satisfies A=AA=A^{*}, and is also weakly closed. Thus, AA is a von Neumann algebra. Note that this follows as well from the following “tricommutant formula”, which follows from Theorem 5.10:

S=SS^{\prime\prime\prime}=S^{\prime}

(2) Given a von Neumann algebra AB(H)A\subset B(H), we can take S=AS=A^{\prime}. Then SS is closed under the involution, and we have S=AS^{\prime}=A, as desired. ∎

Observe that Proposition 5.11 can be regarded as yet another alternative definition for the von Neumann algebras, and with this definition being probably the best one when talking about quantum mechanics, where the self-adjoint operators T:HHT:H\to H can be though of as being “observables” of the system, and with the commutants A=SA=S^{\prime} of the sets of such observables S={Ti}S=\{T_{i}\} being the algebras AB(H)A\subset B(H) that we are interested in. And with all this actually needing some discussion about self-adjointness, and about boundedness too, but let us not get into this here, and stay mathematical, as before.


As another interesting consequence of Theorem 5.10, we have:

Proposition 5.12.

Given a von Neumann algebra AB(H)A\subset B(H), its center

Z(A)=AAZ(A)=A\cap A^{\prime}

regarded as an algebra Z(A)B(H)Z(A)\subset B(H), is a von Neumann algebra too.

Proof.

This follows from the fact that the commutants are weakly closed, that we know from the above, which shows that AB(H)A^{\prime}\subset B(H) is a von Neumann algebra. Thus, the intersection Z(A)=AAZ(A)=A\cap A^{\prime} must be a von Neumann algebra too, as claimed. ∎

In order to develop some general theory, let us start by investigating the finite dimensional case. Here the ambient algebra is B(H)=MN()B(H)=M_{N}(\mathbb{C}), any linear subspace AB(H)A\subset B(H) is automatically closed, for all 3 topologies in Proposition 5.7, and we have:

Theorem 5.13.

The *-algebras AMN()A\subset M_{N}(\mathbb{C}) are exactly the algebras of the form

A=Mn1()Mnk()A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

depending on parameters kk\in\mathbb{N} and n1,,nkn_{1},\ldots,n_{k}\in\mathbb{N} satisfying

n1++nk=Nn_{1}+\ldots+n_{k}=N

embedded into MN()M_{N}(\mathbb{C}) via the obvious block embedding, twisted by a unitary UUNU\in U_{N}.

Proof.

We have two assertions to be proved, the idea being as follows:

(1) Given numbers n1,,nkn_{1},\ldots,n_{k}\in\mathbb{N} satisfying n1++nk=Nn_{1}+\ldots+n_{k}=N, we have indeed an obvious embedding of *-algebras, via matrix blocks, as follows:

Mn1()Mnk()MN()M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})\subset M_{N}(\mathbb{C})

In addition, we can twist this embedding by a unitary UUNU\in U_{N}, as follows:

MUMUM\to UMU^{*}

(2) In the other sense now, consider a *-algebra AMN()A\subset M_{N}(\mathbb{C}). It is elementary to prove that the center Z(A)=AAZ(A)=A\cap A^{\prime}, as an algebra, is of the following form:

Z(A)kZ(A)\simeq\mathbb{C}^{k}

Consider now the standard basis e1,,ekke_{1},\ldots,e_{k}\in\mathbb{C}^{k}, and let p1,,pkZ(A)p_{1},\ldots,p_{k}\in Z(A) be the images of these vectors via the above identification. In other words, these elements p1,,pkAp_{1},\ldots,p_{k}\in A are central minimal projections, summing up to 1:

p1++pk=1p_{1}+\ldots+p_{k}=1

The idea is then that this partition of the unity will eventually lead to the block decomposition of AA, as in the statement. We prove this in 4 steps, as follows:

Step 1. We first construct the matrix blocks, our claim here being that each of the following linear subspaces of AA are non-unital *-subalgebras of AA:

Ai=piApiA_{i}=p_{i}Ap_{i}

But this is clear, with the fact that each AiA_{i} is closed under the various non-unital *-subalgebra operations coming from the projection equations pi2=pi=pip_{i}^{2}=p_{i}^{*}=p_{i}.

Step 2. We prove now that the above algebras AiAA_{i}\subset A are in a direct sum position, in the sense that we have a non-unital *-algebra sum decomposition, as follows:

A=A1AkA=A_{1}\oplus\ldots\oplus A_{k}

As with any direct sum question, we have two things to be proved here. First, by using the formula p1++pk=1p_{1}+\ldots+p_{k}=1 and the projection equations pi2=pi=pip_{i}^{2}=p_{i}^{*}=p_{i}, we conclude that we have the needed generation property, namely:

A1++Ak=AA_{1}+\ldots+A_{k}=A

As for the fact that the sum is indeed direct, this follows as well from the formula p1++pk=1p_{1}+\ldots+p_{k}=1, and from the projection equations pi2=pi=pip_{i}^{2}=p_{i}^{*}=p_{i}.

Step 3. Our claim now, which will finish the proof, is that each of the *-subalgebras Ai=piApiA_{i}=p_{i}Ap_{i} constructed above is a full matrix algebra. To be more precise here, with ni=rank(pi)n_{i}=rank(p_{i}), our claim is that we have isomorphisms, as follows:

AiMni()A_{i}\simeq M_{n_{i}}(\mathbb{C})

In order to prove this claim, recall that the projections piAp_{i}\in A were chosen central and minimal. Thus, the center of each of the algebras AiA_{i} reduces to the scalars:

Z(Ai)=Z(A_{i})=\mathbb{C}

But this shows, either via a direct computation, or via the bicommutant theorem, that the each of the algebras AiA_{i} is a full matrix algebra, as claimed.

Step 4. We can now obtain the result, by putting together what we have. Indeed, by using the results from Step 2 and Step 3, we obtain an isomorphism as follows:

AMn1()Mnk()A\simeq M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

Moreover, a more careful look at the isomorphisms established in Step 3 shows that at the global level, that of the algebra AA itself, the above isomorphism simply comes by twisting the following standard multimatrix embedding, discussed in the beginning of the proof, (1) above, by a certain unitary matrix UUNU\in U_{N}:

Mn1()Mnk()MN()M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})\subset M_{N}(\mathbb{C})

Now by putting everything together, we obtain the result. ∎

In relation with the bicommutant theorem, we have the following result, which fully clarifies the situation, with a very explicit proof, in finite dimensions:

Proposition 5.14.

Consider a *-algebra AMN()A\subset M_{N}(\mathbb{C}), written as above:

A=Mn1()Mnk()A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The commutant of this algebra is then, with respect with the block decomposition used,

A=A^{\prime}=\mathbb{C}\oplus\ldots\oplus\mathbb{C}

and by taking one more time the commutant we obtain AA itself, A=AA=A^{\prime\prime}.

Proof.

Let us decompose indeed our algebra AA as in Theorem 5.13:

A=Mn1()Mnk()A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The center of each matrix algebra being reduced to the scalars, the commutant of this algebra is then as follows, with each copy of \mathbb{C} corresponding to a matrix block:

A=A^{\prime}=\mathbb{C}\oplus\ldots\oplus\mathbb{C}

By taking once again the commutant we obtain AA itself, and we are done. ∎

As another interesting application of Theorem 5.13, clarifying this time the relation with operator theory, in finite dimensions, we have the following result:

Theorem 5.15.

Given an operator TB(H)T\in B(H) in finite dimensions, H=NH=\mathbb{C}^{N}, the von Neumann algebra A=<T>A=<T> that it generates inside B(H)=MN()B(H)=M_{N}(\mathbb{C}) is

A=Mn1()Mnk()A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

with the sizes of the blocks n1,,nkn_{1},\ldots,n_{k}\in\mathbb{N} coming from the spectral theory of the associated matrix MMN()M\in M_{N}(\mathbb{C}). In the normal case TT=TTTT^{*}=T^{*}T, this decomposition comes from

T=UDUT=UDU^{*}

with DMN()D\in M_{N}(\mathbb{C}) diagonal, and with UUNU\in U_{N} unitary.

Proof.

This is something which is routine, by using the linear algebra and spectral theory developed in chapter 1, for the matrices MMN()M\in M_{N}(\mathbb{C}). To be more precise:

(1) The fact that A=<T>A=<T> decomposes into a direct sum of matrix algebras is something that we already know, coming from Theorem 5.13.

(2) By using standard linear algebra, we can compute the block sizes n1,,nkn_{1},\ldots,n_{k}\in\mathbb{N}, from the knowledge of the spectral theory of the associated matrix MMN()M\in M_{N}(\mathbb{C}).

(3) In the normal case, TT=TTTT^{*}=T^{*}T, we can simply invoke the spectral theorem, and by suitably changing the basis, we are led to the conclusion in the statement. ∎

Let us get now to infinite dimensions, with Theorem 5.15 as our main source of inspiration. The same argument applies, provided that we are in the normal case, and we have the following result, summarizing our basic knowledge here:

Theorem 5.16.

Given a bounded operator TB(H)T\in B(H) which is normal, TT=TTTT^{*}=T^{*}T, the von Neumann algebra A=<T>A=<T> that it generates inside B(H)B(H) is

<T>=L(σ(T))<T>=L^{\infty}(\sigma(T))

with σ(T)\sigma(T)\subset\mathbb{C} being as usual its spectrum.

Proof.

The measurable functional calculus theorem for the normal operators tells us that we have a weakly continuous morphism of *-algebras, as follows:

L(σ(T))B(H),ff(T)L^{\infty}(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Moreover, by the general properties of the measurable calculus, also established in chapter 3, this morphism is injective, and its image is the weakly closed algebra <T><T> generated by T,TT,T^{*}. Thus, we obtain the isomorphism in the statement. ∎

More generally now, along the same lines, we have the following result:

Theorem 5.17.

Given operators TiB(H)T_{i}\in B(H) which are normal, and which commute, the von Neumann algebra A=<Ti>A=<T_{i}> that these operators generates inside B(H)B(H) is

<Ti>=L(X)<T_{i}>=L^{\infty}(X)

with XX being a certain measured space, associated to the family {Ti}\{T_{i}\}.

Proof.

This is once again routine, by using the spectral theory for the families of commuting normal operators TiB(H)T_{i}\in B(H) developed in chapter 3. ∎

As a fundamental consequence now of the above results, we have:

Theorem 5.18.

The commutative von Neumann algebras are the algebras

A=L(X)A=L^{\infty}(X)

with XX being a measured space.

Proof.

We have two assertions to be proved, the idea being as follows:

(1) In one sense, we must prove that given a measured space XX, we can realize the A=L(X)A=L^{\infty}(X) as a von Neumann algebra, on a certain Hilbert space HH. But this is something that we know since chapter 2, the representation being as follows:

L(X)B(L2(X)),f(gfg)L^{\infty}(X)\subset B(L^{2}(X))\quad,\quad f\to(g\to fg)

(2) In the other sense, given a commutative von Neumann algebra AB(H)A\subset B(H), we must construct a certain measured space XX, and an identification A=L(X)A=L^{\infty}(X). But this follows from Theorem 5.17, because we can write our algebra as follows:

A=<Ti>A=<T_{i}>

To be more precise, AA being commutative, any element TAT\in A is normal, so we can pick a basis {Ti}A\{T_{i}\}\subset A, and then we have A=<Ti>A=<T_{i}> as above, with TiB(H)T_{i}\in B(H) being commuting normal operators. Thus Theorem 5.17 applies, and gives the result.

(3) Alternatively, and more explicitly, we can deduce this from Theorem 5.16, applied with T=TT=T^{*}. Indeed, by using T=Re(T)+iIm(T)T=Re(T)+iIm(T), we conclude that any von Neumann algebra AB(H)A\subset B(H) is generated by its self-adjoint elements TAT\in A. Moreover, by using measurable functional calculus, we conclude that AA is linearly generated by its projections. But then, assuming A=span¯{pi}A=\overline{span}\{p_{i}\}, with pip_{i} being projections, we can set:

T=i=0pi3iT=\sum_{i=0}^{\infty}\frac{p_{i}}{3^{i}}

Then T=TT=T^{*}, and by functional calculus we have p0<T>p_{0}\in<T>, then p1<T>p_{1}\in<T>, and so on. Thus A=<T>A=<T>, and A=L(X)A=L^{\infty}(X) comes now via Theorem 5.16, as claimed. ∎

The above result is the foundation for all the advanced von Neumann algebra theory, that we will discuss in the remainder of this book, and there are many things that can be said about it. To start with, in relation with the general theory of the normed closed algebras, that we developed in the beginning of this chapter, we have:

Warning 5.19.

Although the von Neumann algebras are norm closed, the theory of norm closed algebras does not always apply well to them. For instance for A=L(X)A=L^{\infty}(X) Gelfand gives A=C(X^)A=C(\widehat{X}), with X^\widehat{X} being a certain technical compactification of XX.

In short, this would be my advice, do not mess up the two theories that we will be developing in this book, try finding different rooms for them, in your brain. At least at this stage of things, because later, do not worry, we will be playing with both.


Now forgetting about Gelfand, and taking Theorem 5.18 as such, tentative foundation for the theory that we want to develop, as a first consequence of this, we have:

Theorem 5.20.

Given a von Neumann algebra AB(H)A\subset B(H), we have

Z(A)=L(X)Z(A)=L^{\infty}(X)

with XX being a certain measured space.

Proof.

We know from Proposition 5.12 that the center Z(A)B(H)Z(A)\subset B(H) is a von Neumann algebra. Thus Theorem 5.18 applies, and gives the result. ∎

It is possible to further build on this, with a powerful decomposition result as follows, over the measured space XX constructed in Theorem 5.20:

A=XAxdxA=\int_{X}A_{x}\,dx

But more on this later, after developing the appropriate tools for this program, which is something non-trivial. Among others, before getting into such things, we will have to study the von Neumann algebras AA having trivial center, Z(A)=Z(A)=\mathbb{C}, called factors, which include the fibers AxA_{x} in the above decomposition result. More on this later.

5c. Random matrices

Our main results so far on the von Neumann algebras concern the finite dimensional case, where the algebra is of the form A=iMni()A=\oplus_{i}M_{n_{i}}(\mathbb{C}), and the commutative case, where the algebra is of the form A=L(X)A=L^{\infty}(X). In order to advance, we must solve:

Question 5.21.

What are the next simplest von Neumann algebras, generalizing at the same time the finite dimensional ones, A=iMni()A=\oplus_{i}M_{n_{i}}(\mathbb{C}), and the commutative ones, A=L(X)A=L^{\infty}(X), that we can use as input for our study?

In this formulation, our question is a no-brainer, the answer to it being that of looking at the direct integrals of matrix algebras, over an arbitrary measured space XX:

A=XMnx()dxA=\int_{X}M_{n_{x}}(\mathbb{C})dx

However, when thinking a bit, all this looks quite tricky, with most likely lots of technical functional analysis and measure theory involved. So, we will leave the investigation of such algebras, which are indeed quite basic, and called of type I, for later.


Nevermind. Let us replace Question 5.21 with something more modest, as follows:

Question 5.22 (update).

What are the next simplest von Neumann algebras, generalizing at the same time the usual matrix algebras, A=MN()A=M_{N}(\mathbb{C}), and the commutative ones, A=L(X)A=L^{\infty}(X), that we can use as input for our study?

But here, what we have is again a no-brainer, because in relation to what has been said above, we just have to restrict the attention to the “isotypic” case, where all fibers are isomorphic. And in this case our algebra is a random matrix algebra:

A=XMN()dxA=\int_{X}M_{N}(\mathbb{C})dx

Which looks quite nice, and so good news, we have our algebras. In practice now, although there is some functional analysis to be done with these algebras, the main questions regard the individual operators TAT\in A, called random matrices. Thus, we are basically back to good old operator theory. Let us begin our discussion with:

Definition 5.23.

A random matrix algebra is a von Neumann algebra of the following type, with XX being a probability space, and with NN\in\mathbb{N} being an integer:

A=MN(L(X))A=M_{N}(L^{\infty}(X))

In other words, AA appears as a tensor product, as follows,

A=MN()L(X)A=M_{N}(\mathbb{C})\otimes L^{\infty}(X)

of a matrix algebra and a commutative von Neumann algebra.

As a first observation, our algebra can be written as well as follows, with this latter convention being quite standard in the probability literature:

A=L(X,MN())A=L^{\infty}(X,M_{N}(\mathbb{C}))

In connection with the tensor product notation, which is often the most useful one for computations, we have as well the following possible writing, also used in probability:

A=L(X)MN()A=L^{\infty}(X)\otimes M_{N}(\mathbb{C})

Importantly now, each random matrix algebra AA is naturally endowed with a canonical von Neumann algebra trace tr:Atr:A\to\mathbb{C}, which appears as follows:

Proposition 5.24.

Given a random matrix algebra A=MN(L(X))A=M_{N}(L^{\infty}(X)), consider the linear form tr:Atr:A\to\mathbb{C} given by:

tr(T)=1Ni=1NXTiixdxtr(T)=\frac{1}{N}\sum_{i=1}^{N}\int_{X}T_{ii}^{x}dx

In tensor product notation, A=MN()L(X)A=M_{N}(\mathbb{C})\otimes L^{\infty}(X), we have then the formula

tr=1NTrXtr=\frac{1}{N}\,Tr\otimes\int_{X}

and this functional tr:Atr:A\to\mathbb{C} is a faithful positive unital trace.

Proof.

The first assertion, regarding the tensor product writing of trtr, is clear from definitions. As for the second assertion, regarding the various properties of trtr, this follows from this, because these properties are stable under taking tensor products. ∎

As before, there is a discussion here in connection with the other possible writings of AA. With the probabilistic notation A=L(X,MN())A=L^{\infty}(X,M_{N}(\mathbb{C})), the trace appears as:

tr(T)=X1NTr(Tx)dxtr(T)=\int_{X}\frac{1}{N}\,Tr(T^{x})\,dx

Also, with the probabilistic tensor notation A=L(X)MN()A=L^{\infty}(X)\otimes M_{N}(\mathbb{C}), the trace appears exactly as in the second part of Proposition 5.24, with the order inverted:

tr=X1NTrtr=\int_{X}\otimes\,\,\frac{1}{N}\,Tr

To summarize, the random matrix algebras appear to be very basic objects, and the only difficulty, in the beginning, lies in getting familiar with the 4 possible notations for them. Or perhaps 5 possible notations, because we have A=XMN()dxA=\int_{X}M_{N}(\mathbb{C})dx as well.


Getting to work now, as already said, the main questions about random matrix algebras regard the individual operators TAT\in A, called random matrices. To be more precise, we are interested in computing the laws of such matrices, constructed according to:

Theorem 5.25.

Given an operator algebra AB(H)A\subset B(H) with a faithful trace tr:Atr:A\to\mathbb{C}, any normal element TAT\in A has a law, namely a probability measure μ\mu satisfying

tr(Tk)=zkdμ(z)tr(T^{k})=\int_{\mathbb{C}}z^{k}d\mu(z)

with the powers being with respect to colored exponents k=k=\circ\bullet\bullet\circ\ldots\,, defined via

a=1,a=a,a=aa^{\emptyset}=1\quad,\quad a^{\circ}=a\quad,\quad a^{\bullet}=a^{*}

and multiplicativity. This law is unique, and is supported by the spectrum σ(T)\sigma(T)\subset\mathbb{C}. In the non-normal case, TTTTTT^{*}\neq T^{*}T, such a law does not exist.

Proof.

We have two assertions here, the idea being as follows:

(1) In the normal case, TT=TTTT^{*}=T^{*}T, we know from Theorem 5.2, based on the continuous functional calculus theorem, that we have:

<T>=C(σ(T))<T>=C(\sigma(T))

Thus the functional f(T)tr(f(T))f(T)\to tr(f(T)) can be regarded as an integration functional on the algebra C(σ(T))C(\sigma(T)), and by the Riesz theorem, this latter functional must come from a probability measure μ\mu on the spectrum σ(T)\sigma(T), in the sense that we must have:

tr(f(T))=σ(T)f(z)dμ(z)tr(f(T))=\int_{\sigma(T)}f(z)d\mu(z)

We are therefore led to the conclusions in the statement, with the uniqueness assertion coming from the fact that the operators TkT^{k}, taken as usual with respect to colored integer exponents, k=k=\circ\bullet\bullet\circ\ldots , generate the whole operator algebra C(σ(T))C(\sigma(T)).

(2) In the non-normal case now, TTTTTT^{*}\neq T^{*}T, we must show that such a law does not exist. For this purpose, we can use a positivity trick, as follows:

TTTT0\displaystyle TT^{*}-T^{*}T\neq 0 \displaystyle\implies (TTTT)2>0\displaystyle(TT^{*}-T^{*}T)^{2}>0
\displaystyle\implies TTTTTTTTTTTT+TTTT>0\displaystyle TT^{*}TT^{*}-TT^{*}T^{*}T-T^{*}TTT^{*}+T^{*}TT^{*}T>0
\displaystyle\implies tr(TTTTTTTTTTTT+TTTT)>0\displaystyle tr(TT^{*}TT^{*}-TT^{*}T^{*}T-T^{*}TTT^{*}+T^{*}TT^{*}T)>0
\displaystyle\implies tr(TTTT+TTTT)>tr(TTTT+TTTT)\displaystyle tr(TT^{*}TT^{*}+T^{*}TT^{*}T)>tr(TT^{*}T^{*}T+T^{*}TTT^{*})
\displaystyle\implies tr(TTTT)>tr(TTTT)\displaystyle tr(TT^{*}TT^{*})>tr(TTT^{*}T^{*})

Now assuming that TT has a law μ𝒫()\mu\in\mathcal{P}(\mathbb{C}), in the sense that the moment formula in the statement holds, the above two different numbers would have to both appear by integrating |z|2|z|^{2} with respect to this law μ\mu, which is contradictory, as desired. ∎

Back now to the random matrices, as a basic example, assume X={.}X=\{.\}, so that we are dealing with a usual scalar matrix, TMN()T\in M_{N}(\mathbb{C}). By changing the basis of N\mathbb{C}^{N}, which won’t affect our trace computations, we can assume that TT is diagonal:

T(λ1λN)T\sim\begin{pmatrix}\lambda_{1}\\ &\ddots\\ &&\lambda_{N}\end{pmatrix}

But for such a diagonal matrix, we have the following formula:

tr(Tk)=1N(λ1k++λNk)tr(T^{k})=\frac{1}{N}(\lambda_{1}^{k}+\ldots+\lambda_{N}^{k})

Thus, the law of TT is the average of the Dirac masses at the eigenvalues:

μ=1N(δλ1++δλN)\mu=\frac{1}{N}\left(\delta_{\lambda_{1}}+\ldots+\delta_{\lambda_{N}}\right)

As a second example now, assume N=1N=1, and so TL(X)T\in L^{\infty}(X). In this case we obtain the usual law of TT, because the equation to be satisfied by μ\mu is:

Xφ(T)=φ(x)dμ(x)\int_{X}\varphi(T)=\int_{\mathbb{C}}\varphi(x)d\mu(x)

At a more advanced level, the main problem regarding the random matrices is that of computing the law of various classes of such matrices, coming in series:

Question 5.26.

What is the law of random matrices coming in series

TNMN(L(X))T_{N}\in M_{N}(L^{\infty}(X))

in the N>>0N>>0 regime?

The general strategy here, coming from physicists, is that of computing first the asymptotic law μ0\mu^{0}, in the NN\to\infty limit, and then looking for the higher order terms as well, as to finally reach to a series in N1N^{-1} giving the law of TNT_{N}, as follows:

μN=μ0+N1μ1+N2μ2+\mu_{N}=\mu^{0}+N^{-1}\mu^{1}+N^{-2}\mu^{2}+\ldots

As a basic example here, of particular interest are the random matrices having i.i.d. complex normal entries, under the constraint T=TT=T^{*}. Here the asymptotic law μ0\mu^{0} is the Wigner semicircle law on [2,2][-2,2]. We will discuss this in chapter 6 below, and in the meantime we can only recommend some reading, from the original papers of Marchenko-Pastur [mpa], Voiculescu [vo2], Wigner [wig], and from the books of Anderson-Guionnet-Zeitouni [agz], Mehta [meh], Nica-Speicher [nsp], Voiculescu-Dykema-Nica [vdn].

5d. Quantum spaces

Let us end this preliminary chapter on operator algebras with some philosophy, a bit a la Heisenberg. In relation with general “quantum space” goals, Theorem 5.18 is something very interesting, philosophically speaking, suggesting us to formulate:

Definition 5.27.

Given a von Neumann algebra AB(H)A\subset B(H), we write

A=L(X)A=L^{\infty}(X)

and call XX a quantum measured space.

As an example here, for the simplest noncommutative von Neumann algebra that we know, namely the usual matrix algebra A=MN()A=M_{N}(\mathbb{C}), the formula that we want to write is as follows, with MNM_{N} being a certain mysterious quantum space:

MN()=L(MN)M_{N}(\mathbb{C})=L^{\infty}(M_{N})

So, what can we say about this space MNM_{N}? As a first observation, this is a finite space, with its cardinality being defined and computed as follows:

|MN|=dimMN()=N2|M_{N}|=\dim_{\mathbb{C}}M_{N}(\mathbb{C})=N^{2}

Now since this is the same as the cardinality of the set {1,,N2}\{1,\ldots,N^{2}\}, we are led to the conclusion that we should have a twisting result as follows, with the twisting operation XXσX\to X^{\sigma} being something that destroys the points, but keeps the cardinality:

MN={1,,N2}σM_{N}=\{1,\ldots,N^{2}\}^{\sigma}

From an analytic viewpoint now, we would like to understand what is the integration over MNM_{N}, giving rise to the corresponding LL^{\infty} functions. And here, we can set:

MNA=tr(A)\int_{M_{N}}A=tr(A)

To be more precise, on the left we have the integral of an arbitrary function on MNM_{N}, which according to our conventions, should be a usual matrix:

AL(MN)=MN()A\in L^{\infty}(M_{N})=M_{N}(\mathbb{C})

As for the quantity on the right, the outcome of the computation, this can only be the trace of AA. In addition, it is better to choose this trace to be normalized, by tr(1)=1tr(1)=1, and this in order for our measure on MNM_{N} to have mass 1, as it is ideal:

tr(A)=1NTr(A)tr(A)=\frac{1}{N}\,Tr(A)

We can say even more about this. Indeed, since the traces of positive matrices are positive, we are led to the following formula, to be taken with the above conventions, which shows that the measure on MNM_{N} that we constructed is a probability measure:

A>0MNA>0A>0\implies\int_{M_{N}}A>0

Before going further, let us record what we found, for future reference:

Theorem 5.28.

The quantum measured space MNM_{N} formally given by

MN()=L(MN)M_{N}(\mathbb{C})=L^{\infty}(M_{N})

has cardinality N2N^{2}, appears as a twist, in a purely algebraic sense,

MN={1,,N2}σM_{N}=\{1,\ldots,N^{2}\}^{\sigma}

and is a probability space, its uniform integration being given by

MNA=tr(A)\int_{M_{N}}A=tr(A)

where at right we have the normalized trace of matrices, tr=Tr/Ntr=Tr/N.

Proof.

This is something half-informal, mostly for fun, which basically follows from the above discussion, the details and missing details being as follows:

(1) In what regards the formula |MN|=N2|M_{N}|=N^{2}, coming by computing the complex vector space dimension, as explained above, this is obviously something rock-solid.

(2) Regarding twisting, we would like to have a formula as follows, with the operation AAσA\to A^{\sigma} being something that destroys the commutativity of the multiplication:

L(MN)=L(1,,N2)σL^{\infty}(M_{N})=L^{\infty}(1,\ldots,N^{2})^{\sigma}

In more familiar terms, with usual complex matrices on the left, and with a better-looking product of sets being used on the right, this formula reads:

MN()=L({1,,N}×{1,,N})σM_{N}(\mathbb{C})=L^{\infty}\Big{(}\{1,\ldots,N\}\times\{1,\ldots,N\}\Big{)}^{\sigma}

In order to establish this formula, consider the algebra on the right. As a complex vector space, this algebra has the standard basis {fij}\{f_{ij}\} formed by the Dirac masses at the points (i,j)(i,j), and the multiplicative structure of this algebra is given by:

fijfkl=δij,klf_{ij}f_{kl}=\delta_{ij,kl}

Now let us twist this multiplication, according to the formula eijekl=δjkeile_{ij}e_{kl}=\delta_{jk}e_{il}. We obtain in this way the usual combination formulae for the standard matrix units eij:ejeie_{ij}:e_{j}\to e_{i} of the algebra MN()M_{N}(\mathbb{C}), and so we have our twisting result, as claimed.

(3) In what regards the integration formula in the statement, with the conclusion that the underlying measure on MNM_{N} is a probability one, this is something that we fully explained before, and as for the result (1) above, it is something rock-solid.

(4) As a last technical comment, observe that the twisting operation performed in (2) destroys both the involution, and the trace of the algebra. This is something quite interesting, which cannot be fixed, and we will back to it, later on. ∎

In order to advance now, based on the above result, the key point there is the construction and interpretation of the trace tr:MN()tr:M_{N}(\mathbb{C})\to\mathbb{C}, as an integration functional. But this leads us into the following natural, and quite puzzling question:

Question 5.29.

In the general context of Definition 5.27, where we formally wrote A=L(X)A=L^{\infty}(X), what is the underlying integration functional tr:Atr:A\to\mathbb{C}?

This is a quite subtle question, and there are several possible answers here. For instance, we would like the integration functional to have the following property:

tr(ab)=tr(ba)tr(ab)=tr(ba)

And the problem is that certain von Neumann algebras do not possess such traces. This is actually something quite advanced, that we do not know yet, but by anticipating a bit, we are in trouble, and we must modify Definition 5.27, as follows:

Definition 5.30 (update).

Given a von Neumann algebra AB(H)A\subset B(H), coming with a faithful positive unital trace tr:Atr:A\to\mathbb{C}, we write

A=L(X)A=L^{\infty}(X)

and call XX a quantum probability space. We also write the trace as tr=Xtr=\int_{X}, and call it integration with respect to the uniform measure on XX.

At the level of examples, passed the classical probability spaces XX, we know from Theorem 5.28 that the quantum space MNM_{N} is a finite quantum probability space. But this raises the question of understanding what the finite quantum probability spaces are, in general. For this purpose, we need to examine the finite dimensional von Neumann algebras. And the result here, extending Theorem 5.13, is as follows:

Theorem 5.31.

The finite dimensional von Neumann algebras AB(H)A\subset B(H) over an arbitrary Hilbert space HH are exactly the direct sums of matrix algebras,

A=Mn1()Mnk()A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

embedded into B(H)B(H) by using a partition of unity of B(H)B(H) with rank 11 projections

1=P1++Pk1=P_{1}+\ldots+P_{k}

with the “factors” Mni()M_{n_{i}}(\mathbb{C}) being each embedded into the algebra PiB(H)PiP_{i}B(H)P_{i}.

Proof.

This is standard, as in the case AMN()A\subset M_{N}(\mathbb{C}). Consider the center of AA, which is a finite dimensional commutative von Neumann algebra, of the following form:

Z(A)=kZ(A)=\mathbb{C}^{k}

Now let PiP_{i} be the Dirac mass at i{1,,k}i\in\{1,\ldots,k\}. Then PiB(H)P_{i}\in B(H) is an orthogonal projection, and these projections form a partition of unity, as follows:

1=P1++Pk1=P_{1}+\ldots+P_{k}

With Ai=PiAPiA_{i}=P_{i}AP_{i}, we have then a non-unital *-algebra decomposition, as follows:

A=A1AkA=A_{1}\oplus\ldots\oplus A_{k}

On the other hand, it follows from the minimality of each of the projections PiZ(A)P_{i}\in Z(A) that we have unital *-algebra isomorphisms AiMni()A_{i}\simeq M_{n_{i}}(\mathbb{C}), and this gives the result. ∎

We can now deduce what the finite quantum measured spaces are, in the sense of the old Definition 5.27. Indeed, we must solve here the following equation:

L(X)=Mn1()Mnk()L^{\infty}(X)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

Now since the direct unions of sets correspond to direct sums at the level of the associated algebras of functions, in the classical case, we can take the following formula as a definition for a direct union of sets, in the general, noncommutative case:

L(X1Xk)=L(X1)L(Xk)L^{\infty}(X_{1}\sqcup\ldots\sqcup X_{k})=L^{\infty}(X_{1})\oplus\ldots\oplus L^{\infty}(X_{k})

With this, and by remembering the definition of MNM_{N}, we are led to the conclusion that the solution to our quantum measured space equation above is as follows:

X=Mn1MnkX=M_{n_{1}}\sqcup\ldots\sqcup M_{n_{k}}

For fully solving our problem, in the spirit of the new Definition 5.30, we still have to discuss the traces on L(X)L^{\infty}(X). We are led in this way to the following statement:

Theorem 5.32.

The finite quantum measured spaces are the spaces

X=Mn1MnkX=M_{n_{1}}\sqcup\ldots\sqcup M_{n_{k}}

according to the following formula, for the associated algebras of functions:

L(X)=Mn1()Mnk()L^{\infty}(X)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The cardinality |X||X| of such a space is the following number,

N=n12++nk2N=n_{1}^{2}+\ldots+n_{k}^{2}

and the possible traces are as follows, with λi>0\lambda_{i}>0 summing up to 11:

tr=λ1tr1λktrktr=\lambda_{1}tr_{1}\oplus\ldots\oplus\lambda_{k}tr_{k}

Among these traces, we have the canonical trace, appearing as

tr:L(X)(L(X))tr:L^{\infty}(X)\subset\mathcal{L}(L^{\infty}(X))\to\mathbb{C}

via the left regular representation, having weights λi=ni2/N\lambda_{i}=n_{i}^{2}/N.

Proof.

We have many assertions here, basically coming from the above discussion, with only the last one needing some explanations. Consider the left regular representation of our algebra A=L(X)A=L^{\infty}(X), which is given by the following formula:

π:A(A),π(a):bab\pi:A\subset\mathcal{L}(A)\quad,\quad\pi(a):b\to ab

We know that the algebra (A)\mathcal{L}(A) of linear operators T:AAT:A\to A is isomorphic to a matrix algebra, and more specifically to MN()M_{N}(\mathbb{C}), with N=|X|N=|X| being as before:

(A)MN()\mathcal{L}(A)\simeq M_{N}(\mathbb{C})

Thus, this algebra has a trace tr:(A)tr:\mathcal{L}(A)\to\mathbb{C}, and by composing this trace with the representation π\pi, we obtain a certain trace tr:Atr:A\to\mathbb{C}, that we can call “canonical”:

tr:A(A)tr:A\subset\mathcal{L}(A)\to\mathbb{C}

We can compute the weights of this trace by using a multimatrix basis of AA, formed by matrix units eabie_{ab}^{i}, with i{1,,k}i\in\{1,\ldots,k\} and with a,b{1,,ni}a,b\in\{1,\ldots,n_{i}\}, and we obtain:

λi=ni2N\lambda_{i}=\frac{n_{i}^{2}}{N}

Thus, we are led to the conclusion in the statement. ∎

We will be back to quantum spaces on several occasions, in what follows. In fact, the present book is as much on operator algebras as it is on quantum spaces, and this because these two points of view are both useful, and complementary to each other.

5e. Exercises

The theory in this chapter has been quite exciting, and we have already run into a number of difficult questions. As a basic exercise on all this, we have:

Exercise 5.33.

Find a simple proof for the von Neumann bicommutant theorem, in finite dimensions.

This is something quite subjective, and try not to cheat. That is, not to convert the amplification proof that we have in general, by using matrix algebras everywhere, nor by using the structure result for the finite dimensional algebras either.

Exercise 5.34.

Again in finite dimensions, H=NH=\mathbb{C}^{N}, compute explicitly the von Neumann algebra <T>B(H)<T>\subset B(H) generated by a single operator.

As mentioned above, in the normal case the answer is clear, by diagonalizing TT. The problem is that of understanding what happens when TT is not normal.

Exercise 5.35.

Try understanding what the law of the simplest non-normal operator,

J=(0100)J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

acting on H=2H=\mathbb{C}^{2} should be. Look also at more general Jordan blocks.

There are many non-trivial computations here. We will be back to this.

Exercise 5.36.

Develop a full theory of finite quantum spaces, by enlarging what has been said above, with various geometric topics, of your choice.

This is of course a bit vague, but some further thinking at all this is certainly useful, at this point, and this is what the exercise is about.

Chapter 6 Random matrices

6a. Random matrices

We have seen so far the basics of von Neumann algebras AB(H)A\subset B(H), with a look into some interesting ramifications too, concerning random matrices and quantum spaces. In what regards these ramifications, the situation is as follows:


(1) The random matrix algebras, A=MN(L(X))A=M_{N}(L^{\infty}(X)) acting on H=NL2(X)H=\mathbb{C}^{N}\otimes L^{2}(X), are the simplest von Neumann algebras, from a variety of viewpoints. The main problem regarding them is of operator theoretic nature, regarding the computation of the law of individual elements TAT\in A with respect to the random matrix trace tr:Atr:A\to\mathbb{C}.


(2) The quantum spaces are exciting abstract objects, obtained by looking at an arbitrary von Neumann algebra AB(H)A\subset B(H) coming with a trace tr:Atr:A\to\mathbb{C}, and formally writing the algebra as A=L(X)A=L^{\infty}(X), and its trace as tr=Xtr=\int_{X}. In this picture, XX is our quantum probability space, and X\int_{X} is the integration over it, or expectation.


All this is quite interesting, and we will further explore these two topics, random matrices and quantum spaces, with some basic theory for them, in this chapter and in the next one. As a first observation, these two topics are closely related, due to:

Fact 6.1.

A random matrix algebra can be written in the following way,

MN(L(X))\displaystyle M_{N}(L^{\infty}(X)) =\displaystyle= MN()L(X)\displaystyle M_{N}(\mathbb{C})\otimes L^{\infty}(X)
=\displaystyle= L(MN)L(X)\displaystyle L^{\infty}(M_{N})\otimes L^{\infty}(X)
=\displaystyle= L(MN×X)\displaystyle L^{\infty}(M_{N}\times X)

so the underlying quantum space is something very simple, Y=MN×XY=M_{N}\times X.

With this understood, the philosophical problem is now, what to do with our quantum spaces, be them of random matrix type Y=MN×XY=M_{N}\times X, or more general. Good question, and do not expect a simple answer to it. Indeed, quantum spaces are more or less the same thing as operator algebras, and from this perspective, our question becomes “what are the operator algebras, and what is to be done with them”, obviously difficult.


And there is even worse, because when remembering that operator algebras are more or less the same thing as quantum mechanics, our question becomes something of type “what is quantum mechanics, and what is to be done with it”. So, modesty.


Getting back to Earth, now that we have our questions and philosophy, for the whole remainder of this book, let us get into random matrices. Quite remarkably, these provide us with an epsilon of answer to our philosophical questions, as follows:

Answer 6.2.

The simplest quantum spaces are those coming from random matrix algebras, which are as follows, with XX being a usual probability space,

Y=MN×XY=M_{N}\times X

and what is to be done with them is the computation of the law of individual elements, the random matrices TL(Y)=MN(L(X))T\in L^{\infty}(Y)=M_{N}(L^{\infty}(X)), in the N>>0N>>0 regime.

Which looks very nice, we eventually reached to some concrete questions, and time now for mathematics and computations. Getting started, we must first further build on the material from chapter 5. We recall from there that given a von Neumann algebra AB(H)A\subset B(H) coming with a trace tr:Atr:A\to\mathbb{C}, any normal element TAT\in A has a law, which is the complex probability measure μ𝒫()\mu\in\mathcal{P}(\mathbb{C}) given by the following formula:

tr(Tk)=zkdμ(z)tr(T^{k})=\int_{\mathbb{C}}z^{k}d\mu(z)

In the non-normal case, TTTTTT^{*}\neq T^{*}T, the law does not exist as a complex probability measure μ𝒫()\mu\in\mathcal{P}(\mathbb{C}), as also explained in chapter 5. However, we can trick a bit, and talk about the law of non-normal elements as well, in the following abstract way:

Definition 6.3.

Let AA be a von Neumann algebra, given with a trace tr:Atr:A\to\mathbb{C}.

  1. (1)

    The elements TAT\in A are called random variables.

  2. (2)

    The moments of such a variable are the numbers Mk(T)=tr(Tk)M_{k}(T)=tr(T^{k}).

  3. (3)

    The law of such a variable is the functional μ:Ptr(P(T))\mu:P\to tr(P(T)).

Here k=k=\circ\bullet\bullet\circ\ldots is by definition a colored integer, and the powers TkT^{k} are defined by multiplicativity and the usual formulae, namely:

T=1,T=T,T=TT^{\emptyset}=1\quad,\quad T^{\circ}=T\quad,\quad T^{\bullet}=T^{*}

As for the polynomial PP, this is a noncommuting *-polynomial in one variable:

P<X,X>P\in\mathbb{C}<X,X^{*}>

Observe that the law is uniquely determined by the moments, because:

P(X)=kλkXkμ(P)=kλkMk(T)P(X)=\sum_{k}\lambda_{k}X^{k}\implies\mu(P)=\sum_{k}\lambda_{k}M_{k}(T)

Generally speaking, the above definition, due to Voiculescu [vdn], is something quite abstract, but there is no other way of doing things, at least at this level of generality. However, in the special case where our variable TAT\in A is self-adjoint, or more generally normal, the theory simplifies, and we recover more familiar objects, as follows:

Theorem 6.4.

The law of a normal variable TAT\in A can be identified with the corresponding spectral measure μ𝒫()\mu\in\mathcal{P}(\mathbb{C}), according to the following formula,

tr(f(T))=σ(T)f(x)dμ(x)tr(f(T))=\int_{\sigma(T)}f(x)d\mu(x)

valid for any fL(σ(T))f\in L^{\infty}(\sigma(T)), coming from the measurable functional calculus. In the self-adjoint case the spectral measure is real, μ𝒫()\mu\in\mathcal{P}(\mathbb{R}).

Proof.

This is something that we know well, from chapter 5, coming from the spectral theorem for the normal operators, as developed in chapter 3. ∎

Getting back now to the random matrices, we have all we need, as general formalism, and we are ready for doing some computations. As a first observation, we have:

Theorem 6.5.

The laws of basic random matrices TMN(L(X))T\in M_{N}(L^{\infty}(X)) are as follows:

  1. (1)

    In the case N=1N=1 the random matrix is a usual random variable, TL(X)T\in L^{\infty}(X), automatically normal, and its law as defined above is the usual law.

  2. (2)

    In the case X={.}X=\{.\} the random matrix is a usual scalar matrix, TMN()T\in M_{N}(\mathbb{C}), and in the diagonalizable case, the law is μ=1N(δλ1++δλN)\mu=\frac{1}{N}\left(\delta_{\lambda_{1}}+\ldots+\delta_{\lambda_{N}}\right).

Proof.

This is something that we know, once again, from chapter 5, and which is elementary. Indeed, the first assertion follows from definitions, and the above discussion. As for the second assertion, this follows by diagonalizing the matrix. ∎

In general, what we have can only be a mixture of (1) and (2) above. Our plan will be that of discussing more in detail (1), and then getting into the general case, or rather into the case of the most interesting random matrices, with inspiration from (2).

6b. Probability theory

So, let us set N=1N=1. Here our algebra is A=L(X)A=L^{\infty}(X), an arbitrary commutative von Neumann algebra. The most interesting linear operators TAT\in A, that we will rather denote as complex functions f:Xf:X\to\mathbb{C}, and call random variables, as it is customary, are the normal, or Gaussian variables, which are defined as follows:

Definition 6.6.

A variable f:Xf:X\to\mathbb{R} is called standard normal when its law is:

g1=12πex2/2dxg_{1}=\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}dx

More generally, the normal law of parameter t>0t>0 is the following measure:

gt=12πtex2/2tdxg_{t}=\frac{1}{\sqrt{2\pi t}}e^{-x^{2}/2t}dx

These are also called Gaussian distributions, with “g” standing for Gauss.

Observe that these normal laws have indeed mass 1, as they should, as shown by a quick change of variable, and the Gauss formula, namely:

(ex2dx)2\displaystyle\left(\int_{\mathbb{R}}e^{-x^{2}}dx\right)^{2} =\displaystyle= ex2y2dxdy\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}e^{-x^{2}-y^{2}}dxdy
=\displaystyle= 02π0er2rdrdt\displaystyle\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^{2}}rdrdt
=\displaystyle= 2π×12\displaystyle 2\pi\times\frac{1}{2}
=\displaystyle= π\displaystyle\pi

Let us start with some basic results regarding the normal laws. We first have:

Proposition 6.7.

The normal law gtg_{t} with t>0t>0 has the following properties:

  1. (1)

    The variance is V=tV=t.

  2. (2)

    The density is even, so the odd moments vanish.

  3. (3)

    The even moments are Mk=tk/2×k!!M_{k}=t^{k/2}\times k!!, with k!!=(k1)(k3)(k5)k!!=(k-1)(k-3)(k-5)\ldots\,.

  4. (4)

    Equivalently, the moments are Mk=πP2(k)t|π|M_{k}=\sum_{\pi\in P_{2}(k)}t^{|\pi|}, for any kk\in\mathbb{N}.

  5. (5)

    The Fourier transform Ff(x)=𝔼(eixf)F_{f}(x)=\mathbb{E}(e^{ixf}) is given by F(x)=etx2/2F(x)=e^{-tx^{2}/2}.

  6. (6)

    We have the convolution semigroup formula gsgt=gs+tg_{s}*g_{t}=g_{s+t}, for any s,t>0s,t>0.

Proof.

All this is very standard, with the various notations used in the statement being explained below, the idea being as follows:

(1) The normal law gtg_{t} being centered, its variance is the second moment, V=M2V=M_{2}. Thus the result follows from (3), proved below, which gives in particular:

M2=t2/2×2!!=tM_{2}=t^{2/2}\times 2!!=t

(2) This is indeed something self-explanatory.

(3) We have indeed the following computation, by partial integration:

Mk\displaystyle M_{k} =\displaystyle= 12πtxkex2/2tdx\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}x^{k}e^{-x^{2}/2t}dx
=\displaystyle= 12πt(txk1)(ex2/2t)dx\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}(tx^{k-1})\left(-e^{-x^{2}/2t}\right)^{\prime}dx
=\displaystyle= 12πtt(k1)xk2ex2/2tdx\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}t(k-1)x^{k-2}e^{-x^{2}/2t}dx
=\displaystyle= t(k1)×12πtxk2ex2/2tdx\displaystyle t(k-1)\times\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}x^{k-2}e^{-x^{2}/2t}dx
=\displaystyle= t(k1)Mk2\displaystyle t(k-1)M_{k-2}

The initial value being M0=1M_{0}=1, we obtain the result.

(4) We know from (2,3) that the moments of the normal law gtg_{t} satisfy the following recurrence formula, with the initial data M0=1,M1=0M_{0}=1,M_{1}=0:

Mk=t(k1)Mk2M_{k}=t(k-1)M_{k-2}

Now let us look at P2(k)P_{2}(k), the set of pairings of {1,,k}\{1,\ldots,k\}. In order to have such a pairing, we must pair 1 with a number chosen among 2,,k2,\ldots,k, and then come up with a pairing of the remaining k2k-2 numbers. Thus, the number Nk=|P2(k)|N_{k}=|P_{2}(k)| of such pairings is subject to the following recurrence formula, with initial data N0=1,N1=0N_{0}=1,N_{1}=0:

Nk=(k1)Nk2N_{k}=(k-1)N_{k-2}

But this solves our problem at t=1t=1, because in this case we obtain the following formula, with |.||.| standing as usual for the number of blocks of a partition:

Mk=Nk=|P2(k)|=πP2(k)1=πP2(k)1|π|M_{k}=N_{k}=|P_{2}(k)|=\sum_{\pi\in P_{2}(k)}1=\sum_{\pi\in P_{2}(k)}1^{|\pi|}

Now back to the general case, t>0t>0, our problem here is solved in fact too, because the number of blocks of a pairing πP2(k)\pi\in P_{2}(k) being constant, |π|=k/2|\pi|=k/2, we obtain:

Mk=tk/2Nk=πP2(k)tk/2=πP2(k)t|π|M_{k}=t^{k/2}N_{k}=\sum_{\pi\in P_{2}(k)}t^{k/2}=\sum_{\pi\in P_{2}(k)}t^{|\pi|}

(5) The Fourier transform formula can be established as follows:

F(x)\displaystyle F(x) =\displaystyle= 12πtey2/2t+ixydy\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-y^{2}/2t+ixy}dy
=\displaystyle= 12πte(y/2tt/2ix)2tx2/2dy\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-(y/\sqrt{2t}-\sqrt{t/2}ix)^{2}-tx^{2}/2}dy
=\displaystyle= 12πtez2tx2/22tdz\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-z^{2}-tx^{2}/2}\sqrt{2t}dz
=\displaystyle= 1πetx2/2ez2dz\displaystyle\frac{1}{\sqrt{\pi}}e^{-tx^{2}/2}\int_{\mathbb{R}}e^{-z^{2}}dz
=\displaystyle= etx2/2\displaystyle e^{-tx^{2}/2}

(6) This follows indeed from (5), because logFgt\log F_{g_{t}} is linear in tt. ∎

We are now ready to establish the Central Limit Theorem (CLT), which is a key result, telling us why the normal laws appear a bit everywhere, in the real life:

Theorem 6.8.

Given a sequence of real random variables f1,f2,f3,L(X)f_{1},f_{2},f_{3},\ldots\in L^{\infty}(X), which are i.i.d., centered, and with variance t>0t>0, we have

1ni=1nfigt\frac{1}{\sqrt{n}}\sum_{i=1}^{n}f_{i}\sim g_{t}

with nn\to\infty, in moments.

Proof.

In terms of moments, the Fourier transform Ff(x)=𝔼(eixf)F_{f}(x)=\mathbb{E}(e^{ixf}) is given by:

Ff(x)=𝔼(k=0(ixf)kk!)=k=0ikMk(f)k!xkF_{f}(x)=\mathbb{E}\left(\sum_{k=0}^{\infty}\frac{(ixf)^{k}}{k!}\right)=\sum_{k=0}^{\infty}\frac{i^{k}M_{k}(f)}{k!}\,x^{k}

Thus, the Fourier transform of the variable in the statement is:

F(x)\displaystyle F(x) =\displaystyle= [Ff(xn)]n\displaystyle\left[F_{f}\left(\frac{x}{\sqrt{n}}\right)\right]^{n}
=\displaystyle= [1tx22n+O(n2)]n\displaystyle\left[1-\frac{tx^{2}}{2n}+O(n^{-2})\right]^{n}
\displaystyle\simeq [1tx22n]n\displaystyle\left[1-\frac{tx^{2}}{2n}\right]^{n}
\displaystyle\simeq etx2/2\displaystyle e^{-tx^{2}/2}

But this latter function being the Fourier transform of gtg_{t}, we obtain the result. ∎

Let us discuss as well the “discrete” counterpart of the above results, that we will need too a bit later, in relation with the random matrices. We have:

Definition 6.9.

The Poisson law of parameter 11 is the following measure,

p1=1ekδkk!p_{1}=\frac{1}{e}\sum_{k}\frac{\delta_{k}}{k!}

and the Poisson law of parameter t>0t>0 is the following measure,

pt=etktkk!δkp_{t}=e^{-t}\sum_{k}\frac{t^{k}}{k!}\,\delta_{k}

with the letter “p” standing for Poisson.

We will see in a moment why these laws appear everywhere, in discrete probability, the reasons behind this coming from the Poisson Limit Theorem (PLT). Getting started now, in analogy with the normal laws, the Poisson laws have the following properties:

Proposition 6.10.

The Poisson law ptp_{t} with t>0t>0 has the following properties:

  1. (1)

    The variance is V=tV=t.

  2. (2)

    The moments are Mk=πP(k)t|π|M_{k}=\sum_{\pi\in P(k)}t^{|\pi|}.

  3. (3)

    The Fourier transform is F(x)=exp((eix1)t)F(x)=\exp\left((e^{ix}-1)t\right).

  4. (4)

    We have the semigroup formula pspt=ps+tp_{s}*p_{t}=p_{s+t}, for any s,t>0s,t>0.

Proof.

We have four formulae to be proved, the idea being as follows:

(1) The variance is V=M2M12V=M_{2}-M_{1}^{2}, and by using the formulae M1=tM_{1}=t and M2=t+t2M_{2}=t+t^{2}, coming from (2), proved below, we obtain as desired, V=tV=t.

(2) This is something more tricky. Consider indeed the set P(k)P(k) of all partitions of {1,,k}\{1,\ldots,k\}. At t=1t=1, to start with, the formula that we want to prove is:

Mk=|P(k)|M_{k}=|P(k)|

We have the following recurrence formula for the moments of p1p_{1}:

Mk+1\displaystyle M_{k+1} =\displaystyle= 1es(s+1)k+1(s+1)!\displaystyle\frac{1}{e}\sum_{s}\frac{(s+1)^{k+1}}{(s+1)!}
=\displaystyle= 1essks!(1+1s)k\displaystyle\frac{1}{e}\sum_{s}\frac{s^{k}}{s!}\left(1+\frac{1}{s}\right)^{k}
=\displaystyle= 1essks!r(kr)sr\displaystyle\frac{1}{e}\sum_{s}\frac{s^{k}}{s!}\sum_{r}\binom{k}{r}s^{-r}
=\displaystyle= r(kr)1esskrs!\displaystyle\sum_{r}\binom{k}{r}\cdot\frac{1}{e}\sum_{s}\frac{s^{k-r}}{s!}
=\displaystyle= r(kr)Mkr\displaystyle\sum_{r}\binom{k}{r}M_{k-r}

Our claim is that the numbers Bk=|P(k)|B_{k}=|P(k)| satisfy the same recurrence formula. Indeed, since a partition of {1,,k+1}\{1,\ldots,k+1\} appears by choosing rr neighbors for 11, among the kk numbers available, and then partitioning the krk-r elements left, we have:

Bk+1=r(kr)BkrB_{k+1}=\sum_{r}\binom{k}{r}B_{k-r}

Thus we obtain by recurrence Mk=BkM_{k}=B_{k}, as desired. Regarding now the general case, t>0t>0, we can use here a similar method. We have the following recurrence formula for the moments of ptp_{t}, obtained by using the binomial formula:

Mk+1\displaystyle M_{k+1} =\displaystyle= etsts+1(s+1)k+1(s+1)!\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}(s+1)^{k+1}}{(s+1)!}
=\displaystyle= etsts+1sks!(1+1s)k\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}s^{k}}{s!}\left(1+\frac{1}{s}\right)^{k}
=\displaystyle= etsts+1sks!r(kr)sr\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}s^{k}}{s!}\sum_{r}\binom{k}{r}s^{-r}
=\displaystyle= r(kr)etsts+1skrs!\displaystyle\sum_{r}\binom{k}{r}\cdot e^{-t}\sum_{s}\frac{t^{s+1}s^{k-r}}{s!}
=\displaystyle= tr(kr)Mkr\displaystyle t\sum_{r}\binom{k}{r}M_{k-r}

On the other hand, consider the numbers in the statement, Sk=πP(k)t|π|S_{k}=\sum_{\pi\in P(k)}t^{|\pi|}. As before, since a partition of {1,,k+1}\{1,\ldots,k+1\} appears by choosing rr neighbors for 11, among the kk numbers available, and then partitioning the krk-r elements left, we have:

Sk+1=tr(kr)SkrS_{k+1}=t\sum_{r}\binom{k}{r}S_{k-r}

Thus we obtain by recurrence Mk=BkM_{k}=B_{k}, as desired.

(3) The Fourier transform formula can be established as follows:

Fpt(x)\displaystyle F_{p_{t}}(x) =\displaystyle= etktkk!Fδk(x)\displaystyle e^{-t}\sum_{k}\frac{t^{k}}{k!}F_{\delta_{k}}(x)
=\displaystyle= etktkk!eikx\displaystyle e^{-t}\sum_{k}\frac{t^{k}}{k!}\,e^{ikx}
=\displaystyle= etk(eixt)kk!\displaystyle e^{-t}\sum_{k}\frac{(e^{ix}t)^{k}}{k!}
=\displaystyle= exp(t)exp(eixt)\displaystyle\exp(-t)\exp(e^{ix}t)
=\displaystyle= exp((eix1)t)\displaystyle\exp\left((e^{ix}-1)t\right)

(4) This follows from (3), because logFpt\log F_{p_{t}} is linear in tt. ∎

We are now ready to establish the Poisson Limit Theorem (PLT), as follows:

Theorem 6.11.

We have the following convergence, in moments,

((1tn)δ0+tnδ1)npt\left(\left(1-\frac{t}{n}\right)\delta_{0}+\frac{t}{n}\delta_{1}\right)^{*n}\to p_{t}

for any t>0t>0.

Proof.

Let us denote by μn\mu_{n} the Bernoulli measure appearing under the convolution sign. We have then the following computation:

Fδr(x)=eirx\displaystyle F_{\delta_{r}}(x)=e^{irx} \displaystyle\implies Fμn(x)=(1tn)+tneix\displaystyle F_{\mu_{n}}(x)=\left(1-\frac{t}{n}\right)+\frac{t}{n}e^{ix}
\displaystyle\implies Fμnn(x)=((1tn)+tneix)n\displaystyle F_{\mu_{n}^{*n}}(x)=\left(\left(1-\frac{t}{n}\right)+\frac{t}{n}e^{ix}\right)^{n}
\displaystyle\implies Fμnn(x)=(1+(eix1)tn)n\displaystyle F_{\mu_{n}^{*n}}(x)=\left(1+\frac{(e^{ix}-1)t}{n}\right)^{n}
\displaystyle\implies F(x)=exp((eix1)t)\displaystyle F(x)=\exp\left((e^{ix}-1)t\right)

Thus, we obtain the Fourier transform of ptp_{t}, as desired. ∎

As a third and last topic from classical probability, let us discuss now the complex normal laws, that we will need too. To start with, we have the following definition:

Definition 6.12.

The complex Gaussian law of parameter t>0t>0 is

Gt=law(12(a+ib))G_{t}=law\left(\frac{1}{\sqrt{2}}(a+ib)\right)

where a,ba,b are independent, each following the law gtg_{t}.

As in the real case, these measures form convolution semigroups:

Proposition 6.13.

The complex Gaussian laws have the property

GsGt=Gs+tG_{s}*G_{t}=G_{s+t}

for any s,t>0s,t>0, and so they form a convolution semigroup.

Proof.

This follows indeed from the real result, namely gsgt=gs+tg_{s}*g_{t}=g_{s+t}, established above, simply by taking real and imaginary parts. ∎

We have the following complex analogue of the CLT:

Theorem 6.14 (CCLT).

Given complex random variables f1,f2,f3,L(X)f_{1},f_{2},f_{3},\ldots\in L^{\infty}(X) which are i.i.d., centered, and with variance t>0t>0, we have, with nn\to\infty, in moments,

1ni=1nfiGt\frac{1}{\sqrt{n}}\sum_{i=1}^{n}f_{i}\sim G_{t}

where GtG_{t} is the complex Gaussian law of parameter tt.

Proof.

This follows indeed from the real CLT, established above, simply by taking the real and imaginary parts of all the variables involved. ∎

Regarding now the moments, we use the general formalism from Definition 6.3, involving colored integer exponents k=k=\circ\bullet\bullet\circ\ldots\, We say that a pairing πP2(k)\pi\in P_{2}(k) is matching when it pairs \circ-\bullet symbols. With this convention, we have the following result:

Theorem 6.15.

The moments of the complex normal law are the numbers

Mk(Gt)=π𝒫2(k)t|π|M_{k}(G_{t})=\sum_{\pi\in\mathcal{P}_{2}(k)}t^{|\pi|}

where 𝒫2(k)\mathcal{P}_{2}(k) are the matching pairings of {1,,k}\{1,\ldots,k\}, and |.||.| is the number of blocks.

Proof.

This is something well-known, which can be established as follows:

(1) As a first observation, by using a standard dilation argument, it is enough to do this at t=1t=1. So, let us first recall from the above that the moments of the real Gaussian law g1g_{1}, with respect to integer exponents kk\in\mathbb{N}, are the following numbers:

mk=|P2(k)|m_{k}=|P_{2}(k)|

Numerically, we have the following formula, explained as well in the above:

mk={k!!(keven)0(kodd)m_{k}=\begin{cases}k!!&(k\ {\rm even})\\ 0&(k\ {\rm odd})\end{cases}

(2) We will show here that in what concerns the complex Gaussian law G1G_{1}, similar results hold. Numerically, we will prove that we have the following formula, where a colored integer k=k=\circ\bullet\bullet\circ\ldots is called uniform when it contains the same number of \circ and \bullet , and where |k||k|\in\mathbb{N} is the length of such a colored integer:

Mk={(|k|/2)!(kuniform)0(knotuniform)M_{k}=\begin{cases}(|k|/2)!&(k\ {\rm uniform})\\ 0&(k\ {\rm not\ uniform})\end{cases}

Now since the matching partitions π𝒫2(k)\pi\in\mathcal{P}_{2}(k) are counted by exactly the same numbers, and this for trivial reasons, we will obtain the formula in the statement, namely:

Mk=|𝒫2(k)|M_{k}=|\mathcal{P}_{2}(k)|

(3) This was for the plan. In practice now, we must compute the moments, with respect to colored integer exponents k=k=\circ\bullet\bullet\circ\ldots , of the variable in the statement:

c=12(a+ib)c=\frac{1}{\sqrt{2}}(a+ib)

As a first observation, in the case where such an exponent k=k=\circ\bullet\bullet\circ\ldots is not uniform in ,\circ,\bullet , a rotation argument shows that the corresponding moment of cc vanishes. To be more precise, the variable c=wcc^{\prime}=wc can be shown to be complex Gaussian too, for any ww\in\mathbb{C}, and from Mk(c)=Mk(c)M_{k}(c)=M_{k}(c^{\prime}) we obtain Mk(c)=0M_{k}(c)=0, in this case.

(4) In the uniform case now, where k=k=\circ\bullet\bullet\circ\ldots consists of pp copies of \circ and pp copies of \bullet , the corresponding moment can be computed as follows:

Mk\displaystyle M_{k} =\displaystyle= 12p(a2+b2)p\displaystyle\frac{1}{2^{p}}\int(a^{2}+b^{2})^{p}
=\displaystyle= 12ps(ps)a2sb2p2s\displaystyle\frac{1}{2^{p}}\sum_{s}\binom{p}{s}\int a^{2s}\int b^{2p-2s}
=\displaystyle= 12ps(ps)(2s)!!(2p2s)!!\displaystyle\frac{1}{2^{p}}\sum_{s}\binom{p}{s}(2s)!!(2p-2s)!!
=\displaystyle= 12psp!s!(ps)!(2s)!2ss!(2p2s)!2ps(ps)!\displaystyle\frac{1}{2^{p}}\sum_{s}\frac{p!}{s!(p-s)!}\cdot\frac{(2s)!}{2^{s}s!}\cdot\frac{(2p-2s)!}{2^{p-s}(p-s)!}
=\displaystyle= p!4ps(2ss)(2p2sps)\displaystyle\frac{p!}{4^{p}}\sum_{s}\binom{2s}{s}\binom{2p-2s}{p-s}

(5) In order to finish now the computation, let us recall that we have the following formula, coming from the generalized binomial formula, or from the Taylor formula:

11+t=k=0(2kk)(t4)k\frac{1}{\sqrt{1+t}}=\sum_{k=0}^{\infty}\binom{2k}{k}\left(\frac{-t}{4}\right)^{k}

By taking the square of this series, we obtain the following formula:

11+t\displaystyle\frac{1}{1+t} =\displaystyle= ks(2kk)(2ss)(t4)k+s\displaystyle\sum_{ks}\binom{2k}{k}\binom{2s}{s}\left(\frac{-t}{4}\right)^{k+s}
=\displaystyle= p(t4)ps(2ss)(2p2sps)\displaystyle\sum_{p}\left(\frac{-t}{4}\right)^{p}\sum_{s}\binom{2s}{s}\binom{2p-2s}{p-s}

Now by looking at the coefficient of tpt^{p} on both sides, we conclude that the sum on the right equals 4p4^{p}. Thus, we can finish the moment computation in (4), as follows:

Mp=p!4p×4p=p!M_{p}=\frac{p!}{4^{p}}\times 4^{p}=p!

(6) As a conclusion, if we denote by |k||k| the length of a colored integer k=k=\circ\bullet\bullet\circ\ldots , the moments of the variable cc in the statement are given by:

Mk={(|k|/2)!(kuniform)0(knotuniform)M_{k}=\begin{cases}(|k|/2)!&(k\ {\rm uniform})\\ 0&(k\ {\rm not\ uniform})\end{cases}

On the other hand, the numbers |𝒫2(k)||\mathcal{P}_{2}(k)| are given by exactly the same formula. Indeed, in order to have matching pairings of kk, our exponent k=k=\circ\bullet\bullet\circ\ldots must be uniform, consisting of pp copies of \circ and pp copies of \bullet, with p=|k|/2p=|k|/2. But then the matching pairings of kk correspond to the permutations of the \bullet symbols, as to be matched with \circ symbols, and so we have p!p! such matching pairings. Thus, we have the same formula as for the moments of cc, and we are led to the conclusion in the statement. ∎

This was for the basic probability theory, which is in a certain sense advanced operator theory, inside the commutative von Neumann algebras, A=L(X)A=L^{\infty}(X). We will be back to this, with some further limiting theorems, in chapter 8 below.

6c. Wigner matrices

Let us exit now the classical world, that of the commutative von Neumann algebras A=L(X)A=L^{\infty}(X), and do as promised some random matrix theory. We recall that a random matrix algebra is a von Neumann algebra of type A=MN(L(X))A=M_{N}(L^{\infty}(X)), and that we are interested in the computation of the laws of the operators TAT\in A, called random matrices. Regarding the precise classes of random matrices that we are interested in, first we have the complex Gaussian matrices, which are constructed as follows:

Definition 6.16.

A complex Gaussian matrix is a random matrix of type

ZMN(L(X))Z\in M_{N}(L^{\infty}(X))

which has i.i.d. complex normal entries.

We will see that the above matrices have an interesting, and “central” combinatorics, among all kinds of random matrices, with the study of the other random matrices being usually obtained as a modification of the study of the Gaussian matrices.


As a somewhat surprising remark, using real normal variables in Definition 6.16, instead of the complex ones appearing there, leads nowhere. The correct real versions of the Gaussian matrices are the Wigner random matrices, constructed as follows:

Definition 6.17.

A Wigner matrix is a random matrix of type

ZMN(L(X))Z\in M_{N}(L^{\infty}(X))

which has i.i.d. complex normal entries, up to the constraint Z=ZZ=Z^{*}.

In other words, a Wigner matrix must be as follows, with the diagonal entries being real normal variables, aigta_{i}\sim g_{t}, for some t>0t>0, the upper diagonal entries being complex normal variables, bijGtb_{ij}\sim G_{t}, the lower diagonal entries being the conjugates of the upper diagonal entries, as indicated, and with all the variables ai,bija_{i},b_{ij} being independent:

Z=(a1b12b1Nb¯12a2aN1bN1,Nb¯1Nb¯N1,NaN)Z=\begin{pmatrix}a_{1}&b_{12}&\ldots&\ldots&b_{1N}\\ \bar{b}_{12}&a_{2}&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ \vdots&&\ddots&a_{N-1}&b_{N-1,N}\\ \bar{b}_{1N}&\ldots&\ldots&\bar{b}_{N-1,N}&a_{N}\end{pmatrix}

As a comment here, for many concrete applications the Wigner matrices are in fact the central objects in random matrix theory, and in particular, they are often more important than the Gaussian matrices. In fact, these are the random matrices which were first considered and investigated, a long time ago, by Wigner himself [wig].


Finally, we will be interested as well in the complex Wishart matrices, which are the positive versions of the above random matrices, constructed as follows:

Definition 6.18.

A complex Wishart matrix is a random matrix of type

Z=YYMN(L(X))Z=YY^{*}\in M_{N}(L^{\infty}(X))

with YY being a complex Gaussian matrix.

As before with the Gaussian and Wigner matrices, there are many possible comments that can be made here, of technical or historical nature. First, using real Gaussian variables instead of complex ones leads to a less interesting combinatorics. Also, these matrices were introduced and studied by Marchenko-Pastur not long after Wigner, in [mpa], and so historically came second. Finally, in what regards their combinatorics and applications, these matrices quite often come first, before both the Gaussian and the Wigner ones, with all this being of course a matter of knowledge and taste.


Summarizing, we have three main types of random matrices, which can be somehow designated as “complex”, “real” and “positive”, and that we will study in what follows. Let us also mention that there are many other interesting classes of random matrices, usually appearing as modifications of the above. More on these later.


In order to compute the asymptotic laws of the above matrices, we will use the moment method. We have the following result, which will be our main tool here:

Theorem 6.19.

Given independent variables XiX_{i}, each following the complex normal law GtG_{t}, with t>0t>0 being a fixed parameter, we have the Wick formula

𝔼(Xi1k1Xisks)=ts/2#{π𝒫2(k)|πkeri}\mathbb{E}\left(X_{i_{1}}^{k_{1}}\ldots X_{i_{s}}^{k_{s}}\right)=t^{s/2}\#\left\{\pi\in\mathcal{P}_{2}(k)\Big{|}\pi\leq\ker i\right\}

where k=k1ksk=k_{1}\ldots k_{s} and i=i1isi=i_{1}\ldots i_{s}, for the joint moments of these variables.

Proof.

This is something well-known, and the basis for all possible computations with complex normal variables, which can be proved in two steps, as follows:

(1) Let us first discuss the case where we have a single complex normal variable XX, which amounts in taking Xi=XX_{i}=X for any ii in the formula in the statement. What we have to compute here are the moments of XX, with respect to colored integer exponents k=k=\circ\bullet\bullet\circ\ldots\,, and the formula in the statement tells us that these moments must be:

𝔼(Xk)=t|k|/2|𝒫2(k)|\mathbb{E}(X^{k})=t^{|k|/2}|\mathcal{P}_{2}(k)|

But this is something that we know well from the above, the idea being that at t=1t=1 this follows by doing some combinatorics and calculus, in analogy with the combinatorics and calculus from the real case, where the moment formula is identical, save for the matching pairings 𝒫2\mathcal{P}_{2} being replaced by the usual pairings P2P_{2}, and then that the general case t>0t>0 follows from this, by rescaling. Thus, we are done with this case.

(2) In general now, the point is that we obtain the formula in the statement. Indeed, when expanding the product Xi1k1XisksX_{i_{1}}^{k_{1}}\ldots X_{i_{s}}^{k_{s}} and rearranging the terms, we are left with doing a number of computations as in (1), and then making the product of the expectations that we found. But this amounts precisely in counting the partitions in the statement, with the condition πkeri\pi\leq\ker i there standing precisely for the fact that we are doing the various type (1) computations independently, and then making the product. ∎

Now by getting back to the Gaussian matrices, we have the following result, with 𝒩𝒞2(k)=𝒫2(k)NC(k)\mathcal{NC}_{2}(k)=\mathcal{P}_{2}(k)\cap NC(k) standing for the noncrossing pairings of a colored integer kk:

Theorem 6.20.

Given a sequence of Gaussian random matrices

ZNMN(L(X))Z_{N}\in M_{N}(L^{\infty}(X))

having independent GtG_{t} variables as entries, for some fixed t>0t>0, we have

Mk(ZNN)t|k|/2|𝒩𝒞2(k)|M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{|k|/2}|\mathcal{NC}_{2}(k)|

for any colored integer k=k=\circ\bullet\bullet\circ\ldots\,, in the NN\to\infty limit.

Proof.

This is something standard, which can be done as follows:

(1) We fix NN\in\mathbb{N}, and we let Z=ZNZ=Z_{N}. Let us first compute the trace of ZkZ^{k}. With k=k1ksk=k_{1}\ldots k_{s}, and with the convention (ij)=ij,(ij)=ji(ij)^{\circ}=ij,(ij)^{\bullet}=ji, we have:

Tr(Zk)\displaystyle Tr(Z^{k}) =\displaystyle= Tr(Zk1Zks)\displaystyle Tr(Z^{k_{1}}\ldots Z^{k_{s}})
=\displaystyle= i1=1Nis=1N(Zk1)i1i2(Zk2)i2i3(Zks)isi1\displaystyle\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z^{k_{1}})_{i_{1}i_{2}}(Z^{k_{2}})_{i_{2}i_{3}}\ldots(Z^{k_{s}})_{i_{s}i_{1}}
=\displaystyle= i1=1Nis=1N(Z(i1i2)k1)k1(Z(i2i3)k2)k2(Z(isi1)ks)ks\displaystyle\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}

(2) Next, we rescale our variable ZZ by a N\sqrt{N} factor, as in the statement, and we also replace the usual trace by its normalized version, tr=Tr/Ntr=Tr/N. Our formula becomes:

tr((ZN)k)=1Ns/2+1i1=1Nis=1N(Z(i1i2)k1)k1(Z(i2i3)k2)k2(Z(isi1)ks)kstr\left(\left(\frac{Z}{\sqrt{N}}\right)^{k}\right)=\frac{1}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}

Thus, the moment that we are interested in is given by:

Mk(ZN)=1Ns/2+1i1=1Nis=1NX(Z(i1i2)k1)k1(Z(i2i3)k2)k2(Z(isi1)ks)ksM_{k}\left(\frac{Z}{\sqrt{N}}\right)=\frac{1}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}\int_{X}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}

(3) Let us apply now the Wick formula, from Theorem 6.19. We conclude that the moment that we are interested in is given by the following formula:

Mk(ZN)\displaystyle M_{k}\left(\frac{Z}{\sqrt{N}}\right)
=\displaystyle= ts/2Ns/2+1i1=1Nis=1N#{π𝒫2(k)|πker((i1i2)k1,(i2i3)k2,,(isi1)ks)}\displaystyle\frac{t^{s/2}}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}\#\left\{\pi\in\mathcal{P}_{2}(k)\Big{|}\pi\leq\ker\left((i_{1}i_{2})^{k_{1}},(i_{2}i_{3})^{k_{2}},\ldots,(i_{s}i_{1})^{k_{s}}\right)\right\}
=\displaystyle= ts/2π𝒫2(k)1Ns/2+1#{i{1,,N}s|πker((i1i2)k1,(i2i3)k2,,(isi1)ks)}\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\frac{1}{N^{s/2+1}}\#\left\{i\in\{1,\ldots,N\}^{s}\Big{|}\pi\leq\ker\left((i_{1}i_{2})^{k_{1}},(i_{2}i_{3})^{k_{2}},\ldots,(i_{s}i_{1})^{k_{s}}\right)\right\}

(4) Our claim now is that in the NN\to\infty limit the combinatorics of the above sum simplifies, with only the noncrossing partitions contributing to the sum, and with each of them contributing precisely with a 1 factor, so that we will have, as desired:

Mk(ZN)\displaystyle M_{k}\left(\frac{Z}{\sqrt{N}}\right) =\displaystyle= ts/2π𝒫2(k)(δπNC2(k)+O(N1))\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\Big{(}\delta_{\pi\in NC_{2}(k)}+O(N^{-1})\Big{)}
\displaystyle\simeq ts/2π𝒫2(k)δπNC2(k)\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\delta_{\pi\in NC_{2}(k)}
=\displaystyle= ts/2|𝒩𝒞2(k)|\displaystyle t^{s/2}|\mathcal{NC}_{2}(k)|

(5) In order to prove this, the first observation is that when kk is not uniform, in the sense that it contains a different number of \circ, \bullet symbols, we have 𝒫2(k)=\mathcal{P}_{2}(k)=\emptyset, and so:

Mk(ZN)=ts/2|𝒩𝒞2(k)|=0M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{s/2}|\mathcal{NC}_{2}(k)|=0

(6) Thus, we are left with the case where kk is uniform. Let us examine first the case where kk consists of an alternating sequence of \circ and \bullet symbols, as follows:

k=2pk=\underbrace{\circ\bullet\circ\bullet\ldots\ldots\circ\bullet}_{2p}

In this case it is convenient to relabel our multi-index i=(i1,,is)i=(i_{1},\ldots,i_{s}), with s=2ps=2p, in the form (j1,l1,j2,l2,,jp,lp)(j_{1},l_{1},j_{2},l_{2},\ldots,j_{p},l_{p}). With this done, our moment formula becomes:

Mk(ZN)=tpπ𝒫2(k)1Np+1#{j,l{1,,N}p|πker(j1l1,j2l1,j2l2,,j1lp)}M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{p}\sum_{\pi\in\mathcal{P}_{2}(k)}\frac{1}{N^{p+1}}\#\left\{j,l\in\{1,\ldots,N\}^{p}\Big{|}\pi\leq\ker\left(j_{1}l_{1},j_{2}l_{1},j_{2}l_{2},\ldots,j_{1}l_{p}\right)\right\}

Now observe that, with kk being as above, we have an identification 𝒫2(k)Sp\mathcal{P}_{2}(k)\simeq S_{p}, obtained in the obvious way. With this done too, our moment formula becomes:

Mk(ZN)=tpπSp1Np+1#{j,l{1,,N}p|jr=jπ(r)+1,lr=lπ(r),r}M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{p}\sum_{\pi\in S_{p}}\frac{1}{N^{p+1}}\#\left\{j,l\in\{1,\ldots,N\}^{p}\Big{|}j_{r}=j_{\pi(r)+1},l_{r}=l_{\pi(r)},\forall r\right\}

(7) We are now ready to do our asymptotic study, and prove the claim in (4). Let indeed γSp\gamma\in S_{p} be the full cycle, which is by definition the following permutation:

γ=(1 2p)\gamma=(1\,2\,\ldots\,p)

In terms of γ\gamma, the conditions jr=jπ(r)+1j_{r}=j_{\pi(r)+1} and lr=lπ(r)l_{r}=l_{\pi(r)} found above read:

γπkerj,πkerl\gamma\pi\leq\ker j\quad,\quad\pi\leq\ker l

Counting the number of free parameters in our moment formula, we obtain:

Mk(ZN)=tpNp+1πSpN|π|+|γπ|=tpπSpN|π|+|γπ|p1M_{k}\left(\frac{Z}{\sqrt{N}}\right)=\frac{t^{p}}{N^{p+1}}\sum_{\pi\in S_{p}}N^{|\pi|+|\gamma\pi|}=t^{p}\sum_{\pi\in S_{p}}N^{|\pi|+|\gamma\pi|-p-1}

(8) The point now is that the last exponent is well-known to be 0\leq 0, with equality precisely when the permutation πSp\pi\in S_{p} is geodesic, which in practice means that π\pi must come from a noncrossing partition. Thus we obtain, in the NN\to\infty limit, as desired:

Mk(ZN)tp|𝒩𝒞2(k)|M_{k}\left(\frac{Z}{\sqrt{N}}\right)\simeq t^{p}|\mathcal{NC}_{2}(k)|

This finishes the proof in the case of the exponents kk which are alternating, and the case where kk is an arbitrary uniform exponent is similar, by permuting everything. ∎

As a conclusion to this, we have obtained as asymptotic law for the Gaussian matrices a certain mysterious distribution, having as moments some numbers which are similar to the moments of the usual normal laws, but with the “underlying matching pairings being now replaced by underlying matching noncrossing pairings”. More on this later.


Regarding now the Wigner matrices, we have here the following result, coming as a consequence of Theorem 6.20, via some simple algebraic manipulations:

Theorem 6.21.

Given a sequence of Wigner random matrices

ZNMN(L(X))Z_{N}\in M_{N}(L^{\infty}(X))

having independent GtG_{t} variables as entries, with t>0t>0, up to ZN=ZNZ_{N}=Z_{N}^{*}, we have

Mk(ZNN)tk/2|NC2(k)|M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{k/2}|NC_{2}(k)|

for any integer kk\in\mathbb{N}, in the NN\to\infty limit.

Proof.

This can be deduced from a direct computation based on the Wick formula, similar to that from the proof of Theorem 6.20, but the best is to deduce this result from Theorem 6.20 itself. Indeed, we know from there that for Gaussian matrices YNMN(L(X))Y_{N}\in M_{N}(L^{\infty}(X)) we have the following formula, valid for any colored integer K=K=\circ\bullet\bullet\circ\ldots\,, in the NN\to\infty limit, with 𝒩𝒞2\mathcal{NC}_{2} standing for noncrossing matching pairings:

MK(YNN)t|K|/2|𝒩𝒞2(K)|M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)\simeq t^{|K|/2}|\mathcal{NC}_{2}(K)|

By doing some combinatorics, we deduce from this that we have the following formula for the moments of the matrices Re(YN)Re(Y_{N}), with respect to usual exponents, kk\in\mathbb{N}:

Mk(Re(YN)N)\displaystyle M_{k}\left(\frac{Re(Y_{N})}{\sqrt{N}}\right) =\displaystyle= 2kMk(YNN+YNN)\displaystyle 2^{-k}\cdot M_{k}\left(\frac{Y_{N}}{\sqrt{N}}+\frac{Y_{N}^{*}}{\sqrt{N}}\right)
=\displaystyle= 2k|K|=kMK(YNN)\displaystyle 2^{-k}\sum_{|K|=k}M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)
\displaystyle\simeq 2k|K|=ktk/2|𝒩𝒞2(K)|\displaystyle 2^{-k}\sum_{|K|=k}t^{k/2}|\mathcal{NC}_{2}(K)|
=\displaystyle= 2ktk/22k/2|𝒩𝒞2(k)|\displaystyle 2^{-k}\cdot t^{k/2}\cdot 2^{k/2}|\mathcal{NC}_{2}(k)|
=\displaystyle= 2k/2tk/2|NC2(k)|\displaystyle 2^{-k/2}\cdot t^{k/2}|NC_{2}(k)|

Now since the matrices ZN=2Re(YN)Z_{N}=\sqrt{2}Re(Y_{N}) are of Wigner type, this gives the result. ∎

Summarizing, all this brings us into counting noncrossing pairings. So, let us start with some preliminaries here. We first have the following well-known result:

Theorem 6.22.

The Catalan numbers, which are by definition given by

Ck=|NC2(2k)|C_{k}=|NC_{2}(2k)|

satisfy the following recurrence formula, with initial data C0=C1=1C_{0}=C_{1}=1,

Ck+1=a+b=kCaCbC_{k+1}=\sum_{a+b=k}C_{a}C_{b}

their generating series f(z)=k0Ckzkf(z)=\sum_{k\geq 0}C_{k}z^{k} satisfies the equation

zf2f+1=0zf^{2}-f+1=0

and is given by the following explicit formula,

f(z)=114z2zf(z)=\frac{1-\sqrt{1-4z}}{2z}

and we have the following explicit formula for these numbers:

Ck=1k+1(2kk)C_{k}=\frac{1}{k+1}\binom{2k}{k}

Numerically, these numbers are 1,1,2,5,14,42,132,429,1430,4862,16796,1,1,2,5,14,42,132,429,1430,4862,16796,\ldots

Proof.

We must count the noncrossing pairings of {1,,2k}\{1,\ldots,2k\}. Now observe that such a pairing appears by pairing 1 to an odd number, 2a+12a+1, and then inserting a noncrossing pairing of {2,,2a}\{2,\ldots,2a\}, and a noncrossing pairing of {2a+2,,2l}\{2a+2,\ldots,2l\}. We conclude that we have the following recurrence formula for the Catalan numbers:

Ck=a+b=k1CaCbC_{k}=\sum_{a+b=k-1}C_{a}C_{b}

In terms of the generating series f(z)=k0Ckzkf(z)=\sum_{k\geq 0}C_{k}z^{k}, this recurrence formula reads:

zf2\displaystyle zf^{2} =\displaystyle= a,b0CaCbza+b+1\displaystyle\sum_{a,b\geq 0}C_{a}C_{b}z^{a+b+1}
=\displaystyle= k1a+b=k1CaCbzk\displaystyle\sum_{k\geq 1}\sum_{a+b=k-1}C_{a}C_{b}z^{k}
=\displaystyle= k1Ckzk\displaystyle\sum_{k\geq 1}C_{k}z^{k}
=\displaystyle= f1\displaystyle f-1

Thus ff satisfies zf2f+1=0zf^{2}-f+1=0, and by solving this equation, and choosing the solution which is bounded at z=0z=0, we obtain the following formula:

f(z)=114z2zf(z)=\frac{1-\sqrt{1-4z}}{2z}

In order to finish, we use the generalized binomial formula, which gives:

1+t=12k=11k(2k2k1)(t4)k\sqrt{1+t}=1-2\sum_{k=1}^{\infty}\frac{1}{k}\binom{2k-2}{k-1}\left(\frac{-t}{4}\right)^{k}

Now back to our series ff, we obtain the following formula for it:

f(z)\displaystyle f(z) =\displaystyle= 114z2z\displaystyle\frac{1-\sqrt{1-4z}}{2z}
=\displaystyle= k=11k(2k2k1)zk1\displaystyle\sum_{k=1}^{\infty}\frac{1}{k}\binom{2k-2}{k-1}z^{k-1}
=\displaystyle= k=01k+1(2kk)zk\displaystyle\sum_{k=0}^{\infty}\frac{1}{k+1}\binom{2k}{k}z^{k}

It follows that the Catalan numbers are given by:

Ck=1k+1(2kk)C_{k}=\frac{1}{k+1}\binom{2k}{k}

Thus, we are led to the conclusion in the statement. ∎

In order to recapture now the Wigner measure from its moments, we can use:

Proposition 6.23.

The Catalan numbers are the even moments of

γ1=12π4x2dx\gamma_{1}=\frac{1}{2\pi}\sqrt{4-x^{2}}dx

called standard semicircle law. As for the odd moments of γ1\gamma_{1}, these all vanish.

Proof.

The even moments of the semicircle law in the statement can be computed with the change of variable x=2costx=2\cos t, and we are led to the following formula:

M2k\displaystyle M_{2k} =\displaystyle= 1π024x2x2kdx\displaystyle\frac{1}{\pi}\int_{0}^{2}\sqrt{4-x^{2}}x^{2k}dx
=\displaystyle= 1π0π/244cos2t(2cost)2k2sintdt\displaystyle\frac{1}{\pi}\int_{0}^{\pi/2}\sqrt{4-4\cos^{2}t}\,(2\cos t)^{2k}2\sin t\,dt
=\displaystyle= 4k+1π0π/2cos2ktsin2tdt\displaystyle\frac{4^{k+1}}{\pi}\int_{0}^{\pi/2}\cos^{2k}t\sin^{2}t\,dt
=\displaystyle= 4k+1ππ2(2k)!!2!!(2k+3)!!\displaystyle\frac{4^{k+1}}{\pi}\cdot\frac{\pi}{2}\cdot\frac{(2k)!!2!!}{(2k+3)!!}
=\displaystyle= 24k(2k)!/2kk!2k+1(k+1)!\displaystyle 2\cdot 4^{k}\cdot\frac{(2k)!/2^{k}k!}{2^{k+1}(k+1)!}
=\displaystyle= Ck\displaystyle C_{k}

As for the odd moments, these all vanish, because the density of γ1\gamma_{1} is an even function. Thus, we are led to the conclusion in the statement. ∎

More generally, we have the following result, involving a parameter t>0t>0:

Proposition 6.24.

Given t>0t>0, the real measure having as even moments the numbers M2k=tkCkM_{2k}=t^{k}C_{k} and having all odd moments 0 is the measure

γt=12πt4tx2dx\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx

called Wigner semicircle law on [2t,2t][-2\sqrt{t},2\sqrt{t}].

Proof.

This follows indeed from Proposition 6.23, via a change of variables. ∎

Now by putting everything together, we obtain the Wigner theorem, as follows:

Theorem 6.25.

Given a sequence of Wigner random matrices

ZNMN(L(X))Z_{N}\in M_{N}(L^{\infty}(X))

which by definition have i.i.d. complex normal entries, up to ZN=ZNZ_{N}=Z_{N}^{*}, we have

ZNγtZ_{N}\sim\gamma_{t}

in the NN\to\infty limit, where γt=12πt4tx2dx\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx is the Wigner semicircle law.

Proof.

This follows indeed from all the above, and more specifically, by combining Theorem 6.21, Theorem 6.22 and Proposition 6.24. ∎

Regarding now the complex Gaussian matrices, in view of this result, it is natural to think at the law found in Theorem 6.20 as being “circular”. But this is just a thought, and more on this later, in chapter 8 below, when doing free probability.

6d. Wishart matrices

Let us discuss now the Wishart matrices, which are the positive analogues of the Wigner matrices. Quite surprisingly, the computation here leads to the Catalan numbers, but not in the same way as for the Wigner matrices, the result being as follows:

Theorem 6.26.

Given a sequence of complex Wishart matrices

WN=YNYNMN(L(X))W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with YNY_{N} being N×NN\times N complex Gaussian of parameter t>0t>0, we have

Mk(WNN)tkCkM_{k}\left(\frac{W_{N}}{N}\right)\simeq t^{k}C_{k}

for any exponent kk\in\mathbb{N}, in the NN\to\infty limit.

Proof.

There are several possible proofs for this result, as follows:

(1) A first method is by using the formula that we have in Theorem 6.20, for the Gaussian matrices YNY_{N}. Indeed, we know from there that we have the following formula, valid for any colored integer K=K=\circ\bullet\bullet\circ\ldots\,, in the NN\to\infty limit:

MK(YNN)t|K|/2|𝒩𝒞2(K)|M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)\simeq t^{|K|/2}|\mathcal{NC}_{2}(K)|

With K=K=\circ\bullet\circ\bullet\ldots\,, alternating word of length 2k2k, with kk\in\mathbb{N}, this gives:

Mk(YNYNN)tk|𝒩𝒞2(K)|M_{k}\left(\frac{Y_{N}Y_{N}^{*}}{N}\right)\simeq t^{k}|\mathcal{NC}_{2}(K)|

Thus, in terms of the Wishart matrix WN=YNYNW_{N}=Y_{N}Y_{N}^{*} we have, for any kk\in\mathbb{N}:

Mk(WNN)tk|𝒩𝒞2(K)|M_{k}\left(\frac{W_{N}}{N}\right)\simeq t^{k}|\mathcal{NC}_{2}(K)|

The point now is that, by doing some combinatorics, we have:

|𝒩𝒞2(K)|=|NC2(2k)|=Ck|\mathcal{NC}_{2}(K)|=|NC_{2}(2k)|=C_{k}

Thus, we are led to the formula in the statement.

(2) A second method, that we will explain now as well, is by proving the result directly, starting from definitions. The matrix entries of our matrix W=WNW=W_{N} are given by:

Wij=r=1NYirY¯jrW_{ij}=\sum_{r=1}^{N}Y_{ir}\bar{Y}_{jr}

Thus, the normalized traces of powers of WW are given by the following formula:

tr(Wk)\displaystyle tr(W^{k}) =\displaystyle= 1Ni1=1Nik=1NWi1i2Wi2i3Wiki1\displaystyle\frac{1}{N}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}W_{i_{1}i_{2}}W_{i_{2}i_{3}}\ldots W_{i_{k}i_{1}}
=\displaystyle= 1Ni1=1Nik=1Nr1=1Nrk=1NYi1r1Y¯i2r1Yi2r2Y¯i3r2YikrkY¯i1rk\displaystyle\frac{1}{N}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}Y_{i_{1}r_{1}}\bar{Y}_{i_{2}r_{1}}Y_{i_{2}r_{2}}\bar{Y}_{i_{3}r_{2}}\ldots Y_{i_{k}r_{k}}\bar{Y}_{i_{1}r_{k}}

By rescaling now WW by a 1/N1/N factor, as in the statement, we obtain:

tr((WN)k)=1Nk+1i1=1Nik=1Nr1=1Nrk=1NYi1r1Y¯i2r1Yi2r2Y¯i3r2YikrkY¯i1rktr\left(\left(\frac{W}{N}\right)^{k}\right)=\frac{1}{N^{k+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}Y_{i_{1}r_{1}}\bar{Y}_{i_{2}r_{1}}Y_{i_{2}r_{2}}\bar{Y}_{i_{3}r_{2}}\ldots Y_{i_{k}r_{k}}\bar{Y}_{i_{1}r_{k}}

By using now the Wick rule, we obtain the following formula for the moments, with K=K=\circ\bullet\circ\bullet\ldots\,, alternating word of lenght 2k2k, and with I=(i1r1,i2r1,,ikrk,i1rk)I=(i_{1}r_{1},i_{2}r_{1},\ldots,i_{k}r_{k},i_{1}r_{k}):

Mk(WN)\displaystyle M_{k}\left(\frac{W}{N}\right) =\displaystyle= tkNk+1i1=1Nik=1Nr1=1Nrk=1N#{π𝒫2(K)|πker(I)}\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}\#\left\{\pi\in\mathcal{P}_{2}(K)\Big{|}\pi\leq\ker(I)\right\}
=\displaystyle= tkNk+1π𝒫2(K)#{i,r{1,,N}k|πker(I)}\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{\pi\in\mathcal{P}_{2}(K)}\#\left\{i,r\in\{1,\ldots,N\}^{k}\Big{|}\pi\leq\ker(I)\right\}

In order to compute this quantity, we use the standard bijection 𝒫2(K)Sk\mathcal{P}_{2}(K)\simeq S_{k}. By identifying the pairings π𝒫2(K)\pi\in\mathcal{P}_{2}(K) with their counterparts πSk\pi\in S_{k}, we obtain:

Mk(WN)\displaystyle M_{k}\left(\frac{W}{N}\right) =\displaystyle= tkNk+1πSk#{i,r{1,,N}k|is=iπ(s)+1,rs=rπ(s),s}\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{\pi\in S_{k}}\#\left\{i,r\in\{1,\ldots,N\}^{k}\Big{|}i_{s}=i_{\pi(s)+1},r_{s}=r_{\pi(s)},\forall s\right\}

Now let γSk\gamma\in S_{k} be the full cycle, which is by definition the following permutation:

γ=(1 2k)\gamma=(1\,2\,\ldots\,k)

The general factor in the product computed above is then 1 precisely when following two conditions are simultaneously satisfied:

γπkeri,πkerr\gamma\pi\leq\ker i\quad,\quad\pi\leq\ker r

Counting the number of free parameters in our moment formula, we obtain:

Mk(WN)=tkπSkN|π|+|γπ|k1M_{k}\left(\frac{W}{N}\right)=t^{k}\sum_{\pi\in S_{k}}N^{|\pi|+|\gamma\pi|-k-1}

The point now is that the last exponent is well-known to be 0\leq 0, with equality precisely when the permutation πSk\pi\in S_{k} is geodesic, which in practice means that π\pi must come from a noncrossing partition. Thus we obtain, in the NN\to\infty limit:

Mk(WN)tkCkM_{k}\left(\frac{W}{N}\right)\simeq t^{k}C_{k}

Thus, we are led to the conclusion in the statement. ∎

As a consequence of the above result, we have a new look on the Catalan numbers, which is more adapted to our present Wishart matrix considerations, as follows:

Proposition 6.27.

The Catalan numbers Ck=|NC2(2k)|C_{k}=|NC_{2}(2k)| appear as well as

Ck=|NC(k)|C_{k}=|NC(k)|

where NC(k)NC(k) is the set of all noncrossing partitions of {1,,k}\{1,\ldots,k\}.

Proof.

This follows indeed from the proof of Theorem 6.26. Observe that we obtain as well a formula in terms of matching pairings of alternating colored integers. ∎

The direct explanation for the above formula, relating noncrossing partitions and pairings, comes form the following result, which is very useful, and good to know:

Proposition 6.28.

We have a bijection between noncrossing partitions and pairings

NC(k)NC2(2k)NC(k)\simeq NC_{2}(2k)

which is constructed as follows:

  1. (1)

    The application NC(k)NC2(2k)NC(k)\to NC_{2}(2k) is the “fattening” one, obtained by doubling all the legs, and doubling all the strings as well.

  2. (2)

    Its inverse NC2(2k)NC(k)NC_{2}(2k)\to NC(k) is the “shrinking” application, obtained by collapsing pairs of consecutive neighbors.

Proof.

The fact that the two operations in the statement are indeed inverse to each other is clear, by computing the corresponding two compositions, with the remark that the construction of the fattening operation requires the partitions to be noncrossing. ∎

Getting back now to probability, we are led to the question of finding the law having the Catalan numbers as moments, in the above way. The result here is as follows:

Proposition 6.29.

The real measure having the Catalan numbers as moments is

π1=12π4x11dx\pi_{1}=\frac{1}{2\pi}\sqrt{4x^{-1}-1}\,dx

called Marchenko-Pastur law of parameter 11.

Proof.

The moments of the law π1\pi_{1} in the statement can be computed with the change of variable x=4cos2tx=4\cos^{2}t, as follows:

Mk\displaystyle M_{k} =\displaystyle= 12π044x11xkdx\displaystyle\frac{1}{2\pi}\int_{0}^{4}\sqrt{4x^{-1}-1}\,x^{k}dx
=\displaystyle= 12π0π/2sintcost(4cos2t)k2costsintdt\displaystyle\frac{1}{2\pi}\int_{0}^{\pi/2}\frac{\sin t}{\cos t}\cdot(4\cos^{2}t)^{k}\cdot 2\cos t\sin t\,dt
=\displaystyle= 4k+1π0π/2cos2ktsin2tdt\displaystyle\frac{4^{k+1}}{\pi}\int_{0}^{\pi/2}\cos^{2k}t\sin^{2}t\,dt
=\displaystyle= 4k+1ππ2(2k)!!2!!(2k+3)!!\displaystyle\frac{4^{k+1}}{\pi}\cdot\frac{\pi}{2}\cdot\frac{(2k)!!2!!}{(2k+3)!!}
=\displaystyle= 24k(2k)!/2kk!2k+1(k+1)!\displaystyle 2\cdot 4^{k}\cdot\frac{(2k)!/2^{k}k!}{2^{k+1}(k+1)!}
=\displaystyle= Ck\displaystyle C_{k}

Thus, we are led to the conclusion in the statement. ∎

Now back to the Wishart matrices, we are led to the following result:

Theorem 6.30.

Given a sequence of complex Wishart matrices

WN=YNYNMN(L(X))W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with YNY_{N} being N×NN\times N complex Gaussian of parameter t>0t>0, we have

WNtN12π4x11dx\frac{W_{N}}{tN}\sim\frac{1}{2\pi}\sqrt{4x^{-1}-1}\,dx

with NN\to\infty, with the limiting measure being the Marchenko-Pastur law π1\pi_{1}.

Proof.

This follows indeed from Theorem 6.26 and Proposition 6.29. ∎

As a comment now, while the above result is definitely something interesting at t=1t=1, at general t>0t>0 this looks more like a “fake” generalization of the t=1t=1 result, because the law π1\pi_{1} stays the same, modulo a trivial rescaling. The reasons behind this phenomenon are quite subtle, and skipping some discussion, the point is that Theorem 6.30 is indeed something “fake” at general t>0t>0, and the correct generalization of the t=1t=1 computation, involving more general classes of complex Wishart matrices, is as follows:

Theorem 6.31.

Given a sequence of general complex Wishart matrices

WN=YNYNMN(L(X))W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with YNY_{N} being N×MN\times M complex Gaussian of parameter 11, we have

WNNmax(1t,0)δ0+4t(x1t)22πxdx\frac{W_{N}}{N}\sim\max(1-t,0)\delta_{0}+\frac{\sqrt{4t-(x-1-t)^{2}}}{2\pi x}\,dx

with M=tNM=tN\to\infty, with the limiting measure being the Marchenko-Pastur law πt\pi_{t}.

Proof.

This follows once again by using the moment method, the limiting moments in the M=tNM=tN\to\infty regime being as follows, after doing the combinatorics:

Mk(WNN)πNC(k)t|π|M_{k}\left(\frac{W_{N}}{N}\right)\simeq\sum_{\pi\in NC(k)}t^{|\pi|}

But these numbers are the moments of the Marchenko-Pastur law πt\pi_{t}, which in addition has the density given by the formula in the statement, and this gives the result. ∎

As a philosophical conclusion now, we have 4 main laws in what we have been doing so far, namely the Gaussian laws gtg_{t}, the Poisson laws ptp_{t}, the Wigner laws γt\gamma_{t} and the Marchenko-Pastur laws πt\pi_{t}. These laws naturally form a diagram, as follows:

πt\textstyle{\pi_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}γt\textstyle{\gamma_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}pt\textstyle{p_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}gt\textstyle{g_{t}}

We will see in chapter 8 that πt,γt\pi_{t},\gamma_{t} appear as “free analogues” of pt,gtp_{t},g_{t}, and that a full theory can be developed, with central limiting theorems for all 4 laws, convolution semigroup results for all 4 laws too, and Lie group type results for all 4 laws too. And also, we will be back to the random matrices as well, with further results about them.

6e. Exercises

There has been a lot of non-trivial combinatorics and calculus in this chapter, sometimes only briefly explained, and as an exercise on all this, we have:

Exercise 6.32.

Clarify all the details in connection with the Wigner and Marchenko-Pastur computations, first at t=1t=1, and then for general t>0t>0.

As before, these are things discussed in the above, but only briefly, this whole chapter having been just a modest introduction to this exciting subject which are the random matrices. In the hope that you will find some time, and do the exercise.

Chapter 7 Quantum spaces

7a. Gelfand theorem

We have seen that the von Neumann algebras AB(H)A\subset B(H) are interesting objects, and it is tempting to go ahead with a systematic study of such algebras. This is what Murray and von Neumann did, when first coming across such algebras, back in the 1930s, in their series of papers [mv1], [mv2], [mv3], [vn1], [vn2], [vn3]. In what concerns us, we will rather keep this material for later, and talk instead, in this chapter and in the next one, of things which are perhaps more basic, motivated by the following definition:

Definition 7.1.

Given a von Neumann algebra AB(H)A\subset B(H), coming with a faithful positive unital trace tr:Atr:A\to\mathbb{C}, we write

A=L(X)A=L^{\infty}(X)

and call XX a quantum probability space. We also write the trace as tr=Xtr=\int_{X}, and call it integration with respect to the uniform measure on XX.

Obviously, this is something exciting, and we have seen how some interesting theory can be developed along these lines in the simplest case, that of the random matrix algebras. Thus, all this needs a better understanding, before going ahead with the above-mentioned Murray-von Neumann theory. In order to get started, here are a few comments:


(1) Generally speaking, all this comes from the fact that the commutative von Neumann algebras are those of the form A=L(X)A=L^{\infty}(X), with XX being a measured space. Since in the finite measure case, μ(X)<\mu(X)<\infty, the integration can be regarded as being a faithful positive unital trace tr:L(X)tr:L^{\infty}(X)\to\mathbb{C}, we are basically led to Definition 7.1.


(2) Regarding our assumption μ(X)<\mu(X)<\infty, making the integration tr:Atr:A\to\mathbb{C} bounded, this is something advanced, coming from deep classification results of von Neumann and Connes, which roughly state that “modulo classical measure theory, the study of the quantum measured spaces XX basically reduces to the case μ(X)<\mu(X)<\infty”.


(3) Finally, the traciality of tr:Atr:A\to\mathbb{C} is something advanced too, again coming from that classification results of von Neumann and Connes, which in their more precise formulation state that “modulo classical measure theory, the study of the quantum measured spaces XX basically reduces to the case where μ(X)<\mu(X)<\infty, and X\int_{X} is a trace”.


In short, complicated all this, and you will have to trust me here. Moving ahead now, there is one more thing to be discussed in connection with Definition 7.1, and this is physics. Let me formulate here the question that you surely have in mind:

Question 7.2.

As physicists we already agreed, without clear evidence, that our operators T:HHT:H\to H should be bounded. But what about quantum spaces, is it a good idea to assume that these are as above, of finite mass, and with tracial integration?

Well, this is certainly an interesting question. In favor of my choice, I would argue that the mathematical physics of Jones [jo1], [jo2], [jo3], [jo5], [jo6] and Voiculescu [vo1], [vo2], [vdn] needs a trace tr:Atr:A\to\mathbb{C}, as above. And the same goes for certain theoretical physics continuations of the main work of Connes [co3], as for instance the basic theory of the Standard Model spectral triple of Chamseddine-Connes, whose free gauge group has tracial Haar integration. Needless to say, all this is quite subjective. But hey, question of theoretical physics you asked, answer of theoretical physics is what you get.


Hang on, we are not done yet. Now that we are convinced that Definition 7.1 is the correct one, be that on mathematical or physical grounds, let us look for examples. And here the situation is quite grim, because even in the classical case, we have:

Fact 7.3.

The measure on a classical measured space XX cannot come out of nowhere, and is usually a Haar measure, appearing by theorem. Thus, in our picture

AB(H)A\subset B(H)

both the Hilbert space H=L2(X)H=L^{2}(X) and the von Neumann algebra A=L(X)A=L^{\infty}(X) should appear by theorem, not by definition, contrary to what Definition 7.1 says.

To be more precise, in what regards the first assertion, this is certainly the case with simple objects like Lie groups, or spheres and other homogeneous spaces. Of course you might say that [0,1][0,1] with the uniform measure is a measured space, but isn’t [0,1][0,1] obtained by cutting the Lie group \mathbb{R}, with its Haar measure. And the same goes with [0,1][0,1] with an arbitrary measure f(x)dxf(x)dx, or with [0,1][0,1] being deformed into a curve, and so on, because that dxdx, or what is left from it, will always refer to the Haar measure of \mathbb{R}.


As for the second assertion, nothing much to comment here, mathematics has spoken. So, getting back now to Definition 7.1 as it is, looks like we have two dead bodies there, the Hilbert space HH and the operator algebra AA. So let us try to get rid of at least one of them. But which? In the lack of any obvious idea, let us turn to physics:

Question 7.4.

In quantum mechanics, which came first, the Hilbert space HH, or the operator algebra AA?

Unfortunately this question is as difficult as the one regarding the chicken and the egg. A look at what various physicists said on this matter, in a direct or indirect way, does not help much, and by the end of the day we are left with guidelines like “no one understands quantum mechanics” (Feynman), “shut up and compute” (Dirac) and so on. And all this, coming on top on what has been already said on Definition 7.1, of rather unclear nature, is probably too much. That is, the last drop, time to conclude:

Conclusion 7.5.

The theory of von Neumann algebras has the same peculiarity as quantum mechanics: it tends to self-destruct, when approached axiomatically.

And we will take this as good news, providing us with warm evidence that the theory of von Neumann algebras is indeed related to quantum mechanics. This is what matters, being on the right track, and difficulties and all the rest, we won’t be scared by them.


Back to business now, in practice, we must go back to chapter 5, and examine what we were saying right before introducing the von Neumann algebras. And at that time, we were talking about general operator algebras AB(H)A\subset B(H), closed with respect to the norm, but not necessarily with respect to the weak topology. But this suggests formulating the following definition, somewhat as a purely mathematical answer to Question 7.4:

Definition 7.6.

A CC^{*}-algebra is an complex algebra AA, given with:

  1. (1)

    A norm a||a||a\to||a||, making it into a Banach algebra.

  2. (2)

    An involution aaa\to a^{*}, related to the norm by the formula ||aa||=||a||2||aa^{*}||=||a||^{2}.

Here by Banach algebra we mean a complex algebra with a norm satisfying all the conditions for a vector space norm, along with ||ab||||a||||b||||ab||\leq||a||\cdot||b|| and ||1||=1||1||=1, and which is such that our algebra is complete, in the sense that the Cauchy sequences converge. As for the involution, this must be antilinear, antimultiplicative, and satisfying a=aa^{**}=a.


As basic examples, we have the operator algebra B(H)B(H), for any Hilbert space HH, and more generally, the norm closed *-subalgebras AB(H)A\subset B(H). It is possible to prove that any CC^{*}-algebra appears in this way, but this is a non-trivial result, called GNS theorem, and more on this later. Note in passing that this result tells us that there is no need to memorize the above axioms for the CC^{*}-algebras, because these are simply the obvious things that can be said about B(H)B(H), and its norm closed *-subalgebras AB(H)A\subset B(H).


As a second class of basic examples, which are of great interest for us, we have:

Proposition 7.7.

If XX is a compact space, the algebra C(X)C(X) of continuous functions f:Xf:X\to\mathbb{C} is a CC^{*}-algebra, with the usual norm and involution, namely:

||f||=supxX|f(x)|,f(x)=f(x)¯||f||=\sup_{x\in X}|f(x)|\quad,\quad f^{*}(x)=\overline{f(x)}

This algebra is commutative, in the sense that fg=gffg=gf, for any f,gC(X)f,g\in C(X).

Proof.

All this is clear from definitions. Observe that we have indeed:

||ff||=supxX|f(x)|2=||f||2||ff^{*}||=\sup_{x\in X}|f(x)|^{2}=||f||^{2}

Thus, the axioms are satisfied, and finally fg=gffg=gf is clear. ∎

In general, the CC^{*}-algebras can be thought of as being algebras of operators, over some Hilbert space which is not present. By using this philosophy, one can emulate spectral theory in this setting, with extensions of the various results from chapters 3,5:

Theorem 7.8.

Given element aAa\in A of a CC^{*}-algebra, define its spectrum as:

σ(a)={λ|aλA1}\sigma(a)=\left\{\lambda\in\mathbb{C}\Big{|}a-\lambda\notin A^{-1}\right\}

The following spectral theory results hold, exactly as in the A=B(H)A=B(H) case:

  1. (1)

    We have σ(ab){0}=σ(ba){0}\sigma(ab)\cup\{0\}=\sigma(ba)\cup\{0\}.

  2. (2)

    We have polynomial, rational and holomorphic calculus.

  3. (3)

    As a consequence, the spectra are compact and non-empty.

  4. (4)

    The spectra of unitaries (u=u1)(u^{*}=u^{-1}) and self-adjoints (a=a)(a=a^{*}) are on 𝕋,\mathbb{T},\mathbb{R}.

  5. (5)

    The spectral radius of normal elements (aa=aa)(aa^{*}=a^{*}a) is given by ρ(a)=||a||\rho(a)=||a||.

In addition, assuming aABa\in A\subset B, the spectra of aa with respect to AA and to BB coincide.

Proof.

This is something that we know from chapter 3, in the case A=B(H)A=B(H), and then from chapter 5, in the case AB(H)A\subset B(H). In general, the proof is similar:

(1) Regarding the assertions (1-5), which are of course formulated a bit informally, the proofs here are perfectly similar to those for the full operator algebra A=B(H)A=B(H). All this is standard material, and in fact, things in chapters 3 were written in such a way as for their extension now, to the general CC^{*}-algebra setting, to be obvious.

(2) Regarding the last assertion, we know this from chapter 5 for ABB(H)A\subset B\subset B(H), and the proof in general is similar. Indeed, the inclusion σB(a)σA(a)\sigma_{B}(a)\subset\sigma_{A}(a) is clear. For the converse, assume aλB1a-\lambda\in B^{-1}, and consider the following self-adjoint element:

b=(aλ)(aλ)b=(a-\lambda)^{*}(a-\lambda)

The difference between the two spectra of bABb\in A\subset B is then given by:

σA(b)σB(b)={μσB(b)|(bμ)1BA}\sigma_{A}(b)-\sigma_{B}(b)=\left\{\mu\in\mathbb{C}-\sigma_{B}(b)\Big{|}(b-\mu)^{-1}\in B-A\right\}

Thus this difference in an open subset of \mathbb{C}. On the other hand bb being self-adjoint, its two spectra are both real, and so is their difference. Thus the two spectra of bb are equal, and in particular bb is invertible in AA, and so aλA1a-\lambda\in A^{-1}, as desired. ∎

We can now get back to the commutative CC^{*}-algebras, and we have the following result, due to Gelfand, which will be of crucial importance for us:

Theorem 7.9.

The commutative CC^{*}-algebras are exactly the algebras of the form

A=C(X)A=C(X)

with the “spectrum” XX of such an algebra being the space of characters χ:A\chi:A\to\mathbb{C}, with topology making continuous the evaluation maps eva:χχ(a)ev_{a}:\chi\to\chi(a).

Proof.

This is something that we basically know from chapter 5, but always good to talk about it again. Given a commutative CC^{*}-algebra AA, we can define XX as in the statement. Then XX is compact, and aevaa\to ev_{a} is a morphism of algebras, as follows:

ev:AC(X)ev:A\to C(X)

(1) We first prove that evev is involutive. We use the following formula, which is similar to the z=Re(z)+iIm(z)z=Re(z)+iIm(z) formula for the usual complex numbers:

a=a+a2+iaa2ia=\frac{a+a^{*}}{2}+i\cdot\frac{a-a^{*}}{2i}

Thus it is enough to prove the equality eva=evaev_{a^{*}}=ev_{a}^{*} for self-adjoint elements aa. But this is the same as proving that a=aa=a^{*} implies that evaev_{a} is a real function, which is in turn true, because eva(χ)=χ(a)ev_{a}(\chi)=\chi(a) is an element of σ(a)\sigma(a), contained in \mathbb{R}.

(2) Since AA is commutative, each element is normal, so evev is isometric:

||eva||=ρ(a)=||a||||ev_{a}||=\rho(a)=||a||

(3) It remains to prove that evev is surjective. But this follows from the Stone-Weierstrass theorem, because ev(A)ev(A) is a closed subalgebra of C(X)C(X), which separates the points. ∎

In view of the Gelfand theorem, we can formulate the following key definition:

Definition 7.10.

Given an arbitrary CC^{*}-algebra AA, we write

A=C(X)A=C(X)

and call XX a compact quantum space.

This might look like something informal, but it is not. Indeed, we can define the category of compact quantum spaces to be the category of the CC^{*}-algebras, with the arrows reversed. When AA is commutative, the above space XX exists indeed, as a Gelfand spectrum, X=Spec(A)X=Spec(A). In general, XX is something rather abstract, and our philosophy here will be that of studying of course AA, but formulating our results in terms of XX. For instance whenever we have a morphism Φ:AB\Phi:A\to B, we will write A=C(X),B=C(Y)A=C(X),B=C(Y), and rather speak of the corresponding morphism ϕ:YX\phi:Y\to X. And so on.


Technically speaking, we will see later that the above formalism has its limitations, and needs a fix. To be more precise, when looking at compact quantum spaces having a probability measure, there are more of them in the sense of Definition 7.10, than in the von Neumann algebra sense. Thus, all this needs a fix. But more on this later.


As a first concrete consequence of the Gelfand theorem, we have:

Proposition 7.11.

Assume that aAa\in A is normal, and let fC(σ(a))f\in C(\sigma(a)).

  1. (1)

    We can define f(a)Af(a)\in A, with ff(a)f\to f(a) being a morphism of CC^{*}-algebras.

  2. (2)

    We have the “continuous functional calculus” formula σ(f(a))=f(σ(a))\sigma(f(a))=f(\sigma(a)).

Proof.

Since aa is normal, the CC^{*}-algebra <a><a> that is generates is commutative, so if we denote by XX the space formed by the characters χ:<a>\chi:<a>\to\mathbb{C}, we have:

<a>=C(X)<a>=C(X)

Now since the map Xσ(a)X\to\sigma(a) given by evaluation at aa is bijective, we obtain:

<a>=C(σ(a))<a>=C(\sigma(a))

Thus, we are dealing with usual functions, and this gives all the assertions. ∎

As another consequence of the Gelfand theorem, we have:

Proposition 7.12.

For a normal element aAa\in A, the following are equivalent:

  1. (1)

    aa is positive, in the sense that σ(a)[0,)\sigma(a)\subset[0,\infty).

  2. (2)

    a=b2a=b^{2}, for some bAb\in A satisfying b=bb=b^{*}.

  3. (3)

    a=cca=cc^{*}, for some cAc\in A.

Proof.

This is very standard, exactly as in A=B(H)A=B(H) case, as follows:

(1)(2)(1)\implies(2) Since f(z)=zf(z)=\sqrt{z} is well-defined on σ(a)[0,)\sigma(a)\subset[0,\infty), we can set b=ab=\sqrt{a}.

(2)(3)(2)\implies(3) This is trivial, because we can set c=bc=b.

(3)(1)(3)\implies(1) We proceed by contradiction. By multiplying cc by a suitable element of <cc><cc^{*}>, we are led to the existence of an element d0d\neq 0 satisfying dd0-dd^{*}\geq 0. By writing now d=x+iyd=x+iy with x=x,y=yx=x^{*},y=y^{*} we have:

dd+dd=2(x2+y2)0dd^{*}+d^{*}d=2(x^{2}+y^{2})\geq 0

Thus dd0d^{*}d\geq 0, contradicting the fact that σ(dd),σ(dd)\sigma(dd^{*}),\sigma(d^{*}d) must coincide outside {0}\{0\}. ∎

Let us clarify now the relation between CC^{*}-algebras and von Neumann algebras. In order to do so, we need a prove a key result, called GNS representation theorem, stating that any CC^{*}-algebra appears as an operator algebra. As a first result, we have:

Proposition 7.13.

Let AA be a commutative CC^{*}-algebra, write A=C(X)A=C(X), with XX being a compact space, and let μ\mu be a positive measure on XX. We have then

AB(H)A\subset B(H)

where H=L2(X)H=L^{2}(X), with fAf\in A corresponding to the operator gfgg\to fg.

Proof.

Given a continuous function fC(X)f\in C(X), consider the operator Tf(g)=fgT_{f}(g)=fg, on H=L2(X)H=L^{2}(X). Observe that TfT_{f} is indeed well-defined, and bounded as well, because:

||fg||2=X|f(x)|2|g(x)|2dμ(x)||f||||g||2||fg||_{2}=\sqrt{\int_{X}|f(x)|^{2}|g(x)|^{2}d\mu(x)}\leq||f||_{\infty}||g||_{2}

The application fTff\to T_{f} being linear, involutive, continuous, and injective as well, we obtain in this way a CC^{*}-algebra embedding AB(H)A\subset B(H), as claimed. ∎

In order to prove the GNS representation theorem, we must extend the above construction, to the case where AA is not necessarily commutative. Let us start with:

Definition 7.14.

Consider a CC^{*}-algebra AA.

  1. (1)

    φ:A\varphi:A\to\mathbb{C} is called positive when a0φ(a)0a\geq 0\implies\varphi(a)\geq 0.

  2. (2)

    φ:A\varphi:A\to\mathbb{C} is called faithful and positive when a0,a0φ(a)>0a\geq 0,a\neq 0\implies\varphi(a)>0.

In the commutative case, A=C(X)A=C(X), the positive elements are the positive functions, f:X[0,)f:X\to[0,\infty). As for the positive linear forms φ:A\varphi:A\to\mathbb{C}, these appear as follows, with μ\mu being positive, and strictly positive if we want φ\varphi to be faithful and positive:

φ(f)=Xf(x)dμ(x)\varphi(f)=\int_{X}f(x)d\mu(x)

In general, the positive linear forms can be thought of as being integration functionals with respect to some underlying “positive measures”. We can use them as follows:

Proposition 7.15.

Let φ:A\varphi:A\to\mathbb{C} be a positive linear form.

  1. (1)

    <a,b>=φ(ab)<a,b>=\varphi(ab^{*}) defines a generalized scalar product on AA.

  2. (2)

    By separating and completing we obtain a Hilbert space HH.

  3. (3)

    π(a):bab\pi(a):b\to ab defines a representation π:AB(H)\pi:A\to B(H).

  4. (4)

    If φ\varphi is faithful in the above sense, then π\pi is faithful.

Proof.

Almost everything here is straightforward, as follows:

(1) This is clear from definitions, and from the basic properties of the positive elements a0a\geq 0, which can be established exactly as in the A=B(H)A=B(H) case.

(2) This is a standard procedure, which works for any scalar product, the idea being that of dividing by the vectors satisfying <x,x>=0<x,x>=0, then completing.

(3) All the verifications here are standard algebraic computations, in analogy with what we have seen many times, for multiplication operators, or group algebras.

(4) Assuming that we have a0a\neq 0, we have then π(aa)0\pi(aa^{*})\neq 0, which in turn implies by faithfulness that we have π(a)0\pi(a)\neq 0, which gives the result. ∎

In order to establish the embedding theorem, it remains to prove that any CC^{*}-algebra has a faithful positive linear form φ:A\varphi:A\to\mathbb{C}. This is something more technical:

Proposition 7.16.

Let AA be a CC^{*}-algebra.

  1. (1)

    Any positive linear form φ:A\varphi:A\to\mathbb{C} is continuous.

  2. (2)

    A linear form φ\varphi is positive iff there is a norm one hA+h\in A_{+} such that ||φ||=φ(h)||\varphi||=\varphi(h).

  3. (3)

    For any aAa\in A there exists a positive norm one form φ\varphi such that φ(aa)=||a||2\varphi(aa^{*})=||a||^{2}.

  4. (4)

    If AA is separable there is a faithful positive form φ:A\varphi:A\to\mathbb{C}.

Proof.

The proof here is quite technical, inspired from the existence proof of the probability measures on abstract compact spaces, the idea being as follows:

(1) This follows from Proposition 7.15, via the following estimate:

|φ(a)|||π(a)||φ(1)||a||φ(1)|\varphi(a)|\leq||\pi(a)||\varphi(1)\leq||a||\varphi(1)

(2) In one sense we can take h=1h=1. Conversely, let aA+a\in A_{+}, ||a||1||a||\leq 1. We have:

|φ(h)φ(a)|||φ||||ha||φ(h)|\varphi(h)-\varphi(a)|\leq||\varphi||\cdot||h-a||\leq\varphi(h)

Thus we have Re(φ(a))0Re(\varphi(a))\geq 0, and with a=1ha=1-h we obtain:

Re(φ(1h))0Re(\varphi(1-h))\geq 0

Thus Re(φ(1))||φ||Re(\varphi(1))\geq||\varphi||, and so φ(1)=||φ||\varphi(1)=||\varphi||, so we can assume h=1h=1. Now observe that for any self-adjoint element aa, and any tt\in\mathbb{R} we have, with φ(a)=x+iy\varphi(a)=x+iy:

φ(1)2(1+t2||a||2)\displaystyle\varphi(1)^{2}(1+t^{2}||a||^{2}) \displaystyle\geq φ(1)2||1+t2a2||\displaystyle\varphi(1)^{2}||1+t^{2}a^{2}||
=\displaystyle= ||φ||2||1+ita||2\displaystyle||\varphi||^{2}\cdot||1+ita||^{2}
\displaystyle\geq |φ(1+ita)|2\displaystyle|\varphi(1+ita)|^{2}
=\displaystyle= |φ(1)ty+itx|\displaystyle|\varphi(1)-ty+itx|
\displaystyle\geq (φ(1)ty)2\displaystyle(\varphi(1)-ty)^{2}

Thus we have y=0y=0, and this finishes the proof of our remaining claim.

(3) We can set φ(λaa)=λ||a||2\varphi(\lambda aa^{*})=\lambda||a||^{2} on the linear space spanned by aaaa^{*}, then extend this functional by Hahn-Banach, to the whole AA. The positivity follows from (2).

(4) This is standard, by starting with a dense sequence (an)(a_{n}), and taking the Cesàro limit of the functionals constructed in (3). We have φ(aa)>0\varphi(aa^{*})>0, and we are done. ∎

With these ingredients in hand, we can now state and prove:

Theorem 7.17.

Any CC^{*}-algebra appears as a norm closed *-algebra of operators

AB(H)A\subset B(H)

over a certain Hilbert space HH. When AA is separable, HH can be taken to be separable.

Proof.

This result, called called GNS representation theorem after Gelfand, Naimark and Segal, follows indeed by combining Proposition 7.15 with Proposition 7.16. ∎

All this might seem quite surprising, and your first reaction would be to say what have we been we doing here, with our CC^{*}-algebra theory, because we are now back to operator algebras AB(H)A\subset B(H), and everything that we did with CC^{*}-algebras, extending things that we knew about operator algebras AB(H)A\subset B(H), looks more like a waste of time.


Error. The axioms in Definition 7.6, coupled with the writing A=C(X)A=C(X) in Definition 7.10, are something powerful, because they do not involve any kind of L2L^{2} or LL^{\infty} functions on our quantum spaces XX. Thus, we can start hunting for such spaces, just by defining CC^{*}-algebras with generators and relations, then look for Haar measures on such spaces, and use the GNS construction in order to reach to von Neumann algebras. Before getting into this, however, let us summarize the above discussion as follows:

Theorem 7.18.

We can talk about compact quantum measured spaces, as follows:

  1. (1)

    The category of compact quantum measured spaces (X,μ)(X,\mu) is the category of the CC^{*}-algebras with faithful traces (A,φ)(A,\varphi), with the arrows reversed.

  2. (2)

    In the case where we have a non-faithful trace φ\varphi, we can still talk about the corresponding space (X,μ)(X,\mu), by performing the GNS construction.

  3. (3)

    By taking the weak closure in the GNS representation, we obtain the von Neumann algebra A=L(X)A^{\prime\prime}=L^{\infty}(X), in the previous general measured space sense.

Proof.

All this follows from Theorem 7.17, and from the other things that we already know, with the whole result itself being something rather philosophical. ∎

7b. Tori, amenability

In the remainder of this chapter we explore the whole new world opened by the CC^{*}-algebra theory, with the study of several key examples. We will first discuss the group duals, also called noncommutative tori. Let us start with a well-known result:

Theorem 7.19.

The compact abelian groups GG are in correspondence with the discrete abelian groups Γ\Gamma, via Pontrjagin duality,

G=Γ^,Γ=G^G=\widehat{\Gamma}\quad,\quad\Gamma=\widehat{G}

with the dual of a locally compact group LL being the locally compact group L^\widehat{L} consisting of the continuous group characters χ:L𝕋\chi:L\to\mathbb{T}.

Proof.

This is something very standard, the idea being that, given a group LL as above, its continuous characters χ:L𝕋\chi:L\to\mathbb{T} form indeed a group, that we can call L^\widehat{L}. The correspondence LL^L\to\widehat{L} constructed in this way has then the following properties:

(1) We have ^N=N\widehat{\mathbb{Z}}_{N}=\mathbb{Z}_{N}. This is the basic computation to be performed, before anything else, and which is something algebraic, with roots of unity.

(2) More generally, the dual of a finite abelian group G=N1××NkG=\mathbb{Z}_{N_{1}}\times\ldots\times\mathbb{Z}_{N_{k}} is the group GG itself. This comes indeed from (1) and from G×H^=G^×H^\widehat{G\times H}=\widehat{G}\times\widehat{H}.

(3) At the opposite end now, that of the locally compact groups which are not compact, nor discrete, the main example, which is standard, is ^=\widehat{\mathbb{R}}=\mathbb{R}.

(4) Getting now to what we are interested in, it follows from the definition of the correspondence LL^L\to\widehat{L} that when LL is compact L^\widehat{L} is discrete, and vice versa.

(5) Finally, in order to best understand this latter phenomenon, the best is to work out the main pair of examples, which are 𝕋^=\widehat{\mathbb{T}}=\mathbb{Z} and ^=𝕋\widehat{\mathbb{Z}}=\mathbb{T}. ∎

Our claim now is that, by using operator algebra theory, we can talk about the dual G=Γ^G=\widehat{\Gamma} of any discrete group Γ\Gamma. Let us start our discussion in the von Neumann algebra setting, where things are particularly simple. We have here:

Theorem 7.20.

Given a discrete group Γ\Gamma, we can construct its von Neumann algebra

L(Γ)B(l2(Γ))L(\Gamma)\subset B(l^{2}(\Gamma))

by using the left regular representation. This algebra has a faithful positive trace, tr(g)=δg,1tr(g)=\delta_{g,1}, and when Γ\Gamma is abelian we have an isomorphism of tracial von Neumann algebras

L(Γ)L(G)L(\Gamma)\simeq L^{\infty}(G)

given by a Fourier type transform, where G=Γ^G=\widehat{\Gamma} is the compact dual of Γ\Gamma.

Proof.

There are many assertions here, the idea being as follows:

(1) The first part is standard, with the left regular representation of Γ\Gamma working as expected, and being a unitary representation, as follows:

ΓB(l2(Γ)),π(g):hgh\Gamma\subset B(l^{2}(\Gamma))\quad,\quad\pi(g):h\to gh

(2) The positivity of the trace comes from the following alternative formula for it, with the equivalence with the definition in the statement being clear:

tr(T)=<T1,1>tr(T)=<T1,1>

(3) The third part is standard as well, because when Γ\Gamma is abelian the algebra L(Γ)L(\Gamma) is commutative, and its spectral decomposition leads by delinearization to the group characters χ:Γ𝕋\chi:\Gamma\to\mathbb{T}, and so the dual group G=Γ^G=\widehat{\Gamma}, as indicated.

(4) Finally, the fact that our isomorphism transforms the trace of L(Γ)L(\Gamma) into the Haar integration functional of L(G)L^{\infty}(G) is clear. Moreover, the study of various examples show that what we constructed is in fact the Fourier transform, in its various incarnations. ∎

Getting back now to our quantum space questions, we have a beginning of answer, because based on the above, we can formulate the following definition:

Definition 7.21.

Given a discrete group Γ\Gamma, not necessarily abelian, we can construct its abstract dual G=Γ^G=\widehat{\Gamma} as a quantum measured space, via the following formula:

L(G)=L(Γ)L^{\infty}(G)=L(\Gamma)

In the case where Γ\Gamma happens to be abelian, this quantum space G=Γ^G=\widehat{\Gamma} is a classical space, namely the usual Pontrjagin dual of Γ\Gamma, endowed with its Haar measure.

Let us discuss now the same questions, in the CC^{*}-algebra setting. The situation here is more complicated than in the von Neumann algebra setting, as follows:

Proposition 7.22.

Associated to any discrete group Γ\Gamma are several group CC^{*}-algebras,

C(Γ)Cπ(Γ)Cred(Γ)C^{*}(\Gamma)\to C^{*}_{\pi}(\Gamma)\to C^{*}_{red}(\Gamma)

which are constructed as follows:

  1. (1)

    C(Γ)C^{*}(\Gamma) is the closure of the group algebra [Γ]\mathbb{C}[\Gamma], with involution g=g1g^{*}=g^{-1}, with respect to the maximal CC^{*}-seminorm on this *-algebra, which is a CC^{*}-norm.

  2. (2)

    Cred(Γ)C^{*}_{red}(\Gamma) is the norm closure of the group algebra [Γ]\mathbb{C}[\Gamma] in the left regular representation, on the Hilbert space l2(Γ)l^{2}(\Gamma), given by λ(g)(h)=gh\lambda(g)(h)=gh and linearity.

  3. (3)

    Cπ(Γ)C^{*}_{\pi}(\Gamma) can be any intermediate CC^{*}-algebra, but for best results, the indexing object π\pi must be a unitary group representation, satisfying πππ\pi\otimes\pi\subset\pi.

Proof.

This is something quite technical, with (2) being very similar to the von Neumann algebra construction from Theorem 7.20, with (1) being something new, with the norm property there coming from (2), and finally with (3) being an informal statement, that we will comment on later, once we will know about compact quantum groups. ∎

When Γ\Gamma is finite, or abelian, or more generally amenable, all the above group algebras coincide. In the abelian case, that we are particularly interested in here, the precise result is as follows, complementing the LL^{\infty} analysis from Theorem 7.20:

Theorem 7.23.

When Γ\Gamma is abelian all its group CC^{*}-algebras coincide, and we have an isomorphism as follows, given by a Fourier type transform,

C(Γ)C(G)C^{*}(\Gamma)\simeq C(G)

where G=Γ^G=\widehat{\Gamma} is the compact dual of Γ\Gamma. Moreover, this isomorphism transforms the standard group algebra trace tr(g)=δg,1tr(g)=\delta_{g,1} into the Haar integration of GG.

Proof.

Since Γ\Gamma is abelian, any of its group CC^{*}-algebras A=Cπ(Γ)A=C^{*}_{\pi}(\Gamma) is commutative. Thus, we can apply the Gelfand theorem, and we obtain A=C(X)A=C(X), with X=Spec(A)X=Spec(A). But the spectrum X=Spec(A)X=Spec(A), consisting of the characters χ:A\chi:A\to\mathbb{C}, can be identified by delinearizing with the Pontrjagin dual G=Γ^G=\widehat{\Gamma}, and this gives the results. ∎

At a more advanced level now, we have the following result:

Theorem 7.24.

For a discrete group Γ=<g1,,gN>\Gamma=<g_{1},\ldots,g_{N}>, the following conditions are equivalent, and if they are satisfied, we say that Γ\Gamma is amenable:

  1. (1)

    The projection map C(Γ)Cred(Γ)C^{*}(\Gamma)\to C^{*}_{red}(\Gamma) is an isomorphism.

  2. (2)

    The morphism ε:C(Γ)\varepsilon:C^{*}(\Gamma)\to\mathbb{C} given by g1g\to 1 factorizes through Cred(Γ)C^{*}_{red}(\Gamma).

  3. (3)

    We have Nσ(Re(g1++gN))N\in\sigma(Re(g_{1}+\ldots+g_{N})), the spectrum being taken inside Cred(Γ)C^{*}_{red}(\Gamma).

The amenable groups include all finite groups, and all abelian groups. As a basic example of a non-amenable group, we have the free group FNF_{N}, with N2N\geq 2.

Proof.

There are several things to be proved, the idea being as follows:

(1) The implication (1)(2)(1)\implies(2) is trivial, and (2)(3)(2)\implies(3) comes from the following computation, which shows that NRe(g1++gN)N-Re(g_{1}+\ldots+g_{N}) is not invertible inside Cred(Γ)C^{*}_{red}(\Gamma):

ε[NRe(g1++gN)]\displaystyle\varepsilon[N-Re(g_{1}+\ldots+g_{N})] =\displaystyle= NRe[ε(g1)++ε(gn)]\displaystyle N-Re[\varepsilon(g_{1})+\ldots+\varepsilon(g_{n})]
=\displaystyle= NN\displaystyle N-N
=\displaystyle= 0\displaystyle 0

As for (3)(1)(3)\implies(1), this is something more advanced, that we will not need for the moment. We will be back to this later, directly in a more general setting.

(2) The fact that any finite group GG is amenable is clear, because all the group CC^{*}-algebras are equal to the usual group *-algebra [G]\mathbb{C}[G], in this case. As for the case of the abelian groups, these are all amenable as well, as shown by Theorem 7.23.

(3) It remains to prove that FNF_{N} with N2N\geq 2 is not amenable. By using F2FNF_{2}\subset F_{N}, it is enough to do this at N=2N=2. So, consider the free group F2=<g,h>F_{2}=<g,h>. In order to prove that F2F_{2} is not amenable, we use (1)(3)(1)\implies(3). To be more precise, it is enough to show that 4 is not in the spectrum of the following operator:

T=λ(g)+λ(g1)+λ(h)+λ(h1)T=\lambda(g)+\lambda(g^{-1})+\lambda(h)+\lambda(h^{-1})

This is a sum of four terms, each of them acting via δwδew\delta_{w}\to\delta_{ew}, with ee being a certain length one word. Thus if w1w\neq 1 has length nn then T(δw)T(\delta_{w}) is a sum of four Dirac masses, three of them at words of length n+1n+1 and the remaining one at a length n1n-1 word. We can therefore decompose TT as a sum T++TT_{+}+T_{-}, where T+T_{+} adds and TT_{-} cuts:

T=T++TT=T_{+}+T_{-}

That is, if w1w\neq 1 is a word, say beginning with hh, then T±T_{\pm} act on δw\delta_{w} as follows:

T+(δw)=δgw+δg1w+δhw,T(δw)=δh1wT_{+}(\delta_{w})=\delta_{gw}+\delta_{g^{-1}w}+\delta_{hw}\quad,\quad T_{-}(\delta_{w})=\delta_{h^{-1}w}

It follows from definitions that we have T+=TT_{+}^{*}=T_{-}. We can use the following trick:

(T++T)2+(i(T+T))2=2(T+T+TT+)(T_{+}+T_{-})^{2}+\left(i(T_{+}-T_{-})\right)^{2}=2(T_{+}T_{-}+T_{-}T_{+})

Indeed, this gives (T++T)22(T+T+TT+)(T_{+}+T_{-})^{2}\leq 2(T_{+}T_{-}+T_{-}T_{+}), and we obtain in this way:

||T||2=||T++T||22||T+T+TT+||||T||^{2}=||T_{+}+T_{-}||^{2}\leq 2||T_{+}T_{-}+T_{-}T_{+}||

Let w1w\neq 1 be a word, say beginning with hh. We have then:

TT+(δw)=T(δgw+δg1w+δhw)=3δwT_{-}T_{+}(\delta_{w})=T_{-}(\delta_{gw}+\delta_{g^{-1}w}+\delta_{hw})=3\delta_{w}

The action of TT+T_{-}T_{+} on the remaining vector δ1\delta_{1} is computed as follows:

TT+(δ1)=T(δg+δg1+δh+δh1)=4δ1T_{-}T_{+}(\delta_{1})=T_{-}(\delta_{g}+\delta_{g^{-1}}+\delta_{h}+\delta_{h^{-1}})=4\delta_{1}

Summing up, with P:δwδ1P:\delta_{w}\to\delta_{1} being the projection onto δ1\mathbb{C}\delta_{1}, we have:

TT+=3+PT_{-}T_{+}=3+P

On the other hand we have T+T(δ1)=T+(0)=0T_{+}T_{-}(\delta_{1})=T_{+}(0)=0, so the subspace δ1\mathbb{C}\delta_{1} is invariant under the operator T+T+TT+T_{+}T_{-}+T_{-}T_{+}. We have the following norm estimate:

||T||22||T+T+TT+||2max{||3+P||,||(T+T+TT+)(1P)||}||T||^{2}\leq 2||T_{+}T_{-}+T_{-}T_{+}||\leq 2\cdot\max\left\{||3+P||,\,\,\,||(T_{+}T_{-}+T_{-}T_{+})(1-P)||\right\}

The norm of 3+P3+P is equal to 44, and the other norm is estimated as follows:

||(T+T+TT+)(1P)||\displaystyle||(T_{+}T_{-}+T_{-}T_{+})(1-P)|| \displaystyle\leq ||T+T||+||(3+P)(1P)||\displaystyle||T_{+}T_{-}||+||(3+P)(1-P)||
=\displaystyle= ||TT+||+3\displaystyle||T_{-}T_{+}||+3
=\displaystyle= 7\displaystyle 7

Thus we have ||T||14<4||T||\leq\sqrt{14}<4, and this finishes the proof. ∎

7c. Quantum groups

The duals of discrete groups have several similarities with the compact groups, and our goal now will be that of unifying these two classes of compact quantum spaces. Let us start with the following definition, due to Woronowicz [wo1]:

Definition 7.25.

A Woronowicz algebra is a CC^{*}-algebra AA, given with a unitary matrix uMN(A)u\in M_{N}(A) whose coefficients generate AA, such that the formulae

Δ(uij)=kuikukj,ε(uij)=δij,S(uij)=uji\Delta(u_{ij})=\sum_{k}u_{ik}\otimes u_{kj}\quad,\quad\varepsilon(u_{ij})=\delta_{ij}\quad,\quad S(u_{ij})=u_{ji}^{*}

define morphisms of CC^{*}-algebras Δ:AAA\Delta:A\to A\otimes A, ε:A\varepsilon:A\to\mathbb{C}, S:AAoppS:A\to A^{opp}.

We say that AA is cocommutative when ΣΔ=Δ\Sigma\Delta=\Delta, where Σ(ab)=ba\Sigma(a\otimes b)=b\otimes a is the flip. We have the following result, which justifies the terminology and axioms:

Proposition 7.26.

The following are Woronowicz algebras:

  1. (1)

    C(G)C(G), with GUNG\subset U_{N} compact Lie group. Here the structural maps are:

    Δ(φ)=(g,h)φ(gh),ε(φ)=φ(1),S(φ)=gφ(g1)\Delta(\varphi)=(g,h)\to\varphi(gh)\quad,\quad\varepsilon(\varphi)=\varphi(1)\quad,\quad S(\varphi)=g\to\varphi(g^{-1})
  2. (2)

    C(Γ)C^{*}(\Gamma), with FNΓF_{N}\to\Gamma finitely generated group. Here the structural maps are:

    Δ(g)=gg,ε(g)=1,S(g)=g1\Delta(g)=g\otimes g\quad,\quad\varepsilon(g)=1\quad,\quad S(g)=g^{-1}

Moreover, we obtain in this way all the commutative/cocommutative algebras.

Proof.

In both cases, we have to exhibit a certain matrix uu. For the first assertion, we can use the matrix u=(uij)u=(u_{ij}) formed by matrix coordinates of GG, given by:

g=(u11(g)u1N(g)uN1(g)uNN(g))g=\begin{pmatrix}u_{11}(g)&\ldots&u_{1N}(g)\\ \vdots&&\vdots\\ u_{N1}(g)&\ldots&u_{NN}(g)\end{pmatrix}

As for the second assertion, here we can use the diagonal matrix formed by generators, u=diag(g1,,gN)u=diag(g_{1},\ldots,g_{N}). Finally, the last assertion follows from the Gelfand theorem, in the commutative case, and in the cocommutative case, we will be back to this later. ∎

In general now, the structural maps Δ,ε,S\Delta,\varepsilon,S have the following properties:

Proposition 7.27.

Let (A,u)(A,u) be a Woronowicz algebra.

  1. (1)

    Δ,ε\Delta,\varepsilon satisfy the usual axioms for a comultiplication and a counit, namely:

    (Δid)Δ=(idΔ)Δ(\Delta\otimes id)\Delta=(id\otimes\Delta)\Delta
    (εid)Δ=(idε)Δ=id(\varepsilon\otimes id)\Delta=(id\otimes\varepsilon)\Delta=id
  2. (2)

    SS satisfies the antipode axiom, on the *-subalgebra generated by entries of uu:

    m(Sid)Δ=m(idS)Δ=ε(.)1m(S\otimes id)\Delta=m(id\otimes S)\Delta=\varepsilon(.)1
  3. (3)

    In addition, the square of the antipode is the identity, S2=idS^{2}=id.

Proof.

When AA is commutative, by using Proposition 7.26 we can write:

Δ=mt,ε=ut,S=it\Delta=m^{t}\quad,\quad\varepsilon=u^{t}\quad,\quad S=i^{t}

The above 3 conditions come then by transposition from the basic 3 group theory conditions satisfied by m,u,im,u,i, which are as follows, with δ(g)=(g,g)\delta(g)=(g,g):

m(m×id)=m(id×m)m(m\times id)=m(id\times m)
m(id×u)=m(u×id)=idm(id\times u)=m(u\times id)=id
m(id×i)δ=m(i×id)δ=1m(id\times i)\delta=m(i\times id)\delta=1

Observe that S2=idS^{2}=id is satisfied as well, coming from i2=idi^{2}=id. In general now, all the formulae in the statement are satisfied on the generators uiju_{ij}, and so by linearity, multiplicativity and continuity they are satisfied everywhere, as desired. ∎

In view of Proposition 7.26, we can formulate the following definition:

Definition 7.28.

Given a Woronowicz algebra AA, we formally write

A=C(G)=C(Γ)A=C(G)=C^{*}(\Gamma)

and call GG compact quantum group, and Γ\Gamma discrete quantum group.

When AA is both commutative and cocommutative, GG is a compact abelian group, Γ\Gamma is a discrete abelian group, and these groups are dual to each other:

G=Γ^,Γ=G^G=\widehat{\Gamma}\quad,\quad\Gamma=\widehat{G}

In general, we still agree to write G=Γ^,Γ=G^G=\widehat{\Gamma},\Gamma=\widehat{G}, in a formal sense. Finally, in relation with functoriality issues, let us complement Definitions 7.25 and 7.28 with:

Definition 7.29.

Given two Woronowicz algebras (A,u)(A,u) and (B,v)(B,v), we write

ABA\simeq B

and we identify as well the corresponding compact and discrete quantum groups, when we have an isomorphism of *-algebras <uij><vij><u_{ij}>\simeq<v_{ij}>, mapping uijviju_{ij}\to v_{ij}.

In order to develop now some theory, let us call corepresentation of AA any unitary matrix vMn(𝒜)v\in M_{n}(\mathcal{A}), with 𝒜=<uij>\mathcal{A}=<u_{ij}>, satisfying the same conditions as uu, namely:

Δ(vij)=kvikvkj,ε(vij)=δij,S(vij)=vji\Delta(v_{ij})=\sum_{k}v_{ik}\otimes v_{kj}\quad,\quad\varepsilon(v_{ij})=\delta_{ij}\quad,\quad S(v_{ij})=v_{ji}^{*}

These can be thought of as corresponding to the unitary representations of the underlying compact quantum group GG. Following Woronowicz [wo1], we have:

Theorem 7.30.

Any Woronowicz algebra has a unique Haar integration functional,

(Gid)Δ=(idG)Δ=G(.)1\left(\int_{G}\otimes id\right)\Delta=\left(id\otimes\int_{G}\right)\Delta=\int_{G}(.)1

which can be constructed by starting with any faithful positive form φA\varphi\in A^{*}, and setting

G=limn1nk=1nφk\int_{G}=\lim_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\varphi^{*k}

where ϕψ=(ϕψ)Δ\phi*\psi=(\phi\otimes\psi)\Delta. Moreover, for any corepresentation vMn()Av\in M_{n}(\mathbb{C})\otimes A we have

(idG)v=P\left(id\otimes\int_{G}\right)v=P

where PP is the orthogonal projection onto Fix(v)={ξn|vξ=ξ}Fix(v)=\{\xi\in\mathbb{C}^{n}|v\xi=\xi\}.

Proof.

Following [wo1], this can be done in 3 steps, as follows:

(1) Given φA\varphi\in A^{*}, our claim is that the following limit converges, for any aAa\in A:

φa=limn1nk=1nφk(a)\int_{\varphi}a=\lim_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\varphi^{*k}(a)

Indeed, by linearity we can assume that aa is the coefficient of corepresentation, a=(τid)va=(\tau\otimes id)v. But in this case, an elementary computation shows that we have the following formula, where PφP_{\varphi} is the orthogonal projection onto the 11-eigenspace of (idφ)v(id\otimes\varphi)v:

(idφ)v=Pφ\left(id\otimes\int_{\varphi}\right)v=P_{\varphi}

(2) Since vξ=ξv\xi=\xi implies [(idφ)v]ξ=ξ[(id\otimes\varphi)v]\xi=\xi, we have PφPP_{\varphi}\geq P, where PP is the orthogonal projection onto the space Fix(v)={ξn|vξ=ξ}Fix(v)=\{\xi\in\mathbb{C}^{n}|v\xi=\xi\}. The point now is that when φA\varphi\in A^{*} is faithful, by using a positivity trick, one can prove that we have Pφ=PP_{\varphi}=P. Thus our linear form φ\int_{\varphi} is independent of φ\varphi, and is given on coefficients a=(τid)va=(\tau\otimes id)v by:

(idφ)v=P\left(id\otimes\int_{\varphi}\right)v=P

(3) With the above formula in hand, the left and right invariance of G=φ\int_{G}=\int_{\varphi} is clear on coefficients, and so in general, and this gives all the assertions. See [wo1]. ∎

As a main application, we can develop a Peter-Weyl type theory for the corepresentations of AA. Consider the dense *-subalgebra 𝒜A\mathcal{A}\subset A generated by the coefficients of the fundamental corepresentation uu, and endow it with the following scalar product:

<a,b>=Gab<a,b>=\int_{G}ab^{*}

With this convention, we have the following result, also from Woronowicz [wo1]:

Theorem 7.31.

We have the following Peter-Weyl type results:

  1. (1)

    Any corepresentation decomposes as a sum of irreducible corepresentations.

  2. (2)

    Each irreducible corepresentation appears inside a certain uku^{\otimes k}.

  3. (3)

    𝒜=vIrr(A)Mdim(v)()\mathcal{A}=\bigoplus_{v\in Irr(A)}M_{\dim(v)}(\mathbb{C}), the summands being pairwise orthogonal.

  4. (4)

    The characters of irreducible corepresentations form an orthonormal system.

Proof.

All these results are from [wo1], the idea being as follows:

(1) Given vMn(A)v\in M_{n}(A), its intertwiner algebra End(v)={TMn()|Tv=vT}End(v)=\{T\in M_{n}(\mathbb{C})|Tv=vT\} is a finite dimensional CC^{*}-algebra, and so decomposes as End(v)=Mn1()Mnr()End(v)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{r}}(\mathbb{C}). But this gives a decomposition of type v=v1++vrv=v_{1}+\ldots+v_{r}, as desired.

(2) Consider indeed the Peter-Weyl corepresentations, uku^{\otimes k} with kk colored integer, defined by u=1u^{\otimes\emptyset}=1, u=uu^{\otimes\circ}=u, u=u¯u^{\otimes\bullet}=\bar{u} and multiplicativity. The coefficients of these corepresentations span the dense algebra 𝒜\mathcal{A}, and by using (1), this gives the result.

(3) Here the direct sum decomposition, which is technically a *-coalgebra isomorphism, follows from (2). As for the second assertion, this follows from the fact that (idG)v(id\otimes\int_{G})v is the orthogonal projection PvP_{v} onto the space Fix(v)Fix(v), for any corepresentation vv.

(4) Let us define indeed the character of vMn(A)v\in M_{n}(A) to be the matrix trace, χv=Tr(v)\chi_{v}=Tr(v). Since this character is a coefficient of vv, the orthogonality assertion follows from (3). As for the norm 1 claim, this follows once again from (idG)v=Pv(id\otimes\int_{G})v=P_{v}. ∎

We can now solve a problem that we left open before, namely:

Proposition 7.32.

The cocommutative Woronowicz algebras appear as the quotients

C(Γ)ACred(Γ)C^{*}(\Gamma)\to A\to C^{*}_{red}(\Gamma)

given by A=Cπ(Γ)A=C^{*}_{\pi}(\Gamma) with πππ\pi\otimes\pi\subset\pi, with Γ\Gamma being a discrete group.

Proof.

This follows from the Peter-Weyl theory, and clarifies a number of things said before, notably in Proposition 7.26. Indeed, for a cocommutative Woronowicz algebra the irreducible corepresentations are all 1-dimensional, and this gives the results. ∎

As another consequence of the above results, once again by following Woronowicz [wo1], we have the following statement, dealing with functional analysis aspects, and extending what we already knew about the CC^{*}-algebras of the usual discrete groups:

Theorem 7.33.

Let AfullA_{full} be the enveloping CC^{*}-algebra of 𝒜\mathcal{A}, and AredA_{red} be the quotient of AA by the null ideal of the Haar integration. The following are then equivalent:

  1. (1)

    The Haar functional of AfullA_{full} is faithful.

  2. (2)

    The projection map AfullAredA_{full}\to A_{red} is an isomorphism.

  3. (3)

    The counit map ε:Afull\varepsilon:A_{full}\to\mathbb{C} factorizes through AredA_{red}.

  4. (4)

    We have Nσ(Re(χu))N\in\sigma(Re(\chi_{u})), the spectrum being taken inside AredA_{red}.

If this is the case, we say that the underlying discrete quantum group Γ\Gamma is amenable.

Proof.

This is well-known in the group dual case, A=C(Γ)A=C^{*}(\Gamma), with Γ\Gamma being a usual discrete group. In general, the result follows by adapting the group dual case proof:

(1)(2)(1)\iff(2) This simply follows from the fact that the GNS construction for the algebra AfullA_{full} with respect to the Haar functional produces the algebra AredA_{red}.

(2)(3)(2)\iff(3) Here \implies is trivial, and conversely, a counit map ε:Ared\varepsilon:A_{red}\to\mathbb{C} produces an isomorphism AredAfullA_{red}\to A_{full}, via a formula of type (εid)Φ(\varepsilon\otimes id)\Phi. See [wo1].

(3)(4)(3)\iff(4) Here \implies is clear, coming from ε(NRe(χ(u)))=0\varepsilon(N-Re(\chi(u)))=0, and the converse can be proved by doing some functional analysis. Once again, we refer here to [wo1]. ∎

Let us discuss now some interesting examples. Following Wang [wan], we have:

Proposition 7.34.

The following universal algebras are Woronowicz algebras,

C(ON+)=C((uij)i,j=1,,N|u=u¯,ut=u1)C(O_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u=\bar{u},u^{t}=u^{-1}\right)
C(UN+)=C((uij)i,j=1,,N|u=u1,ut=u¯1)C(U_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u^{*}=u^{-1},u^{t}=\bar{u}^{-1}\right)

so the underlying spaces ON+,UN+O_{N}^{+},U_{N}^{+} are compact quantum groups.

Proof.

This follows from the elementary fact that if a matrix u=(uij)u=(u_{ij}) is orthogonal or biunitary, then so must be the following matrices:

uΔij=kuikukj,uεij=δij,uSij=ujiu^{\Delta}_{ij}=\sum_{k}u_{ik}\otimes u_{kj}\quad,\quad u^{\varepsilon}_{ij}=\delta_{ij}\quad,\quad u^{S}_{ij}=u_{ji}^{*}

Thus, we can indeed define morphisms Δ,ε,S\Delta,\varepsilon,S as in Definition 7.25, by using the universal properties of C(ON+)C(O_{N}^{+}), C(UN+)C(U_{N}^{+}), and this gives the result. ∎

There is a connection here with group duals, coming from:

Proposition 7.35.

Given a closed subgroup GUN+G\subset U_{N}^{+}, consider its “diagonal torus”, which is the closed subgroup TGT\subset G constructed as follows:

C(T)=C(G)/uij=0|ijC(T)=C(G)\Big{/}\left<u_{ij}=0\Big{|}\forall i\neq j\right>

This torus is then a group dual, T=Λ^T=\widehat{\Lambda}, where Λ=<g1,,gN>\Lambda=<g_{1},\ldots,g_{N}> is the discrete group generated by the elements gi=uiig_{i}=u_{ii}, which are unitaries inside C(T)C(T).

Proof.

Since uu is unitary, its diagonal entries gi=uiig_{i}=u_{ii} are unitaries inside C(T)C(T). Moreover, from Δ(uij)=kuikukj\Delta(u_{ij})=\sum_{k}u_{ik}\otimes u_{kj} we obtain, when passing inside the quotient:

Δ(gi)=gigi\Delta(g_{i})=g_{i}\otimes g_{i}

It follows that we have C(T)=C(Λ)C(T)=C^{*}(\Lambda), modulo identifying as usual the CC^{*}-completions of the various group algebras, and so that we have T=Λ^T=\widehat{\Lambda}, as claimed. ∎

With this notion in hand, we have the following result:

Theorem 7.36.

The diagonal tori of the basic rotation groups are as follows,

UN\textstyle{U_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}UN+\textstyle{U_{N}^{+}}ON\textstyle{O_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ON+\textstyle{O_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}      :\textstyle{:}      𝕋N\textstyle{\mathbb{T}^{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}FN^\textstyle{\widehat{F_{N}}}2N\textstyle{\mathbb{Z}_{2}^{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}2N^\textstyle{\widehat{\mathbb{Z}_{2}^{*N}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

where FNF_{N} is the free group on NN generators, and * is a group-theoretical free product.

Proof.

This is clear indeed from UN+U_{N}^{+}, and the other results can be obtained by imposing to the generators of FNF_{N} the relations defining the corresponding quantum groups. ∎

As a conclusion to all this, the CC^{*}-algebra theory suggests developing a theory of “noncommutative geometry”, covering both the classical and the free geometry, by using compact quantum groups. We will be back to this in chapter 8.

7d. Cuntz algebras

We would like to end this chapter with an interesting class of CC^{*}-algebras, discovered by Cuntz in [cun], and heavily used since then, for various technical purposes. These algebras are not obviously related to the quantum space program that we have been developing so far, and might even look like some sort of Devil’s invention, orthogonal to what is beautiful in operator algebras, but believe me, if planning to do some serious operator algebra work, you will certainly run into them. Their definition is as follows:

Definition 7.37.

The Cuntz algebra OnO_{n} is the CC^{*}-algebra generated by isometries S1,,SnS_{1},\ldots,S_{n} satisfying the following condition:

S1S1++SnSn=1S_{1}S_{1}^{*}+\ldots+S_{n}S_{n}^{*}=1

That is, OnB(H)O_{n}\subset B(H) is generated by nn isometries whose ranges sum up to HH.

Observe that HH must be infinite dimensional, in order to have isometries as above. In what follows we will prove that OnO_{n} is independent on the choice of such isometries, and also that this algebra is simple. We will restrict the attention to the case n=2n=2, the proof in general being similar. Let us start with some simple computations, as follows:

Proposition 7.38.

Given a word i=i1iki=i_{1}\ldots i_{k} with il{1,2}i_{l}\in\{1,2\}, we associate to it the element Si=Si1SikS_{i}=S_{i_{1}}\ldots S_{i_{k}} of the algebra O2O_{2}. Then SiS_{i} are isometries, and we have

SiSj=δij1S_{i}^{*}S_{j}=\delta_{ij}1

for any two words i,ji,j having the same lenght.

Proof.

We use the relations defining the algebra O2O_{2}, namely:

S1S1=S2S2=1,S1S1+S2S2=1S_{1}^{*}S_{1}=S_{2}^{*}S_{2}=1\quad,\quad S_{1}S_{1}^{*}+S_{2}S_{2}^{*}=1

The fact that SiS_{i} are isometries is clear, here being the check for i=12i=12:

S12S12=(S1S2)(S1S2)=S2S1S1S2=S2S2=1S_{12}^{*}S_{12}=(S_{1}S_{2})^{*}(S_{1}S_{2})=S_{2}^{*}S_{1}^{*}S_{1}S_{2}=S_{2}^{*}S_{2}=1

Regarding the last assertion, by recurrence we just have to establish the formula there for the words of length 1. That is, we want to prove the following formulae:

S1S2=S2S1=0S_{1}^{*}S_{2}=S_{2}^{*}S_{1}=0

But these two formulae follow from the fact that the projections Pi=SiSiP_{i}=S_{i}S_{i}^{*} satisfy by definition P1+P2=1P_{1}+P_{2}=1. Indeed, we have the following computation:

P1+P2=1\displaystyle P_{1}+P_{2}=1 \displaystyle\implies P1P2=0\displaystyle P_{1}P_{2}=0
\displaystyle\implies S1S1S2S2=0\displaystyle S_{1}S_{1}^{*}S_{2}S_{2}^{*}=0
\displaystyle\implies S1S2=S1S1S1S2S2S2=0\displaystyle S_{1}^{*}S_{2}=S_{1}^{*}S_{1}S_{1}^{*}S_{2}S_{2}^{*}S_{2}=0

Thus, we have the first formula, and the proof of the second one is similar. ∎

We can use the formulae in Proposition 7.38 as follows:

Proposition 7.39.

Consider words in O2O_{2}, meaning products of S1,S1,S2,S2S_{1},S_{1}^{*},S_{2},S_{2}^{*}.

  1. (1)

    Each word in O2O_{2} is of form 0 or SiSjS_{i}S_{j}^{*} for some words i,ji,j.

  2. (2)

    Words of type SiSjS_{i}S_{j}^{*} with l(i)=l(j)=kl(i)=l(j)=k form a system of 2k×2k2^{k}\times 2^{k} matrix units.

  3. (3)

    The algebra AkA_{k} generated by matrix units in (2) is a subalgebra of Ak+1A_{k+1}.

Proof.

Here the first two assertions follow from the formulae in Proposition 7.38, and for the last assertion, we can use the following formula:

SiSj=Si1Sj=Si(S1S1+S2S2)SjS_{i}S_{j}^{*}=S_{i}1S_{j}^{*}=S_{i}(S_{1}S_{1}^{*}+S_{2}S_{2}^{*})S_{j}^{*}

Thus, we obtain an embedding of algebras AkA_{k}, as in the statement. ∎

Observe now that the embedding constructed in (3) above is compatible with the matrix unit systems in (2). Consider indeed the following diagram:

Ak+1M2k+1()AkM2k()\begin{matrix}A_{k+1}&\simeq&M_{2^{k+1}}(\mathbb{C})\\ \ &\ &\ \\ \cup&\ &\cup\\ \ &\ &\ \\ A_{k}&\simeq&M_{2^{k}}(\mathbb{C})\end{matrix}

With the notation eix,yj=eijexye_{ix,yj}=e_{ij}\otimes e_{xy}, the inclusion on the right is given by:

eij\displaystyle e_{ij} \displaystyle\to ei1,1h+ei2,2j\displaystyle e_{i1,1h}+e_{i2,2j}
=\displaystyle= eije11+eije22\displaystyle e_{ij}\otimes e_{11}+e_{ij}\otimes e_{22}
=\displaystyle= eij1\displaystyle e_{ij}\otimes 1

Thus, with standard tensor product notations, the inclusion on the right is the canonical inclusion mm1m\to m\otimes 1, and so the above diagram becomes:

Ak+1M2()k+1AkM2()k\begin{matrix}A_{k+1}&\simeq&M_{2}(\mathbb{C})^{\otimes k+1}\\ \ &\ &\ \\ \cup&\ &\cup\\ \ &\ &\ \\ A_{k}&\simeq&M_{2}(\mathbb{C})^{\otimes k}\end{matrix}

The passage from the algebra A=kAkM2()A=\cup_{k}A_{k}\simeq M_{2}(\mathbb{C})^{\otimes\infty} coming from this observation to the full the algebra O2O_{2} that we are interested in can be done by using:

Proposition 7.40.

Each element X<S1,S2>O2X\in<S_{1},S_{2}>\subset O_{2} decomposes as a finite sum

X=i>0S1iXi+X0+i>0XiS1iX=\sum_{i>0}S_{1}^{*i}X_{-i}+X_{0}+\sum_{i>0}X_{i}S_{1}^{i}

where each XiX_{i} is in the union AA of algebras AkA_{k}.

Proof.

By linearity and by using Proposition 7.39 we may assume that XX is a nonzero word, say X=SiSjX=S_{i}S_{j}^{*}. In the case l(i)=l(j)l(i)=l(j) we can set X0=XX_{0}=X and we are done. Otherwise, we just have to add at left or at right terms of the form 1=S1S11=S_{1}^{*}S_{1}. For instance X=S2X=S_{2} is equal to S2S1S1S_{2}S_{1}^{*}S_{1}, and we can take X1=S2S1A1X_{1}=S_{2}S_{1}^{*}\in A_{1}. ∎

We must show now that the decomposition X(Xi)X\to(X_{i}) found above is unique, and then prove that each application XXiX\to X_{i} has good continuity properties. The following formulae show that in both problems we may restrict attention to the case i=0i=0:

Xi+1=(XS1)iXi1=(S1X)iX_{i+1}=(XS_{1}^{*})_{i}\hskip 56.9055ptX_{-i-1}=(S_{1}X)_{i}

In order to solve these questions, we use the following fact:

Proposition 7.41.

If PP is a nonzero projection in 𝒪2=<S1,S2>O2\mathcal{O}_{2}=<S_{1},S_{2}>\subset O_{2}, its kk-th average, given by the formula

Q=l(i)=kSiPSiQ=\sum_{l(i)=k}S_{i}PS_{i}^{*}

is a nonzero projection in 𝒪2\mathcal{O}_{2} having the property that the linear subspace QAkQQA_{k}Q is isomorphic to a matrix algebra, and YQYQY\to QYQ is an isomorphism of AkA_{k} onto it.

Proof.

We know that the words of form SiSjS_{i}S_{j}^{*} with l(i)=l(j)=kl(i)=l(j)=k are a system of matrix units in AkA_{k}. We apply to them the map YQYQY\to QYQ, and we obtain:

QSiSjQ\displaystyle QS_{i}S_{j}^{*}Q =\displaystyle= pqSpPSpSiSjSqPSq\displaystyle\sum_{pq}S_{p}PS_{p}^{*}S_{i}S_{j}^{*}S_{q}PS_{q}^{*}
=\displaystyle= pqδipδjqSpP2Sq\displaystyle\sum_{pq}\delta_{ip}\delta_{jq}S_{p}P^{2}S_{q}^{*}
=\displaystyle= SiPSj\displaystyle S_{i}PS_{j}^{*}

The output being a system of matrix units, YQYQY\to QYQ is an isomorphism from the algebra of matrices AkA_{k} to another algebra of matrices QAkQQA_{k}Q, and this gives the result. ∎

Thus any map YQYQY\to QYQ behaves well on the i=0i=0 part of the decomposition on XX. It remains to find PP such that YQYQY\to QYQ destroys all i0i\neq 0 terms, and we have here:

Proposition 7.42.

Assuming X0AkX_{0}\in A_{k}, there is a nonzero projection PAP\in A such that QXQ=QX0QQXQ=QX_{0}Q, where QQ is the kk-th average of PP.

Proof.

We want YQYQY\to QYQ to map to zero all terms in the decomposition of XX, except for X0X_{0}. Let us call M1,,Mt𝒪2AM_{1},\ldots,M_{t}\in\mathcal{O}_{2}-A the terms to be destroyed. We want the following equalities to hold, with the sum over all pairs of length kk indices:

ijSiPSiMqSjPSj=0\sum_{ij}S_{i}PS_{i}^{*}M_{q}S_{j}PS_{j}^{*}=0

The simplest way is to look for PP such that all terms of all sums are 0:

SiPSiMqSjPSj=0S_{i}PS_{i}^{*}M_{q}S_{j}PS_{j}^{*}=0

By multiplying to the left by SiS_{i}^{*} and to the right by SjS_{j}, we want to have:

PSiMqSjP=0PS_{i}^{*}M_{q}S_{j}P=0

With Nz=SiMqSjN_{z}=S_{i}^{*}M_{q}S_{j}, where zz belongs to some new index set, we want to have:

PNzP=0PN_{z}P=0

Since Nz𝒪2AN_{z}\in\mathcal{O}_{2}-A, we can write Nz=SmzSnzN_{z}=S_{m_{z}}S_{n_{z}}^{*} with l(mz)l(nz)l(m_{z})\neq l(n_{z}), and we want:

PSmzSnzP=0PS_{m_{z}}S_{n_{z}}^{*}P=0

In order to do this, we can the projections of form P=SrSrP=S_{r}S_{r}^{*}. We want:

SrSrSmzSnzSrSr=0S_{r}S_{r}^{*}S_{m_{z}}S_{n_{z}}^{*}S_{r}S_{r}^{*}=0

Let KK be the biggest length of all mz,nzm_{z},n_{z}. Assume that we have fixed rr, of length bigger than KK. If the above product is nonzero then both SrSmzS_{r}^{*}S_{m_{z}} and SnzSrS_{n_{z}}^{*}S_{r} must be nonzero, which gives the following equalities of words:

r1rl(mz)=mz,r1rl(nz)=nzr_{1}\ldots r_{l(m_{z})}=m_{z}\quad,\quad r_{1}\ldots r_{l(n_{z})}=n_{z}

Assuming that these equalities hold indeed, the above product reduces as follows:

SrSrl(r)Srl(mz)+1Srl(nz)+1Srl(r)SrS_{r}S_{r_{l(r)}}^{*}\ldots S_{r_{l(m_{z})+1}}^{*}S_{r_{l(n_{z})+1}}\ldots S_{r_{l(r)}}S_{r}^{*}

Now if this product is nonzero, the middle term must be nonzero:

Srl(r)Srl(mz)+1Srl(nz)+1Srl(r)0S_{r_{l(r)}}^{*}\ldots S_{r_{l(m_{z})+1}}^{*}S_{r_{l(n_{z})+1}}\ldots S_{r_{l(r)}}\neq 0

In order for this for hold, the indices starting from the middle to the right must be equal to the indices starting from the middle to the left. Thus rr must be periodic, of period |l(mz)l(nz)|>0|l(m_{z})-l(n_{z})|>0. But this is certainly possible, because we can take any aperiodic infinite word, and let rr be the sequence of first MM letters, with MM big enough. ∎

We can now start solving our problems. We first have:

Proposition 7.43.

The decomposition of XX is unique, and we have

||Xi||||X||||X_{i}||\leq||X||

for any ii.

Proof.

It is enough to do this for i=0i=0. But this follows from the previous result, via the following sequence of equalities and inequalities:

||X0||=||QX0Q||=||QXQ||||X||||X_{0}||=||QX_{0}Q||=||QXQ||\leq||X||

Thus we got the inequality in the statement. As for the uniqueness part, this follows from the fact that X0QX0Q=QXQX_{0}\to QX_{0}Q=QXQ is an isomorphism. ∎

Remember now we want to prove that the Cuntz algebra O2O_{2} does not depend on the choice of the isometries S1,S2S_{1},S_{2}. In order to do so, let 𝒪¯2\overline{\mathcal{O}}_{2} be the completion of the *-algebra 𝒪2=<S1,S2>O2\mathcal{O}_{2}=<S_{1},S_{2}>\subset O_{2} with respect to the biggest CC^{*}-norm. We have:

Proposition 7.44.

We have the equivalence

X=0Xi=0,iX=0\iff X_{i}=0,\forall i

valid for any element X𝒪¯2X\in\overline{\mathcal{O}}_{2}.

Proof.

Assume Xi=0X_{i}=0 for any ii, and choose a sequence XkXX^{k}\to X with Xk𝒪2X^{k}\in\mathcal{O}_{2}. For λ𝕋\lambda\in\mathbb{T} we define a representation ρλ\rho_{\lambda} in the following way:

ρλ:SiλSi\rho_{\lambda}:S_{i}\to\lambda S_{i}

We have then ρλ(Y)=Y\rho_{\lambda}(Y)=Y for any element YAY\in A. We fix norm one vectors ξ,η\xi,\eta and we consider the following continuous functions f:𝕋f:\mathbb{T}\to\mathbb{C}:

fk(λ)=<ρλ(Xk)ξ,η>f^{k}(\lambda)=<\rho_{\lambda}(X^{k})\xi,\eta>

From XkXX^{k}\to X we get, with respect to the usual sup norm of C(𝕋)C(\mathbb{T}):

fkff^{k}\to f

Each Xk𝒪2X^{k}\in\mathcal{O}_{2} can be decomposed, and fkf^{k} is given by the following formula:

fk(λ)=i>0λi<S1iXkiξ,η>+<X0ξ,η>+i>0λi<XikS1iξ,η>f^{k}(\lambda)=\sum_{i>0}\lambda^{-i}<S_{1}^{*i}X^{k}_{-i}\xi,\eta>+<X_{0}\xi,\eta>+\sum_{i>0}\lambda^{i}<X_{i}^{k}S_{1}^{i}\xi,\eta>

This is a Fourier type expansion of fkf^{k}, that can we write in the following way:

fk(λ)=j=ajkλjf^{k}(\lambda)=\sum_{j=-\infty}^{\infty}a_{j}^{k}\lambda^{j}

By using Proposition 7.43 we obtain that with kk\to\infty, we have:

|ajk|||Xjk||||Xj||=0|a_{j}^{k}|\leq||X_{j}^{k}||\to||X_{j}^{\infty}||=0

On the other hand we have ajkaja_{j}^{k}\to a_{j} with kk\to\infty. Thus all Fourier coefficients aja_{j} of ff are zero, so f=0f=0. With λ=1\lambda=1 this gives the following equality:

<Xξ,η>=0<X\xi,\eta>=0

This is true for arbitrary norm one vectors ξ,η\xi,\eta, so X=0X=0 and we are done. ∎

We can now formulate the Cuntz theorem, from [cun], as follows:

Theorem 7.45 (Cuntz).

Let S1,S2S_{1},S_{2} be isometries satisfying S1S1+S2S2=1S_{1}S_{1}^{*}+S_{2}S_{2}^{*}=1.

  1. (1)

    The CC^{*}-algebra O2O_{2} generated by S1,S2S_{1},S_{2} does not depend on the choice of S1,S2S_{1},S_{2}.

  2. (2)

    For any nonzero XO2X\in O_{2} there are A,BO2A,B\in O_{2} with AXB=1AXB=1.

  3. (3)

    In particular O2O_{2} is simple.

Proof.

This basically follows from the various results established above:

(1) Consider the canonical projection map π:O¯2O2\pi:\overline{O}_{2}\to O_{2}. We know that π\pi is surjective, and we will prove now that π\pi is injective. Indeed, if π(X)=0\pi(X)=0 then π(X)i=0\pi(X)_{i}=0 for any ii. But π(X)i\pi(X)_{i} is in the dense *-algebra AA, so it can be regarded as an element of O¯2\overline{O}_{2}, and with this identification, we have π(X)i=Xi\pi(X)_{i}=X_{i} in O¯2\overline{O}_{2}. Thus Xi=0X_{i}=0 for any ii, so X=0X=0. Thus π\pi is an isomorphism. On the other hand O¯2\overline{O}_{2} depends only on 𝒪2\mathcal{O}_{2}, and the above formulae in 𝒪2\mathcal{O}_{2}, for algebraic calculus and for decomposition of an arbitrary X𝒪2X\in\mathcal{O}_{2}, show that 𝒪2\mathcal{O}_{2} does not depend on the choice of S1,S2S_{1},S_{2}. Thus, we obtain the result.

(2) Choose a sequence XkXX^{k}\to X with Xk𝒪2X^{k}\in\mathcal{O}_{2}. We have the following formula:

(XX)0=limk(i>0XkiXki+Xk0Xk0+i>0S1iXkiXkiS1i)(X^{*}X)_{0}=\lim_{k\to\infty}\left(\sum_{i>0}X^{k*}_{-i}X^{k}_{-i}+X^{k*}_{0}X^{k}_{0}+\sum_{i>0}S_{1}^{*i}X^{k*}_{i}X^{k}_{i}S_{1}^{i}\right)

Thus X0X\neq 0 implies (XX)00(X^{*}X)_{0}\neq 0. By linearity we can assume that we have:

||(XX)0||=1||(X^{*}X)_{0}||=1

Now choose a positive element Y𝒪2Y\in\mathcal{O}_{2} which is close enough to XXX^{*}X:

||XXY||<ε||X^{*}X-Y||<\varepsilon

Since ZZ0Z\to Z_{0} is norm decreasing, we have the following estimate:

||Y0||>1ε||Y_{0}||>1-\varepsilon

We apply Proposition 7.42 to our positive element Y𝒪2Y\in\mathcal{O}_{2}. We obtain in this way a certain projection QQ such that QY0Q=QYQQY_{0}Q=QYQ belongs to a certain matrix algebra. We have QYQ>0QYQ>0, so we can diagonalize this latter element, as follows:

QYQ=λiRiQYQ=\sum\lambda_{i}R_{i}

Here λi\lambda_{i} are positive numbers and RiR_{i} are minimal projections in the matrix algebra. Now since ||QYQ||=||Y0||||QYQ||=||Y_{0}||, there must be an eigenvalue greater that 1ε1-\varepsilon:

λ0>1ε\lambda_{0}>1-\varepsilon

By linear algebra, we can pass from a minimal projection to another:

UU=Ri,UU=S1kS1kU^{*}U=R_{i}\quad,\quad UU^{*}=S_{1}^{k}S_{1}^{*k}

The element B=QUS1kB=QU^{*}S_{1}^{k} has norm 1\leq 1, and we get the following inequality:

||1BXXB||\displaystyle||1-B^{*}X^{*}XB|| \displaystyle\leq ||1BYB||+||BYBBXXB||\displaystyle||1-B^{*}YB||+||B^{*}YB-B^{*}X^{*}XB||
<\displaystyle< ||1BYB||+ε\displaystyle||1-B^{*}YB||+\varepsilon

The last term can be computed by using the diagonalization of QYQQYQ, as follows:

BYB\displaystyle B^{*}YB =\displaystyle= S1kUQYQUS1k\displaystyle S_{1}^{*k}UQYQU^{*}S_{1}^{k}
=\displaystyle= S1k(λiURiU)S1k\displaystyle S_{1}^{*k}\left(\sum\lambda_{i}UR_{i}U^{*}\right)S_{1}^{k}
=\displaystyle= λ0S1kS1kS1kS1k\displaystyle\lambda_{0}S_{1}^{*k}S_{1}^{k}S_{1}^{*k}S_{1}^{k}
=\displaystyle= λ0\displaystyle\lambda_{0}

From λ0>1ε\lambda_{0}>1-\varepsilon we get ||1BYB||<ε||1-B^{*}YB||<\varepsilon, and we obtain the following estimate:

||1BXXB||<2ε||1-B^{*}X^{*}XB||<2\varepsilon

Thus BXXBB^{*}X^{*}XB is invertible, say with inverse CC, and we have (BX)X(BC)=1(B^{*}X^{*})X(BC)=1.

(3) This is clear from the formula AXB=1AXB=1 established in (2). ∎

7e. Exercises

We have seen many things in this chapter, and there are many potential exercises, on all this. We will be however short, and as unique, key exercise, we have:

Exercise 7.46.

Work out the proof of the existence result for the Haar measure on a compact group GG, as a particular case of the result proved for quantum groups.

This is of course something very standard, the problem being that of eliminating algebras, linear forms and other functional analysis notions from the proof for the quantum groups, as to have in the end something talking about spaces, and measures on them.

Chapter 8 Geometric aspects

8a. Topology, K-theory

This chapter is a continuation of the previous one, meant to be a grand finale to the CC^{*}-algebra theory that we started to develop there, before getting back to more traditional von Neumann algebra material, following Murray, von Neumann and others. There are countless things to be said, and possible paths to be taken. En hommage to Connes, and his book [co3], which is probably the finest ever on CC^{*}-algebras, we will adopt a geometric viewpoint. To be more precise, we know that a CC^{*}-algebra is a beast of type A=C(X)A=C(X), with XX being a compact quantum space. So, it is about the “geometry” of XX that we would like to talk about, everything else being rather of administrative nature.


Let us first look at the classical case, where XX is a usual compact space. You might say right away that wrong way, what we need for doing geometry is a manifold. But my answer here is modesty, and no hurry. It is right that you cannot do much geometry with a compact space XX, but you can do some, and we have here, for instance:

Definition 8.1.

Given a compact space XX, its first KK-theory group K0(X)K_{0}(X) is the group of formal differences of complex vector bundles over XX.

This notion is quite interesting, and we can talk in fact about higher KK-theory groups Kn(X)K_{n}(X) as well, and all this is related to the homotopy groups πn(X)\pi_{n}(X) too. There are many non-trivial results on the subject, the end of the game being of course that of understanding the “shape” of XX, that you need to know a bit about, before getting into serious geometry, in the case where XX happens to be a manifold.


As a question for us now, operator algebra theorists, we have:

Question 8.2.

Can we talk about the first KK-theory group K0(X)K_{0}(X) of a compact quantum space XX?

We will see that this is a quite subtle question. To be more precise, we will see that we can talk, in a quite straightforward way, of the group K0(A)K_{0}(A) of an arbitrary CC^{*}-algebra AA, which is constructed as to have K0(A)=K0(X)K_{0}(A)=K_{0}(X) in the commutative case, where A=C(X)A=C(X), with XX being a usual compact space. In the noncommutative case, however, K0(A)K_{0}(A) will sometimes depend on the choice of AA satisfying A=C(X)A=C(X), and so all this will eventually lead to a sort of dead end, and to a rather “no” answer to Question 8.2.


Getting started now, in order to talk about the first KK-theory group K0(A)K_{0}(A) of an arbitrary CC^{*}-algebra AA, we will need the following simple fact:

Proposition 8.3.

Given a CC^{*}-algebra AA, the finitely generated projective AA-modules EE appear via quotient maps f:AnEf:A^{n}\to E, so are of the form

E=pAnE=pA^{n}

with pMn(A)p\in M_{n}(A) being an idempotent. In the commutative case, A=C(X)A=C(X) with XX classical, these AA-modules consist of sections of the complex vector bundles over XX.

Proof.

Here the first assertion is clear from definitions, via some standard algebra, and the second assertion is clear from definitions too, again via some algebra. ∎

With this in hand, let us go back to Definition 8.1. Given a compact space XX, it is now clear that its KK-theory group K0(X)K_{0}(X) can be recaptured from the knowledge of the associated CC^{*}-algebra A=C(X)A=C(X), and to be more precise we have K0(X)=K0(A)K_{0}(X)=K_{0}(A), when the first KK-theory group of an arbitrary CC^{*}-algebra is constructed as follows:

Definition 8.4.

The first KK-theory group of a CC^{*}-algebra AA is the group of formal differences

K0(A)={pq}K_{0}(A)=\big{\{}p-q\big{\}}

of equivalence classes of projections pMn(A)p\in M_{n}(A), with the equivalence being given by

pqu,uu=p,uu=qp\sim q\iff\exists u,uu^{*}=p,u^{*}u=q

and with the additive structure being the obvious one, by diagonal concatenation.

This is very nice, and as a first example, we have K0()=K_{0}(\mathbb{C})=\mathbb{Z}. More generally, as already mentioned above, it follows from Proposition 8.3 that in the commutative case, where A=C(X)A=C(X) with XX being a compact space, we have K0(A)=K0(X)K_{0}(A)=K_{0}(X). Observe also that we have, by definition, the following formula, valid for any nn\in\mathbb{N}:

K0(A)=K0(Mn(A))K_{0}(A)=K_{0}(M_{n}(A))

Some further elementary observations include the fact that K0K_{0} behaves well with respect to direct sums and with inductive limits, and also that K0K_{0} is a homotopy invariant, and for details here, we refer to any introductory book on the subject, such as [bla].


In what concerns us, back to our Question 8.2, what has been said above is certainly not enough for investigating our question, and we need more examples. However, these examples are not easy to find, and for getting them, we need more theory. We have:

Definition 8.5.

The second KK-theory group of a CC^{*}-algebra AA is the group of connected components of the unitary group of GL(A)GL_{\infty}(A), with

GLn(A)GLn+1(A),a(a001)GL_{n}(A)\subset GL_{n+1}(A)\quad,\quad a\to\begin{pmatrix}a&0\\ 0&1\end{pmatrix}

being the embeddings producing the inductive limit GL(A)GL_{\infty}(A).

Again, for a basic example we can take A=A=\mathbb{C}, and we have here K1()={1}K_{1}(\mathbb{C})=\{1\}, trivially. In fact, in the commutative case, where A=C(X)A=C(X), with XX being a usual compact space, it is possible to establish a formula of type K1(A)=K1(X)K_{1}(A)=K_{1}(X). Further elementary observations include the fact that K1K_{1} behaves well with respect to direct sums and with inductive limits, and also that K1K_{1} is a homotopy invariant.


Importantly, the first and second KK-theory groups are related, as follows:

Theorem 8.6.

Given a CC^{*}-algebra AA, we have isomorphisms as follows, with

SA={fC([0,1],A)|f(0)=0}SA=\left\{f\in C([0,1],A)\Big{|}f(0)=0\right\}

standing for the suspension operation for the CC^{*}-algebras:

  1. (1)

    K1(A)=K0(SA)K_{1}(A)=K_{0}(SA).

  2. (2)

    K0(A)=K1(SA)K_{0}(A)=K_{1}(SA).

Proof.

Here the isomorphism in (1) is something rather elementary, and the isomorphism in (2) is something more complicated. In both cases, the idea is to start first with the commutative case, where A=C(X)A=C(X) with XX being a compact space, and understand there the isomorphisms (1,2), called Bott periodicity isomorphisms. Then, with this understood, the extension to the general CC^{*}-algebra case is quite straightforward. ∎

The above result is quite interesting, making it clear that the groups K0,K1K_{0},K_{1} are of the same nature. In fact, it is possible to be a bit more abstract here, and talk in various clever ways about the higher KK-theory groups, Kn(A)K_{n}(A) with nn\in\mathbb{N}, of an arbitrary CC^{*}-algebra, with the result that these higher KK-theory groups are subject to Bott periodicity:

Kn(A)=Kn+2(A)K_{n}(A)=K_{n+2}(A)

However, in practice, this leads us back to Definition 8.4, Definition 8.5 and Theorem 8.6, with these statements containing in fact all we need to know, at n=0,1n=0,1.


Going ahead with examples, following Cuntz [cun] and related papers, we have:

Theorem 8.7.

The KK-theory groups of the Cuntz algebra OnO_{n} are given by

K0(On)=n1,K1(On)={1}K_{0}(O_{n})=\mathbb{Z}_{n-1}\quad,\quad K_{1}(O_{n})=\{1\}

with the equivalent projections Pi=SiSiP_{i}=S_{i}S_{i}^{*} standing for the standard generator of n1\mathbb{Z}_{n-1}.

Proof.

We recall that the Cuntz algebra OnO_{n} is generated by isometries S1,,SnS_{1},\ldots,S_{n} satisfying S1S1++SnSn=1S_{1}S_{1}^{*}+\ldots+S_{n}S_{n}^{*}=1. Since we have SiSi=1S_{i}^{*}S_{i}=1, with Pi=SiSiP_{i}=S_{i}S_{i}^{*}, we have:

P1Pn1P_{1}\sim\ldots\sim P_{n}\sim 1

On the other hand, we also know that we have P1++Pn=1P_{1}+\ldots+P_{n}=1, and the conclusion is that, in the first KK-theory group K1(On)K_{1}(O_{n}), the following equality happens:

n[1]=[1]n[1]=[1]

Thus (n1)[1]=0(n-1)[1]=0, and it is quite elementary to prove that k[1]=0k[1]=0 happens in fact precisely when kk is a multiple of n1n-1. Thus, we have a group embedding, as follows:

n1K0(On)\mathbb{Z}_{n-1}\subset K_{0}(O_{n})

The whole point now is that of proving that this group embedding is an isomorphism, which in practice amounts in proving that any projection in OnO_{n} is equivalent to a sum of the form P1++PkP_{1}+\ldots+P_{k}, with Pi=SiSiP_{i}=S_{i}S_{i}^{*} as above. Which is something non-trivial, requiring the use of Bott periodicity, and the consideration of the second KK-theory group K1(On)K_{1}(O_{n}) as well, and for details here, we refer to Cuntz [cun] and related papers. ∎

The above result is very interesting, for various reasons. First, it shows that the structure of the first KK-theory groups K0(A)K_{0}(A) of the arbitrary CC^{*}-algebras can be more complicated than that of the first KK-theory groups K0(X)K_{0}(X) of the usual compact spaces XX, with the group K0(A)K_{0}(A) being for instance not ordered, in the case A=OnA=O_{n}, and with this being the first in a series of no-go observations that can be formulated.


Second, and on a positive note now, what we have in Theorem 8.7 is a true noncommutative computation, dealing with an algebra which is rather of “free” type. The outcome of the computation is something nice and clear, suggesting that, modulo the small technical issues mentioned above, we are on our way of developing a nice theory, and that the answer to Question 8.2 might be “yes”. However, as bad news, we have:

Theorem 8.8.

There are discrete groups Γ\Gamma having the property that the projection

π:C(Γ)Cred(Γ)\pi:C^{*}(\Gamma)\to C^{*}_{red}(\Gamma)

is not an isomorphism, at the level of KK-theory groups.

Proof.

For constructing such a counterexample, the group Γ\Gamma must be definitely non-amenable, and the first thought goes to the free group F2F_{2}. But it is possible to prove that F2F_{2} is KK-amenable, in the sense that π\pi is an isomorphism at the KK-theory level. However, counterexamples do exist, such as the infinite groups Γ\Gamma having Kazhdan’s property (T)(T). Indeed, for such a group the asssociated Kazhdan projection pK0(C(Γ))p\in K_{0}(C^{*}(\Gamma)) is nonzero, while mapping to the zero element 0K0(Cred(Γ))0\in K_{0}(C^{*}_{red}(\Gamma)), so we have our counterexample. ∎

As a conclusion to all this, which might seem a bit dissapointing, we have:

Conclusion 8.9.

The answer to Question 8.2 is no.

Of course, the answer to Question 8.2 remains “yes” in many cases, the general idea being that, as long as we don’t get too far away from the classical case, the answer remains “yes”, so we can talk about the KK-theory groups of our compact quantum spaces XX, and also, about countless other invariants inspired from the classical theory. For a survey of what can be done here, including applications too, we refer to Connes’ book [co3].


In what concerns us, however, we will not take this path. For various reasons, coming from certain quantum physics beliefs, which can be informally summarized as “at sufficiently tiny scales, freeness rules”, we will be rather interested in this book in compact quantum spaces XX which are of “free” type, and we will only accept geometric invariants for them which are well-defined. And KK-theory, unfortunately, does not qualify.

8b. Free probability

As a solution to the difficulties met in the previous section, let us turn to probability. This is surely not geometry, in a standard sense, but at a more advanced level, geometry that is. For instance if you have a quantum manifold XX, and you want to talk about its Laplacian, or its Dirac operator, you will certainly need to know a bit about L2(X)L^{2}(X). And isn’t advanced measure theory the same as probability theory, hope we agree on this.


Let us start our discussion with something that we know since chapter 5:

Definition 8.10.

Let AA be a CC^{*}-algebra, given with a trace tr:Atr:A\to\mathbb{C}.

  1. (1)

    The elements aAa\in A are called random variables.

  2. (2)

    The moments of such a variable are the numbers Mk(a)=tr(ak)M_{k}(a)=tr(a^{k}).

  3. (3)

    The law of such a variable is the functional μ:Ptr(P(a))\mu:P\to tr(P(a)).

Here the exponent k=k=\circ\bullet\bullet\circ\ldots is as before a colored integer, with the powers aka^{k} being defined by multiplicativity and the usual formulae, namely:

a=1,a=a,a=aa^{\emptyset}=1\quad,\quad a^{\circ}=a\quad,\quad a^{\bullet}=a^{*}

As for the polynomial PP, this is a noncommuting *-polynomial in one variable:

P<X,X>P\in\mathbb{C}<X,X^{*}>

Generally speaking, the above definition is something quite abstract, but there is no other way of doing things, at least at this level of generality. However, in the special case where our variable aAa\in A is self-adjoint, or more generally normal, we have:

Proposition 8.11.

The law of a normal variable aAa\in A can be identified with the corresponding spectral measure μ𝒫()\mu\in\mathcal{P}(\mathbb{C}), according to the following formula,

tr(f(a))=σ(a)f(x)dμ(x)tr(f(a))=\int_{\sigma(a)}f(x)d\mu(x)

valid for any fL(σ(a))f\in L^{\infty}(\sigma(a)), coming from the measurable functional calculus. In the self-adjoint case the spectral measure is real, μ𝒫()\mu\in\mathcal{P}(\mathbb{R}).

Proof.

This is something that we again know well, either from chapter 5, or simply from chapter 3, coming from the spectral theorem for normal operators. ∎

Let us discuss now independence, and its noncommutative versions. As a starting point, we have the following update of the classical notion of independence:

Definition 8.12.

We call two subalgebras B,CAB,C\subset A independent when the following condition is satisfied, for any xBx\in B and yCy\in C:

tr(xy)=tr(x)tr(y)tr(xy)=tr(x)tr(y)

Equivalently, the following condition must be satisfied, for any xBx\in B and yCy\in C:

tr(x)=tr(y)=0tr(xy)=0tr(x)=tr(y)=0\implies tr(xy)=0

Also, b,cAb,c\in A are called independent when B=<b>B=<b> and C=<c>C=<c> are independent.

It is possible to develop some theory here, but this leads to the usual CLT. As a much more interesting notion now, we have Voiculescu’s freeness [vo1]:

Definition 8.13.

Given a pair (A,tr)(A,tr), we call two subalgebras B,CAB,C\subset A free when the following condition is satisfied, for any xiBx_{i}\in B and yiCy_{i}\in C:

tr(xi)=tr(yi)=0tr(x1y1x2y2)=0tr(x_{i})=tr(y_{i})=0\implies tr(x_{1}y_{1}x_{2}y_{2}\ldots)=0

Also, b,cAb,c\in A are called free when B=<b>B=<b> and C=<c>C=<c> are free.

As a first observation, there is a certain lack of symmetry between Definition 8.12 and Definition 8.13, because the latter does not include an explicit formula for quantities of type tr(x1y1x2y2)tr(x_{1}y_{1}x_{2}y_{2}\ldots). But this can be done, the precise result being as follows:

Proposition 8.14.

If B,CAB,C\subset A are free, the restriction of trtr to <B,C><B,C> can be computed in terms of the restrictions of trtr to B,CB,C. To be more precise, we have

tr(x1y1x2y2)=P({tr(xi1xi2)}i,{tr(yj1yj2)}j)tr(x_{1}y_{1}x_{2}y_{2}\ldots)=P\Big{(}\{tr(x_{i_{1}}x_{i_{2}}\ldots)\}_{i},\{tr(y_{j_{1}}y_{j_{2}}\ldots)\}_{j}\Big{)}

where PP is certain polynomial, depending on the length of x1y1x2y2x_{1}y_{1}x_{2}y_{2}\ldots\,, having as variables the traces of products xi1xi2x_{i_{1}}x_{i_{2}}\ldots and yj1yj2y_{j_{1}}y_{j_{2}}\ldots\,, with i1<i2<i_{1}<i_{2}<\ldots and j1<j2<j_{1}<j_{2}<\ldots

Proof.

With x=xtr(x)x^{\prime}=x-tr(x), we can start our computation as follows:

tr(x1y1x2y2)\displaystyle tr(x_{1}y_{1}x_{2}y_{2}\ldots) =\displaystyle= tr[(x1+tr(x1))(y1+tr(y1))(x2+tr(x2))]\displaystyle tr\big{[}(x_{1}^{\prime}+tr(x_{1}))(y_{1}^{\prime}+tr(y_{1}))(x_{2}^{\prime}+tr(x_{2}))\ldots\big{]}
=\displaystyle= tr(x1y1x2y2)+otherterms\displaystyle tr(x_{1}^{\prime}y_{1}^{\prime}x_{2}^{\prime}y_{2}^{\prime}\ldots)+{\rm other\ terms}
=\displaystyle= otherterms\displaystyle{\rm other\ terms}

Thus, we are led to a kind of recurrence, and this gives the result. ∎

Let us discuss now some examples of independence and freeness. We first have the following result, from [vo1], which is something elementary:

Proposition 8.15.

Given two algebras (A,tr)(A,tr) and (B,tr)(B,tr), the following hold:

  1. (1)

    A,BA,B are independent inside their tensor product ABA\otimes B, endowed with its canonical tensor product trace, given on basic tensors by tr(ab)=tr(a)tr(b)tr(a\otimes b)=tr(a)tr(b).

  2. (2)

    A,BA,B are free inside their free product ABA*B, endowed with its canonical free product trace, given by the formulae in Proposition 8.14.

Proof.

Both the assertions are indeed clear from definitions, with just some standard discussion needed for (2), in connection with the free product trace. See [vo1]. ∎

More concretely now, we have the following result, also from Voiculescu [vo1]:

Proposition 8.16.

We have the following results, valid for group algebras:

  1. (1)

    L(Γ),L(Λ)L(\Gamma),L(\Lambda) are independent inside L(Γ×Λ)L(\Gamma\times\Lambda).

  2. (2)

    L(Γ),L(Λ)L(\Gamma),L(\Lambda) are free inside L(ΓΛ)L(\Gamma*\Lambda).

Proof.

In order to prove these results, we can use the general results in Proposition 8.15, along with the following two isomorphisms, which are both standard:

L(Γ×Λ)=L(Λ)L(Γ),L(ΓΛ)=L(Λ)L(Γ)L(\Gamma\times\Lambda)=L(\Lambda)\otimes L(\Gamma)\quad,\quad L(\Gamma*\Lambda)=L(\Lambda)*L(\Gamma)

Alternatively, we can check the independence and freeness formulae on group elements, which is something trivial, and then conclude by linearity. See [vo1]. ∎

We have already seen limiting theorems in classical probability, in chapter 6. In order to deal now with freeness, let us develop some tools. First, we have:

Proposition 8.17.

We have a well-defined operation \boxplus, given by

μaμb=μa+b\mu_{a}\boxplus\mu_{b}=\mu_{a+b}

with a,ba,b being free, called free convolution.

Proof.

We need to check here that if a,ba,b are free, then the distribution μa+b\mu_{a+b} depends only on the distributions μa,μb\mu_{a},\mu_{b}. But for this purpose, we can use the formula in Proposition 8.14. Indeed, by plugging in arbitrary powers of a,ba,b as variables xi,yjx_{i},y_{j}, we obtain a family of formulae of the following type, with QQ being certain polyomials:

tr(ak1bl1ak2bl2)=P({tr(ak)}k,{tr(bl)}l)tr(a^{k_{1}}b^{l_{1}}a^{k_{2}}b^{l_{2}}\ldots)=P\Big{(}\{tr(a^{k})\}_{k},\{tr(b^{l})\}_{l}\Big{)}

Thus the moments of a+ba+b depend only on the moments of a,ba,b, and the same argument shows that the same holds for *-moments, and this gives the result. ∎

In order to advance now, we would need an analogue of the Fourier transform, or rather of the log of the Fourier transform. Quite remarkably, such a transform exists indeed, the precise result here, due to Voiculescu [vo1], being as follows:

Theorem 8.18.

Given a probability measure μ\mu, define its RR-transform as follows:

Gμ(ξ)=dμ(t)ξtGμ(Rμ(ξ)+1ξ)=ξG_{\mu}(\xi)=\int_{\mathbb{R}}\frac{d\mu(t)}{\xi-t}\implies G_{\mu}\left(\ R_{\mu}(\xi)+\frac{1}{\xi}\right)=\xi

The free convolution operation is then linearized by the RR-transform.

Proof.

This is something quite tricky, the idea being as follows:

(1) In order to model the free convolution, the best is to use creation operators on free Fock spaces, corresponding to the semigroup von Neumann algebras L(k)L(\mathbb{N}^{*k}). Indeed, we have some freeness here, a bit in the same way as in the free group algebras L(Fk)L(F_{k}).

(2) The point now, motivating this choice, is that the variables of type S+f(S)S^{*}+f(S), with SL()S\in L(\mathbb{N}) being the shift, and with f[X]f\in\mathbb{C}[X] being an arbitrary polynomial, are easily seen to model in moments all the possible distributions μ:[X]\mu:\mathbb{C}[X]\to\mathbb{C}.

(3) Now let f,g[X]f,g\in\mathbb{C}[X] and consider the variables S+f(S)S^{*}+f(S) and T+g(T)T^{*}+g(T), where S,TL()S,T\in L(\mathbb{N}*\mathbb{N}) are the shifts corresponding to the generators of \mathbb{N}*\mathbb{N}. These variables are free, and by using a 4545^{\circ} argument, their sum has the same law as S+(f+g)(S)S^{*}+(f+g)(S).

(4) Thus the operation μf\mu\to f linearizes the free convolution. We are therefore left with a computation inside L()L(\mathbb{N}), which is elementary, and whose conclusion is that Rμ=fR_{\mu}=f can be recaptured from μ\mu via the Cauchy transform GμG_{\mu}, as in the statement. ∎

With the above linearization technology in hand, we can now establish the following remarkable free analogue of the CLT, also due to Voiculescu [vo1]:

Theorem 8.19 (Free CLT).

Given self-adjoint variables x1,x2,x3,,x_{1},x_{2},x_{3},\ldots, which are f.i.d., centered, with variance t>0t>0, we have, with nn\to\infty, in moments,

1ni=1nxiγt\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i}\sim\gamma_{t}

where γt=12πt4tx2dx\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx is the Wigner semicircle law of parameter tt.

Proof.

We follow the same idea as in the proof of the CLT:

(1) At t=1t=1, the RR-transform of the variable in the statement can be computed by using the linearization property from Theorem 8.18, and is given by:

R(ξ)=nRx(ξn)ξR(\xi)=nR_{x}\left(\frac{\xi}{\sqrt{n}}\right)\simeq\xi

(2) On the other hand, some standard computations show that the Cauchy transform of the Wigner law γ1\gamma_{1} satisfies the following equation:

Gγ1(ξ+1ξ)=ξG_{\gamma_{1}}\left(\xi+\frac{1}{\xi}\right)=\xi

Thus, by using Theorem 8.18, we have the following formula:

Rγ1(ξ)=ξR_{\gamma_{1}}(\xi)=\xi

(3) We conclude that the laws in the statement have the same RR-transforms, and so they are equal. The passage to the general case, t>0t>0, is routine, by dilation. ∎

In the complex case now, we have a similar result, also from [vo1], as follows:

Theorem 8.20 (Free CCLT).

Given random variables x1,x2,x3,x_{1},x_{2},x_{3},\ldots which are f.i.d., centered, with variance t>0t>0, we have, with nn\to\infty, in moments,

1ni=1nxiΓt\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i}\sim\Gamma_{t}

where Γt=law((a+ib)/2)\Gamma_{t}=law\big{(}(a+ib)/\sqrt{2}\big{)}, with a,ba,b being free, each following the Wigner semicircle law γt\gamma_{t}, is the Voiculescu circular law of parameter tt.

Proof.

This follows indeed from the free CLT, established before, simply by taking real and imaginary parts of all the variables involved. ∎

Now that we are done with the basic results in continuous case, let us discuss as well the discrete case. We can establish a free version of the PLT, as follows:

Theorem 8.21 (Free PLT).

The following limit converges, for any t>0t>0,

limn((1tn)δ0+tnδ1)n\lim_{n\to\infty}\left(\left(1-\frac{t}{n}\right)\delta_{0}+\frac{t}{n}\delta_{1}\right)^{\boxplus n}

and we obtain the Marchenko-Pastur law of parameter tt,

πt=max(1t,0)δ0+4t(x1t)22πxdx\pi_{t}=\max(1-t,0)\delta_{0}+\frac{\sqrt{4t-(x-1-t)^{2}}}{2\pi x}\,dx

also called free Poisson law of parameter tt.

Proof.

Let μ\mu be the measure in the statement, appearing under the convolution sign. The Cauchy transform of this measure is elementary to compute, given by:

Gμ(ξ)=(1tn)1ξ+tn1ξ1G_{\mu}(\xi)=\left(1-\frac{t}{n}\right)\frac{1}{\xi}+\frac{t}{n}\cdot\frac{1}{\xi-1}

By using Theorem 8.18, we want to compute the following RR-transform:

R=Rμn(y)=nRμ(y)R=R_{\mu^{\boxplus n}}(y)=nR_{\mu}(y)

We know that the equation for this function RR is as follows:

(1tn)1y1+R/n+tn1y1+R/n1=y\left(1-\frac{t}{n}\right)\frac{1}{y^{-1}+R/n}+\frac{t}{n}\cdot\frac{1}{y^{-1}+R/n-1}=y

With nn\to\infty we obtain from this the following formula:

R=t1yR=\frac{t}{1-y}

But this being the RR-transform of πt\pi_{t}, via some calculus, we are done. ∎

As a first application now of all this, following Voiculescu [vo2], we have:

Theorem 8.22.

Given a sequence of complex Gaussian matrices ZNMN(L(X))Z_{N}\in M_{N}(L^{\infty}(X)), having independent GtG_{t} variables as entries, with t>0t>0, we have

ZNNΓt\frac{Z_{N}}{\sqrt{N}}\sim\Gamma_{t}

in the NN\to\infty limit, with the limiting measure being Voiculescu’s circular law.

Proof.

We know from chapter 6 that the asymptotic moments are:

Mk(ZNN)t|k|/2|𝒩𝒞2(k)|M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{|k|/2}|\mathcal{NC}_{2}(k)|

On the other hand, the free Fock space analysis done in the proof of Theorem 8.18 shows that we have, with the notations there, the following formulae:

S+Sγ1,S+TΓ1S+S^{*}\sim\gamma_{1}\quad,\quad S+T^{*}\sim\Gamma_{1}

By doing some combinatorics, this shows that an abstract noncommutative variable aAa\in A is circular, following the law Γt\Gamma_{t}, precisely when its moments are:

Mk(a)=t|k|/2|𝒩𝒞2(k)|M_{k}(a)=t^{|k|/2}|\mathcal{NC}_{2}(k)|

Thus, we are led to the conclusion in the statement. See [vo2]. ∎

Next in line, comes the main result of Voiculescu in [vo2], as follows:

Theorem 8.23.

Given a family of sequences of Wigner matrices,

ZiNMN(L(X)),iIZ^{i}_{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with pairwise independent entries, each following the complex normal law GtG_{t}, with t>0t>0, up to the constraint ZNi=(ZNi)Z_{N}^{i}=(Z_{N}^{i})^{*}, the rescaled sequences of matrices

ZiNNMN(L(X)),iI\frac{Z^{i}_{N}}{\sqrt{N}}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with NN\to\infty semicircular, each following the Wigner law γt\gamma_{t}, and free.

Proof.

We can assume that we are dealing with 2 sequences of matrices, ZN,ZNZ_{N},Z_{N}^{\prime}. In order to prove the asymptotic freeness, consider the following matrix:

YN=12(ZN+iZN)Y_{N}=\frac{1}{\sqrt{2}}(Z_{N}+iZ_{N}^{\prime})

This is then a complex Gaussian matrix, so by using Theorem 8.22, we have:

YNNΓt\frac{Y_{N}}{\sqrt{N}}\sim\Gamma_{t}

We are therefore in the situation where (ZN+iZN)/N(Z_{N}+iZ_{N}^{\prime})/\sqrt{N}, which has asymptotically semicircular real and imaginary parts, converges to the distribution of a free combination of such variables. Thus ZN,ZNZ_{N},Z_{N}^{\prime} become asymptotically free, as desired. ∎

Getting now to the complex case, we have a similar result here, as follows:

Theorem 8.24.

Given a family of sequences of complex Gaussian matrices,

ZiNMN(L(X)),iIZ^{i}_{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with pairwise independent entries, each following the law GtG_{t}, with t>0t>0, the matrices

ZiNNMN(L(X)),iI\frac{Z^{i}_{N}}{\sqrt{N}}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with NN\to\infty circular, each following the Voiculescu law Γt\Gamma_{t}, and free.

Proof.

This follows indeed from Theorem 8.23, which applies to the real and imaginary parts of our complex Gaussian matrices, and gives the result. ∎

Finally, we have as well a similar result for the Wishart matrices, as follows:

Theorem 8.25.

Given a family of sequences of complex Wishart matrices,

ZiN=YiN(YiN)MN(L(X)),iIZ^{i}_{N}=Y^{i}_{N}(Y^{i}_{N})^{*}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with each YiNY^{i}_{N} being a N×MN\times M matrix, with entries following the normal law G1G_{1}, and with all these entries being pairwise independent, the rescaled sequences of matrices

ZiNNMN(L(X)),iI\frac{Z^{i}_{N}}{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with M=tNM=tN\to\infty Marchenko-Pastur, each following the law πt\pi_{t}, and free.

Proof.

Here the first assertion is the Marchenko-Pastur theorem, from chapter 6, and the second assertion follows from Theorem 8.23, or from Theorem 8.24. ∎

Let us develop now some further limiting theorems, classical and free. We have the following definition, extending the Poisson limit theory developed before:

Definition 8.26.

Associated to any compactly supported positive measure ρ\rho on \mathbb{C} are the probability measures

pρ=limn((1cn)δ0+1nρ)n,πρ=limn((1cn)δ0+1nρ)np_{\rho}=\lim_{n\to\infty}\left(\left(1-\frac{c}{n}\right)\delta_{0}+\frac{1}{n}\rho\right)^{*n}\quad,\quad\pi_{\rho}=\lim_{n\to\infty}\left(\left(1-\frac{c}{n}\right)\delta_{0}+\frac{1}{n}\rho\right)^{\boxplus n}

where c=mass(ρ)c=mass(\rho), called compound Poisson and compound free Poisson laws.

In what follows we will be interested in the case where ρ\rho is discrete, as is for instance the case for ρ=tδ1\rho=t\delta_{1} with t>0t>0, which produces the Poisson and free Poisson laws. The following result allows one to detect compound Poisson/free Poisson laws:

Proposition 8.27.

For ρ=i=1sciδzi\rho=\sum_{i=1}^{s}c_{i}\delta_{z_{i}} with ci>0c_{i}>0 and ziz_{i}\in\mathbb{C}, we have

Fpρ(y)=exp(i=1sci(eiyzi1)),Rπρ(y)=i=1scizi1yziF_{p_{\rho}}(y)=\exp\left(\sum_{i=1}^{s}c_{i}(e^{iyz_{i}}-1)\right)\quad,\quad R_{\pi_{\rho}}(y)=\sum_{i=1}^{s}\frac{c_{i}z_{i}}{1-yz_{i}}

where F,RF,R denote respectively the Fourier transform, and Voiculescu’s RR-transform.

Proof.

Let μn\mu_{n} be the measure appearing in Definition 8.26. We have:

Fμn(y)=(1cn)+1ni=1scieiyzi\displaystyle F_{\mu_{n}}(y)=\left(1-\frac{c}{n}\right)+\frac{1}{n}\sum_{i=1}^{s}c_{i}e^{iyz_{i}} \displaystyle\implies Fμnn(y)=((1cn)+1ni=1scieiyzi)n\displaystyle F_{\mu_{n}^{*n}}(y)=\left(\left(1-\frac{c}{n}\right)+\frac{1}{n}\sum_{i=1}^{s}c_{i}e^{iyz_{i}}\right)^{n}
\displaystyle\implies Fpρ(y)=exp(i=1sci(eiyzi1))\displaystyle F_{p_{\rho}}(y)=\exp\left(\sum_{i=1}^{s}c_{i}(e^{iyz_{i}}-1)\right)

In the free case we can use a similar method, and we obtain the above formula. ∎

We have the following result, providing an alternative to Definition 8.26, which will be our formulation here of the Compond Poisson Limit Theorem, classical and free:

Theorem 8.28 (CPLT).

For ρ=i=1sciδzi\rho=\sum_{i=1}^{s}c_{i}\delta_{z_{i}} with ci>0c_{i}>0 and ziz_{i}\in\mathbb{C}, we have

pρ/πρ=law(i=1sziαi)p_{\rho}/\pi_{\rho}={\rm law}\left(\sum_{i=1}^{s}z_{i}\alpha_{i}\right)

where the variables αi\alpha_{i} are Poisson/free Poisson(ci)(c_{i}), independent/free.

Proof.

This follows indeed from the fact that the the Fourier/RR-transform of the variable in the statement is given by the formulae in Proposition 8.27. ∎

Following [bb+], [bbc], we will be interested here in the main examples of classical and free compound Poisson laws, which are constructed as follows:

Definition 8.29.

The Bessel and free Bessel laws are the compound Poisson laws

bst=ptεs,βst=πtεsb^{s}_{t}=p_{t\varepsilon_{s}}\quad,\quad\beta^{s}_{t}=\pi_{t\varepsilon_{s}}

where εs\varepsilon_{s} is the uniform measure on the ss-th roots unity. In particular:

  1. (1)

    At s=1s=1 we obtain the usual Poisson and free Poisson laws, pt,πtp_{t},\pi_{t}.

  2. (2)

    At s=2s=2 we obtain the “real” Bessel and free Bessel laws, denoted bt,βtb_{t},\beta_{t}.

  3. (3)

    At s=s=\infty we obtain the “complex” Bessel and free Bessel laws, denoted Bt,𝔅tB_{t},\mathfrak{B}_{t}.

There is a lot of theory regarding these laws, and we refer here to [bb+], [bbc], where these laws were introduced. We will be back to these laws, in a moment.

8c. Algebraic manifolds

We are now ready, or almost, to develop some basic noncommutative geometry. The idea will be that of further building on the material from chapter 7, by enlarging the class of compact quantum groups studied there, with the consideration of quantum homogeneous spaces, X=G/HX=G/H, and with classical and free probability as our main tools.


But let us start with something intuitive, namely basic algebraic geometry, in a basic sense. The simplest compact manifolds that we know are the spheres, and if we want to have free analogues of these spheres, there are not many choices here, and we have:

Definition 8.30.

We have compact quantum spaces, constructed as follows,

C(SN1,+)=C(x1,,xN|xi=xi,ixi2=1)C(S^{N-1}_{\mathbb{R},+})=C^{*}\left(x_{1},\ldots,x_{N}\Big{|}x_{i}=x_{i}^{*},\sum_{i}x_{i}^{2}=1\right)
C(SN1,+)=C(x1,,xN|ixixi=ixixi=1)C(S^{N-1}_{\mathbb{C},+})=C^{*}\left(x_{1},\ldots,x_{N}\Big{|}\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1\right)

called respectively free real sphere, and free complex sphere.

Observe that our spheres are indeed well-defined, due to the following estimate:

||xi||2=||xixi||||ixixi||=1||x_{i}||^{2}=||x_{i}x_{i}^{*}||\leq\left|\left|\sum_{i}x_{i}x_{i}^{*}\right|\right|=1

Given a compact quantum space XX, meaning as usual the abstract spectrum of a CC^{*}-algebra, we define its classical version to be the classical space XclassX_{class} obtained by dividing C(X)C(X) by its commutator ideal, then applying the Gelfand theorem:

C(Xclass)=C(X)/I,I=<[a,b]>C(X_{class})=C(X)/I\quad,\quad I=<[a,b]>

Observe that we have an embedding of compact quantum spaces XclassXX_{class}\subset X. In this situation, we also say that XX appears as a “liberation” of XX. We have:

Proposition 8.31.

We have embeddings of compact quantum spaces

SN1\textstyle{S^{N-1}_{\mathbb{C}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1,+\textstyle{S^{N-1}_{\mathbb{C},+}}SN1\textstyle{S^{N-1}_{\mathbb{R}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1,+\textstyle{S^{N-1}_{\mathbb{R},+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

and the spaces on the right appear as liberations of the spaces of the left.

Proof.

In order to prove this, we must establish the following isomorphisms:

C(SN1)=Ccomm(x1,,xN|xi=xi,ixi2=1)C(S^{N-1}_{\mathbb{R}})=C^{*}_{comm}\left(x_{1},\ldots,x_{N}\Big{|}x_{i}=x_{i}^{*},\sum_{i}x_{i}^{2}=1\right)
C(SN1)=Ccomm(x1,,xN|ixixi=ixixi=1)C(S^{N-1}_{\mathbb{C}})=C^{*}_{comm}\left(x_{1},\ldots,x_{N}\Big{|}\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1\right)

But these isomorphisms are both clear, by using the Gelfand theorem. ∎

We can now introduce a broad class of compact quantum manifolds, as follows:

Definition 8.32.

A real algebraic submanifold XSN1,+X\subset S^{N-1}_{\mathbb{C},+} is a closed quantum space defined, at the level of the corresponding CC^{*}-algebra, by a formula of type

C(X)=C(SN1,+)/<fi(x1,,xN)=0>C(X)=C(S^{N-1}_{\mathbb{C},+})\Big{/}\Big{<}f_{i}(x_{1},\ldots,x_{N})=0\Big{>}

for certain noncommutative polynomials fi<X1,,XN>f_{i}\in\mathbb{C}<X_{1},\ldots,X_{N}>. We identify two such manifolds, XYX\simeq Y, when we have an isomorphism of *-algebras of coordinates

𝒞(X)𝒞(Y)\mathcal{C}(X)\simeq\mathcal{C}(Y)

mapping standard coordinates to standard coordinates.

In practice, while our assumption XSN1,+X\subset S^{N-1}_{\mathbb{C},+} is definitely something technical, we are not losing much when imposing it, and we have the following list of examples:

Proposition 8.33.

The following are algebraic submanifolds XSN1,+X\subset S^{N-1}_{\mathbb{C},+}:

  1. (1)

    The spheres SN1SN1,SN1,+SN1,+S^{N-1}_{\mathbb{R}}\subset S^{N-1}_{\mathbb{C}},S^{N-1}_{\mathbb{R},+}\subset S^{N-1}_{\mathbb{C},+}.

  2. (2)

    Any compact Lie group, GUnG\subset U_{n}, with N=n2N=n^{2}.

  3. (3)

    The duals Γ^\widehat{\Gamma} of finitely generated groups, Γ=<g1,,gN>\Gamma=<g_{1},\ldots,g_{N}>.

  4. (4)

    More generally, the closed subgroups GUn+G\subset U_{n}^{+}, with N=n2N=n^{2}.

Proof.

These facts are all well-known, the proofs being as follows:

(1) This is indeed true by definition of our various spheres.

(2) Given a closed subgroup GUnG\subset U_{n}, we have an embedding GSN1G\subset S^{N-1}_{\mathbb{C}}, with N=n2N=n^{2}, given in double indices by xij=uij/nx_{ij}=u_{ij}/\sqrt{n}, that we can further compose with the standard embedding SN1SN1,+S^{N-1}_{\mathbb{C}}\subset S^{N-1}_{\mathbb{C},+}. As for the fact that we obtain indeed a real algebraic manifold, this is standard too, coming either from Lie theory or from Tannakian duality.

(3) Given a group Γ=<g1,,gN>\Gamma=<g_{1},\ldots,g_{N}>, consider the variables xi=gi/Nx_{i}=g_{i}/\sqrt{N}. These variables satisfy then the quadratic relations ixixi=ixixi=1\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1 defining SN1,+S^{N-1}_{\mathbb{C},+}, and the algebricity claim for the manifold Γ^SN1,+\widehat{\Gamma}\subset S^{N-1}_{\mathbb{C},+} is clear.

(4) Given a closed subgroup GUn+G\subset U_{n}^{+}, we have indeed an embedding GSN1,+G\subset S^{N-1}_{\mathbb{C},+}, with N=n2N=n^{2}, given by xij=uij/nx_{ij}=u_{ij}/\sqrt{n}. As for the fact that we obtain indeed a real algebraic manifold, this comes from the Tannakian duality results in [mal], [wo2]. ∎

Summarizing, what we have in Definition 8.32 is something quite fruitful, covering many interesting examples. In addition, all this is nice too at the axiomatic level, because the equivalence relation for our algebraic manifolds, as formulated in Definition 8.32, fixes in a quite clever way the functoriality issues of the Gelfand correspondence.


At the level of the general theory now, as a first tool that we can use, for the study of our manifolds, we have the following version of the Gelfand theorem:

Theorem 8.34.

Assuming that XSN1,+X\subset S^{N-1}_{\mathbb{C},+} is an algebraic manifold, given by

C(X)=C(SN1,+)/<fi(x1,,xN)=0>C(X)=C(S^{N-1}_{\mathbb{C},+})\Big{/}\Big{<}f_{i}(x_{1},\ldots,x_{N})=0\Big{>}

for certain noncommutative polynomials fi<X1,,XN>f_{i}\in\mathbb{C}<X_{1},\ldots,X_{N}>, we have

Xclass={xSN1|fi(x1,,xN)=0}X_{class}=\left\{x\in S^{N-1}_{\mathbb{C}}\Big{|}f_{i}(x_{1},\ldots,x_{N})=0\right\}

and XX itself appears as a liberation of XclassX_{class}.

Proof.

This is something that we know well for the spheres, from Proposition 8.31. In general, the proof is similar, coming from the Gelfand theorem. ∎

There are of course many other things that can be said about our manifolds, at the purely algebraic level. But in what follows we will be rather going towards analysis.

8d. Free geometry

We have now all the needed tools in our bag for developing “free geometry”. The idea will be that of going back to the free quantum groups from chapter 7, and further building on that material, with a beginning of free geometry. Let us start with:

Theorem 8.35.

The classical and free, real and complex quantum rotation groups can be complemented with quantum reflection groups, as follows,

KN+\textstyle{K_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}UN+\textstyle{U_{N}^{+}}HN+\textstyle{H_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ON+\textstyle{O_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}KN\textstyle{K_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}UN\textstyle{U_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}HN\textstyle{H_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ON\textstyle{O_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

with HN=2SNH_{N}=\mathbb{Z}_{2}\wr S_{N} and KN=𝕋SNK_{N}=\mathbb{T}\wr S_{N} being the hyperoctahedral group and the full complex reflection group, and HN+=2SN+H_{N}^{+}=\mathbb{Z}_{2}\wr_{*}S_{N}^{+} and KN+=𝕋SN+K_{N}^{+}=\mathbb{T}\wr_{*}S_{N}^{+} being their free versions.

Proof.

This is something quite tricky, the idea being as follows:

(1) The first observation is that SNS_{N}, regarded as group of permutations of the NN coordinate axes of N\mathbb{R}^{N}, is a group of orthogonal matrices, SNONS_{N}\subset O_{N}. The corresponding coordinate functions uij:SN{0,1}u_{ij}:S_{N}\to\{0,1\} form a matrix u=(uij)u=(u_{ij}) which is “magic”, in the sense that its entries are projections, summing up to 1 on each row and each column. In fact, by using the Gelfand theorem, we have the following presentation result:

C(SN)=Ccomm((uij)i,j=1,,N|u=magic)C(S_{N})=C^{*}_{comm}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u={\rm magic}\right)

(2) Based on the above, and following Wang’s paper [wan], we can construct the free analogue SN+S_{N}^{+} of the symmetric group SNS_{N} via the following formula:

C(SN+)=C((uij)i,j=1,,N|u=magic)C(S_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u={\rm magic}\right)

Here the fact that we have indeed a Woronowicz algebra is standard, exactly as for the free rotation groups in chapter 7, because if a matrix u=(uij)u=(u_{ij}) is magic, then so are the matrices uΔ,uε,uSu^{\Delta},u^{\varepsilon},u^{S} constructed there, and this gives the existence of Δ,u,S\Delta,u,S.

(3) Consider now the group HNsUNH_{N}^{s}\subset U_{N} consisting of permutation-like matrices having as entries the ss-th roots of unity. This group decomposes as follows:

HNs=sSNH_{N}^{s}=\mathbb{Z}_{s}\wr S_{N}

It is straightforward then to construct a free analogue HNs+UN+H_{N}^{s+}\subset U_{N}^{+} of this group, for instance by formulating a definition as follows, with \wr_{*} being a free wreath product:

HNs+=sSN+H_{N}^{s+}=\mathbb{Z}_{s}\wr_{*}S_{N}^{+}

(4) In order to finish, besides the case s=1s=1, of particular interest are the cases s=2,s=2,\infty. Here the corresponding reflection groups are as follows:

HN=2SN,KN=𝕋SNH_{N}=\mathbb{Z}_{2}\wr S_{N}\quad,\quad K_{N}=\mathbb{T}\wr S_{N}

As for the corresponding quantum groups, these are denoted as follows:

HN+=2SN+,KN+=𝕋SN+H_{N}^{+}=\mathbb{Z}_{2}\wr_{*}S_{N}^{+}\quad,\quad K_{N}^{+}=\mathbb{T}\wr_{*}S_{N}^{+}

Thus, we are led to the conclusions in the statement. See [bb+], [bbc]. ∎

The point now is that we can add to the picture spheres and tori, as follows:

Fact 8.36.

The basic quantum groups can be complemented with spheres and tori,

𝕋N+\textstyle{\ \mathbb{T}_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1,+\textstyle{S^{N-1}_{\mathbb{C},+}}TN+\textstyle{\ T_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1,+\textstyle{S^{N-1}_{\mathbb{R},+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}𝕋N\textstyle{\ \mathbb{T}_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1\textstyle{S^{N-1}_{\mathbb{C}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}TN\textstyle{\ T_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}SN1\textstyle{S^{N-1}_{\mathbb{R}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

with TN=2N,𝕋N=𝕋NT_{N}=\mathbb{Z}_{2}^{N},\mathbb{T}_{N}=\mathbb{T}^{N}, and with TN+,𝕋N+T_{N}^{+},\mathbb{T}_{N}^{+} standing for the duals of 2N,FN\mathbb{Z}_{2}^{*N},F_{N}.

Again, this is something quite tricky, and there is a long story with all this. We already know from chapter 7 that the diagonal subgroups of the rotation groups are the tori in the statement, but this is just an epsilon of what can be said, and this type of result can be extended as well to the reflection groups, and then we can make the spheres come into play too, with various results connecting them to the quantum groups, and to the tori.


Instead of getting into details here, let us formulate, again a bit informally:

Fact 8.37.

The various quantum manifolds that we have, namely spheres SS, tori TT, unitary groups UU, and reflection groups KK, arrange into 44 diagrams, as follows,

S\textstyle{S\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}T\textstyle{T\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}U\textstyle{U\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}K\textstyle{K\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

with the arrows standing for various correspondences between (S,T,U,K)(S,T,U,K). These diagrams correspond to 44 main noncommutative geometries, real and complex, classical and free,

N+\textstyle{\mathbb{R}^{N}_{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}N+\textstyle{\mathbb{C}^{N}_{+}}N\textstyle{\mathbb{R}^{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}N\textstyle{\mathbb{C}^{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

with the remark that, technically speaking, N+\mathbb{R}^{N}_{+}, N+\mathbb{C}^{N}_{+} do not exist, as quantum spaces.

As before, things here are quite long and tricky, but we already have some good evidence for all this, so I guess you can just trust me. And if truly interested in all this, later after finishing this book, you can check [bgo] and subsequent papers for details.


Summarizing, we have some beginning of theory. Now with this understood, let us try to integrate on our manifolds. In order to deal with quantum groups, we will need:

Definition 8.38.

The Tannakian category associated to a Woronowicz algebra (A,u)(A,u) is the collection CA=(CA(k,l))C_{A}=(C_{A}(k,l)) of vector spaces

CA(k,l)=Hom(uk,ul)C_{A}(k,l)=Hom(u^{\otimes k},u^{\otimes l})

where the corepresentations uku^{\otimes k} with k=k=\circ\bullet\bullet\circ\ldots colored integer, defined by

u=1,u=u,u=u¯u^{\otimes\emptyset}=1\quad,\quad u^{\otimes\circ}=u\quad,\quad u^{\otimes\bullet}=\bar{u}

and multiplicativity, ukl=ukulu^{\otimes kl}=u^{\otimes k}\otimes u^{\otimes l}, are the Peter-Weyl corepresentations.

As a key remark, the fact that uMN(A)u\in M_{N}(A) is biunitary translates into the following conditions, where R:NNR:\mathbb{C}\to\mathbb{C}^{N}\otimes\mathbb{C}^{N} is the linear map given by R(1)=ieieiR(1)=\sum_{i}e_{i}\otimes e_{i}:

RHom(1,uu¯),RHom(1,u¯u)R\in Hom(1,u\otimes\bar{u})\quad,\quad R\in Hom(1,\bar{u}\otimes u)
RHom(uu¯,1),RHom(u¯u,1)R^{*}\in Hom(u\otimes\bar{u},1)\quad,\quad R^{*}\in Hom(\bar{u}\otimes u,1)

We are therefore led to the following abstract definition, summarizing the main properties of the categories appearing from Woronowicz algebras:

Definition 8.39.

Let HH be a finite dimensional Hilbert space. A tensor category over HH is a collection C=(C(k,l))C=(C(k,l)) of subspaces

C(k,l)(Hk,Hl)C(k,l)\subset\mathcal{L}(H^{\otimes k},H^{\otimes l})

satisfying the following conditions:

  1. (1)

    S,TCS,T\in C implies STCS\otimes T\in C.

  2. (2)

    If S,TCS,T\in C are composable, then STCST\in C.

  3. (3)

    TCT\in C implies TCT^{*}\in C.

  4. (4)

    Each C(k,k)C(k,k) contains the identity operator.

  5. (5)

    C(,)C(\emptyset,\circ\bullet) and C(,)C(\emptyset,\bullet\circ) contain the operator R:1ieieiR:1\to\sum_{i}e_{i}\otimes e_{i}.

The point now is that conversely, we can associate a Woronowicz algebra to any tensor category in the sense of Definition 8.39, in the following way:

Proposition 8.40.

Given a tensor category C=(C(k,l))C=(C(k,l)) over N\mathbb{C}^{N}, as above,

AC=C((uij)i,j=1,,N|THom(uk,ul),k,l,TC(k,l))A_{C}=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}T\in Hom(u^{\otimes k},u^{\otimes l}),\forall k,l,\forall T\in C(k,l)\right)

is a Woronowicz algebra.

Proof.

This is something standard, because the relations THom(uk,ul)T\in Hom(u^{\otimes k},u^{\otimes l}) determine a Hopf ideal, so they allow the construction of Δ,ε,S\Delta,\varepsilon,S as in chapter 7. ∎

With the above constructions in hand, we have the following result:

Theorem 8.41.

The Tannakian duality constructions

CAC,ACAC\to A_{C}\quad,\quad A\to C_{A}

are inverse to each other, modulo identifying full and reduced versions.

Proof.

The idea is that we have CCACC\subset C_{A_{C}}, for any algebra AA, and so we are left with proving that we have CACCC_{A_{C}}\subset C, for any category CC. But this follows from a long series of algebraic manipulations, and for details we refer to Malacarne [mal], and also to Woronowicz [wo2], where this result was first proved, by using other methods. ∎

In practice now, all this is quite abstract, and we will rather need Brauer type results, for the specific quantum groups that we are interested in. Let us start with:

Definition 8.42.

Let P(k,l)P(k,l) be the set of partitions between an upper colored integer kk, and a lower colored integer ll. A collection of subsets

D=k,lD(k,l)D=\bigsqcup_{k,l}D(k,l)

with D(k,l)P(k,l)D(k,l)\subset P(k,l) is called a category of partitions when it has the following properties:

  1. (1)

    Stability under the horizontal concatenation, (π,σ)[πσ](\pi,\sigma)\to[\pi\sigma].

  2. (2)

    Stability under vertical concatenation (π,σ)[σπ](\pi,\sigma)\to[^{\sigma}_{\pi}], with matching middle symbols.

  3. (3)

    Stability under the upside-down turning *, with switching of colors, \circ\leftrightarrow\bullet.

  4. (4)

    Each set P(k,k)P(k,k) contains the identity partition ||||||\ldots||.

  5. (5)

    The sets P(,)P(\emptyset,\circ\bullet) and P(,)P(\emptyset,\bullet\circ) both contain the semicircle \cap.

Observe the similarity with Definition 8.39. In fact Definition 8.42 is a delinearized version of Definition 8.39, the relation with the Tannakian categories coming from:

Proposition 8.43.

Given a partition πP(k,l)\pi\in P(k,l), consider the linear map

Tπ:(N)k(N)lT_{\pi}:(\mathbb{C}^{N})^{\otimes k}\to(\mathbb{C}^{N})^{\otimes l}

given by the following formula, where e1,,eNe_{1},\ldots,e_{N} is the standard basis of N\mathbb{C}^{N},

Tπ(ei1eik)=j1jlδπ(i1ikj1jl)ej1ejlT_{\pi}(e_{i_{1}}\otimes\ldots\otimes e_{i_{k}})=\sum_{j_{1}\ldots j_{l}}\delta_{\pi}\begin{pmatrix}i_{1}&\ldots&i_{k}\\ j_{1}&\ldots&j_{l}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{l}}

and with the Kronecker type symbols δπ{0,1}\delta_{\pi}\in\{0,1\} depending on whether the indices fit or not. The assignement πTπ\pi\to T_{\pi} is then categorical, in the sense that we have

TπTσ=T[πσ],TπTσ=Nc(π,σ)T[σπ],Tπ=TπT_{\pi}\otimes T_{\sigma}=T_{[\pi\sigma]}\quad,\quad T_{\pi}T_{\sigma}=N^{c(\pi,\sigma)}T_{[^{\sigma}_{\pi}]}\quad,\quad T_{\pi}^{*}=T_{\pi^{*}}

where c(π,σ)c(\pi,\sigma) are certain integers, coming from the erased components in the middle.

Proof.

The concatenation property follows from the following computation:

(TπTσ)(ei1eipek1ekr)\displaystyle(T_{\pi}\otimes T_{\sigma})(e_{i_{1}}\otimes\ldots\otimes e_{i_{p}}\otimes e_{k_{1}}\otimes\ldots\otimes e_{k_{r}})
=\displaystyle= j1jql1lsδπ(i1ipj1jq)δσ(k1krl1ls)ej1ejqel1els\displaystyle\sum_{j_{1}\ldots j_{q}}\sum_{l_{1}\ldots l_{s}}\delta_{\pi}\begin{pmatrix}i_{1}&\ldots&i_{p}\\ j_{1}&\ldots&j_{q}\end{pmatrix}\delta_{\sigma}\begin{pmatrix}k_{1}&\ldots&k_{r}\\ l_{1}&\ldots&l_{s}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{q}}\otimes e_{l_{1}}\otimes\ldots\otimes e_{l_{s}}
=\displaystyle= j1jql1lsδ[πσ](i1ipk1krj1jql1ls)ej1ejqel1els\displaystyle\sum_{j_{1}\ldots j_{q}}\sum_{l_{1}\ldots l_{s}}\delta_{[\pi\sigma]}\begin{pmatrix}i_{1}&\ldots&i_{p}&k_{1}&\ldots&k_{r}\\ j_{1}&\ldots&j_{q}&l_{1}&\ldots&l_{s}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{q}}\otimes e_{l_{1}}\otimes\ldots\otimes e_{l_{s}}
=\displaystyle= T[πσ](ei1eipek1ekr)\displaystyle T_{[\pi\sigma]}(e_{i_{1}}\otimes\ldots\otimes e_{i_{p}}\otimes e_{k_{1}}\otimes\ldots\otimes e_{k_{r}})

As for the other two formulae in the statement, their proofs are similar. ∎

In relation with quantum groups, we have the following result, from [bsp]:

Theorem 8.44.

Each category of partitions D=(D(k,l))D=(D(k,l)) produces a family of compact quantum groups G=(GN)G=(G_{N}), one for each NN\in\mathbb{N}, via the following formula:

Hom(uk,ul)=span(Tπ|πD(k,l))Hom(u^{\otimes k},u^{\otimes l})=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

To be more precise, the spaces on the right form a Tannakian category, and so produce a certain closed subgroup GNUN+G_{N}\subset U_{N}^{+}, via the Tannakian duality correspondence.

Proof.

This follows indeed from Woronowicz’s Tannakian duality, in its “soft” form from Malacarne [mal], as explained in Theorem 8.41. Indeed, let us set:

C(k,l)=span(Tπ|πD(k,l))C(k,l)=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

By using the axioms in Definition 8.42, and the categorical properties of the operation πTπ\pi\to T_{\pi}, from Proposition 8.43, we deduce that C=(C(k,l))C=(C(k,l)) is a Tannakian category. Thus the Tannakian duality applies, and gives the result. ∎

Philosophically speaking, the quantum groups appearing as in Theorem 8.44 are the simplest, from the perspective of Tannakian duality, so let us formulate:

Definition 8.45.

A closed subgroup GUN+G\subset U_{N}^{+} is called easy when we have

Hom(uk,ul)=span(Tπ|πD(k,l))Hom(u^{\otimes k},u^{\otimes l})=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

for any colored integers k,lk,l, for a certain category of partitions DPD\subset P.

All this might seem a bit complicated, but we will see examples in a moment. Getting back now to integration questions, we have the following key result:

Theorem 8.46.

For an easy quantum group GUN+G\subset U_{N}^{+}, coming from a category of partitions D=(D(k,l))D=(D(k,l)), we have the Weingarten integration formula

Gui1j1e1uikjkek=π,σD(k)δπ(i)δσ(j)WkN(π,σ)\int_{G}u_{i_{1}j_{1}}^{e_{1}}\ldots u_{i_{k}j_{k}}^{e_{k}}=\sum_{\pi,\sigma\in D(k)}\delta_{\pi}(i)\delta_{\sigma}(j)W_{kN}(\pi,\sigma)

for any k=e1ekk=e_{1}\ldots e_{k} and any i,ji,j, where D(k)=D(,k)D(k)=D(\emptyset,k), δ\delta are usual Kronecker symbols, and WkN=GkN1W_{kN}=G_{kN}^{-1}, with GkN(π,σ)=N|πσ|G_{kN}(\pi,\sigma)=N^{|\pi\vee\sigma|}, where |.||.| is the number of blocks.

Proof.

We know from chapter 7 that the integrals in the statement form altogether the orthogonal projection PP onto the space Fix(uk)=span(D(k))Fix(u^{\otimes k})=span(D(k)). Let us set:

E(x)=πD(k)<x,Tπ>TπE(x)=\sum_{\pi\in D(k)}<x,T_{\pi}>T_{\pi}

By standard linear algebra, it follows that we have P=WEP=WE, where WW is the inverse on span(Tπ|πD(k))span(T_{\pi}|\pi\in D(k)) of the restriction of EE. But this restriction is the linear map given by GkNG_{kN}, and so WW is the linear map given by WkNW_{kN}, and this gives the result. ∎

All this is very nice. However, before enjoying the Weingarten formula, we still have to prove that our main quantum groups are easy. The result here is as follows:

Theorem 8.47.

The basic quantum unitary and reflection groups

KN+\textstyle{K_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}UN+\textstyle{U_{N}^{+}}HN+\textstyle{H_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ON+\textstyle{O_{N}^{+}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}KN\textstyle{K_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}UN\textstyle{U_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}HN\textstyle{H_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}ON\textstyle{O_{N}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

are all easy, the corresponding categories of partitions being as follows,

𝒩𝒞even\textstyle{\mathcal{NC}_{even}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}𝒩𝒞2\textstyle{\mathcal{NC}_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}NCeven\textstyle{NC_{even}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}NC2\textstyle{NC_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}𝒫even\textstyle{\mathcal{P}_{even}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}𝒫2\textstyle{\mathcal{P}_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Peven\textstyle{P_{even}}P2\textstyle{P_{2}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

with P,NCP,NC standing for partitions and noncrosssing partitions, 2,even2,even standing for pairings, and partitions with even blocks, and with calligraphic standing for matching.

Proof.

The quantum group UN+U_{N}^{+} is defined via the following relations:

u=u1,ut=u¯1u^{*}=u^{-1}\quad,\quad u^{t}=\bar{u}^{-1}

Thus, the following operators must be in the associated Tannakian category:

Tπ,π=,T_{\pi}\quad,\quad\pi={\ }^{\,\cap}_{\circ\bullet}\ ,{\ }^{\,\cap}_{\bullet\circ}

We conclude that the associated Tannakian category is span(Tπ|πD)span(T_{\pi}|\pi\in D), with:

D=<,>=𝒩C2D=<{\ }^{\,\cap}_{\circ\bullet}\,\,,{\ }^{\,\cap}_{\bullet\circ}>={\mathcal{N}C}_{2}

Thus, we have one result, and the other ones are similar. See [bb+], [bbc]. ∎

We are not ready yet for applications, because we still have to understand which assumptions on NN\in\mathbb{N} make the vectors TπT_{\pi} linearly independent. We will need:

Definition 8.48.

The Möbius function of any lattice, and so of PP, is given by

μ(π,σ)={1ifπ=σπτ<σμ(π,τ)ifπ<σ0ifπσ\mu(\pi,\sigma)=\begin{cases}1&{\rm if}\ \pi=\sigma\\ -\sum_{\pi\leq\tau<\sigma}\mu(\pi,\tau)&{\rm if}\ \pi<\sigma\\ 0&{\rm if}\ \pi\not\leq\sigma\end{cases}

with the construction being performed by recurrence.

The main interest in this function comes from the Möbius inversion formula:

f(σ)=πσg(π)g(σ)=πσμ(π,σ)f(π)f(\sigma)=\sum_{\pi\leq\sigma}g(\pi)\implies g(\sigma)=\sum_{\pi\leq\sigma}\mu(\pi,\sigma)f(\pi)

In linear algebra terms, the statement and proof of this formula are as follows:

Proposition 8.49.

The inverse of the adjacency matrix of PP, given by

Aπσ={1ifπσ0ifπσA_{\pi\sigma}=\begin{cases}1&{\rm if}\ \pi\leq\sigma\\ 0&{\rm if}\ \pi\not\leq\sigma\end{cases}

is the Möbius matrix of PP, given by Mπσ=μ(π,σ)M_{\pi\sigma}=\mu(\pi,\sigma).

Proof.

This is well-known, coming for instance from the fact that AA is upper triangular. Indeed, when inverting, we are led into the recurrence from Definition 8.48. ∎

Now back to our Gram and Weingarten matrix considerations, with WkN=GkN1W_{kN}=G_{kN}^{-1}, as in the statement of Theorem 8.46, we have the following result:

Proposition 8.50.

The Gram matrix is given by GkN=ALG_{kN}=AL, where

L(π,σ)={N(N1)(N|π|+1)ifσπ0otherwiseL(\pi,\sigma)=\begin{cases}N(N-1)\ldots(N-|\pi|+1)&{\rm if}\ \sigma\leq\pi\\ 0&{\rm otherwise}\end{cases}

and where A=M1A=M^{-1} is the adjacency matrix of P(k)P(k).

Proof.

We have indeed the following computation:

N|πσ|\displaystyle N^{|\pi\vee\sigma|} =\displaystyle= #{i1,,ik{1,,N}|keriπσ}\displaystyle\#\left\{i_{1},\ldots,i_{k}\in\{1,\ldots,N\}\Big{|}\ker i\geq\pi\vee\sigma\right\}
=\displaystyle= τπσ#{i1,,ik{1,,N}|keri=τ}\displaystyle\sum_{\tau\geq\pi\vee\sigma}\#\left\{i_{1},\ldots,i_{k}\in\{1,\ldots,N\}\Big{|}\ker i=\tau\right\}
=\displaystyle= τπσN(N1)(N|τ|+1)\displaystyle\sum_{\tau\geq\pi\vee\sigma}N(N-1)\ldots(N-|\tau|+1)

According to the definition of GkNG_{kN} and of A,LA,L, this formula reads:

(GkN)πσ=τπLτσ=τAπτLτσ=(AL)πσ(G_{kN})_{\pi\sigma}=\sum_{\tau\geq\pi}L_{\tau\sigma}=\sum_{\tau}A_{\pi\tau}L_{\tau\sigma}=(AL)_{\pi\sigma}

Thus, we obtain the formula in the statement. ∎

With the above result in hand, we can now formulate:

Theorem 8.51.

The determinant of the Gram matrix GkNG_{kN} is given by:

det(GkN)=πP(k)N!(N|π|)!\det(G_{kN})=\prod_{\pi\in P(k)}\frac{N!}{(N-|\pi|)!}

In particular, the vectors {ξπ|πP(k)}\left\{\xi_{\pi}|\pi\in P(k)\right\} are linearly independent for NkN\geq k.

Proof.

This is an old formula from the 60s, due to Lindstöm and others, having many things behind it. By using the formula in Proposition 8.50, we have:

det(GkN)=det(A)det(L)\det(G_{kN})=\det(A)\det(L)

Now if we order P(k)P(k) with respect to the number of blocks, then lexicographically, AA is upper triangular, and LL is lower triangular, and we obtain the above formula. ∎

Now back to our quantum groups, let us start with:

Theorem 8.52.

For an easy quantum group G=(GN)G=(G_{N}), coming from a category of partitions D=(D(k,l))D=(D(k,l)), the asymptotic moments of the character χ=iuii\chi=\sum_{i}u_{ii} are

limNGNχk=|D(k)|\lim_{N\to\infty}\int_{G_{N}}\chi^{k}=|D(k)|

where D(k)=D(,k)D(k)=D(\emptyset,k), with the limiting sequence on the left consisting of certain integers, and being stationary at least starting from the kk-th term.

Proof.

This is something elementary, which follows straight from Peter-Weyl theory, by using the linear independence result from Theorem 8.51. ∎

In practice now, for the basic rotation and reflection groups, we obtain:

Theorem 8.53.

The character laws for basic rotation and reflection groups are

𝔅1\textstyle{\mathfrak{B}_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Γ1\textstyle{\Gamma_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}β1\textstyle{\beta_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}γ1\textstyle{\gamma_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}B1\textstyle{B_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}G1\textstyle{G_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}b1\textstyle{b_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}g1\textstyle{g_{1}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

in the NN\to\infty limit, corresponding to the basic probabilistic limiting theorems, at t=1t=1.

Proof.

This follows indeed from Theorem 8.47 and Theorem 8.52, by using the known moment formulae for the laws in the statement, at t=1t=1. ∎

In the free case, the convergence can be shown to be stationary starting from N=4N=4. The “fix” comes by looking at truncated characters, constructed as follows:

χt=i=1[tN]uii\chi_{t}=\sum_{i=1}^{[tN]}u_{ii}

With this convention, we have the following final result on the subject, with the convergence being non-stationary at t<1t<1, in both the classical and free cases:

Theorem 8.54.

The truncated character laws for the basic quantum groups are

𝔅t\textstyle{\mathfrak{B}_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Γt\textstyle{\Gamma_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}βt\textstyle{\beta_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}γt\textstyle{\gamma_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Bt\textstyle{B_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}Gt\textstyle{G_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces}bt\textstyle{b_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}gt\textstyle{g_{t}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}

in the NN\to\infty limit, corresponding to the basic probabilistic limiting theorems.

Proof.

We already know that the result holds at t=1t=1, and the proof at arbitrary t>0t>0 is once again based on easiness, but this time by using the Weingarten formula for the computation of the moments. We refer here to [bb+], [bbc], [bco], [bsp]. ∎

All this is very nice, as a beginning. Of course, still left for this chapter would be the extension of all this to the case of more general homogeneous spaces X=G/HX=G/H, and other free manifolds, in the sense of the free real and complex geometry axiomatized before.


But hey, we learned enough math in this chapter, time for a beer. We refer here to the 2010 paper [bgo], which started things with the computation for SN1,+S^{N-1}_{\mathbb{R},+}, and to the book [ba3], which explains what was found on the subject, in the 10s. And if interested in this, the hot topic, waiting for input from you, are the applications to quantum physics.

8e. Exercises

There has been a lot of exciting theory in this chapter, and as exercise, we have:

Exercise 8.55.

Prove that SN+S_{N}^{+} is easy, coming from the category of all noncrossing partitions NCNC, and compute the asymptotic law of the main character.

As bonus exercise, try as well the truncated characters. Also, don’t forget about SNS_{N}.

Part III Theory of factors


And the story tellers say

That the score brave souls inside

For many a lonely day sailed across the milky seas

Never looked back, never feared, never cried

Chapter 9 Functional analysis

9a. Kaplansky density

Welcome to this second half of the present book. We will get back here to a more normal pace, at least for most of the text to follow, our goal being to discuss the basics of the von Neumann algebra theory, due to Murray, von Neumann and Connes [co1], [co2], [mv1], [mv2], [mv3], [vn1], [vn2], [vn3], or at least the “basics of the basics”, the whole theory being quite complex, and then the most beautiful advanced theory which can be built on this, which is the subfactor theory of Jones [jo1], [jo2], [jo3], [jo4], [jo5], [jms], [jsu].


The material here will be in direct continuation of what we learned in chapter 5, namely bicommutant theorem, commutative case, finite dimensions, and a handful of other things. The idea will be that of building directly on that material, and using the same basic techniques, namely functional analysis and operator theory.


As an important point, all this is related, but in a subtle way, to what we learned in chapters 6-8 too. To be more precise, what we will be doing in chapters 9-12 here will be more or less orthogonal to what we did in chapters 6-8. However, and here comes our point, the continuation of all this, chapters 13-16 below following Jones, will stand as a direct continuation of what we did in chapters 6-8, with Jones’ subfactors being something more general than the random matrices and quantum groups from there.


Getting started, as a first objective we would like to have a better understanding of the precise difference between the norm closed *-algebras, or CC^{*}-algebras, AB(H)A\subset B(H), and the weakly closed such algebras, which are the von Neumann algebras, from a functional analytic viewpoint. Let us begin with some generalities. We first have:

Proposition 9.1.

The weak operator topology on B(H)B(H) is the topology having the following equivalent properties:

  1. (1)

    It makes T<Tx,y>T\to<Tx,y> continuous, for any x,yHx,y\in H.

  2. (2)

    It makes TnTT_{n}\to T when <Tnx,y><Tx,y><T_{n}x,y>\to<Tx,y>, for any x,yHx,y\in H.

  3. (3)

    Has as subbase the sets UT(x,y,ε)={S:|<(ST)x,y>|<ε}U_{T}(x,y,\varepsilon)=\{S:|<(S-T)x,y>|<\varepsilon\}.

  4. (4)

    Has as base UT(x1,,xn,y1,,yn,ε)={S:|<(ST)xi,yi>|<ε,i}U_{T}(x_{1},\ldots,x_{n},y_{1},\ldots,y_{n},\varepsilon)=\{S:|<(S-T)x_{i},y_{i}>|<\varepsilon,\forall i\}.

Proof.

The equivalences (1)(2)(3)(4)(1)\iff(2)\iff(3)\iff(4) all follow from definitions, with of course (1,2) referring to the coarsest topology making that things happen. ∎

Similarly, in what regards the strong operator topology, we have:

Proposition 9.2.

The strong operator topology on B(H)B(H) is the topology having the following equivalent properties:

  1. (1)

    It makes TTxT\to Tx continuous, for any xHx\in H.

  2. (2)

    It makes TnTT_{n}\to T when TnxTxT_{n}x\to Tx, for any xHx\in H.

  3. (3)

    Has as subbase the sets VT(x,ε)={S:||(ST)x||<ε}V_{T}(x,\varepsilon)=\{S:||(S-T)x||<\varepsilon\}.

  4. (4)

    Has as base the sets VT(x1,,xn,ε)={S:||(ST)xi||<ε,i}V_{T}(x_{1},\ldots,x_{n},\varepsilon)=\{S:||(S-T)x_{i}||<\varepsilon,\forall i\}.

Proof.

Again, the equivalences (1)(2)(3)(4)(1)\iff(2)\iff(3)\iff(4) are all clear, and with (1,2) referring to the coarsest topology making that things happen. ∎

We know from chapter 5 that an operator algebra AB(H)A\subset B(H) is weakly closed if and only if it is strongly closed. Here is a useful generalization of this fact:

Theorem 9.3.

Given a convex set CB(H)C\subset B(H), its weak operator closure and strong operator closure coincide.

Proof.

Since the weak operator topology on B(H)B(H) is weaker by definition than the strong operator topology on B(H)B(H), we have, for any subset CB(H)C\subset B(H):

C¯strongC¯weak\overline{C}^{\,strong}\subset\overline{C}^{\,weak}

Now by assuming that CB(H)C\subset B(H) is convex, we must prove that:

TC¯weakTC¯strongT\in\overline{C}^{\,weak}\implies T\in\overline{C}^{\,strong}

In order to do so, let us pick vectors x1,,xnHx_{1},\ldots,x_{n}\in H and ε>0\varepsilon>0. We let K=HnK=H^{\oplus n}, and we consider the standard embedding i:B(H)B(K)i:B(H)\subset B(K), given by:

iT(y1,,yn)=(Ty1,,Tyn)iT(y_{1},\ldots,y_{n})=(Ty_{1},\ldots,Ty_{n})

We have then the following implications, which are all trivial:

TC¯weakiTiC¯weakiT(x)iC(x)¯weakT\in\overline{C}^{\,weak}\implies iT\in\overline{iC}^{\,weak}\implies iT(x)\in\overline{iC(x)}^{\,weak}

Now since the set CB(H)C\subset B(H) was assumed to be convex, the set iC(x)KiC(x)\subset K is convex too, and by the Hahn-Banach theorem, for compact sets, it follows that we have:

iT(x)iC(x)¯||.||iT(x)\in\overline{iC(x)}^{\,||.||}

Thus, there exists an operator SCS\in C such that we have, for any ii:

||SxiTxi||<ε||Sx_{i}-Tx_{i}||<\varepsilon

But this shows that we have SVT(x1,,xn,ε)S\in V_{T}(x_{1},\ldots,x_{n},\varepsilon), and since x1,,xnHx_{1},\ldots,x_{n}\in H and ε>0\varepsilon>0 were arbitrary, by Proposition 9.2 it follows that we have TC¯strongT\in\overline{C}^{\,strong}, as desired. ∎

We will need as well the following standard result:

Proposition 9.4.

Given a vector space EB(H)E\subset B(H), and a linear form f:Ef:E\to\mathbb{C}, the following conditions are equivalent:

  1. (1)

    ff is weakly continuous.

  2. (2)

    ff is strongly continuous.

  3. (3)

    f(T)=i=1n<Txi,yi>f(T)=\sum_{i=1}^{n}<Tx_{i},y_{i}>, for certain vectors xi,yiHx_{i},y_{i}\in H.

Proof.

This is something standard, using the same tools at those already used in chapter 5, namely basic functional analysis, and amplification tricks:

(1)(2)(1)\implies(2) Since the weak operator topology on B(H)B(H) is weaker than the strong operator topology on B(H)B(H), weakly continuous implies strongly continuous. To be more precise, assume TnTT_{n}\to T strongly. Then TnTT_{n}\to T weakly, and since ff was assumed to be weakly continuous, we have f(Tn)f(T)f(T_{n})\to f(T). Thus ff is strongly continuous, as desired.

(2)(3)(2)\implies(3) Assume indeed that our linear form f:Ef:E\to\mathbb{C} is strongly continuous. In particular ff is strongly continuous at 0, and Proposition 9.2 provides us with vectors x1,,xnHx_{1},\ldots,x_{n}\in H and a number ε>0\varepsilon>0 such that, with the notations there: