Principles of operator algebras

Teo Banica Department of Mathematics, University of Cergy-Pontoise, F-95000 Cergy-Pontoise, France. [email protected]

Abstract.

This is an introduction to the algebras $A\subset B(H)$ that the linear operators $T:H\to H$ can form, once a complex Hilbert space $H$ is given. Motivated by quantum mechanics, we are mainly interested in the von Neumann algebras, which are stable under taking adjoints, $T\to T^{*}$ , and are weakly closed. When the algebra has a trace $tr:A\to\mathbb{C}$ , we can think of it as being of the form $A=L^{\infty}(X)$ , with $X$ being a quantum measured space. Of particular interest is the free case, where the center of the algebra reduces to the scalars, $Z(A)=\mathbb{C}$ . Following von Neumann, Connes, Jones, Voiculescu and others, we discuss the basic properties of such algebras $A$ , and how to do algebra, geometry, analysis and probability on the underlying quantum spaces $X$ .

Key words and phrases:

Linear operator, Operator algebra

2010 Mathematics Subject Classification:

46L10

Preface

Quantum mechanics as we know it is the source of many puzzling questions. The simplest quantum mechanical system is the hydrogen atom, consisting of a negative charge, an electron, moving around a positive charge, a proton. This reminds electrodynamics, and accepting the fact that the electron is a bit of a slippery particle, whose position and speed are described by probability, rather than by exact formulae, the hydrogen atom can indeed be solved, by starting with electrodynamics, and making a long series of corrections, for the most coming from experiments, but sometimes coming as well from intuition, with the idea in mind that beautiful mathematics should correspond to true physics. The solution, as we presently know it, is something quite complicated.

Mathematically, the commonly accepted belief is that the good framework for the study of quantum mechanics is an infinite dimensional complex Hilbert space $H$ , whose vectors can be thought of as being states of the system, and with the linear operators $T:H\to H$ corresponding to the observables. This is however to be taken with care, because in order to do “true physics”, things must be far sharper than that. Always remember indeed that the simplest object of quantum mechanics is the hydrogen atom, whose simplest states and observables are something quite complicated. Thus when talking about “states and observables”, we have a whole continuum of possible considerations and theories, ranging from true physics to very abstract mathematics.

For making things worse, even the existence and exact relevance of the Hilbert space $H$ is subject to debate. This is something more philosophical, related to the 2-body hydrogen problem evoked above, which has twisted the minds of many scientists, starting with Einstein and others. Can we get someday to a better quantum mechanics, by adding more variables to those available inside $H$ ? No one really knows the answer here.

The present book is an introduction to the algebras $A\subset B(H)$ that the bounded linear operators $T:H\to H$ can form, once a Hilbert space $H$ is given. There has been an enormous amount of work on such algebras, starting with von Neumann in the 1930s, and we will insist here on the aspects which are beautiful. With the idea, or rather hope in mind, that beautiful mathematics should correspond to true physics.

So, what is beauty, in the operator algebra framework? In our opinion, the source of all possible beauty is an old result of von Neumann, related to the Spectral Theorem for normal operators, which states that any commutative von Neumann algebra $A\subset B(H)$ must be of the form $A=L^{\infty}(X)$ , with $X$ being a measured space.

This is something subtle and interesting, which suggests doing several things with the von Neumann algebras $A\subset B(H)$ . Given such an algebra we can write the center as $Z(A)=L^{\infty}(X)$ , we have then a decomposition of type $A=\int_{X}A_{x}dx$ , and the problem is that of understanding the structure of the fibers, called “factors”. This is what von Neumann himself, and then Connes and others, did. Another idea, more speculative, following later work of Connes, and in parallel work of Voiculescu, is that of writing $A=L^{\infty}(X)$ , with $X$ being an abstract “quantum measured space”, and then trying to understand the geometry and probabilistic theory of $X$ . Finally, yet another beautiful idea, due this time to Jones, is that of looking at the inclusions $A_{0}\subset A_{1}$ of von Neumann algebras, instead at the von Neumann algebras themselves, the point being that the “symmetries” of such an inclusion lead to interesting combinatorics.

All in all, many things that can be done with a von Neumann algebra $A\subset B(H)$ , and explaining the basics, plus having a look at the above 4 directions of research, is already what a medium sized book can cover. And this book is written exactly with this idea in mind. We will talk about all the above, keeping things as simple as possible, and with everything being accessible with a minimal knowledge of undergraduate mathematics.

The book is organized in 4 parts, with Part I explaining the basics of operator theory, Part II explaining the basics of operator algebras, with a look into geometry and probability too, then Part III going into the structure of the von Neumann factors, and finally Part IV being an introduction to the subfactor theory of Jones.

This book contains, besides the basics of the operator algebra theory, some modern material as well, namely quantum group illustrations for pretty much everything, and I am grateful to Julien Bichon, Benoît Collins, Steve Curran and the others, for our joint work. Many thanks go as well to my cats. Their views and opinions on mathematics, and knowledge of advanced functional analysis, have always been of great help.

Cergy, August 2024

Teo Banica

Part I Linear operators

Does anybody here remember Vera Lynn

Remember how she said that

We would meet again

Some sunny day

Chapter 1 Linear algebra

1a. Linear maps

According to various findings in physics, starting with those of Heisenberg from the early 1920s, basic quantum mechanics involves linear operators $T:H\to H$ from a complex Hilbert space $H$ to itself. The space $H$ is typically infinite dimensional, a basic example being the Schrödinger space $H=L^{2}(\mathbb{R}^{3})$ of the wave functions $\psi:\mathbb{R}^{3}\to\mathbb{C}$ of the electron. In fact, in what regards the electron, this space $H=L^{2}(\mathbb{R}^{3})$ is basically the correct one, with the only adjustment needed, due to Pauli and others, being that of tensoring with a copy of $K=\mathbb{C}^{2}$ , in order to account for the electron spin.

But more on this later. Let us start this book more modestly, as follows:

Fact 1.1.

We are interested in quantum mechanics, taking place in infinite dimensions, but as a main source of inspiration we will have $H=\mathbb{C}^{N}$ , with scalar product

<x,y>=\sum_{i}x_{i}\bar{y}_{i}

with the linearity at left being the standard mathematical convention. More specifically, we will be interested in the mathematics of the linear operators $T:H\to H$ .

The point now, that you surely know about, is that the above operators $T:H\to H$ correspond to the square matrices $A\in M_{N}(\mathbb{C})$ . Thus, as a preliminary to what we want to do in this book, we need a good knowledge of linear algebra over $\mathbb{C}$ .

You probably know well linear algebra, but always good to recall this, and this will be the purpose of the present chapter. Let us start with the very basics:

Theorem 1.2.

The linear maps $T:\mathbb{C}^{N}\to\mathbb{C}^{N}$ are in correspondence with the square matrices $A\in M_{N}(\mathbb{C})$ , with the linear map associated to such a matrix being

Tx=Ax

and with the matrix associated to a linear map being $A_{ij}=<Te_{j},e_{i}>$ .

Proof.

The first assertion is clear, because a linear map $T:\mathbb{C}^{N}\to\mathbb{C}^{N}$ must send a vector $x\in\mathbb{C}^{N}$ to a certain vector $Tx\in\mathbb{C}^{N}$ , all whose components are linear combinations of the components of $x$ . Thus, we can write, for certain complex numbers $A_{ij}\in\mathbb{C}$ :

T\begin{pmatrix}x_{1}\\ \vdots\\ \vdots\\ x_{N}\end{pmatrix}=\begin{pmatrix}A_{11}x_{1}+\ldots+A_{1N}x_{N}\\ \vdots\\ \vdots\\ A_{N1}x_{1}+\ldots+A_{NN}x_{N}\end{pmatrix}

Now the parameters $A_{ij}\in\mathbb{C}$ can be regarded as being the entries of a square matrix $A\in M_{N}(\mathbb{C})$ , and with the usual convention for matrix multiplication, we have:

Tx=Ax

Regarding the second assertion, with $Tx=Ax$ as above, if we denote by $e_{1},\ldots,e_{N}$ the standard basis of $\mathbb{C}^{N}$ , then we have the following formula:

Te_{j}=\begin{pmatrix}A_{1j}\\ \vdots\\ \vdots\\ A_{Nj}\end{pmatrix}

But this gives the second formula, $<Te_{j},e_{i}>=A_{ij}$ , as desired. ∎

Our claim now is that, no matter what we want to do with $T$ or $A$ , of advanced type, we will run at some point into their adjoints $T^{*}$ and $A^{*}$ , constructed as follows:

Theorem 1.3.

The adjoint operator $T^{*}:\mathbb{C}^{N}\to\mathbb{C}^{N}$ , which is given by

<Tx,y>=<x,T^{*}y>

corresponds to the adjoint matrix $A^{*}\in M_{N}(\mathbb{C})$ , given by

(A^{*})_{ij}=\bar{A}_{ji}

via the correspondence between linear maps and matrices constructed above.

Proof.

Given a linear map $T:\mathbb{C}^{N}\to\mathbb{C}^{N}$ , fix $y\in\mathbb{C}^{N}$ , and consider the linear form $\varphi(x)=<Tx,y>$ . This form must be as follows, for a certain vector $T^{*}y\in\mathbb{C}^{N}$ :

\varphi(x)=<x,T^{*}y>

Thus, we have constructed a map $y\to T^{*}y$ as in the statement, which is obviously linear, and that we can call $T^{*}$ . Now by taking the vectors $x,y\in\mathbb{C}^{N}$ to be elements of the standard basis of $\mathbb{C}^{N}$ , our defining formula for $T^{*}$ reads:

<Te_{i},e_{j}>=<e_{i},T^{*}e_{j}>

By reversing the scalar product on the right, this formula can be written as:

<T^{*}e_{j},e_{i}>=\overline{<Te_{i},e_{j}>}

But this means that the matrix of $T^{*}$ is given by $(A^{*})_{ij}=\bar{A}_{ji}$ , as desired. ∎

Getting back to our claim, the adjoints $*$ are indeed ubiquitous, as shown by:

Theorem 1.4.

The following happen:

(1)

$T(x)=Ux$ with $U\in M_{N}(\mathbb{C})$ is an isometry precisely when $U^{*}=U^{-1}$ .
(2)

$T(x)=Px$ with $P\in M_{N}(\mathbb{C})$ is a projection precisely when $P^{2}=P^{*}=P$ .

Proof.

Let us first recall that the lengths, or norms, of the vectors $x\in\mathbb{C}^{N}$ can be recovered from the knowledge of the scalar products, as follows:

||x||=\sqrt{<x,x>}

Conversely, we can recover the scalar products out of norms, by using the following difficult to remember formula, called complex polarization identity:

4<x,y>=||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}

The proof of this latter formula is indeed elementary, as follows:

			$\displaystyle\|\|x+y\|\|^{2}-\|\|x-y\|\|^{2}+i\|\|x+iy\|\|^{2}-i\|\|x-iy\|\|^{2}$
		$\displaystyle=$	$\displaystyle\|\|x\|\|^{2}+\|\|y\|\|^{2}-\|\|x\|\|^{2}-\|\|y\|\|^{2}+i\|\|x\|\|^{2}+i\|\|y\|\|^{2}-i\|\|x\|\|^{2}-i\|\|y\|\|^{2}$
			$\displaystyle+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)$
		$\displaystyle=$	$\displaystyle 4<x,y>$

Finally, we will use Theorem 1.3, and more specifically the following formula coming from there, valid for any matrix $A\in M_{N}(\mathbb{C})$ and any two vectors $x,y\in\mathbb{C}^{N}$ :

<Ax,y>=<x,A^{*}y>

(1) Given a matrix $U\in M_{N}(\mathbb{C})$ , we have indeed the following equivalences, with the first one coming from the polarization identity, and the other ones being clear:

$\displaystyle\|\|Ux\|\|=\|\|x\|\|$	$\displaystyle\iff$	$\displaystyle<Ux,Uy>=<x,y>$
	$\displaystyle\iff$	$\displaystyle<x,U^{*}Uy>=<x,y>$
	$\displaystyle\iff$	$\displaystyle U^{*}Uy=y$
	$\displaystyle\iff$	$\displaystyle U^{*}U=1$
	$\displaystyle\iff$	$\displaystyle U^{*}=U^{-1}$

(2) Given a matrix $P\in M_{N}(\mathbb{C})$ , in order for $x\to Px$ to be an oblique projection, we must have $P^{2}=P$ . Now observe that this projection is orthogonal when:

$\displaystyle<Px-x,Py>=0$	$\displaystyle\iff$	$\displaystyle<P^{}Px-P^{}x,y>=0$
	$\displaystyle\iff$	$\displaystyle P^{}Px-P^{}x=0$
	$\displaystyle\iff$	$\displaystyle P^{}P-P^{}=0$
	$\displaystyle\iff$	$\displaystyle P^{}P=P^{}$

The point now is that by conjugating the last formula, we obtain $P^{*}P=P$ . Thus we must have $P=P^{*}$ , and this gives the result. ∎

Summarizing, the linear operators come in pairs $T,T^{*}$ , and the associated matrices come as well in pairs $A,A^{*}$ . This is something quite interesting, philosophically speaking, and will keep this in mind, and come back to it later, on numerous occasions.

1b. Diagonalization

Let us discuss now the diagonalization question for the linear maps and matrices. Again, we will be quite brief here, and for more, we refer to any standard linear algebra book. By the way, there will be some complex analysis involved too, and here we refer to Rudin [rud]. Which book of Rudin will be in fact the one and only true prerequisite for reading the present book, but more on references and reading later.

The basic diagonalization theory, formulated in terms of matrices, is as follows:

Proposition 1.5.

A vector $v\in\mathbb{C}^{N}$ is called eigenvector of $A\in M_{N}(\mathbb{C})$ , with corresponding eigenvalue $\lambda$ , when $A$ multiplies by $\lambda$ in the direction of $v$ :

Av=\lambda v

In the case where $\mathbb{C}^{N}$ has a basis $v_{1},\ldots,v_{N}$ formed by eigenvectors of $A$ , with corresponding eigenvalues $\lambda_{1},\ldots,\lambda_{N}$ , in this new basis $A$ becomes diagonal, as follows:

A\sim\begin{pmatrix}\lambda_{1}\\ &\ddots\\ &&\lambda_{N}\end{pmatrix}

Equivalently, if we denote by $D=diag(\lambda_{1},\ldots,\lambda_{N})$ the above diagonal matrix, and by $P=[v_{1}\ldots v_{N}]$ the square matrix formed by the eigenvectors of $A$ , we have:

A=PDP^{-1}

In this case we say that the matrix $A$ is diagonalizable.

Proof.

This is something which is clear, the idea being as follows:

(1) The first assertion is clear, because the matrix which multiplies each basis element $v_{i}$ by a number $\lambda_{i}$ is precisely the diagonal matrix $D=diag(\lambda_{1},\ldots,\lambda_{N})$ .

(2) The second assertion follows from the first one, by changing the basis. We can prove this by a direct computation as well, because we have $Pe_{i}=v_{i}$ , and so:

$\displaystyle PDP^{-1}v_{i}$	$\displaystyle=$	$\displaystyle PDe_{i}$
	$\displaystyle=$	$\displaystyle P\lambda_{i}e_{i}$
	$\displaystyle=$	$\displaystyle\lambda_{i}Pe_{i}$
	$\displaystyle=$	$\displaystyle\lambda_{i}v_{i}$

Thus, the matrices $A$ and $PDP^{-1}$ coincide, as stated. ∎

Let us recall as well that the basic example of a non diagonalizable matrix, over the complex numbers as above, is the following matrix:

J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

Indeed, we have $J\binom{x}{y}=\binom{y}{0}$ , so the eigenvectors are the vectors of type $\binom{x}{0}$ , all with eigenvalue $0$ . Thus, we have not enough eigenvectors for constructing a basis of $\mathbb{C}^{2}$ .

In general, in order to study the diagonalization problem, the idea is that the eigenvectors can be grouped into linear spaces, called eigenspaces, as follows:

Theorem 1.6.

Let $A\in M_{N}(\mathbb{C})$ , and for any eigenvalue $\lambda\in\mathbb{C}$ define the corresponding eigenspace as being the vector space formed by the corresponding eigenvectors:

E_{\lambda}=\left\{v\in\mathbb{C}^{N}\Big{|}Av=\lambda v\right\}

These eigenspaces $E_{\lambda}$ are then in a direct sum position, in the sense that given vectors $v_{1}\in E_{\lambda_{1}},\ldots,v_{k}\in E_{\lambda_{k}}$ corresponding to different eigenvalues $\lambda_{1},\ldots,\lambda_{k}$ , we have:

\sum_{i}c_{i}v_{i}=0\implies c_{i}=0

In particular we have the following estimate, with sum over all the eigenvalues,

\sum_{\lambda}\dim(E_{\lambda})\leq N

and our matrix is diagonalizable precisely when we have equality.

Proof.

We prove the first assertion by recurrence on $k\in\mathbb{N}$ . Assume by contradiction that we have a formula as follows, with the scalars $c_{1},\ldots,c_{k}$ being not all zero:

c_{1}v_{1}+\ldots+c_{k}v_{k}=0

By dividing by one of these scalars, we can assume that our formula is:

v_{k}=c_{1}v_{1}+\ldots+c_{k-1}v_{k-1}

Now let us apply $A$ to this vector. On the left we obtain:

Av_{k}=\lambda_{k}v_{k}=\lambda_{k}c_{1}v_{1}+\ldots+\lambda_{k}c_{k-1}v_{k-1}

On the right we obtain something different, as follows:

	$\displaystyle A(c_{1}v_{1}+\ldots+c_{k-1}v_{k-1})$	$\displaystyle=$	$\displaystyle c_{1}Av_{1}+\ldots+c_{k-1}Av_{k-1}$
		$\displaystyle=$	$\displaystyle c_{1}\lambda_{1}v_{1}+\ldots+c_{k-1}\lambda_{k-1}v_{k-1}$

We conclude from this that the following equality must hold:

\lambda_{k}c_{1}v_{1}+\ldots+\lambda_{k}c_{k-1}v_{k-1}=c_{1}\lambda_{1}v_{1}+\ldots+c_{k-1}\lambda_{k-1}v_{k-1}

On the other hand, we know by recurrence that the vectors $v_{1},\ldots,v_{k-1}$ must be linearly independent. Thus, the coefficients must be equal, at right and at left:

\lambda_{k}c_{1}=c_{1}\lambda_{1}

\vdots

\lambda_{k}c_{k-1}=c_{k-1}\lambda_{k-1}

Now since at least one of the numbers $c_{i}$ must be nonzero, from $\lambda_{k}c_{i}=c_{i}\lambda_{i}$ we obtain $\lambda_{k}=\lambda_{i}$ , which is a contradiction. Thus our proof by recurrence of the first assertion is complete. As for the second assertion, this follows from the first one. ∎

In order to reach now to more advanced results, we can use the characteristic polynomial, which appears via the following fundamental result:

Theorem 1.7.

Given a matrix $A\in M_{N}(\mathbb{C})$ , consider its characteristic polynomial:

P(x)=\det(A-x1_{N})

The eigenvalues of $A$ are then the roots of $P$ . Also, we have the inequality

\dim(E_{\lambda})\leq m_{\lambda}

where $m_{\lambda}$ is the multiplicity of $\lambda$ , as root of $P$ .

Proof.

The first assertion follows from the following computation, using the fact that a linear map is bijective when the determinant of the associated matrix is nonzero:

	$\displaystyle\exists v,Av=\lambda v$	$\displaystyle\iff$	$\displaystyle\exists v,(A-\lambda 1_{N})v=0$
		$\displaystyle\iff$	$\displaystyle\det(A-\lambda 1_{N})=0$

Regarding now the second assertion, given an eigenvalue $\lambda$ of our matrix $A$ , consider the dimension $d_{\lambda}=\dim(E_{\lambda})$ of the corresponding eigenspace. By changing the basis of $\mathbb{C}^{N}$ , as for the eigenspace $E_{\lambda}$ to be spanned by the first $d_{\lambda}$ basis elements, our matrix becomes as follows, with $B$ being a certain smaller matrix:

A\sim\begin{pmatrix}\lambda 1_{d_{\lambda}}&0\\ 0&B\end{pmatrix}

We conclude that the characteristic polynomial of $A$ is of the following form:

P_{A}=P_{\lambda 1_{d_{\lambda}}}P_{B}=(\lambda-x)^{d_{\lambda}}P_{B}

Thus the multiplicity $m_{\lambda}$ of our eigenvalue $\lambda$ , as a root of $P$ , satisfies $m_{\lambda}\geq d_{\lambda}$ , and this leads to the conclusion in the statement. ∎

Now recall that we are over $\mathbb{C}$ , which is something that we have not used yet, in our last two statements. And the point here is that we have the following key result:

Theorem 1.8.

Any polynomial $P\in\mathbb{C}[X]$ decomposes as

P=c(X-a_{1})\ldots(X-a_{N})

with $c\in\mathbb{C}$ and with $a_{1},\ldots,a_{N}\in\mathbb{C}$ .

Proof.

It is enough to prove that $P$ has one root, and we do this by contradiction. Assume that $P$ has no roots, and pick a number $z\in\mathbb{C}$ where $|P|$ attains its minimum:

|P(z)|=\min_{x\in\mathbb{C}}|P(x)|>0

Since $Q(t)=P(z+t)-P(z)$ is a polynomial which vanishes at $t=0$ , this polynomial must be of the form $ct^{k}$ + higher terms, with $c\neq 0$ , and with $k\geq 1$ being an integer. We obtain from this that, with $t\in\mathbb{C}$ small, we have the following estimate:

P(z+t)\simeq P(z)+ct^{k}

Now let us write $t=rw$ , with $r>0$ small, and with $|w|=1$ . Our estimate becomes:

P(z+rw)\simeq P(z)+cr^{k}w^{k}

Now recall that we have assumed $P(z)\neq 0$ . We can therefore choose $w\in\mathbb{T}$ such that $cw^{k}$ points in the opposite direction to that of $P(z)$ , and we obtain in this way:

|P(z+rw)|\simeq|P(z)+cr^{k}w^{k}|=|P(z)|(1-|c|r^{k})

Now by choosing $r>0$ small enough, as for the error in the first estimate to be small, and overcame by the negative quantity $-|c|r^{k}$ , we obtain from this:

|P(z+rw)|<|P(z)|

But this contradicts our definition of $z\in\mathbb{C}$ , as a point where $|P|$ attains its minimum. Thus $P$ has a root, and by recurrence it has $N$ roots, as stated. ∎

Now by putting everything together, we obtain the following result:

Theorem 1.9.

Given a matrix $A\in M_{N}(\mathbb{C})$ , consider its characteristic polynomial

P(X)=\det(A-X1_{N})

then factorize this polynomial, by computing the complex roots, with multiplicities,

P(X)=(-1)^{N}(X-\lambda_{1})^{n_{1}}\ldots(X-\lambda_{k})^{n_{k}}

and finally compute the corresponding eigenspaces, for each eigenvalue found:

E_{i}=\left\{v\in\mathbb{C}^{N}\Big{|}Av=\lambda_{i}v\right\}

The dimensions of these eigenspaces satisfy then the following inequalities,

\dim(E_{i})\leq n_{i}

and $A$ is diagonalizable precisely when we have equality for any $i$ .

Proof.

This follows by combining Theorem 1.6, Theorem 1.7 and Theorem 1.8. Indeed, the statement is well formulated, thanks to Theorem 1.8. By summing the inequalities $\dim(E_{\lambda})\leq m_{\lambda}$ from Theorem 1.7, we obtain an inequality as follows:

\sum_{\lambda}\dim(E_{\lambda})\leq\sum_{\lambda}m_{\lambda}\leq N

On the other hand, we know from Theorem 1.6 that our matrix is diagonalizable when we have global equality. Thus, we are led to the conclusion in the statement. ∎

This was for the main result of linear algebra. There are countless applications of this, and generally speaking, advanced linear algebra consists in building on Theorem 1.9.

In practice, diagonalizing a matrix remains something quite complicated. Let us record a useful algorithmic version of the above result, as follows:

Theorem 1.10.

The square matrices $A\in M_{N}(\mathbb{C})$ can be diagonalized as follows:

(1)

Compute the characteristic polynomial.
(2)

Factorize the characteristic polynomial.
(3)

Compute the eigenvectors, for each eigenvalue found.
(4)

If there are no $N$ eigenvectors, $A$ is not diagonalizable.
(5)

Otherwise, $A$ is diagonalizable, $A=PDP^{-1}$ .

Proof.

This is an informal reformulation of Theorem 1.9, with (4) referring to the total number of linearly independent eigenvectors found in (3), and with $A=PDP^{-1}$ in (5) being the usual diagonalization formula, with $P,D$ being as before. ∎

As an illustration for all this, which is a must-know computation, we have:

Proposition 1.11.

The rotation of angle $t\in\mathbb{R}$ in the plane diagonalizes as:

\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}=\frac{1}{2}\begin{pmatrix}1&1\\ i&-i\end{pmatrix}\begin{pmatrix}e^{-it}&0\\ 0&e^{it}\end{pmatrix}\begin{pmatrix}1&-i\\ 1&i\end{pmatrix}

Over the reals this is impossible, unless $t=0,\pi$ , where the rotation is diagonal.

Proof.

Observe first that, as indicated, unlike we are in the case $t=0,\pi$ , where our rotation is $\pm 1_{2}$ , our rotation is a “true” rotation, having no eigenvectors in the plane. Fortunately the complex numbers come to the rescue, via the following computation:

\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{i}=\binom{\cos t-i\sin t}{i\cos t+\sin t}=e^{-it}\binom{1}{i}

We have as well a second complex eigenvector, coming from:

\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\binom{1}{-i}=\binom{\cos t+i\sin t}{-i\cos t+\sin t}=e^{it}\binom{1}{-i}

Thus, we are led to the conclusion in the statement. ∎

1c. Matrix tricks

At the level of basic examples of diagonalizable matrices, we first have the following result, which provides us with the “generic” examples:

Theorem 1.12.

For a matrix $A\in M_{N}(\mathbb{C})$ the following conditions are equivalent,

(1)

The eigenvalues are different, $\lambda_{i}\neq\lambda_{j}$ ,
(2)

The characteristic polynomial $P$ has simple roots,
(3)

The characteristic polynomial satisfies $(P,P^{\prime})=1$ ,
(4)

The resultant of $P,P^{\prime}$ is nonzero, $R(P,P^{\prime})\neq 0$ ,
(5)

The discriminant of $P$ is nonzero, $\Delta(P)\neq 0$ ,

and in this case, the matrix is diagonalizable.

Proof.

The last assertion holds indeed, due to Theorem 1.9. As for the equivalences in the statement, these are all standard, the idea for their proofs, along with some more theory, needed for using in practice the present result, being as follows:

$(1)\iff(2)$ This follows from Theorem 1.9.

$(2)\iff(3)$ This is standard, the double roots of $P$ being roots of $P^{\prime}$ .

$(3)\iff(4)$ The idea here is that associated to any two polynomials $P,Q$ is their resultant $R(P,Q)$ , which checks whether $P,Q$ have a common root. Let us write:

P=c(X-a_{1})\ldots(X-a_{k})

Q=d(X-b_{1})\ldots(X-b_{l})

We can define then the resultant as being the following quantity:

R(P,Q)=c^{l}d^{k}\prod_{ij}(a_{i}-b_{j})

The point now, that we will explain as well, is that this is a polynomial in the coefficients of $P,Q$ , with integer coefficients. Indeed, this can be checked as follows:

– We can expand the formula of $R(P,Q)$ , and in what regards $a_{1},\ldots,a_{k}$ , which are the roots of $P$ , we obtain in this way certain symmetric functions in these variables, which will be therefore polynomials in the coefficients of $P$ , with integer coefficients.

– We can then look what happens with respect to the remaining variables $b_{1},\ldots,b_{l}$ , which are the roots of $Q$ . Once again what we have here are certain symmetric functions, and so polynomials in the coefficients of $Q$ , with integer coefficients.

– Thus, we are led to the above conclusion, that $R(P,Q)$ is a polynomial in the coefficients of $P,Q$ , with integer coefficients, and with the remark that the $c^{l}d^{k}$ factor is there for these latter coefficients to be indeed integers, instead of rationals.

Alternatively, let us write our two polynomials in usual form, as follows:

P=p_{k}X^{k}+\ldots+p_{1}X+p_{0}

Q=q_{l}X^{l}+\ldots+q_{1}X+q_{0}

The corresponding resultant appears then as the determinant of an associated matrix, having size $k+l$ , and having $0$ coefficients at the blank spaces, as follows:

R(P,Q)=\begin{vmatrix}p_{k}&&&q_{l}\\ \vdots&\ddots&&\vdots&\ddots\\ p_{0}&&p_{k}&q_{0}&&q_{l}\\ &\ddots&\vdots&&\ddots&\vdots\\ &&p_{0}&&&q_{0}\end{vmatrix}

$(4)\iff(5)$ Once again this is something standard, the idea here being that the discriminant $\Delta(P)$ of a polynomial $P\in\mathbb{C}[X]$ is, modulo scalars, the resultant $R(P,P^{\prime})$ . To be more precise, let us write our polynomial as follows:

P(X)=cX^{N}+dX^{N-1}+\ldots

Its discriminant is then defined as being the following quantity:

\Delta(P)=\frac{(-1)^{\binom{N}{2}}}{c}R(P,P^{\prime})

This is a polynomial in the coefficients of $P$ , with integer coefficients, with the division by $c$ being indeed possible, under $\mathbb{Z}$ , and with the sign being there for various reasons, including the compatibility with some well-known formulae, at small values of $N$ . ∎

All the above might seem a bit complicated, so as an illustration, let us work out an example. Consider the case of a polynomial of degree 2, and a polynomial of degree 1:

P=ax^{2}+bx+c\quad,\quad Q=dx+e

In order to compute the resultant, let us factorize our polynomials:

P=a(x-p)(x-q)\quad,\quad Q=d(x-r)

The resultant can be then computed as follows, by using the two-step method:

$\displaystyle R(P,Q)$	$\displaystyle=$	$\displaystyle ad^{2}(p-r)(q-r)$
	$\displaystyle=$	$\displaystyle ad^{2}(pq-(p+q)r+r^{2})$
	$\displaystyle=$	$\displaystyle cd^{2}+bd^{2}r+ad^{2}r^{2}$
	$\displaystyle=$	$\displaystyle cd^{2}-bde+ae^{2}$

Observe that $R(P,Q)=0$ corresponds indeed to the fact that $P,Q$ have a common root. Indeed, the root of $Q$ is $r=-e/d$ , and we have:

P(r)=\frac{ae^{2}}{d^{2}}-\frac{be}{d}+c=\frac{R(P,Q)}{d^{2}}

We can recover as well the resultant as a determinant, as follows:

R(P,Q)=\begin{vmatrix}a&d&0\\ b&e&d\\ c&0&e\end{vmatrix}=ae^{2}-bde+cd^{2}

Finally, in what regards the discriminant, let us see what happens in degree 2. Here we must compute the resultant of the following two polynomials:

P=aX^{2}+bX+c\quad,\quad P^{\prime}=2aX+b

The resultant is then given by the following formula:

$\displaystyle R(P,P^{\prime})$	$\displaystyle=$	$\displaystyle ab^{2}-b(2a)b+c(2a)^{2}$
	$\displaystyle=$	$\displaystyle 4a^{2}c-ab^{2}$
	$\displaystyle=$	$\displaystyle-a(b^{2}-4ac)$

Now by doing the discriminant normalizations, we obtain, as we should:

\Delta(P)=b^{2}-4ac

As already mentioned, one can prove that the matrices having distinct eigenvalues are “generic”, and so the above result basically captures the whole situation. We have in fact the following collection of density results, which are quite advanced:

Theorem 1.13.

The following happen, inside $M_{N}(\mathbb{C})$ :

(1)

The invertible matrices are dense.
(2)

The matrices having distinct eigenvalues are dense.
(3)

The diagonalizable matrices are dense.

Proof.

These are quite advanced results, which can be proved as follows:

(1) This is clear, intuitively speaking, because the invertible matrices are given by the condition $\det A\neq 0$ . Thus, the set formed by these matrices appears as the complement of the hypersurface $\det A=0$ , and so must be dense inside $M_{N}(\mathbb{C})$ , as claimed.

(2) Here we can use a similar argument, this time by saying that the set formed by the matrices having distinct eigenvalues appears as the complement of the hypersurface given by $\Delta(P_{A})=0$ , and so must be dense inside $M_{N}(\mathbb{C})$ , as claimed.

(3) This follows from (2), via the fact that the matrices having distinct eigenvalues are diagonalizable, that we know from Theorem 1.12. There are of course some other proofs as well, for instance by putting the matrix in Jordan form. ∎

As an application of the above results, and of our methods in general, we have:

Theorem 1.14.

The following happen:

(1)

We have $P_{AB}=P_{BA}$ , for any two matrices $A,B\in M_{N}(\mathbb{C})$ .
(2)

$AB,BA$ have the same eigenvalues, with the same multiplicities.
(3)

If $A$ has eigenvalues $\lambda_{1},\ldots,\lambda_{N}$ , then $f(A)$ has eigenvalues $f(\lambda_{1}),\ldots,f(\lambda_{N})$ .

Proof.

These results can be deduced by using Theorem 1.13, as follows:

(1) It follows from definitions that the characteristic polynomial of a matrix is invariant under conjugation, in the sense that we have the following formula:

P_{C}=P_{ACA^{-1}}

Now observe that, when assuming that $A$ is invertible, we have:

AB=A(BA)A^{-1}

Thus, we have the result when $A$ is invertible. By using now Theorem 1.13 (1), we conclude that this formula holds for any matrix $A$ , by continuity.

(2) This is a reformulation of (1), via the fact that $P$ encodes the eigenvalues, with multiplicities, which is hard to prove with bare hands.

(3) This is something quite informal, clear for the diagonal matrices $D$ , then for the diagonalizable matrices $PDP^{-1}$ , and finally for all matrices, by using Theorem 1.13 (3), provided that $f$ has suitable regularity properties. We will be back to this. ∎

Let us go back to the main problem raised by the diagonalization procedure, namely the computation of the roots of characteristic polynomials. We have here:

Theorem 1.15.

The complex eigenvalues of a matrix $A\in M_{N}(\mathbb{C})$ , counted with multiplicities, have the following properties:

(1)

Their sum is the trace.
(2)

Their product is the determinant.

Proof.

Consider indeed the characteristic polynomial $P$ of the matrix:

	$\displaystyle P(X)$	$\displaystyle=$	$\displaystyle\det(A-X1_{N})$
		$\displaystyle=$	$\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}Tr(A)X^{N-1}+\ldots+\det(A)$

We can factorize this polynomial, by using its $N$ complex roots, and we obtain:

	$\displaystyle P(X)$	$\displaystyle=$	$\displaystyle(-1)^{N}(X-\lambda_{1})\ldots(X-\lambda_{N})$
		$\displaystyle=$	$\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}\left(\sum_{i}\lambda_{i}\right)X^{N-1}+\ldots+\prod_{i}\lambda_{i}$

Thus, we are led to the conclusion in the statement. ∎

Regarding now the intermediate terms, we have here:

Theorem 1.16.

Assume that $A\in M_{N}(\mathbb{C})$ has eigenvalues $\lambda_{1},\ldots,\lambda_{N}\in\mathbb{C}$ , counted with multiplicities. The basic symmetric functions of these eigenvalues, namely

c_{k}=\sum_{i_{1}<\ldots<i_{k}}\lambda_{i_{1}}\ldots\lambda_{i_{k}}

are then given by the fact that the characteristic polynomial of the matrix is:

P(X)=(-1)^{N}\sum_{k=0}^{N}(-1)^{k}c_{k}X^{k}

Moreover, all symmetric functions of the eigenvalues, such as the sums of powers

d_{s}=\lambda_{1}^{s}+\ldots+\lambda_{N}^{s}

appear as polynomials in these characteristic polynomial coefficients $c_{k}$ .

Proof.

These results can be proved by doing some algebra, as follows:

(1) Consider indeed the characteristic polynomial $P$ of the matrix, factorized by using its $N$ complex roots, taken with multiplicities. By expanding, we obtain:

$\displaystyle P(X)$	$\displaystyle=$	$\displaystyle(-1)^{N}(X-\lambda_{1})\ldots(X-\lambda_{N})$
	$\displaystyle=$	$\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}\left(\sum_{i}\lambda_{i}\right)X^{N-1}+\ldots+\prod_{i}\lambda_{i}$
	$\displaystyle=$	$\displaystyle(-1)^{N}X^{N}+(-1)^{N-1}c_{1}X^{N-1}+\ldots+(-1)^{0}c_{N}$
	$\displaystyle=$	$\displaystyle(-1)^{N}\left(X^{N}-c_{1}X^{N-1}+\ldots+(-1)^{N}c_{N}\right)$

With the convention $c_{0}=1$ , we are led to the conclusion in the statement.

(2) This is something standard, coming by doing some abstract algebra. Working out the formulae for the sums of powers $d_{s}=\sum_{i}\lambda_{i}^{s}$ , at small values of the exponent $s\in\mathbb{N}$ , is an excellent exercise, which shows how to proceed in general, by recurrence. ∎

1d. Spectral theorems

Let us go back now to the diagonalization question. Here is a key result:

Theorem 1.17.

Any matrix $A\in M_{N}(\mathbb{C})$ which is self-adjoint, $A=A^{*}$ , is diagonalizable, with the diagonalization being of the following type,

A=UDU^{*}

with $U\in U_{N}$ , and with $D\in M_{N}(\mathbb{R})$ diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because if we take a matrix of the form $A=UDU^{*}$ , with $U$ unitary and $D$ diagonal and real, then we have:

$\displaystyle A^{*}$	$\displaystyle=$	$\displaystyle(UDU^{})^{}$
	$\displaystyle=$	$\displaystyle UD^{}U^{}$
	$\displaystyle=$	$\displaystyle UDU^{*}$
	$\displaystyle=$	$\displaystyle A$

In the other sense now, assume that $A$ is self-adjoint, $A=A^{*}$ . Our first claim is that the eigenvalues are real. Indeed, assuming $Av=\lambda v$ , we have:

$\displaystyle\lambda<v,v>$	$\displaystyle=$	$\displaystyle<\lambda v,v>$
	$\displaystyle=$	$\displaystyle<Av,v>$
	$\displaystyle=$	$\displaystyle<v,Av>$
	$\displaystyle=$	$\displaystyle<v,\lambda v>$
	$\displaystyle=$	$\displaystyle\bar{\lambda}<v,v>$

Thus we obtain $\lambda\in\mathbb{R}$ , as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

Av=\lambda v\quad,\quad Aw=\mu w

We have then the following computation, using $\lambda,\mu\in\mathbb{R}$ :

$\displaystyle\lambda<v,w>$	$\displaystyle=$	$\displaystyle<\lambda v,w>$
	$\displaystyle=$	$\displaystyle<Av,w>$
	$\displaystyle=$	$\displaystyle<v,Aw>$
	$\displaystyle=$	$\displaystyle<v,\mu w>$
	$\displaystyle=$	$\displaystyle\mu<v,w>$

Thus $\lambda\neq\mu$ implies $v\perp w$ , as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of $A$ span the whole space $\mathbb{C}^{N}$ . For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

Av=\lambda v

Assuming now that we have a vector $w$ orthogonal to it, $v\perp w$ , we have:

$\displaystyle<Aw,v>$	$\displaystyle=$	$\displaystyle<w,Av>$
	$\displaystyle=$	$\displaystyle<w,\lambda v>$
	$\displaystyle=$	$\displaystyle\lambda<w,v>$
	$\displaystyle=$	$\displaystyle 0$

Thus, if $v$ is an eigenvector, then the vector space $v^{\perp}$ is invariant under $A$ . Moreover, since a matrix $A$ is self-adjoint precisely when $<Av,v>\in\mathbb{R}$ for any vector $v\in\mathbb{C}^{N}$ , as one can see by expanding the scalar product, the restriction of $A$ to the subspace $v^{\perp}$ is self-adjoint. Thus, we can proceed by recurrence, and we obtain the result. ∎

As basic examples of self-adjoint matrices, we have the orthogonal projections. The diagonalization result regarding them is as follows:

Proposition 1.18.

The matrices $P\in M_{N}(\mathbb{C})$ which are projections,

P^{2}=P^{*}=P

are precisely those which diagonalize as follows,

P=UDU^{*}

with $U\in U_{N}$ , and with $D\in M_{N}(0,1)$ being diagonal.

Proof.

The equation for the projections being $P^{2}=P^{*}=P$ , the eigenvalues $\lambda$ are real, and we have as well the following condition, coming from $P^{2}=P$ :

$\displaystyle\lambda<v,v>$	$\displaystyle=$	$\displaystyle<\lambda v,v>$
	$\displaystyle=$	$\displaystyle<Pv,v>$
	$\displaystyle=$	$\displaystyle<P^{2}v,v>$
	$\displaystyle=$	$\displaystyle<Pv,Pv>$
	$\displaystyle=$	$\displaystyle<\lambda v,\lambda v>$
	$\displaystyle=$	$\displaystyle\lambda^{2}<v,v>$

Thus we obtain $\lambda\in\{0,1\}$ , as claimed, and as a final conclusion here, the diagonalization of the self-adjoint matrices is as follows, with $e_{i}\in\{0,1\}$ :

P\sim\begin{pmatrix}e_{1}\\ &\ddots\\ &&e_{N}\end{pmatrix}

To be more precise, the number of 1 values is the dimension of the image of $P$ , and the number of 0 values is the dimension of space of vectors sent to 0 by $P$ . ∎

An important class of self-adjoint matrices, which includes for instance all the projections, are the positive matrices. The theory here is as follows:

Theorem 1.19.

For a matrix $A\in M_{N}(\mathbb{C})$ the following conditions are equivalent, and if they are satisfied, we say that $A$ is positive:

(1)

$A=B^{2}$ , with $B=B^{*}$ .
(2)

$A=CC^{*}$ , for some $C\in M_{N}(\mathbb{C})$ .
(3)

$<Ax,x>\geq 0$ , for any vector $x\in\mathbb{C}^{N}$ .
(4)

$A=A^{*}$ , and the eigenvalues are positive, $\lambda_{i}\geq 0$ .
(5)

$A=UDU^{*}$ , with $U\in U_{N}$ and with $D\in M_{N}(\mathbb{R}_{+})$ diagonal.

Proof.

The idea is that the equivalences in the statement basically follow from some elementary computations, with only Theorem 1.17 needed, at some point:

$(1)\implies(2)$ This is clear, because we can take $C=B$ .

$(2)\implies(3)$ This follows from the following computation:

$\displaystyle<Ax,x>$	$\displaystyle=$	$\displaystyle<CC^{*}x,x>$
	$\displaystyle=$	$\displaystyle<C^{}x,C^{}x>$
	$\displaystyle\geq$	$\displaystyle 0$

$(3)\implies(4)$ By using the fact that $<Ax,x>$ is real, we have:

	$\displaystyle<Ax,x>$	$\displaystyle=$	$\displaystyle<x,A^{*}x>$
		$\displaystyle=$	$\displaystyle<A^{*}x,x>$

Thus we have $A=A^{*}$ , and the remaining assertion, regarding the eigenvalues, follows from the following computation, assuming $Ax=\lambda x$ :

$\displaystyle<Ax,x>$	$\displaystyle=$	$\displaystyle<\lambda x,x>$
	$\displaystyle=$	$\displaystyle\lambda<x,x>$
	$\displaystyle\geq$	$\displaystyle 0$

$(4)\implies(5)$ This follows indeed by using Theorem 1.17.

$(5)\implies(1)$ Assuming $A=UDU^{*}$ , with $U\in U_{N}$ , and with $D\in M_{N}(\mathbb{R}_{+})$ being diagonal, we can set $B=U\sqrt{D}U^{*}$ . Then $B$ is self-adjoint, and its square is given by:

$\displaystyle B^{2}$	$\displaystyle=$	$\displaystyle U\sqrt{D}U^{}\cdot U\sqrt{D}U^{}$
	$\displaystyle=$	$\displaystyle UDU^{*}$
	$\displaystyle=$	$\displaystyle A$

Thus, we are led to the conclusion in the statement. ∎

Let us record as well the following technical version of the above result:

Theorem 1.20.

For a matrix $A\in M_{N}(\mathbb{C})$ the following conditions are equivalent, and if they are satisfied, we say that $A$ is strictly positive:

(1)

$A=B^{2}$ , with $B=B^{*}$ , invertible.
(2)

$A=CC^{*}$ , for some $C\in M_{N}(\mathbb{C})$ invertible.
(3)

$<Ax,x>>0$ , for any nonzero vector $x\in\mathbb{C}^{N}$ .
(4)

$A=A^{*}$ , and the eigenvalues are strictly positive, $\lambda_{i}>0$ .
(5)

$A=UDU^{*}$ , with $U\in U_{N}$ and with $D\in M_{N}(\mathbb{R}_{+}^{*})$ diagonal.

Proof.

This follows either from Theorem 1.19, by adding the various extra assumptions in the statement, or from the proof of Theorem 1.19, by modifying where needed. ∎

Let us discuss now the case of the unitary matrices. We have here:

Theorem 1.21.

Any matrix $U\in M_{N}(\mathbb{C})$ which is unitary, $U^{*}=U^{-1}$ , is diagonalizable, with the eigenvalues on $\mathbb{T}$ . More precisely we have

U=VDV^{*}

with $V\in U_{N}$ , and with $D\in M_{N}(\mathbb{T})$ diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because given a matrix of type $U=VDV^{*}$ , with $V\in U_{N}$ , and with $D\in M_{N}(\mathbb{T})$ being diagonal, we have:

$\displaystyle U^{*}$	$\displaystyle=$	$\displaystyle(VDV^{})^{}$
	$\displaystyle=$	$\displaystyle VD^{}V^{}$
	$\displaystyle=$	$\displaystyle VD^{-1}V^{-1}$
	$\displaystyle=$	$\displaystyle(V^{*})^{-1}D^{-1}V^{-1}$
	$\displaystyle=$	$\displaystyle(VDV^{*})^{-1}$
	$\displaystyle=$	$\displaystyle U^{-1}$

Let us prove now the first assertion, stating that the eigenvalues of a unitary matrix $U\in U_{N}$ belong to $\mathbb{T}$ . Indeed, assuming $Uv=\lambda v$ , we have:

$\displaystyle<v,v>$	$\displaystyle=$	$\displaystyle<U^{*}Uv,v>$
	$\displaystyle=$	$\displaystyle<Uv,Uv>$
	$\displaystyle=$	$\displaystyle<\lambda v,\lambda v>$
	$\displaystyle=$	$\displaystyle\|\lambda\|^{2}<v,v>$

Thus we obtain $\lambda\in\mathbb{T}$ , as claimed. Our next claim now is that the eigenspaces corresponding to different eigenvalues are pairwise orthogonal. Assume indeed that:

Uv=\lambda v\quad,\quad Uw=\mu w

We have then the following computation, using $U^{*}=U^{-1}$ and $\lambda,\mu\in\mathbb{T}$ :

$\displaystyle\lambda<v,w>$	$\displaystyle=$	$\displaystyle<\lambda v,w>$
	$\displaystyle=$	$\displaystyle<Uv,w>$
	$\displaystyle=$	$\displaystyle<v,U^{*}w>$
	$\displaystyle=$	$\displaystyle<v,U^{-1}w>$
	$\displaystyle=$	$\displaystyle<v,\mu^{-1}w>$
	$\displaystyle=$	$\displaystyle\mu<v,w>$

Thus $\lambda\neq\mu$ implies $v\perp w$ , as claimed. In order now to finish the proof, it remains to prove that the eigenspaces of $U$ span the whole space $\mathbb{C}^{N}$ . For this purpose, we will use a recurrence method. Let us pick an eigenvector of our matrix:

Uv=\lambda v

Assuming that we have a vector $w$ orthogonal to it, $v\perp w$ , we have:

$\displaystyle<Uw,v>$	$\displaystyle=$	$\displaystyle<w,U^{*}v>$
	$\displaystyle=$	$\displaystyle<w,U^{-1}v>$
	$\displaystyle=$	$\displaystyle<w,\lambda^{-1}v>$
	$\displaystyle=$	$\displaystyle\lambda<w,v>$
	$\displaystyle=$	$\displaystyle 0$

Thus, if $v$ is an eigenvector, then the vector space $v^{\perp}$ is invariant under $U$ . Now since $U$ is an isometry, so is its restriction to this space $v^{\perp}$ . Thus this restriction is a unitary, and so we can proceed by recurrence, and we obtain the result. ∎

The self-adjoint matrices and the unitary matrices are particular cases of the general notion of a “normal matrix”, and we have here:

Theorem 1.22.

Any matrix $A\in M_{N}(\mathbb{C})$ which is normal, $AA^{*}=A^{*}A$ , is diagonalizable, with the diagonalization being of the following type,

A=UDU^{*}

with $U\in U_{N}$ , and with $D\in M_{N}(\mathbb{C})$ diagonal. The converse holds too.

Proof.

As a first remark, the converse trivially holds, because if we take a matrix of the form $A=UDU^{*}$ , with $U$ unitary and $D$ diagonal, then we have:

$\displaystyle AA^{*}$	$\displaystyle=$	$\displaystyle UDU^{}\cdot UD^{}U^{*}$
	$\displaystyle=$	$\displaystyle UDD^{}U^{}$
	$\displaystyle=$	$\displaystyle UD^{}DU^{}$
	$\displaystyle=$	$\displaystyle UD^{}U^{}\cdot UDU^{*}$
	$\displaystyle=$	$\displaystyle A^{*}A$

In the other sense now, this is something more technical. Our first claim is that a matrix $A$ is normal precisely when the following happens, for any vector $v$ :

||Av||=||A^{*}v||

Indeed, the above equality can be written as follows:

<AA^{*}v,v>=<A^{*}Av,v>

But this is equivalent to $AA^{*}=A^{*}A$ , by expanding the scalar products. Our next claim is that $A,A^{*}$ have the same eigenvectors, with conjugate eigenvalues:

Av=\lambda v\implies A^{*}v=\bar{\lambda}v

Indeed, this follows from the following computation, and from the trivial fact that if $A$ is normal, then so is any matrix of type $A-\lambda 1_{N}$ :

$\displaystyle\|\|(A^{*}-\bar{\lambda}1_{N})v\|\|$	$\displaystyle=$	$\displaystyle\|\|(A-\lambda 1_{N})^{*}v\|\|$
	$\displaystyle=$	$\displaystyle\|\|(A-\lambda 1_{N})v\|\|$
	$\displaystyle=$	$\displaystyle 0$

Let us prove now, by using this, that the eigenspaces of $A$ are pairwise orthogonal. Assume that we have two eigenvectors, corresponding to different eigenvalues, $\lambda\neq\mu$ :

Av=\lambda v\quad,\quad Aw=\mu w

We have the following computation, which shows that $\lambda\neq\mu$ implies $v\perp w$ :

$\displaystyle\lambda<v,w>$	$\displaystyle=$	$\displaystyle<\lambda v,w>$
	$\displaystyle=$	$\displaystyle<Av,w>$
	$\displaystyle=$	$\displaystyle<v,A^{*}w>$
	$\displaystyle=$	$\displaystyle<v,\bar{\mu}w>$
	$\displaystyle=$	$\displaystyle\mu<v,w>$

In order to finish, it remains to prove that the eigenspaces of $A$ span the whole $\mathbb{C}^{N}$ . This is something that we have already seen for the self-adjoint matrices, and for unitaries, and we will use here these results, in order to deal with the general normal case. As a first observation, given an arbitrary matrix $A$ , the matrix $AA^{*}$ is self-adjoint:

(AA^{*})^{*}=AA^{*}

Thus, we can diagonalize this matrix $AA^{*}$ , as follows, with the passage matrix being a unitary, $V\in U_{N}$ , and with the diagonal form being real, $E\in M_{N}(\mathbb{R})$ :

AA^{*}=VEV^{*}

Now observe that, for matrices of type $A=UDU^{*}$ , which are those that we supposed to deal with, we have the following formulae:

V=U\quad,\quad E=D\bar{D}

In particular, the matrices $A$ and $AA^{*}$ have the same eigenspaces. So, this will be our idea, proving that the eigenspaces of $AA^{*}$ are eigenspaces of $A$ . In order to do so, let us pick two eigenvectors $v,w$ of the matrix $AA^{*}$ , corresponding to different eigenvalues, $\lambda\neq\mu$ . The eigenvalue equations are then as follows:

AA^{*}v=\lambda v\quad,\quad AA^{*}w=\mu w

We have the following computation, using the normality condition $AA^{*}=A^{*}A$ , and the fact that the eigenvalues of $AA^{*}$ , and in particular $\mu$ , are real:

$\displaystyle\lambda<Av,w>$	$\displaystyle=$	$\displaystyle<\lambda Av,w>$
	$\displaystyle=$	$\displaystyle<A\lambda v,w>$
	$\displaystyle=$	$\displaystyle<AAA^{*}v,w>$
	$\displaystyle=$	$\displaystyle<AA^{*}Av,w>$
	$\displaystyle=$	$\displaystyle<Av,AA^{*}w>$
	$\displaystyle=$	$\displaystyle<Av,\mu w>$
	$\displaystyle=$	$\displaystyle\mu<Av,w>$

We conclude that we have $<Av,w>=0$ . But this reformulates as follows:

\lambda\neq\mu\implies A(E_{\lambda})\perp E_{\mu}

Now since the eigenspaces of $AA^{*}$ are pairwise orthogonal, and span the whole $\mathbb{C}^{N}$ , we deduce from this that these eigenspaces are invariant under $A$ :

A(E_{\lambda})\subset E_{\lambda}

But with this result in hand, we can finish. Indeed, we can decompose the problem, and the matrix $A$ itself, following these eigenspaces of $AA^{*}$ , which in practice amounts in saying that we can assume that we only have 1 eigenspace. Now by rescaling, this is the same as assuming that we have $AA^{*}=1$ . But with this, we are now into the unitary case, that we know how to solve, as explained in Theorem 1.21, and so done. ∎

As a first application, we have the following result:

Theorem 1.23.

Given a matrix $A\in M_{N}(\mathbb{C})$ , we can construct a matrix $|A|$ as follows, by using the fact that $A^{*}A$ is diagonalizable, with positive eigenvalues:

|A|=\sqrt{A^{*}A}

This matrix $|A|$ is then positive, and its square is $|A|^{2}=A^{*}A$ . In the case $N=1$ , we obtain in this way the usual absolute value of the complex numbers.

Proof.

Consider indeed the matrix $A^{*}A$ , which is normal. According to Theorem 1.22, we can diagonalize this matrix as follows, with $U\in U_{N}$ , and with $D$ diagonal:

A=UDU^{*}

From $A^{*}A\geq 0$ we obtain $D\geq 0$ . But this means that the entries of $D$ are real, and positive. Thus we can extract the square root $\sqrt{D}$ , and then set:

\sqrt{A^{*}A}=U\sqrt{D}U^{*}

Thus, we are basically done. Indeed, if we call this latter matrix $|A|$ , then we are led to the conclusions in the statement. Finally, the last assertion is clear from definitions. ∎

We can now formulate a first polar decomposition result, as follows:

Theorem 1.24.

Any invertible matrix $A\in M_{N}(\mathbb{C})$ decomposes as

A=U|A|

with $U\in U_{N}$ , and with $|A|=\sqrt{A^{*}A}$ as above.

Proof.

This is routine, and follows by comparing the actions of $A,|A|$ on the vectors $v\in\mathbb{C}^{N}$ , and deducing from this the existence of a unitary $U\in U_{N}$ as above. We will be back to this, later on, directly in the case of the linear operators on Hilbert spaces. ∎

Observe that at $N=1$ we obtain in this way the usual polar decomposition of the nonzero complex numbers. More generally now, we have the following result:

Theorem 1.25.

Any square matrix $A\in M_{N}(\mathbb{C})$ decomposes as

A=U|A|

with $U$ being a partial isometry, and with $|A|=\sqrt{A^{*}A}$ as above.

Proof.

Again, this follows by comparing the actions of $A,|A|$ on the vectors $v\in\mathbb{C}^{N}$ , and deducing from this the existence of a partial isometry $U$ as above. Alternatively, we can get this from Theorem 1.24, applied on the complement of the 0-eigenvectors. ∎

This was for our basic presentation of linear algebra. There are of course many other things that can be said, but we will come back to some of them in what follows, directly in the case of the linear operators on the arbitrary Hilbert spaces.

1e. Exercises

Linear algebra is a wide topic, and there are countless interesting matrices, and exercises about them. As a continuation of our discussion about rotations, we have:

Exercise 1.26.

Prove that the symmetry and projection with respect to the $Ox$ axis rotated by an angle $t/2\in\mathbb{R}$ are given by the matrices

S_{t}=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix}

P_{t}=\frac{1}{2}\begin{pmatrix}1+\cos t&\sin t\\ \sin t&1-\cos t\end{pmatrix}

and then diagonalize these matrices, and if possible without computations.

Here the first part can only be clear on pictures, and by the way, prior to this, do not forget to verify as well that our formula of $R_{t}$ is the good one. As for the second part, just don’t go head-first into computations, there might be some geometry over there.

Exercise 1.27.

Prove that the isometries of $\mathbb{R}^{2}$ are rotations or symmetries,

R_{t}=\begin{pmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{pmatrix}\quad,\quad S_{t}=\begin{pmatrix}\cos t&\sin t\\ \sin t&-\cos t\end{pmatrix}

and then try as well to find a formula for the isometries of $\mathbb{R}^{3}$ .

Here for the first question you should look first at the determinant of such an isometry. As for the second question, this is something quite difficult. If you’re good at computers, you can look into the code of 3D games, the rotation formula is probably there.

Exercise 1.28.

Prove that the isometries of $\mathbb{C}^{2}$ of determinant $1$ are

U=\begin{pmatrix}a&b\\ -\bar{b}&\bar{a}\end{pmatrix}\quad,\quad|a|^{2}+|b|^{2}=1

then work out as well the general case, of arbitrary determinant.

As a comment here, if done with this exercise about $\mathbb{C}^{2}$ , but not yet with the previous one about $\mathbb{R}^{3}$ , you can go back to that exercise, by using a $\mathbb{C}^{2}\simeq\mathbb{R}^{4}$ trick. And in case this trick leads to tough computations and big headache, look it up.

Exercise 1.29.

Prove that the flat matrix, which is the all-one $N\times N$ matrix, diagonalizes over the complex numbers as follows,

\begin{pmatrix}1&\ldots&\ldots&1\\ \vdots&&&\vdots\\ \vdots&&&\vdots\\ 1&\ldots&\ldots&1\end{pmatrix}=\frac{1}{N}\,F_{N}\begin{pmatrix}N\\ &0\\ &&\ddots\\ &&&0\end{pmatrix}F_{N}^{*}

where $F_{N}=(w^{ij})_{ij}$ with $w=e^{2\pi i/N}$ is the Fourier matrix, with the convention that the indices are taken to be $i,j=0,1,\ldots,N-1$ .

This is something very instructive. Normally you have to look for eigenvectors for the flat matrix, and you are led in this way to the equation $x_{0}+\ldots+x_{N-1}=0$ . The problem however is that this equation, while looking very gentle, has no “canonical” solutions over the real numbers. Thus you are led to the complex numbers, and more specifically to the roots of unity, and their magic, leading to the above result. Enjoy.

Chapter 2 Linear operators

2a. Hilbert spaces

We discuss in what follows an extension of the linear algebra results from the previous chapter, obtained by looking at the linear operators $T:H\to H$ , with the space $H$ being no longer assumed to be finite dimensional. Our motivations come from quantum mechanics, and in order to get motivated, here is some suggested reading:

(1) Generally speaking, physics is best learned from Feynman [fey]. If you already know some, and want to learn quantum mechanics, go with Griffiths [gri]. And if you’re already a bit familiar with quantum mechanics, a good book is Weinberg [wei].

(2) A look at classics like Dirac [dir], von Neumann [vn4] or Weyl [wey] can be instructive too. On the opposite, you have as well modern, fancy books on quantum information, such as Bengtsson-Życzkowski [bzy], Nielsen-Chuang [nch] or Watrous [wat].

(3) In short, many ways of getting familiar with this big mess which is quantum mechanics, and as long as you stay away from books advertised as “rigorous”, “axiomatic”, “mathematical”, things fine. By the way, you can try as well my book [ba4].

Getting to work now, physics tells us to look at infinite dimensional complex spaces, such as the space of wave functions $\psi:\mathbb{R}^{3}\to\mathbb{C}$ of the electron. In order to do some mathematics on these spaces, we will need scalar products. So, let us start with:

Definition 2.1.

A scalar product on a complex vector space $H$ is a binary operation $H\times H\to\mathbb{C}$ , denoted $(x,y)\to<x,y>$ , satisfying the following conditions:

(1)

$<x,y>$ is linear in $x$ , and antilinear in $y$ .
(2)

$\overline{<x,y>}=<y,x>$ , for any $x,y$ .
(3)

$<x,x>>0$ , for any $x\neq 0$ .

As before in chapter 1, we use here mathematicians’ convention for scalar products, that is, $<\,,>$ linear at left, as opposed to physicists’ convention, $<\,,>$ linear at right. The reasons for this are quite subtle, coming from the fact that, while basic quantum mechanics looks better with $<\,,>$ linear at right, advanced quantum mechanics looks better with $<\,,>$ linear at left. Or at least that’s what my cats say.

As a basic example for Definition 2.1, we have the finite dimensional vector space $H=\mathbb{C}^{N}$ , with its usual scalar product, namely:

<x,y>=\sum_{i}x_{i}\bar{y}_{i}

There are many other examples, and notably various spaces of $L^{2}$ functions, which naturally appear in problems coming from physics. We will discuss them later on. In order to study now the scalar products, let us formulate the following definition:

Definition 2.2.

The norm of a vector $x\in H$ is the following quantity:

||x||=\sqrt{<x,x>}

We also call this number length of $x$ , or distance from $x$ to the origin.

The terminology comes from what happens in $\mathbb{C}^{N}$ , where the length of the vector, as defined above, coincides with the usual length, given by:

||x||=\sqrt{\sum_{i}|x_{i}|^{2}}

In analogy with what happens in finite dimensions, we have two important results regarding the norms. First we have the Cauchy-Schwarz inequality, as follows:

Theorem 2.3.

We have the Cauchy-Schwarz inequality

|<x,y>|\leq||x||\cdot||y||

and the equality case holds precisely when $x,y$ are proportional.

Proof.

This is something very standard. Consider indeed the following quantity, depending on a real variable $t\in\mathbb{R}$ , and on a variable on the unit circle, $w\in\mathbb{T}$ :

f(t)=||twx+y||^{2}

By developing $f$ , we see that this is a degree 2 polynomial in $t$ :

$\displaystyle f(t)$	$\displaystyle=$	$\displaystyle<twx+y,twx+y>$
	$\displaystyle=$	$\displaystyle t^{2}<x,x>+tw<x,y>+t\bar{w}<y,x>+<y,y>$
	$\displaystyle=$	$\displaystyle t^{2}\|\|x\|\|^{2}+2tRe(w<x,y>)+\|\|y\|\|^{2}$

Since $f$ is obviously positive, its discriminant must be negative:

4Re(w<x,y>)^{2}-4||x||^{2}\cdot||y||^{2}\leq 0

But this is equivalent to the following condition:

|Re(w<x,y>)|\leq||x||\cdot||y||

Now the point is that we can arrange for the number $w\in\mathbb{T}$ to be such that the quantity $w<x,y>$ is real. Thus, we obtain the following inequality:

|<x,y>|\leq||x||\cdot||y||

Finally, the study of the equality case is straightforward, by using the fact that the discriminant of $f$ vanishes precisely when we have a root. But this leads to the conclusion in the statement, namely that the vectors $x,y$ must be proportional. ∎

As a second main result now, we have the Minkowski inequality:

Theorem 2.4.

We have the Minkowski inequality

||x+y||\leq||x||+||y||

and the equality case holds precisely when $x,y$ are proportional.

Proof.

This follows indeed from the Cauchy-Schwarz inequality, as follows:

			$\displaystyle\|\|x+y\|\|\leq\|\|x\|\|+\|\|y\|\|$
		$\displaystyle\iff$	$\displaystyle\|\|x+y\|\|^{2}\leq(\|\|x\|\|+\|\|y\|\|)^{2}$
		$\displaystyle\iff$	$\displaystyle\|\|x\|\|^{2}+\|\|y\|\|^{2}+2Re<x,y>\leq\|\|x\|\|^{2}+\|\|y\|\|^{2}+2\|\|x\|\|\cdot\|\|y\|\|$
		$\displaystyle\iff$	$\displaystyle Re<x,y>\leq\|\|x\|\|\cdot\|\|y\|\|$

As for the equality case, this is clear from Cauchy-Schwarz as well. ∎

As a consequence of this, we have the following result:

Theorem 2.5.

The following function is a distance on $H$ ,

d(x,y)=||x-y||

in the usual sense, that of the abstract metric spaces.

Proof.

This follows indeed from the Minkowski inequality, which corresponds to the triangle inequality, the other two axioms for a distance being trivially satisfied. ∎

The above result is quite important, because it shows that we can do geometry and analysis in our present setting, with distances and angles, a bit as in the finite dimensional case. In order to do such abstract geometry, we will often need the following key result, which shows that everything can be recovered in terms of distances:

Proposition 2.6.

The scalar products can be recovered from distances, via the formula

4<x,y>=||x+y||^{2}-||x-y||^{2}+i||x+iy||^{2}-i||x-iy||^{2}

called complex polarization identity.

Proof.

This is something that we have already met in finite dimensions. In arbitrary dimensions the proof is similar, as follows:

			$\displaystyle\|\|x+y\|\|^{2}-\|\|x-y\|\|^{2}+i\|\|x+iy\|\|^{2}-i\|\|x-iy\|\|^{2}$
		$\displaystyle=$	$\displaystyle\|\|x\|\|^{2}+\|\|y\|\|^{2}-\|\|x\|\|^{2}-\|\|y\|\|^{2}+i\|\|x\|\|^{2}+i\|\|y\|\|^{2}-i\|\|x\|\|^{2}-i\|\|y\|\|^{2}$
			$\displaystyle+2Re(<x,y>)+2Re(<x,y>)+2iIm(<x,y>)+2iIm(<x,y>)$
		$\displaystyle=$	$\displaystyle 4<x,y>$

Thus, we are led to the conclusion in the statement. ∎

In order to do analysis on our spaces, we need the Cauchy sequences that we construct to converge. This is something which is automatic in finite dimensions, but in arbitrary dimensions, this can fail. It is convenient here to formulate a detailed new definition, as follows, which will be the starting point for our various considerations to follow:

Definition 2.7.

A Hilbert space is a complex vector space $H$ given with a scalar product $<x,y>$ , satisfying the following conditions:

(1)

$<x,y>$ is linear in $x$ , and antilinear in $y$ .
(2)

$\overline{<x,y>}=<y,x>$ , for any $x,y$ .
(3)

$<x,x>>0$ , for any $x\neq 0$ .
(4)

$H$ is complete with respect to the norm $||x||=\sqrt{<x,x>}$ .

In other words, we have taken here Definition 2.1 above, and added the condition that $H$ must be complete with respect to the norm $||x||=\sqrt{<x,x>}$ , that we know indeed to be a norm, according to the Minkowski inequality proved above. As a basic example, as before, we have the space $H=\mathbb{C}^{N}$ , with its usual scalar product, namely:

<x,y>=\sum_{i}x_{i}\bar{y}_{i}

More generally now, we have the following construction of Hilbert spaces:

Proposition 2.8.

The sequences of complex numbers $(x_{i})$ which are square-summable,

\sum_{i}|x_{i}|^{2}<\infty

form a Hilbert space $l^{2}(\mathbb{N})$ , with the following scalar product:

<x,y>=\sum_{i}x_{i}\bar{y}_{i}

In fact, given any index set $I$ , we can construct a Hilbert space $l^{2}(I)$ , in this way.

Proof.

There are several things to be proved, as follows:

(1) Our first claim is that $l^{2}(\mathbb{N})$ is a vector space. For this purpose, we must prove that $x,y\in l^{2}(\mathbb{N})$ implies $x+y\in l^{2}(\mathbb{N})$ . But this leads us into proving $||x+y||\leq||x||+||y||$ , where $||x||=\sqrt{<x,x>}$ . Now since we know this inequality to hold on each subspace $\mathbb{C}^{N}\subset l^{2}(\mathbb{N})$ obtained by truncating, this inequality holds everywhere, as desired.

(2) Our second claim is that $<\,,>$ is well-defined on $l^{2}(\mathbb{N})$ . But this follows from the Cauchy-Schwarz inequality, $|<x,y>|\leq||x||\cdot||y||$ , which can be established by truncating, a bit like we established the Minkowski inequality in (1) above.

(3) It is also clear that $<\,,>$ is a scalar product on $l^{2}(\mathbb{N})$ , so it remains to prove that $l^{2}(\mathbb{N})$ is complete with respect to $||x||=\sqrt{<x,x>}$ . But this is clear, because if we pick a Cauchy sequence $\{x^{n}\}_{n\in\mathbb{N}}\subset l^{2}(\mathbb{N})$ , then each numeric sequence $\{x^{n}_{i}\}_{i\in\mathbb{N}}\subset\mathbb{C}$ is Cauchy, and by setting $x_{i}=\lim_{n\to\infty}x^{n}_{i}$ , we have $x^{n}\to x$ inside $l^{2}(\mathbb{N})$ , as desired.

(4) Finally, the same arguments extend to the case of an arbitrary index set $I$ , leading to a Hilbert space $l^{2}(I)$ , and with the remark here that there is absolutely no problem of taking about quantities of type $||x||^{2}=\sum_{i\in I}|x_{i}|^{2}\in[0,\infty]$ , even if the index set $I$ is uncountable, because we are summing positive numbers. ∎

Even more generally, we have the following construction of Hilbert spaces:

Theorem 2.9.

Given a measured space $X$ , the functions $f:X\to\mathbb{C}$ , taken up to equality almost everywhere, which are square-summable,

\int_{X}|f(x)|^{2}dx<\infty

form a Hilbert space $L^{2}(X)$ , with the following scalar product:

<f,g>=\int_{X}f(x)\overline{g(x)}dx

In the case $X=I$ , with the counting measure, we obtain in this way the space $l^{2}(I)$ .

Proof.

This is a straightforward generalization of Proposition 2.8, with the arguments from the proof of Proposition 2.8 carrying over in our case, as follows:

(1) The first part, regarding Cauchy-Schwarz and Minkowski, extends without problems, by using this time approximation by step functions.

(2) Regarding the fact that $<\,,>$ is indeed a scalar product on $L^{2}(X)$ , there is a subtlety here, because if we want $<f,f>>0$ for $f\neq 0$ , we must declare that $f=0$ when $f=0$ almost everywhere, and so that $f=g$ when $f=g$ almost everywhere.

(3) Regarding the fact that $L^{2}(X)$ is complete with respect to $||f||=\sqrt{<f,f>}$ , this is again basic measure theory, by picking a Cauchy sequence $\{f_{n}\}_{n\in\mathbb{N}}\subset L^{2}(X)$ , and then constructing a pointwise, and hence $L^{2}$ limit, $f_{n}\to f$ , almost everywhere.

(4) Finally, the last assertion is clear, because the integration with respect to the counting measure is by definition a sum, and so $L^{2}(I)=l^{2}(I)$ in this case. ∎

Quite remarkably, any Hilbert space must be of the form $L^{2}(X)$ , and even of the particular form $l^{2}(I)$ . This follows indeed from the following key result:

Theorem 2.10.

Let $H$ be a Hilbert space.

(1)

Any algebraic basis of this space $\{f_{i}\}_{i\in I}$ can be turned into an orthonormal basis $\{e_{i}\}_{i\in I}$ , by using the Gram-Schmidt procedure.
(2)

Thus, $H$ has an orthonormal basis, and so we have $H\simeq l^{2}(I)$ , with $I$ being the indexing set for this orthonormal basis.

Proof.

All this is standard by Gram-Schmidt, the idea being as follows:

(1) First of all, in finite dimensions an orthonormal basis $\{e_{i}\}_{i\in I}$ is by definition a usual algebraic basis, satisfying $<e_{i},e_{j}>=\delta_{ij}$ . But the existence of such a basis follows by applying the Gram-Schmidt procedure to any algebraic basis $\{f_{i}\}_{i\in I}$ , as claimed.

(2) In infinite dimensions, a first issue comes from the fact that the standard basis $\{\delta_{i}\}_{i\in\mathbb{N}}$ of the space $l^{2}(\mathbb{N})$ is not an algebraic basis in the usual sense, with the finite linear combinations of the functions $\delta_{i}$ producing only a dense subspace of $l^{2}(\mathbb{N})$ , that of the functions having finite support. Thus, we must fine-tune our definition of “basis”.

(3) But this can be done in two ways, by saying that $\{f_{i}\}_{i\in I}$ is a basis of $H$ when the functions $f_{i}$ are linearly independent, and when either the finite linear combinations of these functions $f_{i}$ form a dense subspace of $H$ , or the linear combinations with $l^{2}(I)$ coefficients of these functions $f_{i}$ form the whole $H$ . For orthogonal bases $\{e_{i}\}_{i\in I}$ these definitions are equivalent, and in any case, our statement makes now sense.

(4) Regarding now the proof, in infinite dimensions, this follows again from Gram-Schmidt, exactly as in the finite dimensional case, but by using this time a tool from logic, called Zorn lemma, in order to correctly do the recurrence. ∎

The above result, and its relation with Theorem 2.9, is something quite subtle, so let us further get into this. First, we have the following definition, based on the above:

Definition 2.11.

A Hilbert space $H$ is called separable when the following equivalent conditions are satisfied:

(1)

$H$ has a countable algebraic basis $\{f_{i}\}_{i\in\mathbb{N}}$ .
(2)

$H$ has a countable orthonormal basis $\{e_{i}\}_{i\in\mathbb{N}}$ .
(3)

We have $H\simeq l^{2}(\mathbb{N})$ , isomorphism of Hilbert spaces.

In what follows we will be mainly interested in the separable Hilbert spaces, where most of the questions coming from quantum physics take place. In view of the above, the following philosophical question appears: why not simply talking about $l^{2}(\mathbb{N})$ ?

In answer to this, we cannot really do so, because many of the separable spaces that we are interested in appear as spaces of functions, and such spaces do not necessarily have a very simple or explicit orthonormal basis, as shown by the following result:

Proposition 2.12.

The Hilbert space $H=L^{2}[0,1]$ is separable, having as orthonormal basis the orthonormalized version of the algebraic basis $f_{n}=x^{n}$ with $n\in\mathbb{N}$ .

Proof.

This follows from the Weierstrass theorem, which provides us with the basis $f_{n}=x^{n}$ , which can be orthogonalized by using the Gram-Schmidt procedure, as explained in Theorem 2.10. Working out the details here is actually an excellent exercise. ∎

As a conclusion to all this, we are interested in 1 space, namely the unique separable Hilbert space $H$ , but due to various technical reasons, it is often better to forget that we have $H=l^{2}(\mathbb{N})$ , and say instead that we have $H=L^{2}(X)$ , with $X$ being a separable measured space, or simply say that $H$ is an abstract separable Hilbert space.

2b. Linear operators

Let us get now into the study of linear operators $T:H\to H$ . Before anything, we should mention that things are quite tricky with respect to quantum mechanics, and physics in general. Indeed, if there is a central operator in physics, this is the Laplace operator on the smooth functions $f:\mathbb{R}^{N}\to\mathbb{C}$ , given by:

\Delta f(x)=\sum_{i}\frac{d^{2}f}{dx_{i}^{2}}

And the problem is that what we have here is an operator $\Delta:C^{\infty}(\mathbb{R}^{N})\to C^{\infty}(\mathbb{R}^{N})$ , which does not extend into an operator $\Delta:L^{2}(\mathbb{R}^{N})\to L^{2}(\mathbb{R}^{N})$ . Thus, we should perhaps look at operators $T:H\to H$ which are densely defined, instead of looking at operators $T:H\to H$ which are everywhere defined. We will not do so, for two reasons:

(1) Tactical retreat. When physics looks too complicated, as it is the case now, you can always declare that mathematics comes first. So, let us be pure mathematicians, simply looking in generalizing linear algebra to infinite dimensions. And from this viewpoint, it is a no-brainer to look at everywhere defined operators $T:H\to H$ .

(2) Modern physics. We will see later, towards the middle of the present book, when talking about various mathematical physics findings of Connes, Jones, Voiculescu and others, that a lot of interesting mathematics, which is definitely related to modern physics, can be developed by using the everywhere defined operators $T:H\to H$ .

In short, you’ll have to trust me here. And hang on, we are not done yet, because with this choice made, there is one more problem, mathematical this time. The problem comes from the fact that in infinite dimensions the everywhere defined operators $T:H\to H$ can be bounded or not, and for reasons which are mathematically intuitive and obvious, and physically acceptable too, we want to deal with the bounded case only.

Long story short, let us avoid too much thinking, and start in a simple way, with:

Proposition 2.13.

For a linear operator $T:H\to H$ , the following are equivalent:

(1)

$T$ is continuous.
(2)

$T$ is continuous at $0$ .
(3)

$T(B)\subset cB$ for some $c<\infty$ , where $B\subset H$ is the unit ball.
(4)

$T$ is bounded, in the sense that $||T||=\sup_{||x||\leq 1}||Tx||$ satisfies $||T||<\infty$ .

Proof.

This is elementary, with $(1)\iff(2)$ coming from the linearity of $T$ , then $(2)\iff(3)$ coming from definitions, and finally $(3)\iff(4)$ coming from the fact that the number $||T||$ from (4) is the infimum of the numbers $c$ making (3) work. ∎

Regarding such operators, we have the following result:

Theorem 2.14.

The linear operators $T:H\to H$ which are bounded,

||T||=\sup_{||x||\leq 1}||Tx||<\infty

form a complex algebra with unit $B(H)$ , having the property

||ST||\leq||S||\cdot||T||

and which is complete with respect to the norm.

Proof.

The fact that we have indeed an algebra, satisfying the product condition in the statement, follows from the following estimates, which are all elementary:

||S+T||\leq||S||+||T||

||\lambda T||=|\lambda|\cdot||T||

||ST||\leq||S||\cdot||T||

Regarding now the last assertion, if $\{T_{n}\}\subset B(H)$ is Cauchy then $\{T_{n}x\}$ is Cauchy for any $x\in H$ , so we can define the limit $T=\lim_{n\to\infty}T_{n}$ by setting:

Tx=\lim_{n\to\infty}T_{n}x

Let us first check that the application $x\to Tx$ is linear. We have:

$\displaystyle T(x+y)$	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}T_{n}(x+y)$
	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}T_{n}(x)+T_{n}(y)$
	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}T_{n}(x)+\lim_{n\to\infty}T_{n}(y)$
	$\displaystyle=$	$\displaystyle T(x)+T(y)$

Similarly, we have as well the following computation:

$\displaystyle T(\lambda x)$	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}T_{n}(\lambda x)$
	$\displaystyle=$	$\displaystyle\lambda\lim_{n\to\infty}T_{n}(x)$
	$\displaystyle=$	$\displaystyle\lambda T(x)$

Thus we have a linear map $T:A\to A$ . It remains to prove that we have $T\in B(H)$ , and that we have $T_{n}\to T$ in norm. For this purpose, observe that we have:

			$\displaystyle\|\|T_{n}-T_{m}\|\|\leq\varepsilon\ ,\ \forall n,m\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{n}x-T_{m}x\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1\ ,\ \forall n,m\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{n}x-Tx\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1\ ,\ \forall n\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{N}x-Tx\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1$
		$\displaystyle\implies$	$\displaystyle\|\|T_{N}-T\|\|\leq\varepsilon$

As a first consequence, we obtain $T\in B(H)$ , because we have:

$\displaystyle\|\|T\|\|$	$\displaystyle=$	$\displaystyle\|\|T_{N}+(T-T_{N})\|\|$
	$\displaystyle\leq$	$\displaystyle\|\|T_{N}\|\|+\|\|T-T_{N}\|\|$
	$\displaystyle\leq$	$\displaystyle\|\|T_{N}\|\|+\varepsilon$
	$\displaystyle<$	$\displaystyle\infty$

As a second consequence, we obtain $T_{N}\to T$ in norm, and we are done. ∎

In the case where $H$ comes with a basis $\{e_{i}\}_{i\in I}$ , we can talk about the infinite matrices $M\in M_{I}(\mathbb{C})$ , with the remark that the multiplication of such matrices is not always defined, in the case $|I|=\infty$ . In this context, we have the following result:

Theorem 2.15.

Let $H$ be a Hilbert space, with orthonormal basis $\{e_{i}\}_{i\in I}$ . The bounded operators $T\in B(H)$ can be then identified with matrices $M\in M_{I}(\mathbb{C})$ via

Tx=Mx\quad,\quad M_{ij}=<Te_{j},e_{i}>

and we obtain in this way an embedding as follows, which is multiplicative:

B(H)\subset M_{I}(\mathbb{C})

In the case $H=\mathbb{C}^{N}$ we obtain in this way the usual isomorphism $B(H)\simeq M_{N}(\mathbb{C})$ . In the separable case we obtain in this way a proper embedding $B(H)\subset M_{\infty}(\mathbb{C})$ .

Proof.

We have several assertions to be proved, the idea being as follows:

(1) Regarding the first assertion, given a bounded operator $T:H\to H$ , let us associate to it a matrix $M\in M_{I}(\mathbb{C})$ as in the statement, by the following formula:

M_{ij}=<Te_{j},e_{i}>

It is clear that this correspondence $T\to M$ is linear, and also that its kernel is $\{0\}$ . Thus, we have an embedding of linear spaces $B(H)\subset M_{I}(\mathbb{C})$ .

(2) Our claim now is that this embedding is multiplicative. But this is clear too, because if we denote by $T\to M_{T}$ our correspondence, we have:

$\displaystyle(M_{ST})_{ij}$	$\displaystyle=$	$\displaystyle<STe_{j},e_{i}>$
	$\displaystyle=$	$\displaystyle\left<S\sum_{k}<Te_{j},e_{k}>e_{k},e_{i}\right>$
	$\displaystyle=$	$\displaystyle\sum_{k}<Se_{k},e_{i}><Te_{j},e_{k}>$
	$\displaystyle=$	$\displaystyle\sum_{k}(M_{S})_{ik}(M_{T})_{kj}$
	$\displaystyle=$	$\displaystyle(M_{S}M_{T})_{ij}$

(3) Finally, we must prove that the original operator $T:H\to H$ can be recovered from its matrix $M\in M_{I}(\mathbb{C})$ via the formula in the statement, namely $Tx=Mx$ . But this latter formula holds for the vectors of the basis, $x=e_{j}$ , because we have:

$\displaystyle(Te_{j})_{i}$	$\displaystyle=$	$\displaystyle<Te_{j},e_{i}>$
	$\displaystyle=$	$\displaystyle M_{ij}$
	$\displaystyle=$	$\displaystyle(Me_{j})_{i}$

Now by linearity we obtain from this that the formula $Tx=Mx$ holds everywhere, on any vector $x\in H$ , and this finishes the proof of the first assertion.

(4) In finite dimensions we obtain an isomorphism, because any matrix $M\in M_{N}(\mathbb{C})$ determines an operator $T:\mathbb{C}^{N}\to\mathbb{C}^{N}$ , according to the formula $<Te_{j},e_{i}>=M_{ij}$ . In infinite dimensions, however, we do not have an isomorphism. For instance on $H=l^{2}(\mathbb{N})$ the following matrix does not define an operator:

M=\begin{pmatrix}1&1&\ldots\\ 1&1&\ldots\\ \vdots&\vdots\end{pmatrix}

Indeed, $T(e_{1})$ should be the all-one vector, which is not square-summable. ∎

In connection with our previous comments on bases, the above result is something quite theoretical, because for basic Hilbert spaces like $L^{2}[0,1]$ , which do not have a simple orthonormal basis, the embedding $B(H)\subset M_{\infty}(\mathbb{C})$ that we obtain is not something very useful. In short, while the bounded operators $T:H\to H$ are basically some infinite matrices, it is better to think of these operators as being objects on their own.

As another comment, the construction $T\to M$ makes sense for any linear operator $T:H\to H$ , but when $\dim H=\infty$ , we do not obtain an embedding $\mathcal{L}(H)\subset M_{I}(\mathbb{C})$ in this way. Indeed, set $H=l^{2}(\mathbb{N})$ , let $E=span(e_{i})$ be the linear space spanned by the standard basis, and pick an algebraic complement $F$ of this space $E$ , so that we have $H=E\oplus F$ , as an algebraic direct sum. Then any linear operator $S:F\to F$ gives rise to a linear operator $T:H\to H$ , given by $T(e,f)=(0,S(f))$ , whose associated matrix is $0$ . And, restrospectively speaking, it is in order to avoid such pathologies that we decided some time ago to restrict the attention to the bounded case, $T\in B(H)$ .

As in the finite dimensional case, we can talk about adjoint operators, in this setting, the definition and main properties of the construction $T\to T^{*}$ being as follows:

Theorem 2.16.

Given a bounded operator $T\in B(H)$ , the following formula defines a bounded operator $T^{*}\in B(H)$ , called adjoint of $H$ :

<Tx,y>=<x,T^{*}y>

The correspondence $T\to T^{*}$ is antilinear, antimultiplicative, and is an involution, and an isometry. In finite dimensions, we recover the usual adjoint operator.

Proof.

There are several things to be done here, the idea being as follows:

(1) We will need a standard functional analysis result, stating that the continuous linear forms $\varphi:H\to\mathbb{C}$ appear as scalar products, as follows, with $z\in H$ :

\varphi(x)=<x,z>

Indeed, in one sense this is clear, because given $z\in H$ , the application $\varphi(x)=<x,z>$ is linear, and continuous as well, because by Cauchy-Schwarz we have:

|\varphi(x)|\leq||x||\cdot||z||

Conversely now, by using a basis we can assume $H=l^{2}(\mathbb{N})$ , and our linear form $\varphi:H\to\mathbb{C}$ must be then, by linearity, given by a formula of the following type:

\varphi(x)=\sum_{i}x_{i}\bar{z}_{i}

But, again by Cauchy-Schwarz, in order for such a formula to define indeed a continuous linear form $\varphi:H\to\mathbb{C}$ we must have $z\in l^{2}(\mathbb{N})$ , and so $z\in H$ , as desired.

(2) With this in hand, we can now construct the adjoint $T^{*}$ , by the formula in the statement. Indeed, given $y\in H$ , the formula $\varphi(x)=<Tx,y>$ defines a linear map $H\to\mathbb{C}$ . Thus, we must have a formula as follows, for a certain vector $T^{*}y\in H$ :

\varphi(x)=<x,T^{*}y>

Moreover, this vector $T^{*}y\in H$ is unique with this property, and we conclude from this that the formula $y\to T^{*}y$ defines a certain map $T^{*}:H\to H$ , which is unique with the property in the statement, namely $<Tx,y>=<x,T^{*}y>$ for any $x,y$ .

(3) Let us prove that we have $T^{*}\in B(H)$ . By using once again the uniqueness of $T^{*}$ , we conclude that we have the following formulae, which show that $T^{*}$ is linear:

T^{*}(x+y)=T^{*}x+T^{*}y\quad,\quad T^{*}(\lambda x)=\lambda T^{*}x

Observe also that $T^{*}$ is bounded as well, because we have:

$\displaystyle\|\|T\|\|$	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\sup_{\|\|y\|\|=1}<Tx,y>$
	$\displaystyle=$	$\displaystyle\sup_{\|\|y\|\|=1}\sup_{\|\|x\|\|=1}<x,T^{*}y>$
	$\displaystyle=$	$\displaystyle\|\|T^{*}\|\|$

(4) The fact that the correspondence $T\to T^{*}$ is antilinear, antimultiplicative, and is an involution comes from the following formulae, coming from uniqueness:

(S+T)^{*}=S^{*}+T^{*}\quad,\quad(\lambda T)^{*}=\bar{\lambda}T^{*}

(ST)^{*}=T^{*}S^{*}\quad,\quad(T^{*})^{*}=T

As for the isometry property with respect to the operator norm, $||T||=||T^{*}||$ , this is something that we already know, from the proof of (3) above.

(5) Regarding finite dimensions, let us first examine the general case where our Hilbert space comes with a basis, $H=l^{2}(I)$ . We can compute the matrix $M^{*}\in M_{I}(\mathbb{C})$ associated to the operator $T^{*}\in B(H)$ , by using $<Tx,y>=<x,T^{*}y>$ , in the following way:

$\displaystyle(M^{*})_{ij}$	$\displaystyle=$	$\displaystyle<T^{*}e_{j},e_{i}>$
	$\displaystyle=$	$\displaystyle\overline{<e_{i},T^{*}e_{j}>}$
	$\displaystyle=$	$\displaystyle\overline{<Te_{i},e_{j}>}$
	$\displaystyle=$	$\displaystyle\overline{M}_{ji}$

Thus, we have reached to the usual formula for the adjoints of matrices, and in the particular case $H=\mathbb{C}^{N}$ , we conclude that $T^{*}$ comes indeed from the usual $M^{*}$ . ∎

As in finite dimensions, the operators $T,T^{*}$ can be thought of as being “twin brothers”, and there is a lot of interesting mathematics connecting them. We first have:

Proposition 2.17.

Given a bounded operator $T\in B(H)$ , the following happen:

(1)

$\ker T^{*}=(ImT)^{\perp}$ .
(2)

$\overline{ImT^{*}}=(\ker T)^{\perp}$ .

Proof.

Both these assertions are elementary, as follows:

(1) Let us first prove “ $\subset$ ”. Assuming $T^{*}x=0$ , we have indeed $x\perp ImT$ , because:

<x,Ty>=<T^{*}x,y>=0

As for “ $\supset$ ”, assuming $<x,Ty>=0$ for any $y$ , we have $T^{*}x=0$ , because:

<T^{*}x,y>=<x,Ty>=0

(2) This can be deduced from (1), applied to the operator $T^{*}$ , as follows:

(\ker T)^{\perp}=(ImT^{*})^{\perp\perp}=\overline{ImT^{*}}

Here we have used the formula $K^{\perp\perp}=\bar{K}$ , valid for any linear subspace $K\subset H$ of a Hilbert space, which for $K$ closed reads $K^{\perp\perp}=K$ , and comes from $H=K\oplus K^{\perp}$ , and which in general follows from $K^{\perp\perp}\subset\bar{K}^{\perp\perp}=\bar{K}$ , the reverse inclusion being clear. ∎

Let us record as well the following useful formula, relating $T$ and $T^{*}$ :

Theorem 2.18.

We have the following formula,

||TT^{*}||=||T||^{2}

valid for any operator $T\in B(H)$ .

Proof.

We recall from Theorem 2.16 that the correspondence $T\to T^{*}$ is an isometry with respect to the operator norm, in the sense that we have:

||T||=||T^{*}||

In order to prove now the formula in the statement, observe first that we have:

||TT^{*}||\leq||T||\cdot||T^{*}||=||T||^{2}

On the other hand, we have as well the following estimate:

$\displaystyle\|\|T\|\|^{2}$	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\|<Tx,Tx>\|$
	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\|<x,T^{*}Tx>\|$
	$\displaystyle\leq$	$\displaystyle\|\|T^{*}T\|\|$

By replacing $T\to T^{*}$ we obtain from this that we have:

||T||^{2}\leq||TT^{*}||

Thus, we have obtained the needed inequality, and we are done. ∎

2c. Unitaries, projections

Let us discuss now some explicit examples of operators, in analogy with what happens in finite dimensions. The most basic examples of linear transformations are the rotations, symmetries and projections. Then, we have certain remarkable classes of linear transformations, such as the positive, self-adjoint and normal ones. In what follows we will develop the basic theory of such transformations, in the present Hilbert space setting.

Let us begin with the rotations. The situation here is quite tricky in arbitrary dimensions, and we have several notions instead of one. We first have the following result:

Theorem 2.19.

For a linear operator $U\in B(H)$ the following conditions are equivalent, and if they are satisfied, we say that $U$ is an isometry:

(1)

$U$ is a metric space isometry, $d(Ux,Uy)=d(x,y)$ .
(2)

$U$ is a normed space isometry, $||Ux||=||x||$ .
(3)

$U$ preserves the scalar product, $<Ux,Uy>=<x,y>$ .
(4)

$U$ satisfies the isometry condition $U^{*}U=1$ .

In finite dimensions, we recover in this way the usual unitary transformations.

Proof.

The proofs are similar to those in finite dimensions, as follows:

$(1)\iff(2)$ This follows indeed from the formula of the distances, namely:

d(x,y)=||x-y||

$(2)\iff(3)$ This is again standard, because we can pass from scalar products to distances, and vice versa, by using $||x||=\sqrt{<x,x>}$ , and the polarization formula.

$(3)\iff(4)$ We have indeed the following equivalences, by using the standard formula $<Tx,y>=<x,T^{*}y>$ , which defines the adjoint operator:

$\displaystyle<Ux,Uy>=<x,y>$	$\displaystyle\iff$	$\displaystyle<x,U^{*}Uy>=<x,y>$
	$\displaystyle\iff$	$\displaystyle U^{*}Uy=y$
	$\displaystyle\iff$	$\displaystyle U^{*}U=1$

Thus, we are led to the conclusions in the statement. ∎

The point now is that the condition $U^{*}U=1$ does not imply in general $UU^{*}=1$ , the simplest counterexample here being the shift operator on $l^{2}(\mathbb{N})$ :

Proposition 2.20.

The shift operator on the space $l^{2}(\mathbb{N})$ , given by

S(e_{i})=e_{i+1}

is an isometry, $S^{*}S=1$ . However, we have $SS^{*}\neq 1$ .

Proof.

The adjoint of the shift is given by the following formula:

S^{*}(e_{i})=\begin{cases}e_{i-1}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

When composing $S,S^{*}$ , in one sense we obtain the following formula:

S^{*}S(e_{i})=e_{i}

In other other sense now, we obtain the following formula:

SS^{*}(e_{i})=\begin{cases}e_{i}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

Summarizing, the compositions are given by the following formulae:

S^{*}S=1\quad,\quad SS^{*}=Proj(e_{0}^{\perp})

Thus, we are led to the conclusions in the statement. ∎

As a conclusion, the notion of isometry is not the correct infinite dimensional analogue of the notion of unitary, and the unitary operators must be introduced as follows:

Theorem 2.21.

For a linear operator $U\in B(H)$ the following conditions are equivalent, and if they are satisfied, we say that $U$ is a unitary:

(1)

$U$ is an isometry, which is invertible.
(2)

$U$ , $U^{-1}$ are both isometries.
(3)

$U$ , $U^{*}$ are both isometries.
(4)

$UU^{*}=U^{*}U=1$ .
(5)

$U^{*}=U^{-1}$ .

Moreover, the unitary operators from a group $U(H)\subset B(H)$ .

Proof.

There are several statements here, the idea being as follows:

(1) The various equivalences in the statement are all clear from definitions, and from Theorem 2.19 in what regards the various possible notions of isometries which can be used, by using the formula $(ST)^{*}=T^{*}S^{*}$ for the adjoints of the products of operators.

(2) The fact that the products and inverses of unitaries are unitaries is also clear, and we conclude that the unitary operators from a group $U(H)\subset B(H)$ , as stated. ∎

Let us discuss now the projections. Modulo the fact that all the subspaces $K\subset H$ where these projections project must be assumed to be closed, in the present setting, here the result is perfectly similar to the one in finite dimensions, as follows:

Theorem 2.22.

For a linear operator $P\in B(H)$ the following conditions are equivalent, and if they are satisfied, we say that $P$ is a projection:

(1)

$P$ is the orthogonal projection on a closed subspace $K\subset H$ .
(2)

$P$ satisfies the projection equations $P^{2}=P^{*}=P$ .

Proof.

As in finite dimensions, $P$ is an abstract projection, not necessarily orthogonal, when it is an idempotent, algebrically speaking, in the sense that we have:

P^{2}=P

The point now is that this projection is orthogonal when:

$\displaystyle<Px-x,Py>=0$	$\displaystyle\iff$	$\displaystyle<P^{}Px-P^{}x,y>=0$
	$\displaystyle\iff$	$\displaystyle P^{}Px-P^{}x=0$
	$\displaystyle\iff$	$\displaystyle P^{}P-P^{}=0$
	$\displaystyle\iff$	$\displaystyle P^{}P=P^{}$

Now observe that by conjugating, we obtain $P^{*}P=P$ . Thus, we must have $P=P^{*}$ , and so we have shown that any orthogonal projection must satisfy, as claimed:

P^{2}=P^{*}=P

Conversely, if this condition is satisfied, $P^{2}=P$ shows that $P$ is a projection, and $P=P^{*}$ shows via the above computation that $P$ is indeed orthogonal. ∎

There is a relation between the projections and the general isometries, such as the shift $S$ that we met before, and we have the following result:

Proposition 2.23.

Given an isometry $U\in B(H)$ , the operator

P=UU^{*}

is a projection, namely the orthogonal projection on $Im(U)$ .

Proof.

Assume indeed that we have an isometry, $U^{*}U=1$ . The fact that $P=UU^{*}$ is indeed a projection can be checked abstractly, as follows:

(UU^{*})^{*}=UU^{*}

UU^{*}UU^{*}=UU^{*}

As for the last assertion, this is something that we already met, for the shift, and the situation in general is similar, with the result itself being clear. ∎

More generally now, along the same lines, and clarifying the whole situation with the unitaries and isometries, we have the following result:

Theorem 2.24.

An operator $U\in B(H)$ is a partial isometry, in the usual geometric sense, when the following two operators are projections:

P=UU^{*}\quad,\quad Q=U^{*}U

Moreover, the isometries, adjoints of isometries and unitaries are respectively characterized by the conditions $Q=1$ , $P=1$ , $P=Q=1$ .

Proof.

The first assertion is a straightforward extension of Proposition 2.23, and the second assertion follows from various results regarding isometries established above. ∎

It is possible to talk as well about symmetries, in the following way:

Definition 2.25.

An operator $S\in B(H)$ is called a symmetry when $S^{2}=1$ , and a unitary symmetry when one of the following equivalent conditions is satisfied:

(1)

$S$ is a unitary, $S^{*}=S^{-1}$ , and a symmetry as well, $S^{2}=1$ .
(2)

$S$ satisfies the equations $S=S^{*}=S^{-1}$ .

Here the terminology is a bit non-standard, because even in finite dimensions, $S^{2}=1$ is not exactly what you would require for a “true” symmetry, as shown by the following transformation, which is a symmetry in our sense, but not a unitary symmetry:

\begin{pmatrix}0&2\\ 1/2&0\end{pmatrix}\binom{x}{y}=\binom{2y}{x/2}

Let us study now some larger classes of operators, which are of particular importance, namely the self-adjoint, positive and normal ones. We first have:

Theorem 2.26.

For an operator $T\in B(H)$ , the following conditions are equivalent, and if they are satisfied, we call $T$ self-adjoint:

(1)

$T=T^{*}$ .
(2)

$<Tx,x>\in\mathbb{R}$ .

In finite dimensions, we recover in this way the usual self-adjointness notion.

Proof.

There are several assertions here, the idea being as follows:

$(1)\implies(2)$ This is clear, because we have:

$\displaystyle\overline{<Tx,x>}$	$\displaystyle=$	$\displaystyle<x,Tx>$
	$\displaystyle=$	$\displaystyle<T^{*}x,x>$
	$\displaystyle=$	$\displaystyle<Tx,x>$

$(2)\implies(1)$ In order to prove this, observe that the beginning of the above computation shows that, when assuming $<Tx,x>\in\mathbb{R}$ , the following happens:

<Tx,x>=<T^{*}x,x>

Thus, in terms of the operator $S=T-T^{*}$ , we have:

<Sx,x>=0

In order to finish, we use a polarization trick. We have the following formula:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

Since the first 3 terms vanish, the sum of the 2 last terms vanishes too. But, by using $S^{*}=-S$ , coming from $S=T-T^{*}$ , we can process this latter vanishing as follows:

$\displaystyle<Sx,y>$	$\displaystyle=$	$\displaystyle-<Sy,x>$
	$\displaystyle=$	$\displaystyle<y,Sx>$
	$\displaystyle=$	$\displaystyle\overline{<Sx,y>}$

Thus we must have $<Sx,y>\in\mathbb{R}$ , and with $y\to iy$ we obtain $<Sx,y>\in i\mathbb{R}$ too, and so $<Sx,y>=0$ . Thus $S=0$ , which gives $T=T^{*}$ , as desired.

(3) Finally, in what regards the finite dimensions, or more generally the case where our Hilbert space comes with a basis, $H=l^{2}(I)$ , here the condition $T=T^{*}$ corresponds to the usual self-adjointness condition $M=M^{*}$ at the level of the associated matrices. ∎

At the level of the basic examples, the situation is as follows:

Proposition 2.27.

The folowing operators are self-adjoint:

(1)

The projections, $P^{2}=P^{*}=P$ . In fact, an abstract, algebraic projection is an orthogonal projection precisely when it is self-adjoint.
(2)

The unitary symmetries, $S=S^{*}=S^{-1}$ . In fact, a unitary is a unitary symmetry precisely when it is self-adjoint.

Proof.

These assertions are indeed all clear from definitions. ∎

Next in line, we have the notion of positive operator. We have here:

Theorem 2.28.

The positive operators, which are the operators $T\in B(H)$ satisfying $<Tx,x>\geq 0$ , have the following properties:

(1)

They are self-adjoint, $T=T^{*}$ .
(2)

As examples, we have the projections, $P^{2}=P^{*}=P$ .
(3)

More generally, $T=S^{*}S$ is positive, for any $S\in B(H)$ .
(4)

In finite dimensions, we recover the usual positive operators.

Proof.

All these assertions are elementary, the idea being as follows:

(1) This follows from Theorem 2.26, because $<Tx,x>\geq 0$ implies $<Tx,x>\in\mathbb{R}$ .

(2) This is clear from $P^{2}=P=P^{*}$ , because we have:

$\displaystyle<Px,x>$	$\displaystyle=$	$\displaystyle<P^{2}x,x>$
	$\displaystyle=$	$\displaystyle<Px,Px>$
	$\displaystyle=$	$\displaystyle\|\|Px\|\|^{2}$

(3) This follows from a similar computation, namely:

<S^{*}Sx,x>=<Sx,Sx>=||Sx||^{2}

(4) This is well-known, the idea being that the condition $<Tx,x>\geq 0$ corresponds to the usual positivity condition $A\geq 0$ , at the level of the associated matrix. ∎

It is possible to talk as well about strictly positive operators, and we have here:

Theorem 2.29.

The strictly positive operators, which are the operators $T\in B(H)$ satisfying $<Tx,x>>0$ , for any $x\neq 0$ , have the following properties:

(1)

They are self-adjoint, $T=T^{*}$ .
(2)

As examples, $T=S^{*}S$ is positive, for any $S\in B(H)$ injective.
(3)

In finite dimensions, we recover the usual strictly positive operators.

Proof.

As before, all these assertions are elementary, the idea being as follows:

(1) This is something that we know, from Theorem 2.28.

(2) This follows from the injectivity of $S$ , because for any $x\neq 0$ we have:

$\displaystyle<S^{*}Sx,x>$	$\displaystyle=$	$\displaystyle<Sx,Sx>$
	$\displaystyle=$	$\displaystyle\|\|Sx\|\|^{2}$
	$\displaystyle>$	$\displaystyle 0$

(3) This is well-known, the idea being that the condition $<Tx,x>>0$ corresponds to the usual strict positivity condition $A>0$ , at the level of the associated matrix. ∎

As a comment, while any strictly positive matrix $A>0$ is well-known to be invertible, the analogue of this fact does not hold in infinite dimensions, a counterexample here being the following operator on $l^{2}(\mathbb{N})$ , which is strictly positive but not invertible:

T=\begin{pmatrix}1\\ &\frac{1}{2}\\ &&\frac{1}{3}\\ &&&\ddots\end{pmatrix}

As a last remarkable class of operators, we have the normal ones. We have here:

Theorem 2.30.

For an operator $T\in B(H)$ , the following conditions are equivalent, and if they are satisfied, we call $T$ normal:

(1)

$TT^{*}=T^{*}T$ .
(2)

$||Tx||=||T^{*}x||$ .

In finite dimensions, we recover in this way the usual normality notion.

Proof.

There are several assertions here, the idea being as follows:

$(1)\implies(2)$ This is clear, due to the following computation:

$\displaystyle\|\|Tx\|\|^{2}$	$\displaystyle=$	$\displaystyle<Tx,Tx>$
	$\displaystyle=$	$\displaystyle<T^{*}Tx,x>$
	$\displaystyle=$	$\displaystyle<TT^{*}x,x>$
	$\displaystyle=$	$\displaystyle<T^{}x,T^{}x>$
	$\displaystyle=$	$\displaystyle\|\|T^{*}x\|\|^{2}$

$(2)\implies(1)$ This is clear as well, because the above computation shows that, when assuming $||Tx||=||T^{*}x||$ , the following happens:

<TT^{*}x,x>=<T^{*}Tx,x>

Thus, in terms of the operator $S=TT^{*}-T^{*}T$ , we have:

<Sx,x>=0

In order to finish, we use a polarization trick. We have the following formula:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

Since the first 3 terms vanish, the sum of the 2 last terms vanishes too. But, by using $S=S^{*}$ , coming from $S=TT^{*}-T^{*}T$ , we can process this latter vanishing as follows:

$\displaystyle<Sx,y>$	$\displaystyle=$	$\displaystyle-<Sy,x>$
	$\displaystyle=$	$\displaystyle-<y,Sx>$
	$\displaystyle=$	$\displaystyle-\overline{<Sx,y>}$

Thus we must have $<Sx,y>\in i\mathbb{R}$ , and with $y\to iy$ we obtain $<Sx,y>\in\mathbb{R}$ too, and so $<Sx,y>=0$ . Thus $S=0$ , which gives $TT^{*}=T^{*}T$ , as desired.

(3) Finally, in what regards finite dimensions, or more generally the case where our Hilbert space comes with a basis, $H=l^{2}(I)$ , here the condition $TT^{*}=T^{*}T$ corresponds to the usual normality condition $MM^{*}=M^{*}M$ at the level of the associated matrices. ∎

Observe that the normal operators generalize both the self-adjoint operators, and the unitaries. We will be back to such operators, on many occassions, in what follows.

2d. Diagonal operators

Let us work out now what happens in the case that we are mostly interested in, namely $H=L^{2}(X)$ , with $X$ being a measured space. We first have:

Theorem 2.31.

Given a measured space $X$ , consider the Hilbert space $H=L^{2}(X)$ . Associated to any function $f\in L^{\infty}(X)$ is then the multiplication operator

T_{f}:H\to H\quad,\quad T_{f}(g)=fg

which is well-defined, linear and bounded, having norm as follows:

||T_{f}||=||f||_{\infty}

Moreover, the correspondence $f\to T_{f}$ is linear, multiplicative and involutive.

Proof.

There are several assertions here, the idea being as follows:

(1) We must first prove that the formula in the statement, $T_{f}(g)=fg$ , defines indeed an operator $H\to H$ , which amounts in saying that we have:

f\in L^{\infty}(X),\ g\in L^{2}(X)\implies fg\in L^{2}(X)

But this follows from the following explicit estimate:

$\displaystyle\|\|fg\|\|_{2}$	$\displaystyle=$	$\displaystyle\sqrt{\int_{X}\|f(x)\|^{2}\|g(x)\|^{2}d\mu(x)}$
	$\displaystyle\leq$	$\displaystyle\sup_{x\in X}\|f(x)\|^{2}\sqrt{\int_{X}\|g(x)\|^{2}d\mu(x)}$
	$\displaystyle=$	$\displaystyle\|\|f\|\|_{\infty}\|\|g\|\|_{2}$
	$\displaystyle<$	$\displaystyle\infty$

(2) Next in line, we must prove that $T$ is linear and bounded. We have:

T_{f}(g+h)=T_{f}(g)+T_{f}(h)\quad,\quad T_{f}(\lambda g)=\lambda T_{f}(g)

As for the boundedness condition, this follows from the estimate from the proof of (1), which gives, in terms of the operator norm of $B(H)$ :

||T_{f}||\leq||f||_{\infty}

(3) Let us prove now that we have equality, $||T_{f}||=||f||_{\infty}$ , in the above estimate. For this purpose, we use the well-known fact that the $L^{\infty}$ functions can be approximated by $L^{2}$ functions. Indeed, with such an approximation $g_{n}\to f$ we obtain:

$\displaystyle\|\|fg_{n}\|\|_{2}$	$\displaystyle=$	$\displaystyle\sqrt{\int_{X}\|f(x)\|^{2}\|g_{n}(x)\|^{2}d\mu(x)}$
	$\displaystyle\simeq$	$\displaystyle\sup_{x\in X}\|f(x)\|^{2}\sqrt{\int_{X}\|g_{n}(x)\|^{2}d\mu(x)}$
	$\displaystyle=$	$\displaystyle\|\|f\|\|_{\infty}\|\|g_{n}\|\|_{2}$

Thus, with $n\to\infty$ we obtain $||T_{f}||\geq||f||_{\infty}$ , which is reverse to the inequality obtained in the proof of (2), and this leads to the conclusion in the statement.

(4) Regarding now the fact that the correspondence $f\to T_{f}$ is indeed linear and multiplicative, the corresponding formulae are as follows, both clear:

T_{f+h}(g)=T_{f}(g)+T_{h}(g)\quad,\quad T_{\lambda f}(g)=\lambda T_{f}(g)

(5) Finally, let us prove that the correspondence $f\to T_{f}$ is involutive, in the sense that it transforms the standard involution $f\to\bar{f}$ of the algebra $L^{\infty}(X)$ into the standard involution $T\to T^{*}$ of the algebra $B(H)$ . We must prove that we have:

T_{f}^{*}=T_{\bar{f}}

But this follows from the following computation:

$\displaystyle<T_{f}g,h>$	$\displaystyle=$	$\displaystyle<fg,h>$
	$\displaystyle=$	$\displaystyle\int_{X}f(x)g(x)\bar{h}(x)d\mu(x)$
	$\displaystyle=$	$\displaystyle\int_{X}g(x)f(x)\bar{h}(x)d\mu(x)$
	$\displaystyle=$	$\displaystyle<g,\bar{f}h>$
	$\displaystyle=$	$\displaystyle<g,T_{\bar{f}}h>$

Indeed, since the adjoint is unique, we obtain from this $T_{f}^{*}=T_{\bar{f}}$ . Thus the correspondence $f\to T_{f}$ is indeed involutive, as claimed. ∎

In what regards now the basic classes of operators, the above construction provides us with many new examples, which are very explicit, and are complementary to the usual finite dimensional examples that we usually have in mind, as follows:

Theorem 2.32.

The multiplication operators $T_{f}(g)=fg$ on the Hilbert space $H=L^{2}(X)$ associated to the functions $f\in L^{\infty}(X)$ are as follows:

(1)

$T_{f}$ is unitary when $f:X\to\mathbb{T}$ .
(2)

$T_{f}$ is a symmetry when $f:X\to\{-1,1\}$ .
(3)

$T_{f}$ is a projection when $f=\chi_{Y}$ with $Y\in X$ .
(4)

There are no non-unitary isometries.
(5)

There are no non-unitary symmetries.
(6)

$T_{f}$ is positive when $f:X\to\mathbb{R}_{+}$ .
(7)

$T_{f}$ is self-adjoint when $f:X\to\mathbb{R}$ .
(8)

$T_{f}$ is always normal, for any $f:X\to\mathbb{C}$ .

Proof.

All these assertions are clear from definitions, and from the various properties of the correspondence $f\to T_{f}$ , established above, as follows:

(1) The unitarity condition $U^{*}=U^{-1}$ for the operator $T_{f}$ reads $\bar{f}=f^{-1}$ , which means that we must have $f:X\to\mathbb{T}$ , as claimed.

(2) The symmetry condition $S^{2}=1$ for the operator $T_{f}$ reads $f^{2}=1$ , which means that we must have $f:X\to\{-1,1\}$ , as claimed.

(3) The projection condition $P^{2}=P^{*}=P$ for the operator $T_{f}$ reads $f^{2}=f=\bar{f}$ , which means that we must have $f:X\to\{0,1\}$ , or equivalently, $f=\chi_{Y}$ with $Y\subset X$ .

(4) A non-unitary isometry must satisfy by definition $U^{*}U=1,UU^{*}\neq 1$ , and for the operator $T_{f}$ this means that we must have $|f|^{2}=1,|f|^{2}\neq 1$ , which is impossible.

(5) This follows from (1) and (2), because the solutions found in (2) for the symmetry problem are included in the solutions found in (1) for the unitarity problem.

(6) The fact that $T_{f}$ is positive amounts in saying that we must have $<fg,g>\geq 0$ for any $g\in L^{2}(X)$ , and this is equivalent to the fact that we must have $f\geq 0$ , as desired.

(7) The self-adjointness condition $T=T^{*}$ for the operator $T_{f}$ reads $f=\bar{f}$ , which means that we must have $f:X\to\mathbb{R}$ , as claimed.

(8) The normality condition $TT^{*}=T^{*}T$ for the operator $T_{f}$ reads $f\bar{f}=\bar{f}f$ , which is automatic for any function $f:X\to\mathbb{C}$ , as claimed. ∎

The above result might look quite puzzling, at a first glance, messing up our intuition with various classes of operators, coming from usual linear algebra. However, a bit of further thinking tells us that there is no contradiction, and that Theorem 2.32 in fact is very similar to what we know about the diagonal matrices. To be more precise, the diagonal matrices are unitaries precisely when their entries are in $\mathbb{T}$ , there are no non-unitary isometries, all such matrices are normal, and so on. In order to understand all this, let us work out what happens with the correspondence $f\to T_{f}$ , in finite dimensions. The situation here is in fact extremely simple, and illuminating, as follows:

Theorem 2.33.

Assuming $X=\{1,\ldots,N\}$ with the counting measure, the embedding

L^{\infty}(X)\subset B(L^{2}(X))

constructed via multiplication operators, $T_{f}(g)=fg$ , corresponds to the embedding

\mathbb{C}^{N}\subset M_{N}(\mathbb{C})

given by the diagonal matrices, constructed as follows:

f\to diag(f_{1},\ldots,f_{N})

Thus, Theorem 2.32 generalizes what we know about the diagonal matrices.

Proof.

The idea is that all this is trivial, with not a single new computation needed, modulo some algebraic thinking, of quite soft type. Let us go back indeed to Theorem 2.31 above and its proof, with the abstract measured space $X$ appearing there being now the following finite space, with its counting mesure:

X=\{1,\ldots,N\}

Regarding the functions $f\in L^{\infty}(X)$ , these are now functions as follows:

f:\{1,\ldots,N\}\to\mathbb{C}

We can identify such a function with the corresponding vector $(f(i))_{i}\in\mathbb{C}^{N}$ , and so we conclude that our input algebra $L^{\infty}(X)$ is the algebra $\mathbb{C}^{N}$ :

L^{\infty}(X)=\mathbb{C}^{N}

Regarding now the Hilbert space $H=L^{2}(X)$ , this is equal as well to $\mathbb{C}^{N}$ , and for the same reasons, namely that $g\in L^{2}(X)$ can be identified with the vector $(g(i))_{i}\in\mathbb{C}^{N}$ :

L^{2}(X)=\mathbb{C}^{N}

Observe that, due to our assumption that $X$ comes with its counting measure, the scalar product that we obtain on $\mathbb{C}^{N}$ is the usual one, without weights. Now, let us identify the operators on $L^{2}(X)=\mathbb{C}^{N}$ with the square matrices, in the usual way:

B(L^{2}(X))=M_{N}(\mathbb{C})

This was our final identification, in order to get started. Now by getting back to Theorem 2.31, the embedding $L^{\infty}(X)\subset B(L^{2}(X))$ constructed there reads:

\mathbb{C}^{N}\subset M_{N}(\mathbb{C})

But this can only be the embedding given by the diagonal matrices, so are basically done. In order to finish, however, let us understand what the operator associated to an arbitrary vector $f\in\mathbb{C}^{N}$ is. We can regard this vector as a function, $f(i)=f_{i}$ , and so the action $T_{f}(g)=fg$ on the vectors of $L^{2}(X)=\mathbb{C}^{N}$ is by componentwise multiplication by the numbers $f_{1},\ldots,f_{N}$ . But this is exactly the action of the diagonal matrix $diag(f_{1},\ldots,f_{N})$ , and so we are led to the conclusion in the statement. ∎

There are other things that can be said about the embedding $L^{\infty}(X)\subset B(L^{2}(X))$ , a key observation here, which is elementary to prove, being the fact that the image of $L^{\infty}(X)$ is closed with respect to the weak topology, the one where $T_{n}\to T$ when $T_{n}x\to Tx$ for any $x\in H$ . And with this meaning that $L^{\infty}(X)$ is a so-called von Neumann algebra on $L^{2}(X)$ . We will be back to this, on numerous occasions, in what follows.

2e. Exercises

As before with linear algebra, operator theory is a wide area of mathematics, and there are many interesting operators, and exercises about them. We first have:

Exercise 2.34.

Find an explicit orthonormal basis for the Hilbert space

H=L^{2}[0,1]

by starting with the algebraic basic $f_{n}=x^{n}$ with $n\in\mathbb{N}$ , and applying Gram-Schmidt.

This is actually quite non-trivial, and in case you’re stuck with complicated computations, better look it up, preferably in the physics literature, physicists being well-known to adore such things, and then write a brief account of what you found.

Exercise 2.35.

Find all the $2\times 2$ complex matrices

S=\begin{pmatrix}a&b\\ c&d\end{pmatrix}

which are symmetries, $S^{2}=1$ , and interpret them geometrically.

Here you can of course start with the real case first, $S\in M_{2}(\mathbb{R})$ . Also, you can have a look at 3 dimensions too, real or complex, and beware of the computations here.

Exercise 2.36.

Prove that any positive operator $T\geq 0$ appears as

T=S^{2}

with $S$ self-adjoint, first in finite dimensions, then in general.

Here the discussion in finite dimensions involves positive eigenvalues and their square roots, which is something quite standard. In infinite dimensions things are a bit more complicated, because we don’t have yet such eigenvalue technology, and with this being actually to come in the next chapter, but you can try of course some other tricks.

Chapter 3 Spectral theorems

3a. Basic theory

We discuss in this chapter the diagonalization problem for the operators $T\in B(H)$ , in analogy with the diagonalization problem for the usual matrices $A\in M_{N}(\mathbb{C})$ . As a first observation, we can talk about eigenvalues and eigenvectors, as follows:

Definition 3.1.

Given an operator $T\in B(H)$ , assuming that we have

Tx=\lambda x

we say that $x\in H$ is an eigenvector of $T$ , with eigenvalue $\lambda\in\mathbb{C}$ .

We know many things about eigenvalues and eigenvectors, in the finite dimensional case. However, most of these will not extend to the infinite dimensional case, or at least not extend in a straightforward way, due to a number of reasons:

(1)

Most of basic linear algebra is based on the fact that $Tx=\lambda x$ is equivalent to $(T-\lambda)x=0$ , so that $\lambda$ is an eigenvalue when $T-\lambda$ is not invertible. In the infinite dimensional setting $T-\lambda$ might be injective and not surjective, or vice versa, or invertible with $(T-\lambda)^{-1}$ not bounded, and so on.
(2)

Also, in linear algebra $T-\lambda$ is not invertible when $\det(T-\lambda)=0$ , and with this leading to most of the advanced results about eigenvalues and eigenvectors. In infinite dimensions, however, it is impossible to construct a determinant function $\det:B(H)\to\mathbb{C}$ , and this even for the diagonal operators on $l^{2}(\mathbb{N})$ .

Summarizing, we are in trouble with our extension program, and this right from the beginning. In order to have some theory started, however, let us forget about (2), which obviously leads nowhere, and focus on the difficulties in (1).

In order to cut short the discussion there, regarding the various properties of $T-\lambda$ , we can just say that $T-\lambda$ is either invertible with bounded inverse, the “good case”, or not. We are led in this way to the following definition:

Definition 3.2.

The spectrum of an operator $T\in B(H)$ is the set

\sigma(T)=\left\{\lambda\in\mathbb{C}\Big{|}T-\lambda\not\in B(H)^{-1}\right\}

where $B(H)^{-1}\subset B(H)$ is the set of invertible operators.

As a basic example, in the finite dimensional case, $H=\mathbb{C}^{N}$ , the spectrum of a usual matrix $A\in M_{N}(\mathbb{C})$ is the collection of its eigenvalues, taken without multiplicities. We will see many other examples. In general, the spectrum has the following properties:

Proposition 3.3.

The spectrum of $T\in B(H)$ contains the eigenvalue set

\varepsilon(T)=\left\{\lambda\in\mathbb{C}\Big{|}\ker(T-\lambda)\neq\{0\}\right\}

and $\varepsilon(T)\subset\sigma(T)$ is an equality in finite dimensions, but not in infinite dimensions.

Proof.

We have several assertions here, the idea being as follows:

(1) First of all, the eigenvalue set is indeed the one in the statement, because $Tx=\lambda x$ tells us precisely that $T-\lambda$ must be not injective. The fact that we have $\varepsilon(T)\subset\sigma(T)$ is clear as well, because if $T-\lambda$ is not injective, it is not bijective.

(2) In finite dimensions we have $\varepsilon(T)=\sigma(T)$ , because $T-\lambda$ is injective if and only if it is bijective, with the boundedness of the inverse being automatic.

(3) In infinite dimensions we can assume $H=l^{2}(\mathbb{N})$ , and the shift operator $S(e_{i})=e_{i+1}$ is injective but not surjective. Thus $0\in\sigma(T)-\varepsilon(T)$ . ∎

We will see more examples and counterexamples, and some general theory, in a moment. Philosophically speaking, the best way of thinking at all this is as follows:

– The numbers $\lambda\notin\sigma(T)$ are good, because we can invert $T-\lambda$ .

– The numbers $\lambda\in\sigma(T)-\varepsilon(T)$ are bad.

– The eigenvalues $\lambda\in\varepsilon(T)$ are evil.

Note that this is somewhat contrary to what happens in linear algebra, where the eigenvalues are highly valued, and cherished, and regarded as being the source of all good things on Earth. Welcome to operator theory, where some things are upside down.

Let us develop now some general theory for the spectrum, or perhaps for its complement, with the promise to come back to eigenvalues later. As a first result, we would like to prove that the spectra are non-empty. This is something tricky, and we will need:

Proposition 3.4.

The following happen:

(1)

$||T||<1\implies(1-T)^{-1}=1+T+T^{2}+\ldots$
(2)

The set $B(H)^{-1}$ is open.
(3)

The map $T\to T^{-1}$ is differentiable.

Proof.

All these assertions are elementary, as follows:

(1) This follows as in the scalar case, the computation being as follows, provided that everything converges under the norm, which amounts in saying that $||T||<1$ :

	$\displaystyle(1-T)(1+T+T^{2}+\ldots)$	$\displaystyle=$	$\displaystyle 1-T+T-T^{2}+T^{2}-T^{3}+\ldots$
		$\displaystyle=$	$\displaystyle 1$

(2) Assuming $T\in B(H)^{-1}$ , let us pick $S\in B(H)$ such that:

||T-S||<\frac{1}{||T^{-1}||}

We have then the following estimate:

$\displaystyle\|\|1-T^{-1}S\|\|$	$\displaystyle=$	$\displaystyle\|\|T^{-1}(T-S)\|\|$
	$\displaystyle\leq$	$\displaystyle\|\|T^{-1}\|\|\cdot\|\|T-S\|\|$
	$\displaystyle<$	$\displaystyle 1$

Thus we have $T^{-1}S\in B(H)^{-1}$ , and so $S\in B(H)^{-1}$ , as desired.

(3) In the scalar case, the derivative of $f(t)=t^{-1}$ is $f^{\prime}(t)=-t^{-2}$ . In the present normed space setting the derivative is no longer a number, but rather a linear transformation, which can be found by developing $f(T)=T^{-1}$ at order 1, as follows:

$\displaystyle(T+S)^{-1}$	$\displaystyle=$	$\displaystyle((1+ST^{-1})T)^{-1}$
	$\displaystyle=$	$\displaystyle T^{-1}(1+ST^{-1})^{-1}$
	$\displaystyle=$	$\displaystyle T^{-1}(1-ST^{-1}+(ST^{-1})^{2}-\ldots)$
	$\displaystyle\simeq$	$\displaystyle T^{-1}(1-ST^{-1})$
	$\displaystyle=$	$\displaystyle T^{-1}-T^{-1}ST^{-1}$

Thus $f(T)=T^{-1}$ is indeed differentiable, with derivative $f^{\prime}(T)S=-T^{-1}ST^{-1}$ . ∎

We can now formulate our first theorem about spectra, as follows:

Theorem 3.5.

The spectrum of a bounded operator $T\in B(H)$ is:

(1)

Compact.
(2)

Contained in the disc $D_{0}(||T||)$ .
(3)

Non-empty.

Proof.

This can be proved by using Proposition 3.4, along with a bit of complex and functional analysis, for which we refer to Rudin [rud] and Lax [lax], as follows:

(1) In view of (2) below, it is enough to prove that $\sigma(T)$ is closed. But this follows from the following computation, with $|\varepsilon|$ being small:

$\displaystyle\lambda\notin\sigma(T)$	$\displaystyle\implies$	$\displaystyle T-\lambda\in B(H)^{-1}$
	$\displaystyle\implies$	$\displaystyle T-\lambda-\varepsilon\in B(H)^{-1}$
	$\displaystyle\implies$	$\displaystyle\lambda+\varepsilon\notin\sigma(T)$

(2) This follows from the following computation:

$\displaystyle\lambda>\|\|T\|\|$	$\displaystyle\implies$	$\displaystyle\Big{\|}\Big{\|}\frac{T}{\lambda}\Big{\|}\Big{\|}<1$
	$\displaystyle\implies$	$\displaystyle 1-\frac{T}{\lambda}\in B(H)^{-1}$
	$\displaystyle\implies$	$\displaystyle\lambda-T\in B(H)^{-1}$
	$\displaystyle\implies$	$\displaystyle\lambda\notin\sigma(T)$

(3) Assume by contradiction $\sigma(T)=\emptyset$ . Given a linear form $f\in B(H)^{*}$ , consider the following map, which is well-defined, due to our assumption $\sigma(T)=\emptyset$ :

\varphi:\mathbb{C}\to\mathbb{C}\quad,\quad\lambda\to f((T-\lambda)^{-1})

By using the fact that $T\to T^{-1}$ is differentiable, that we know from Proposition 3.4, we conclude that this map is differentiable, and so holomorphic. Also, we have:

$\displaystyle\lambda\to\infty$	$\displaystyle\implies$	$\displaystyle T-\lambda\to\infty$
	$\displaystyle\implies$	$\displaystyle(T-\lambda)^{-1}\to 0$
	$\displaystyle\implies$	$\displaystyle f((T-\lambda))^{-1}\to 0$

Thus by the Liouville theorem we obtain $\varphi=0$ . But, in view of the definition of $\varphi$ , this gives $(T-\lambda)^{-1}=0$ , which is a contradiction, as desired. ∎

Here is now a second basic result regarding the spectra, inspired from what happens in finite dimensions, for the usual complex matrices, and which shows that things do not necessarily extend without troubles to the infinite dimensional setting:

Theorem 3.6.

We have the following formula, valid for any operators $S,T$ :

\sigma(ST)\cup\{0\}=\sigma(TS)\cup\{0\}

In finite dimensions we have $\sigma(ST)=\sigma(TS)$ , but this fails in infinite dimensions.

Proof.

There are several assertions here, the idea being as follows:

(1) This is something that we know in finite dimensions, coming from the fact that the characteristic polynomials of the associated matrices $A,B$ coincide:

P_{AB}=P_{BA}

Thus we obtain $\sigma(ST)=\sigma(TS)$ in this case, as claimed. Observe that this improves twice the general formula in the statement, first because we have no issues at 0, and second because what we obtain is actually an equality of sets with mutiplicities.

(2) In general now, let us first prove the main assertion, stating that $\sigma(ST),\sigma(TS)$ coincide outside 0. We first prove that we have the following implication:

1\notin\sigma(ST)\implies 1\notin\sigma(TS)

Assume indeed that $1-ST$ is invertible, with inverse denoted $R$ :

R=(1-ST)^{-1}

We have then the following formulae, relating our variables $R,S,T$ :

RST=STR=R-1

By using $RST=R-1$ , we have the following computation:

$\displaystyle(1+TRS)(1-TS)$	$\displaystyle=$	$\displaystyle 1+TRS-TS-TRSTS$
	$\displaystyle=$	$\displaystyle 1+TRS-TS-TRS+TS$
	$\displaystyle=$	$\displaystyle 1$

A similar computation, using $STR=R-1$ , shows that we have:

(1-TS)(1+TRS)=1

Thus $1-TS$ is invertible, with inverse $1+TRS$ , which proves our claim. Now by multiplying by scalars, we deduce from this that for any $\lambda\in\mathbb{C}-\{0\}$ we have:

\lambda\notin\sigma(ST)\implies\lambda\notin\sigma(TS)

But this leads to the conclusion in the statement.

(3) Regarding now the counterexample to the formula $\sigma(ST)=\sigma(TS)$ , in general, let us take $S$ to be the shift on $H=L^{2}(\mathbb{N})$ , given by the following formula:

S(e_{i})=e_{i+1}

As for $T$ , we can take it to be the adjoint of $S$ , which is the following operator:

S^{*}(e_{i})=\begin{cases}e_{i-1}&{\rm if}\ i>0\\ 0&{\rm if}\ i=0\end{cases}

Let us compose now these two operators. In one sense, we have:

S^{*}S=1\implies 0\notin\sigma(S^{*}S)

In the other sense, however, the situation is different, as follows:

SS^{*}=Proj(e_{0}^{\perp})\implies 0\in\sigma(SS^{*})

Thus, the spectra do not match on $0$ , and we have our counterexample, as desired. ∎

3b. Spectral radius

Let us develop now some systematic theory for the computation of the spectra, based on what we know about the eigenvalues of the usual complex matrices. As a first result, which is well-known for the usual matrices, and extends well, we have:

Theorem 3.7.

We have the “polynomial functional calculus” formula

\sigma(P(T))=P(\sigma(T))

valid for any polynomial $P\in\mathbb{C}[X]$ , and any operator $T\in B(H)$ .

Proof.

We pick a scalar $\lambda\in\mathbb{C}$ , and we decompose the polynomial $P-\lambda$ :

P(X)-\lambda=c(X-r_{1})\ldots(X-r_{n})

We have then the following equivalences:

$\displaystyle\lambda\notin\sigma(P(T))$	$\displaystyle\iff$	$\displaystyle P(T)-\lambda\in B(H)^{-1}$
	$\displaystyle\iff$	$\displaystyle c(T-r_{1})\ldots(T-r_{n})\in B(H)^{-1}$
	$\displaystyle\iff$	$\displaystyle T-r_{1},\ldots,T-r_{n}\in B(H)^{-1}$
	$\displaystyle\iff$	$\displaystyle r_{1},\ldots,r_{n}\notin\sigma(T)$
	$\displaystyle\iff$	$\displaystyle\lambda\notin P(\sigma(T))$

Thus, we are led to the formula in the statement. ∎

The above result is something very useful, and generalizing it will be our next task. As a first ingredient here, assuming that $A\in M_{N}(\mathbb{C})$ is invertible, we have:

\sigma(A^{-1})=\sigma(A)^{-1}

It is possible to extend this formula to the arbitrary operators, and we will do this in a moment. Before starting, however, we have to think in advance on how to unify this potential result, that we have in mind, with Theorem 3.7 itself.

What we have to do here is to find a class of functions generalizing both the polynomials $P\in\mathbb{C}[X]$ and the inverse function $x\to x^{-1}$ , and the answer to this question is provided by the rational functions, which are as follows:

Definition 3.8.

A rational function $f\in\mathbb{C}(X)$ is a quotient of polynomials:

f=\frac{P}{Q}

Assuming that $P,Q$ are prime to each other, we can regard $f$ as a usual function,

f:\mathbb{C}-X\to\mathbb{C}

with $X$ being the set of zeros of $Q$ , also called poles of $f$ .

Here the term “poles” comes from the fact that, if you want to imagine the graph of such a rational function $f$ , in two complex dimensions, what you get is some sort of tent, supported by poles of infinite height, situated at the zeros of $Q$ . For more on all this, and on complex analysis in general, we refer as usual to Rudin [rud]. Although a look at an abstract algebra book can be interesting as well.

Now that we have our class of functions, the next step consists in applying them to operators. Here we cannot expect $f(T)$ to make sense for any $f$ and any $T$ , for instance because $T^{-1}$ is defined only when $T$ is invertible. We are led in this way to:

Definition 3.9.

Given an operator $T\in B(H)$ , and a rational function $f=P/Q$ having poles outside $\sigma(T)$ , we can construct the following operator,

f(T)=P(T)Q(T)^{-1}

that we can denote as a usual fraction, as follows,

f(T)=\frac{P(T)}{Q(T)}

due to the fact that $P(T),Q(T)$ commute, so that the order is irrelevant.

To be more precise, $f(T)$ is indeed well-defined, and the fraction notation is justified too. In more formal terms, we can say that we have a morphism of complex algebras as follows, with $\mathbb{C}(X)^{T}$ standing for the rational functions having poles outside $\sigma(T)$ :

\mathbb{C}(X)^{T}\to B(H)\quad,\quad f\to f(T)

Summarizing, we have now a good class of functions, generalizing both the polynomials and the inverse map $x\to x^{-1}$ . We can now extend Theorem 3.7, as follows:

Theorem 3.10.

We have the “rational functional calculus” formula

\sigma(f(T))=f(\sigma(T))

valid for any rational function $f\in\mathbb{C}(X)$ having poles outside $\sigma(T)$ .

Proof.

We pick a scalar $\lambda\in\mathbb{C}$ , we write $f=P/Q$ , and we set:

F=P-\lambda Q

By using now Theorem 3.7, for this polynomial, we obtain:

$\displaystyle\lambda\in\sigma(f(T))$	$\displaystyle\iff$	$\displaystyle F(T)\notin B(H)^{-1}$
	$\displaystyle\iff$	$\displaystyle 0\in\sigma(F(T))$
	$\displaystyle\iff$	$\displaystyle 0\in F(\sigma(T))$
	$\displaystyle\iff$	$\displaystyle\exists\mu\in\sigma(T),F(\mu)=0$
	$\displaystyle\iff$	$\displaystyle\lambda\in f(\sigma(T))$

Thus, we are led to the formula in the statement. ∎

As an application of the above methods, we can investigate certain special classes of operators, such as the self-adjoint ones, and the unitary ones. Let us start with:

Proposition 3.11.

The following happen:

(1)

We have $\sigma(T^{*})=\overline{\sigma(T)}$ , for any $T\in B(H)$ .
(2)

If $T=T^{*}$ then $X=\sigma(T)$ satisfies $X=\overline{X}$ .
(3)

If $U^{*}=U^{-1}$ then $X=\sigma(U)$ satisfies $X^{-1}=\overline{X}$ .

Proof.

We have several assertions here, the idea being as follows:

(1) The spectrum of the adjoint operator $T^{*}$ can be computed as follows:

$\displaystyle\sigma(T^{*})$	$\displaystyle=$	$\displaystyle\left\{\lambda\in\mathbb{C}\Big{\|}T^{*}-\lambda\notin B(H)^{-1}\right\}$
	$\displaystyle=$	$\displaystyle\left\{\lambda\in\mathbb{C}\Big{\|}T-\bar{\lambda}\notin B(H)^{-1}\right\}$
	$\displaystyle=$	$\displaystyle\overline{\sigma(T)}$

(2) This is clear indeed from (1).

(3) For a unitary operator, $U^{*}=U^{-1}$ , Theorem 3.10 and (1) give:

\sigma(U)^{-1}=\sigma(U^{-1})=\sigma(U^{*})=\overline{\sigma(U)}

Thus, we are led to the conclusion in the statement. ∎

In analogy with what happens for the usual matrices, we would like to improve now (2,3) above, with results stating that the spectrum $X=\sigma(T)$ satisfies $X\subset\mathbb{R}$ for self-adjoints, and $X\subset\mathbb{T}$ for unitaries. This will be tricky. Let us start with:

Theorem 3.12.

The spectrum of a unitary operator

U^{*}=U^{-1}

is on the unit circle, $\sigma(U)\subset\mathbb{T}$ .

Proof.

Assuming $U^{*}=U^{-1}$ , we have the following norm computation:

||U||=\sqrt{||UU^{*}||}=\sqrt{1}=1

Now if we denote by $D$ the unit disk, we obtain from this:

\sigma(U)\subset D

On the other hand, once again by using $U^{*}=U^{-1}$ , we have as well:

||U^{-1}||=||U^{*}||=||U||=1

Thus, as before with $D$ being the unit disk in the complex plane, we have:

\sigma(U^{-1})\subset D

Now by using Theorem 3.10, we obtain $\sigma(U)\subset D\cap D^{-1}=\mathbb{T}$ , as desired. ∎

We have as well a similar result for self-adjoints, as follows:

Theorem 3.13.

The spectrum of a self-adjoint operator

T=T^{*}

consists of real numbers, $\sigma(T)\subset\mathbb{R}$ .

Proof.

The idea is that we can deduce the result from Theorem 3.12, by using the following remarkable rational function, depending on a parameter $r\in\mathbb{R}$ :

f(z)=\frac{z+ir}{z-ir}

Indeed, for $r>>0$ the operator $f(T)$ is well-defined, and we have:

\left(\frac{T+ir}{T-ir}\right)^{*}=\frac{T-ir}{T+ir}=\left(\frac{T+ir}{T-ir}\right)^{-1}

Thus $f(T)$ is unitary, and by using Theorem 3.12 we obtain:

$\displaystyle\sigma(T)$	$\displaystyle\subset$	$\displaystyle f^{-1}(f(\sigma(T)))$
	$\displaystyle=$	$\displaystyle f^{-1}(\sigma(f(T)))$
	$\displaystyle\subset$	$\displaystyle f^{-1}(\mathbb{T})$
	$\displaystyle=$	$\displaystyle\mathbb{R}$

Thus, we are led to the conclusion in the statement. ∎

As a theoretical remark, it is possible to deduce as well Theorem 3.12 from Theorem 3.13, by performing the above computation in the other sense. Indeed, by assuming that Theorem 3.13 holds indeed, and starting with a unitary $U\in B(H)$ , we obtain:

$\displaystyle\sigma(U)$	$\displaystyle\subset$	$\displaystyle f(f^{-1}(\sigma(U)))$
	$\displaystyle=$	$\displaystyle f(\sigma(f^{-1}(U)))$
	$\displaystyle\subset$	$\displaystyle f(\mathbb{R})$
	$\displaystyle=$	$\displaystyle\mathbb{T}$

As a conclusion now, we have so far a beginning of spectral theory, with results allowing us to investigate the unitaries and the self-adjoints, and with the remark that these two classes of operators are related by a certain wizarding rational function, namely:

f(z)=\frac{z+ir}{z-ir}

Let us keep building on this, with more complex analysis involved. One key thing that we know about matrices, and which follows for instance by using the fact that the diagonalizable matrices are dense, is the following formula:

\sigma(e^{A})=e^{\sigma(A)}

We would like to have such formulae for the general operators $T\in B(H)$ , but this is something quite technical. Consider the rational calculus morphism from Definition 3.9, which is as follows, with the exponent standing for “having poles outside $\sigma(T)$ ”:

\mathbb{C}(X)^{T}\to B(H)\quad,\quad f\to f(T)

As mentioned before, the rational functions are holomorphic outside their poles, and this raises the question of extending this morphism, as follows:

Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Normally this can be done in several steps. Let us start with:

Proposition 3.14.

We can exponentiate any operator $T\in B(H)$ , by setting:

e^{T}=\sum_{k=0}^{\infty}\frac{T^{k}}{k!}

Similarly, we can define $f(T)$ , for any holomorphic function $f:\mathbb{C}\to\mathbb{C}$ .

Proof.

We must prove that the series defining $e^{T}$ converges, and this follows from:

||e^{T}||\leq\sum_{k=0}^{\infty}\frac{||T||^{k}}{k!}=e^{||T||}

The case of the arbitrary holomorphic functions $f:\mathbb{C}\to\mathbb{C}$ is similar. ∎

In general, the holomorphic functions are not entire, and the above method won’t cover the rational functions $f\in\mathbb{C}(X)^{T}$ that we want to generalize. Thus, we must use something else. And the answer here comes from the Cauchy formula:

f(t)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-t}\,dz

Indeed, given a rational function $f\in\mathbb{C}(X)^{T}$ , the operator $f(T)\in B(H)$ , constructed in Definition 3.9, can be recaptured in an analytic way, as follows:

f(T)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-T}\,dz

Now given an arbitrary function $f\in Hol(\sigma(T))$ , we can define $f(T)\in B(H)$ by the exactly same formula, and we obtain in this way the desired correspondence:

Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

This was for the plan. In practice now, all this needs a bit of care, with many verifications needed, and with the technical remark that a winding number must be added to the above Cauchy formulae, for things to be correct. The result is as follows:

Theorem 3.15.

We have the “holomorphic functional calculus” formula

\sigma(f(T))=f(\sigma(T))

valid for any holomorphic function $f\in Hol(\sigma(T))$ .

Proof.

This is something that we will not really need, for the purposes of the present book, which is more algebraic than analytic, but here is the general idea:

(1) As explained above, given a rational function $f\in\mathbb{C}(X)^{T}$ , the corresponding operator $f(T)\in B(H)$ can be recaptured in an analytic way, as follows:

f(T)=\frac{1}{2\pi i}\int_{\gamma}\frac{f(z)}{z-T}\,dz

(2) Now given an arbitrary function $f\in Hol(\sigma(T))$ , we can define $f(T)\in B(H)$ by the exactly same formula, and we obtain in this way the desired correspondence:

Hol(\sigma(T))\to B(H)\quad,\quad f\to f(T)

(3) In practice now, all this needs a bit of care, notably with the verification of the fact that the operator $f(T)\in B(H)$ does not depend on $\gamma$ , and with the technical remark that a winding number must be added to the above Cauchy formulae, for things to be correct. But this can be done via a standard study, keeping in mind the fact that in the case $H=\mathbb{C}$ , where our operators are usual numbers, $B(H)=\mathbb{C}$ , what we want to do is simply proving that the usual Cauchy formula holds indeed.

(4) Now with this correspondence $f\to f(T)$ constructed, and so with the formula in the statement, namely $\sigma(f(T))=f(\sigma(T))$ , making now sense, it remains to prove that this formula holds indeed. But this follows as well via a careful use of the Cauchy formula, or by using approximation by polynomials, or rational functions. ∎

As already said, the above result is important for advanced operator theory and applications, and we will not get further into this subject. We will be back, however, to all this in the special case of the normal operators, which is of particular interest for us.

In order to formulate now our next result, we will need the following notion:

Definition 3.16.

Given an operator $T\in B(H)$ , its spectral radius

\rho(T)\in\big{[}0,||T||\big{]}

is the radius of the smallest disk centered at $0$ containing $\sigma(T)$ .

Here we have included for convenience a number of basic results from Theorem 3.5, namely the fact that the spectrum is non-empty, and is contained in the disk $D_{0}(||T||)$ , which provide us respectively with the inequalities $\rho(T)\geq 0$ , with the usual convention $\sup\emptyset=-\infty$ , and $\rho(T)\leq||T||$ . Now with this notion in hand, we have the following key result, improving our key result so far, namely $\sigma(T)\neq\emptyset$ , from Theorem 3.5:

Theorem 3.17.

The spectral radius of an operator $T\in B(H)$ is given by

\rho(T)=\lim_{n\to\infty}||T^{n}||^{1/n}

and in this formula, we can replace the limit by an inf.

Proof.

We have several things to be proved, the idea being as follows:

(1) Our first claim is that the numbers $u_{n}=||T^{n}||^{1/n}$ satisfy:

(n+m)u_{n+m}\leq nu_{n}+mu_{m}

Indeed, we have the following estimate, using the Young inequality $ab\leq a^{p}/p+b^{q}/q$ , with exponents $p=(n+m)/n$ and $q=(n+m)/m$ :

$\displaystyle u_{n+m}$	$\displaystyle=$	$\displaystyle\|\|T^{n+m}\|\|^{1/(n+m)}$
	$\displaystyle\leq$	$\displaystyle\|\|T^{n}\|\|^{1/(n+m)}\|\|T^{m}\|\|^{1/(n+m)}$
	$\displaystyle\leq$	$\displaystyle\|\|T^{n}\|\|^{1/n}\cdot\frac{n}{n+m}+\|\|T^{m}\|\|^{1/m}\cdot\frac{m}{n+m}$
	$\displaystyle=$	$\displaystyle\frac{nu_{n}+mu_{m}}{n+m}$

(2) Our second claim is that the second assertion holds, namely:

\lim_{n\to\infty}||T^{n}||^{1/n}=\inf_{n}||T^{n}||^{1/n}

For this purpose, we just need the inequality found in (1). Indeed, fix $m\geq 1$ , let $n\geq 1$ , and write $n=lm+r$ with $0\leq r\leq m-1$ . By using twice $u_{ab}\leq u_{b}$ , we get:

$\displaystyle u_{n}$	$\displaystyle\leq$	$\displaystyle\frac{1}{n}(lmu_{lm}+ru_{r})$
	$\displaystyle\leq$	$\displaystyle\frac{1}{n}(lmu_{m}+ru_{1})$
	$\displaystyle\leq$	$\displaystyle u_{m}+\frac{r}{n}\,u_{1}$

It follows that we have $\lim\sup_{n}u_{n}\leq u_{m}$ , which proves our claim.

(3) Summarizing, we are left with proving the main formula, which is as follows, and with the remark that we already know that the sequence on the right converges:

\rho(T)=\lim_{n\to\infty}||T^{n}||^{1/n}

In one sense, we can use the polynomial calculus formula $\sigma(T^{n})=\sigma(T)^{n}$ . Indeed, this gives the following estimate, valid for any $n$ , as desired:

$\displaystyle\rho(T)$	$\displaystyle=$	$\displaystyle\sup_{\lambda\in\sigma(T)}\|\lambda\|$
	$\displaystyle=$	$\displaystyle\sup_{\rho\in\sigma(T)^{n}}\|\rho\|^{1/n}$
	$\displaystyle=$	$\displaystyle\sup_{\rho\in\sigma(T^{n})}\|\rho\|^{1/n}$
	$\displaystyle=$	$\displaystyle\rho(T^{n})^{1/n}$
	$\displaystyle\leq$	$\displaystyle\|\|T^{n}\|\|^{1/n}$

(4) For the reverse inequality, we fix a number $\rho>\rho(T)$ , and we want to prove that we have $\rho\geq\lim_{n\to\infty}||T^{n}||^{1/n}$ . By using the Cauchy formula, we have:

$\displaystyle\frac{1}{2\pi i}\int_{\|z\|=\rho}\frac{z^{n}}{z-T}\,dz$	$\displaystyle=$	$\displaystyle\frac{1}{2\pi i}\int_{\|z\|=\rho}\sum_{k=0}^{\infty}z^{n-k-1}T^{k}\,dz$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{\infty}\frac{1}{2\pi i}\left(\int_{\|z\|=\rho}z^{n-k-1}dz\right)T^{k}$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{\infty}\delta_{n,k+1}T^{k}$
	$\displaystyle=$	$\displaystyle T^{n-1}$

By applying the norm we obtain from this formula:

	$\displaystyle\|\|T^{n-1}\|\|$	$\displaystyle\leq$	$\displaystyle\frac{1}{2\pi}\int_{\|z\|=\rho}\left\|\left\|\frac{z^{n}}{z-T}\right\|\right\|\,dz$
		$\displaystyle\leq$	$\displaystyle\rho^{n}\cdot\sup_{\|z\|=\rho}\left\|\left\|\frac{1}{z-T}\right\|\right\|$

Since the sup does not depend on $n$ , by taking $n$ -th roots, we obtain in the limit:

\rho\geq\lim_{n\to\infty}||T^{n}||^{1/n}

Now recall that $\rho$ was by definition an arbitrary number satisfying $\rho>\rho(T)$ . Thus, we have obtained the following estimate, valid for any $T\in B(H)$ :

\rho(T)\geq\lim_{n\to\infty}||T^{n}||^{1/n}

Thus, we are led to the conclusion in the statement. ∎

In the case of the normal elements, we have the following finer result:

Theorem 3.18.

The spectral radius of a normal element,

TT^{*}=T^{*}T

is equal to its norm.

Proof.

We can proceed in two steps, as follows:

Step 1. In the case $T=T^{*}$ we have $||T^{n}||=||T||^{n}$ for any exponent of the form $n=2^{k}$ , by using the formula $||TT^{*}||=||T||^{2}$ , and by taking $n$ -th roots we get:

\rho(T)\geq||T||

Thus, we are done with the self-adjoint case, with the result $\rho(T)=||T||$ .

Step 2. In the general normal case $TT^{*}=T^{*}T$ we have $T^{n}(T^{n})^{*}=(TT^{*})^{n}$ , and by using this, along with the result from Step 1, applied to $TT^{*}$ , we obtain:

$\displaystyle\rho(T)$	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}\|\|T^{n}\|\|^{1/n}$
	$\displaystyle=$	$\displaystyle\sqrt{\lim_{n\to\infty}\|\|T^{n}(T^{n})^{*}\|\|^{1/n}}$
	$\displaystyle=$	$\displaystyle\sqrt{\lim_{n\to\infty}\|\|(TT^{*})^{n}\|\|^{1/n}}$
	$\displaystyle=$	$\displaystyle\sqrt{\rho(TT^{*})}$
	$\displaystyle=$	$\displaystyle\sqrt{\|\|T\|\|^{2}}$
	$\displaystyle=$	$\displaystyle\|\|T\|\|$

Thus, we are led to the conclusion in the statement. ∎

As a first comment, the spectral radius formula $\rho(T)=||T||$ does not hold in general, the simplest counterexample being the following non-normal matrix:

J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

As another comment, we can combine the formula $\rho(T)=||T||$ for normal operators with the formula $||TT^{*}||=||T||^{2}$ , and we are led to the following statement:

Theorem 3.19.

The norm of $B(H)$ is given by

||T||=\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}TT^{*}-\lambda\notin B(H)^{-1}\right\}}

and so is a purely algebraic quantity.

Proof.

We have the following computation, using the formula $||TT^{*}||=||T||^{2}$ , then the spectral radius formula for $TT^{*}$ , and finally the definition of the spectral radius:

$\displaystyle\|\|T\|\|$	$\displaystyle=$	$\displaystyle\sqrt{\|\|TT^{*}\|\|}$
	$\displaystyle=$	$\displaystyle\sqrt{\rho(TT^{*})}$
	$\displaystyle=$	$\displaystyle\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{\|}\lambda\in\sigma(TT^{*})\right\}}$
	$\displaystyle=$	$\displaystyle\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{\|}TT^{*}-\lambda\notin B(H)^{-1}\right\}}$

Thus, we are led to the conclusion in the statement. ∎

The above result is quite interesting, philosophically speaking. We will be back to this, with further results and comments on $B(H)$ , and other algebras of the same type.

3c. Normal operators

By using Theorem 3.18 we can say a number of non-trivial things concerning the normal operators, commonly known as “spectral theorem for normal operators”. As a first result here, we can improve the polynomial functional calculus formula:

Theorem 3.20.

Given $T\in B(H)$ normal, we have a morphism of algebras

\mathbb{C}[X]\to B(H)\quad,\quad P\to P(T)

having the properties $||P(T)||=||P_{|\sigma(T)}||$ , and $\sigma(P(T))=P(\sigma(T))$ .

Proof.

This is an improvement of Theorem 3.7 in the normal case, with the extra assertion being the norm estimate. But the element $P(T)$ being normal, we can apply to it the spectral radius formula for normal elements, and we obtain:

$\displaystyle\|\|P(T)\|\|$	$\displaystyle=$	$\displaystyle\rho(P(T))$
	$\displaystyle=$	$\displaystyle\sup_{\lambda\in\sigma(P(T))}\|\lambda\|$
	$\displaystyle=$	$\displaystyle\sup_{\lambda\in P(\sigma(T))}\|\lambda\|$
	$\displaystyle=$	$\displaystyle\|\|P_{\|\sigma(T)}\|\|$

Thus, we are led to the conclusions in the statement. ∎

We can improve as well the rational calculus formula, and the holomorphic calculus formula, in the same way. Importantly now, at a more advanced level, we have:

Theorem 3.21.

Given $T\in B(H)$ normal, we have a morphism of algebras

C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

which is isometric, $||f(T)||=||f||$ , and has the property $\sigma(f(T))=f(\sigma(T))$ .

Proof.

The idea here is to “complete” the morphism in Theorem 3.20, namely:

\mathbb{C}[X]\to B(H)\quad,\quad P\to P(T)

Indeed, we know from Theorem 3.20 that this morphism is continuous, and is in fact isometric, when regarding the polynomials $P\in\mathbb{C}[X]$ as functions on $\sigma(T)$ :

||P(T)||=||P_{|\sigma(T)}||

We conclude from this that we have a unique isometric extension, as follows:

C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

It remains to prove $\sigma(f(T))=f(\sigma(T))$ , and we can do this by double inclusion:

“ $\subset$ ” Given a continuous function $f\in C(\sigma(T))$ , we must prove that we have:

\lambda\notin f(\sigma(T))\implies\lambda\notin\sigma(f(T))

For this purpose, consider the following function, which is well-defined:

\frac{1}{f-\lambda}\in C(\sigma(T))

We can therefore apply this function to $T$ , and we obtain:

\left(\frac{1}{f-\lambda}\right)T=\frac{1}{f(T)-\lambda}

In particular $f(T)-\lambda$ is invertible, so $\lambda\notin\sigma(f(T))$ , as desired.

“ $\supset$ ” Given a continuous function $f\in C(\sigma(T))$ , we must prove that we have:

\lambda\in f(\sigma(T))\implies\lambda\in\sigma(f(T))

But this is the same as proving that we have:

\mu\in\sigma(T)\implies f(\mu)\in\sigma(f(T))

For this purpose, we approximate our function by polynomials, $P_{n}\to f$ , and we examine the following convergence, which follows from $P_{n}\to f$ :

P_{n}(T)-P_{n}(\mu)\to f(T)-f(\mu)

We know from polynomial functional calculus that we have:

P_{n}(\mu)\in P_{n}(\sigma(T))=\sigma(P_{n}(T))

Thus, the operators $P_{n}(T)-P_{n}(\mu)$ are not invertible. On the other hand, we know that the set formed by the invertible operators is open, so its complement is closed. Thus the limit $f(T)-f(\mu)$ is not invertible either, and so $f(\mu)\in\sigma(f(T))$ , as desired. ∎

As an important comment, Theorem 3.21 is not exactly in final form, because it misses an important point, namely that our correspondence maps:

\bar{z}\to T^{*}

However, this is something non-trivial, and we will be back to this later. Observe however that Theorem 3.21 is fully powerful for the self-adjoint operators, $T=T^{*}$ , where the spectrum is real, and so where $z=\bar{z}$ on the spectrum. We will be back to this.

As a second result now, along the same lines, we can further extend Theorem 3.21 into a measurable functional calculus theorem, as follows:

Theorem 3.22.

Given $T\in B(H)$ normal, we have a morphism of algebras as follows, with $L^{\infty}$ standing for abstract measurable functions, or Borel functions,

L^{\infty}(\sigma(T))\to B(H)\quad,\quad f\to f(T)

which is isometric, $||f(T)||=||f||$ , and has the property $\sigma(f(T))=f(\sigma(T))$ .

Proof.

As before, the idea will be that of “completing” what we have. To be more precise, we can use the Riesz theorem and a polarization trick, as follows:

(1) Given a vector $x\in H$ , consider the following functional:

C(\sigma(T))\to\mathbb{C}\quad,\quad g\to<g(T)x,x>

By the Riesz theorem, this functional must be the integration with respect to a certain measure $\mu$ on the space $\sigma(T)$ . Thus, we have a formula as follows:

<g(T)x,x>=\int_{\sigma(T)}g(z)d\mu(z)

Now given an arbitrary Borel function $f\in L^{\infty}(\sigma(T))$ , as in the statement, we can define a number $<f(T)x,x>\in\mathbb{C}$ , by using exactly the same formula, namely:

<f(T)x,x>=\int_{\sigma(T)}f(z)d\mu(z)

Thus, we have managed to define numbers $<f(T)x,x>\in\mathbb{C}$ , for all vectors $x\in H$ , and in addition we can recover these numbers as follows, with $g_{n}\in C(\sigma(T))$ :

<f(T)x,x>=\lim_{g_{n}\to f}<g_{n}(T)x,x>

(2) In order to define now numbers $<f(T)x,y>\in\mathbb{C}$ , for all vectors $x,y\in H$ , we can use a polarization trick. Indeed, for any operator $S\in B(H)$ we have:

<S(x+y),x+y>=<Sx,x>+<Sy,y>+<Sx,y>+<Sy,x>

By replacing $y\to iy$ , we have as well the following formula:

<S(x+iy),x+iy>=<Sx,x>+<Sy,y>-i<Sx,y>+i<Sy,x>

By multiplying this latter formula by $i$ , we obtain the following formula:

i<S(x+iy),x+iy>=i<Sx,x>+i<Sy,y>+<Sx,y>-<Sy,x>

Now by summing this latter formula with the first one, we obtain:

	$\displaystyle<S(x+y),x+y>+i<S(x+iy),x+iy>$	$\displaystyle=$	$\displaystyle(1+i)[<Sx,x>+<Sy,y>]$
		$\displaystyle+$	$\displaystyle 2<Sx,y>$

(3) But with this, we can now finish. Indeed, by combining (1,2), given a Borel function $f\in L^{\infty}(\sigma(T))$ , we can define numbers $<f(T)x,y>\in\mathbb{C}$ for any $x,y\in H$ , and it is routine to check, by using approximation by continuous functions $g_{n}\to f$ as in (1), that we obtain in this way an operator $f(T)\in B(H)$ , having all the desired properties. ∎

The same comments as before apply. Theorem 3.22 is not exactly in final form, because it misses an important point, namely that our correspondence maps:

\bar{z}\to T^{*}

However, this is something non-trivial, and we will be back to this later. Observe however that Theorem 3.22 is fully powerful for the self-adjoint operators, $T=T^{*}$ , where the spectrum is real, and so where $z=\bar{z}$ on the spectrum. We will be back to this.

As another comment, the above result and its proof provide us with more than a Borel functional calculus, because what we got is a certain measure on the spectrum $\sigma(T)$ , along with a functional calculus for the $L^{\infty}$ functions with respect to this measure. We will be back to this later, and for the moment we will only need Theorem 3.22 as formulated, with $L^{\infty}(\sigma(T))$ standing, a bit abusively, for the Borel functions on $\sigma(T)$ .

3d. Diagonalization

We can now diagonalize the normal operators. We will do this in 3 steps, first for the self-adjoint operators, then for the families of commuting self-adjoint operators, and finally for the general normal operators, by using a trick of the following type:

T=Re(T)+iIm(T)

The diagonalization in infinite dimensions is more tricky than in finite dimensions, and instead of writing a formula of type $T=UDU^{*}$ , with $U,D\in B(H)$ being respectively unitary and diagonal, we will express our operator as $T=U^{*}MU$ , with $U:H\to K$ being a certain unitary, and with $M\in B(K)$ being a certain diagonal operator.

This is indeed how the spectral theorem is best formulated, in view of applications. In practice, the explicit construction of $U,M$ , which will be actually rather part of the proof, is also needed. For the self-adjoint operators, the statement and proof are as follows:

Theorem 3.23.

Any self-adjoint operator $T\in B(H)$ can be diagonalized,

T=U^{*}M_{f}U

with $U:H\to L^{2}(X)$ being a unitary operator from $H$ to a certain $L^{2}$ space associated to $T$ , with $f:X\to\mathbb{R}$ being a certain function, once again associated to $T$ , and with

M_{f}(g)=fg

being the usual multiplication operator by $f$ , on the Hilbert space $L^{2}(X)$ .

Proof.

The construction of $U,f$ can be done in several steps, as follows:

(1) We first prove the result in the special case where our operator $T$ has a cyclic vector $x\in H$ , with this meaning that the following holds:

\overline{span\left(T^{k}x\Big{|}n\in\mathbb{N}\right)}=H

For this purpose, let us go back to the proof of Theorem 3.22. We will use the following formula from there, with $\mu$ being the measure on $X=\sigma(T)$ associated to $x$ :

<g(T)x,x>=\int_{\sigma(T)}g(z)d\mu(z)

Our claim is that we can define a unitary $U:H\to L^{2}(X)$ , first on the dense part spanned by the vectors $T^{k}x$ , by the following formula, and then by continuity:

U[g(T)x]=g

Indeed, the following computation shows that $U$ is well-defined, and isometric:

$\displaystyle\|\|g(T)x\|\|^{2}$	$\displaystyle=$	$\displaystyle<g(T)x,g(T)x>$
	$\displaystyle=$	$\displaystyle<g(T)^{*}g(T)x,x>$
	$\displaystyle=$	$\displaystyle<\|g\|^{2}(T)x,x>$
	$\displaystyle=$	$\displaystyle\int_{\sigma(T)}\|g(z)\|^{2}d\mu(z)$
	$\displaystyle=$	$\displaystyle\|\|g\|\|_{2}^{2}$

We can then extend $U$ by continuity into a unitary $U:H\to L^{2}(X)$ , as claimed. Now observe that we have the following formula:

$\displaystyle UTU^{*}g$	$\displaystyle=$	$\displaystyle U[Tg(T)x]$
	$\displaystyle=$	$\displaystyle U[(zg)(T)x]$
	$\displaystyle=$	$\displaystyle zg$

Thus our result is proved in the present case, with $U$ as above, and with $f(z)=z$ .

(2) We discuss now the general case. Our first claim is that $H$ has a decomposition as follows, with each $H_{i}$ being invariant under $T$ , and admitting a cyclic vector $x_{i}$ :

H=\bigoplus_{i}H_{i}

Indeed, this is something elementary, the construction being by recurrence in finite dimensions, in the obvious way, and by using the Zorn lemma in general. Now with this decomposition in hand, we can make a direct sum of the diagonalizations obtained in (1), for each of the restrictions $T_{|H_{i}}$ , and we obtain the formula in the statement. ∎

We have the following technical generalization of the above result:

Theorem 3.24.

Any family of commuting self-adjoint operators $T_{i}\in B(H)$ can be jointly diagonalized,

T_{i}=U^{*}M_{f_{i}}U

with $U:H\to L^{2}(X)$ being a unitary operator from $H$ to a certain $L^{2}$ space associated to $\{T_{i}\}$ , with $f_{i}:X\to\mathbb{R}$ being certain functions, once again associated to $T_{i}$ , and with

M_{f_{i}}(g)=f_{i}g

being the usual multiplication operator by $f_{i}$ , on the Hilbert space $L^{2}(X)$ .

Proof.

This is similar to the proof of Theorem 3.23, by suitably modifying the measurable calculus formula, and the measure $\mu$ itself, as to have this formula working for all the operators $T_{i}$ . With this modification done, everything extends. ∎

In order to discuss now the case of the arbitrary normal operators, we will need:

Proposition 3.25.

Any operator $T\in B(H)$ can be written as

T=Re(T)+iIm(T)

with $Re(T),Im(T)\in B(H)$ being self-adjoint, and this decomposition is unique.

Proof.

This is something elementary, the idea being as follows:

(1) As a first observation, in the case $H=\mathbb{C}$ our operators are usual complex numbers, and the formula in the statement corresponds to the following basic fact:

z=Re(z)+iIm(z)

(2) In general now, we can use the same formulae for the real and imaginary part as in the complex number case, the decomposition formula being as follows:

T=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

To be more precise, both the operators on the right are self-adjoint, and the summing formula holds indeed, and so we have our decomposition result, as desired.

(3) Regarding now the uniqueness, by linearity it is enough to show that $R+iS=0$ with $R,S$ both self-adjoint implies $R=S=0$ . But this follows by applying the adjoint to $R+iS=0$ , which gives $R-iS=0$ , and so $R=S=0$ , as desired. ∎

As a comment here, the above result is just the “tip of the iceberg”, in what regards decomposition results for the operators $T\in B(H)$ , in analogy with decomposition results for the complex numbers $z\in\mathbb{C}$ . As a sample result here, improving Proposition 3.25, we can write any operator $T\in B(H)$ as a linear combination of 4 positive operators, by writing both $Re(T),Im(T)$ as differences of positive operators. More on this later.

Good news, after all these preliminaries, that you enjoyed I hope, as much as I did, we can eventually discuss the case of arbitrary normal operators. We have here the following result, generalizing what we know from chapter 1 about the normal matrices:

Theorem 3.26.

Any normal operator $T\in B(H)$ can be diagonalized,

T=U^{*}M_{f}U

with $U:H\to L^{2}(X)$ being a unitary operator from $H$ to a certain $L^{2}$ space associated to $T$ , with $f:X\to\mathbb{C}$ being a certain function, once again associated to $T$ , and with

M_{f}(g)=fg

being the usual multiplication operator by $f$ , on the Hilbert space $L^{2}(X)$ .

Proof.

This is our main diagonalization theorem, the idea being as follows:

(1) Consider the decomposition of $T$ into its real and imaginary parts, as constructed in the proof of Proposition 3.25, namely:

T=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

We know that the real and imaginary parts are self-adjoint operators. Now since $T$ was assumed to be normal, $TT^{*}=T^{*}T$ , these real and imaginary parts commute:

\left[\frac{T+T^{*}}{2}\,,\,\frac{T-T^{*}}{2i}\right]=0

Thus Theorem 3.24 applies to these real and imaginary parts, and gives the result.

(2) Alternatively, we can use methods similar to those that we used in chapter 1, in order to deal with the usual normal matrices, involving the special relation between $T$ and the operator $TT^{*}$ , which is self-adjoint. We will leave this as an instructive exercise. ∎

This was for our series of diagonalization theorems. There is of course one more result here, regarding the families of commuting normal operators, as follows:

Theorem 3.27.

Any family of commuting normal operators $T_{i}\in B(H)$ can be jointly diagonalized,

T_{i}=U^{*}M_{f_{i}}U

with $U:H\to L^{2}(X)$ being a unitary operator from $H$ to a certain $L^{2}$ space associated to $\{T_{i}\}$ , with $f_{i}:X\to\mathbb{C}$ being certain functions, once again associated to $T_{i}$ , and with

M_{f_{i}}(g)=f_{i}g

being the usual multiplication operator by $f_{i}$ , on the Hilbert space $L^{2}(X)$ .

Proof.

This is similar to the proof of Theorem 3.24 and Theorem 3.26, by combining the arguments there. To be more precise, this follows as Theorem 3.24, by using the decomposition trick from the proof of Theorem 3.26. ∎

With the above diagonalization results in hand, we can now “fix” the continuous and measurable functional calculus theorems, with a key complement, as follows:

Theorem 3.28.

Given a normal operator $T\in B(H)$ , the following hold, for both the functional calculus and the measurable calculus morphisms:

(1)

These morphisms are $*$ -morphisms.
(2)

The function $\bar{z}$ gets mapped to $T^{*}$ .
(3)

The functions $Re(z),Im(z)$ get mapped to $Re(T),Im(T)$ .
(4)

The function $|z|^{2}$ gets mapped to $TT^{*}=T^{*}T$ .
(5)

If $f$ is real, then $f(T)$ is self-adjoint.

Proof.

These assertions are more or less equivalent, with (1) being the main one, which obviously implies everything else. But this assertion (1) follows from the diagonalization result for normal operators, from Theorem 3.26. ∎

This was for the spectral theory of arbitrary and normal operators, or at least for the basics of this theory. As a conclusion here, our main results are as follows:

(1)

Regarding the arbitrary operators, the main results here, or rather the most advanced results, are the holomorphic calculus formula from Theorem 3.15, and the spectral radius estimate from Theorem 3.17.
(2)

For the self-adjoint operators, the main results are the spectral radius formula from Theorem 3.18, the measurable calculus formula from Theorem 3.22, and the diagonalization result from Theorem 3.23.
(3)

For general normal operators, the main results are the spectral radius formula from Theorem 3.18, the measurable calculus formula from Theorem 3.22, complemented by Theorem 3.28, and the diagonalization result in Theorem 3.26.

There are of course many other things that can be said about the spectral theory of the bounded operators $T\in B(H)$ , and on that of the unbounded operators too. As a complement, we recommend any good operator theory book, with the comment however that there is a bewildering choice here, depending on taste, and on what exactly you want to do with your operators $T\in B(H)$ . In what concerns us, who are rather into general quantum mechanics, but with our operators being bounded, good choices are the functional analysis book of Lax [lax], or the operator algebra book of Blackadar [bla].

3e. Exercises

The main theoretical notion introduced in this chapter was that of the spectrum of an operator, and as a first exercise here, we have:

Exercise 3.29.

Prove that for the usual matrices $A,B\in M_{N}(\mathbb{C})$ we have

\sigma^{+}(AB)=\sigma^{+}(BA)

where $\sigma^{+}$ denotes the set of eigenvalues, taken with multiplicities.

As a remark, we have seen in the above that $\sigma(AB)=\sigma(BA)$ holds outside $\{0\}$ , and the equality on $\{0\}$ holds as well, because $AB$ is invertible if and only if $BA$ is invertible. However, in what regards the eigenvalues taken with multiplicities, things are more tricky here, and the answer should be somewhere inside your linear algebra knowledge.

Exercise 3.30.

Clarify, with examples and counterexamples, the relation between the eigenvalues of an operator $T\in B(H)$ , and its spectrum $\sigma(T)\subset\mathbb{C}$ .

Here, as usual, the counterexamples could only come from the shift operator $S$ , on the space $H=l^{2}(\mathbb{N})$ . As a bonus exercise here, try computing the spectrum of $S$ .

Exercise 3.31.

Draw the picture of the following function, and of its inverse,

f(z)=\frac{z+ir}{z-ir}

with $r\in\mathbb{R}$ , and prove that for $r>>0$ and $T=T^{*}$ , the element $f(T)$ is well-defined.

This is something that we used in the above, when computing spectra of self-adjoints and unitaries, and the problem is that of working out all the details.

Exercise 3.32.

Comment on the spectral radius theorem, stating that for a normal operator, $TT^{*}=T^{*}T$ , the spectral radius is equal to the norm,

\rho(T)=||T||

with examples and counterexamples, and simpler proofs of well, in various particular cases of interest, such as the finite dimensional one.

This is of course something a bit philosophical, but the spectral radius theorem being our key technical result so far, some further thinking on it is definitely a good thing.

Exercise 3.33.

Develop a theory of $*$ -algebras $A$ for which the quantity

||a||=\sqrt{\sup\left\{\lambda\in\mathbb{C}\Big{|}aa^{*}-\lambda\notin A^{-1}\right\}}

defines a norm, for the elements $a\in A$ .

As pointed out in the above, the spectral radius formula shows that for $A=B(H)$ the norm is given by the above formula, and so there should be such a theory of “good” $*$ -algebras, with $A=B(H)$ as a main example. However, this is tricky.

Exercise 3.34.

Find and write down a proof for the spectral theorem for normal operators in the spirit of the proof for normal matrices from chapter 1, and vice versa.

To be more precise, the problem is that the proof of the spectral theorem for the usual matrices, from chapter 1, was using a certain kind of trick, while the proof of the spectral theorem for the arbitrary operators, given in this chapter, was using some other kind of trick. Thus, for fully understanding all this, working out more proofs, both for the usual matrices and for the arbitrary operators, is a useful thing.

Exercise 3.35.

Find and write down an enhancement of the proof given above for the spectral theorem, as for $\bar{z}\to T^{*}$ to appear way before the end of the proof.

This is something a bit philosophical, and check here first the various comments made above, and maybe work out this as well in parallel with the previous exercise.

Chapter 4 Compact operators

4a. Polar decomposition

We have seen so far the basic theory of bounded operators, in the arbitrary, normal and self-adjoint cases, and in a few other cases of interest. In this chapter we discuss a number of more specialized questions, for the most dealing with the compact operators, which are particularly close, conceptually speaking, to the usual complex matrices.

We have in fact considerably many interesting things that we can talk about, in this final chapter on operator theory, and our choices will be as follows:

(1) Before anything, at the general level, we would like to understand the matrix and operator theory analogues of the various things that we know about the complex numbers $z\in M_{1}(\mathbb{C})$ , such as $z\bar{z}=|z|^{2}$ , or $z=re^{it}$ and so on. We will discuss this first.

(2) Then, motivated by advanced linear algebra, we will go on a lengthy discussion on the algebra of compact operators $K(H)\subset B(H)$ , which for many advanced operator theory purposes is the correct generalization of the matrix algebra $M_{N}(\mathbb{C})$ .

(3) Our discussion on the compact operators will feature as well some more specialized types of operators, $F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H)$ , with $F(H)$ being the finite rank ones, $B_{1}(H)$ being the trace class ones, and $B_{2}(H)$ being the Hilbert-Schmidt ones.

And that is pretty much it, all basic things, that must be known. Of course this will be just the tip of the iceberg, and more of an introduction to modern operator theory.

Getting started now, we would first like to systematically develop the theory of positive operators, and then establish polar decomposition results for the operators $T\in B(H)$ . We first have the following result, improving our knowledge from chapter 2:

Theorem 4.1.

For an operator $T\in B(H)$ , the following are equivalent:

(1)

$<Tx,x>\geq 0$ , for any $x\in H$ .
(2)

$T$ is normal, and $\sigma(T)\subset[0,\infty)$ .
(3)

$T=S^{2}$ , for some $S\in B(H)$ satisfying $S=S^{*}$ .
(4)

$T=R^{*}R$ , for some $R\in B(H)$ .

If these conditions are satisfied, we call $T$ positive, and write $T\geq 0$ .

Proof.

We have already seen some implications in chapter 2, but the best is to forget the few partial results that we know, and prove everything, as follows:

$(1)\implies(2)$ Assuming $<Tx,x>\geq 0$ , with $S=T-T^{*}$ we have:

$\displaystyle<Sx,x>$	$\displaystyle=$	$\displaystyle<Tx,x>-<T^{*}x,x>$
	$\displaystyle=$	$\displaystyle<Tx,x>-<x,Tx>$
	$\displaystyle=$	$\displaystyle<Tx,x>-\overline{<Tx,x>}$
	$\displaystyle=$	$\displaystyle 0$

The next step is to use a polarization trick, as follows:

$\displaystyle<Sx,y>$	$\displaystyle=$	$\displaystyle<S(x+y),x+y>-<Sx,x>-<Sy,y>-<Sy,x>$
	$\displaystyle=$	$\displaystyle-<Sy,x>$
	$\displaystyle=$	$\displaystyle<y,Sx>$
	$\displaystyle=$	$\displaystyle\overline{<Sx,y>}$

Thus we must have $<Sx,y>\in\mathbb{R}$ , and with $y\to iy$ we obtain $<Sx,y>\in i\mathbb{R}$ too, and so $<Sx,y>=0$ . Thus $S=0$ , which gives $T=T^{*}$ . Now since $T$ is self-adjoint, it is normal as claimed. Moreover, by self-adjointness, we have:

\sigma(T)\subset\mathbb{R}

In order to prove now that we have indeed $\sigma(T)\subset[0,\infty)$ , as claimed, we must invert $T+\lambda$ , for any $\lambda>0$ . For this purpose, observe that we have:

$\displaystyle<(T+\lambda)x,x>$	$\displaystyle=$	$\displaystyle<Tx,x>+<\lambda x,x>$
	$\displaystyle\geq$	$\displaystyle<\lambda x,x>$
	$\displaystyle=$	$\displaystyle\lambda\|\|x\|\|^{2}$

But this shows that $T+\lambda$ is injective. In order to prove now the surjectivity, and the boundedness of the inverse, observe first that we have:

$\displaystyle Im(T+\lambda)^{\perp}$	$\displaystyle=$	$\displaystyle\ker(T+\lambda)^{*}$
	$\displaystyle=$	$\displaystyle\ker(T+\lambda)$
	$\displaystyle=$	$\displaystyle\{0\}$

Thus $Im(T+\lambda)$ is dense. On the other hand, observe that we have:

$\displaystyle\|\|(T+\lambda)x\|\|^{2}$	$\displaystyle=$	$\displaystyle<Tx+\lambda x,Tx+\lambda x>$
	$\displaystyle=$	$\displaystyle\|\|Tx\|\|^{2}+2\lambda<Tx,x>+\lambda^{2}\|\|x\|\|^{2}$
	$\displaystyle\geq$	$\displaystyle\lambda^{2}\|\|x\|\|^{2}$

Thus for any vector in the image $y\in Im(T+\lambda)$ we have:

||y||\geq\lambda\big{|}\big{|}(T+\lambda)^{-1}y\big{|}\big{|}

As a conclusion to what we have so far, $T+\lambda$ is bijective and invertible as a bounded operator from $H$ onto its image, with the following norm bound:

||(T+\lambda)^{-1}||\leq\lambda^{-1}

But this shows that $Im(T+\lambda)$ is complete, hence closed, and since we already knew that $Im(T+\lambda)$ is dense, our operator $T+\lambda$ is surjective, and we are done.

$(2)\implies(3)$ Since $T$ is normal, and with spectrum contained in $[0,\infty)$ , we can use the continuous functional calculus formula for the normal operators from chapter 3, with the function $f(x)=\sqrt{x}$ , as to construct a square root $S=\sqrt{T}$ .

$(3)\implies(4)$ This is trivial, because we can set $R=S$ .

$(4)\implies(1)$ This is clear, because we have the following computation:

<R^{*}Rx,x>=<Rx,Rx>=||Rx||^{2}

Thus, we have the equivalences in the statement. ∎

In analogy with what happens in finite dimensions, where among the positive matrices $A\geq 0$ we have the strictly positive ones, $A>0$ , given by the fact that the eigenvalues are strictly positive, we have as well a “strict” version of the above result, as follows:

Theorem 4.2.

For an operator $T\in B(H)$ , the following are equivalent:

(1)

$T$ is positive and invertible.
(2)

$T$ is normal, and $\sigma(T)\subset(0,\infty)$ .
(3)

$T=S^{2}$ , for some $S\in B(H)$ invertible, satisfying $S=S^{*}$ .
(4)

$T=R^{*}R$ , for some $R\in B(H)$ invertible.

If these conditions are satisfied, we call $T$ strictly positive, and write $T>0$ .

Proof.

Our claim is that the above conditions (1-4) are precisely the conditions (1-4) in Theorem 4.1, with the assumption “ $T$ is invertible” added. Indeed:

(1) This is clear by definition.

(2) In the context of Theorem 4.1 (2), namely when $T$ is normal, and $\sigma(T)\subset[0,\infty)$ , the invertibility of $T$ , which means $0\notin\sigma(T)$ , gives $\sigma(T)\subset(0,\infty)$ , as desired.

(3) In the context of Theorem 4.1 (3), namely when $T=S^{2}$ , with $S=S^{*}$ , by using the basic properties of the functional calculus for normal operators, the invertibility of $T$ is equivalent to the invertibility of its square root $S=\sqrt{T}$ , as desired.

(4) In the context of Theorem 4.1 (4), namely when $T=RR^{*}$ , the invertibility of $T$ is equivalent to the invertibility of $R$ . This can be either checked directly, or deduced via the equivalence $(3)\iff(4)$ from Theorem 4.1, by using the above argument (3). ∎

As a subtlety now, we have the following complement to the above result:

Proposition 4.3.

For a strictly positive operator, $T>0$ , we have

<Tx,x>>0\quad,\quad\forall x\neq 0

but the converse of this fact is not true, unless we are in finite dimensions.

Proof.

We have several things to be proved, the idea being as follows:

(1) Regarding the main assertion, the inequality can be deduced as follows, by using the fact that the operator $S=\sqrt{T}$ is invertible, and in particular injective:

$\displaystyle<Tx,x>$	$\displaystyle=$	$\displaystyle<S^{2}x,x>$
	$\displaystyle=$	$\displaystyle<Sx,S^{*}x>$
	$\displaystyle=$	$\displaystyle<Sx,Sx>$
	$\displaystyle=$	$\displaystyle\|\|Sx\|\|^{2}$
	$\displaystyle>$	$\displaystyle 0$

(2) In finite dimensions, assuming $<Tx,x>>0$ for any $x\neq 0$ , we know from Theorem 4.1 that we have $T\geq 0$ . Thus we have $\sigma(T)\subset[0,\infty)$ , and assuming by contradiction $0\in\sigma(T)$ , we obtain that $T$ has $\lambda=0$ as eigenvalue, and the corresponding eigenvector $x\neq 0$ has the property $<Tx,x>=0$ , contradiction. Thus $T>0$ , as claimed.

(3) Regarding now the counterexample, consider the following operator on $l^{2}(\mathbb{N})$ :

T=\begin{pmatrix}1\\ &\frac{1}{2}\\ &&\frac{1}{3}\\ &&&\ddots\end{pmatrix}

This operator $T$ is well-defined and bounded, and we have $<Tx,x>>0$ for any $x\neq 0$ . However $T$ is not invertible, and so the converse does not hold, as stated. ∎

With this done, let us discuss now some decomposition results for the bounded operators $T\in B(H)$ . We know that any $z\in\mathbb{C}$ can be written as follows, with $a,b\in\mathbb{R}$ :

z=a+ib

Also, we know that both the real and imaginary parts $a,b\in\mathbb{R}$ , and more generally any real number $c\in\mathbb{R}$ , can be written as follows, with $r,s\geq 0$ :

c=r-s

Here are the operator theoretic generalizations of these results:

Proposition 4.4.

Given an operator $T\in B(H)$ , the following happen:

(1)

We can write $T=A+iB$ , with $A,B\in B(H)$ being self-adjoint.
(2)

When $T=T^{*}$ , we can write $T=R-S$ , with $R,S\in B(H)$ being positive.
(3)

Thus, we can write any $T$ as a linear combination of $4$ positive elements.

Proof.

All this follows from basic spectral theory, as follows:

(1) This is something that we have already met in chapter 3, when proving the spectral theorem in its general form, the decomposition formula being as follows:

T=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

(2) This follows from the measurable functional calculus. Indeed, assuming $T=T^{*}$ we have $\sigma(T)\subset\mathbb{R}$ , so we can use the following decomposition formula on $\mathbb{R}$ :

1=\chi_{[0,\infty)}+\chi_{(-\infty,0)}

To be more precise, let us multiply by $z$ , and rewrite this formula as follows:

z=\chi_{[0,\infty)}z-\chi_{(-\infty,0)}(-z)

Now by applying these measurable functions to $T$ , we obtain as formula as follows, with both the operators $T_{+},T_{-}\in B(H)$ being positive, as desired:

T=T_{+}-T_{-}

(3) This follows indeed by combining the results in (1) and (2) above. ∎

Going ahead with our decomposition results, another basic thing that we know about complex numbers is that any $z\in\mathbb{C}$ appears as a real multiple of a unitary:

z=re^{it}

Finding the correct operator theoretic analogue of this is quite tricky, and this even for the usual matrices $A\in M_{N}(\mathbb{C})$ . As a basic result here, we have:

Proposition 4.5.

Given an operator $T\in B(H)$ , the following happen:

(1)

When $T=T^{*}$ and $||T||\leq 1$ , we can write $T$ as an average of $2$ unitaries:

$T=\frac{U+V}{2}$
(2)

In the general $T=T^{*}$ case, we can write $T$ as a rescaled sum of unitaries:

$T=\lambda(U+V)$
(3)

Thus, in general, we can write $T$ as a rescaled sum of $4$ unitaries.

Proof.

This follows from the results that we have, as follows:

(1) Assuming $T=T^{*}$ and $||T||\leq 1$ we have $1-T^{2}\geq 0$ , and the decomposition that we are looking for is as follows, with both the components being unitaries:

T=\frac{T+i\sqrt{1-T^{2}}}{2}+\frac{T-i\sqrt{1-T^{2}}}{2}

To be more precise, the square root can be extracted as in Theorem 4.1 (3), and the check of the unitarity of the components goes as follows:

	$\displaystyle(T+i\sqrt{1-T^{2}})(T-i\sqrt{1-T^{2}})$	$\displaystyle=$	$\displaystyle T^{2}+(1-T^{2})$
		$\displaystyle=$	$\displaystyle 1$

(2) This simply follows by applying (1) to the operator $T/||T||$ .

(3) Assuming first $||T||\leq 1$ , we know from Proposition 4.4 (1) that we can write $T=A+iB$ , with $A,B$ being self-adjoint, and satisfying $||A||,||B||\leq 1$ . Now by applying (1) to both $A$ and $B$ , we obtain a decomposition of $T$ as follows:

T=\frac{U+V+W+X}{2}

In general, we can apply this to the operator $T/||T||$ , and we obtain the result. ∎

All this gets us into the multiplicative theory of the complex numbers, that we will attempt to generalize now. As a first construction, that we would like to generalize to the bounded operator setting, we have the construction of the modulus, as follows:

|z|=\sqrt{z\bar{z}}

The point now is that we can indeed generalize this construction, as follows:

Proposition 4.6.

Given an operator $T\in B(H)$ , we can construct a positive operator $|T|\in B(H)$ as follows, by using the fact that $T^{*}T$ is positive:

|T|=\sqrt{T^{*}T}

The square of this operator is then $|T|^{2}=T^{*}T$ . In the case $H=\mathbb{C}$ , we obtain in this way the usual absolute value of the complex numbers:

|z|=\sqrt{z\bar{z}}

More generally, in the case where $H=\mathbb{C}^{N}$ is finite dimensional, we obtain in this way the usual moduli of the complex matrices $A\in M_{N}(\mathbb{C})$ .

Proof.

We have several things to be proved, the idea being as follows:

(1) The first assertion follows from Theorem 4.1. Indeed, according to (4) there the operator $T^{*}T$ is indeed positive, and then according to (2) there we can extract the square root of this latter positive operator, by applying to it the function $\sqrt{z}$ .

(2) By functional calculus we have then $|T|^{2}=T^{*}T$ , as desired.

(3) In the case $H=\mathbb{C}$ , we obtain indeed the absolute value of complex numbers.

(4) In the case where the space $H$ is finite dimensional, $H=\mathbb{C}^{N}$ , we obtain indeed the usual moduli of the complex matrices $A\in M_{N}(\mathbb{C})$ . ∎

As a comment here, it is possible to talk as well about $\sqrt{TT^{*}}$ , which is in general different from $\sqrt{T^{*}T}$ . Note that when $T$ is normal, no issue, because we have:

TT^{*}=T^{*}T\implies\sqrt{TT^{*}}=\sqrt{T^{*}T}

Regarding now the polar decomposition formula, let us start with a weak version of this statement, regarding the invertible operators, as follows:

Theorem 4.7.

We have the polar decomposition formula

T=U\sqrt{T^{*}T}

with $U$ being a unitary, for any $T\in B(H)$ invertible.

Proof.

According to our definition of the modulus, $|T|=\sqrt{T^{*}T}$ , we have:

$\displaystyle<\|T\|x,\|T\|y>$	$\displaystyle=$	$\displaystyle<x,\|T\|^{2}y>$
	$\displaystyle=$	$\displaystyle<x,T^{*}Ty>$
	$\displaystyle=$	$\displaystyle<Tx,Ty>$

Thus we can define a unitary operator $U\in B(H)$ by the following formula:

U(|T|x)=Tx

But this formula shows that we have $T=U|T|$ , as desired. ∎

Observe that we have uniqueness in the above result, in what regards the choice of the unitary $U\in B(H)$ , due to the fact that we can write this unitary as follows:

U=T(\sqrt{T^{*}T})^{-1}

More generally now, we have the following result:

Theorem 4.8.

We have the polar decomposition formula

T=U\sqrt{T^{*}T}

with $U$ being a partial isometry, for any $T\in B(H)$ .

Proof.

As before, we have the following equality, for any two vectors $x,y\in H$ :

<|T|x,|T|y>=<Tx,Ty>

We conclude that the following linear application is well-defined, and isometric:

U:Im|T|\to Im(T)\quad,\quad|T|x\to Tx

Now by continuity we can extend this isometry $U$ into an isometry between certain Hilbert subspaces of $H$ , as follows:

U:\overline{Im|T|}\to\overline{Im(T)}\quad,\quad|T|x\to Tx

Moreover, we can further extend $U$ into a partial isometry $U:H\to H$ , by setting $Ux=0$ , for any $x\in\overline{Im|T|}^{\perp}$ , and with this convention, the result follows. ∎

4b. Compact operators

We have seen so far the basic theory of the bounded operators, in the arbitrary, normal and self-adjoint cases, and in a few other cases of interest. We will keep building on this, with a number of more specialized results, regarding the finite rank operators and compact operators, and other special classes of related operators, namely the trace class operators, and the Hilbert-Schmidt operators. Let us start with a basic definition, as follows:

Definition 4.9.

An operator $T\in B(H)$ is said to be of finite rank if its image

Im(T)\subset H

is finite dimensional. The set of such operators is denoted $F(H)$ .

There are many interesting examples of finite rank operators, the most basic ones being the finite rank projections, on the finite dimensional subspaces $K\subset H$ . Observe also that in the case where $H$ is finite dimensional, any operator $T\in B(H)$ is automatically of finite rank. In general, this is of course wrong, but we have the following result:

Proposition 4.10.

The set of finite rank operators

F(H)\subset B(H)

is a two-sided $*$ -ideal.

Proof.

We have several assertions to be proved, the idea being as follows:

(1) It is clear from definitions that $F(H)$ is indeed a vector space, with this due to the following formulae, valid for any $S,T\in B(H)$ , which are both clear:

\dim(Im(S+T))\leq\dim(Im(S))+\dim(Im(T))

\dim(Im(\lambda T))=\dim(Im(T))

(2) Let us prove now that $F(H)$ is stable under $*$ . Given $T\in F(H)$ , we can regard it as an invertible operator between finite dimensional Hilbert spaces, as follows:

T:(\ker T)^{\perp}\to Im(T)

We conclude from this that we have the following dimension equality:

\dim((\ker T)^{\perp})=\dim(Im(T))

Our claim now, in relation with our problem, is that we have equalities as follows:

$\displaystyle\dim(Im(T^{*}))$	$\displaystyle=$	$\displaystyle\dim(\overline{Im(T^{*})})$
	$\displaystyle=$	$\displaystyle\dim((\ker T)^{\perp})$
	$\displaystyle=$	$\displaystyle\dim(Im(T))$

Indeed, the third equality is the one above, and the second equality is something that we know too, from chapter 2. Now by combining these two equalities we deduce that $Im(T^{*})$ is finite dimensional, and so the first equality holds as well. Thus, our equalities are proved, and this shows that we have $T^{*}\in F(H)$ , as desired.

(3) Finally, regarding the ideal property, this follows from the following two formulae, valid for any $S,T\in B(H)$ , which are once again clear from definitions:

\dim(Im(ST))\leq\dim(Im(T))

\dim(Im(TS))\leq\dim(Im(T))

Thus, we are led to the conclusion in the statement. ∎

Let us discuss now the compact operators, which will be the main topic of discussion, for the present chapter. These are best introduced as follows:

Definition 4.11.

An operator $T\in B(H)$ is said to be compact if the closed set

\overline{T(B_{1})}\subset H

is compact, where $B_{1}\subset H$ is the unit ball. The set of such operators is denoted $K(H)$ .

Equivalently, an operator $T\in B(H)$ is compact when for any sequence $\{x_{n}\}\subset B_{1}$ , or more generally for any bounded sequence $\{x_{n}\}\subset H$ , the sequence $\{T(x_{n})\}$ has a convergence subsequence. We will see later some further criteria of compactness.

In finite dimensions any operator is compact. In general, as a first observation, any finite rank operator is compact. We have in fact the following result:

Proposition 4.12.

Any finite rank operator is compact,

F(H)\subset K(H)

and the finite rank operators are dense inside the compact operators.

Proof.

The first assertion is clear, because if $Im(T)$ is finite dimensional, then the following subset is closed and bounded, and so it is compact:

\overline{T(B_{1})}\subset Im(T)

Regarding the second assertion, let us pick a compact operator $T\in K(H)$ , and a number $\varepsilon>0$ . By compactness of $T$ we can find a finite set $S\subset B_{1}$ such that:

T(B_{1})\subset\bigcup_{x\in S}B_{\varepsilon}(Tx)

Consider now the orthogonal projection $P$ onto the following finite dimensional space:

E=span\left(Tx\Big{|}x\in S\right)

Since the set $S$ is finite, this space $E$ is finite dimensional, and so $P$ is of finite rank, $P\in F(H)$ . Now observe that for any norm one $y\in H$ and any $x\in S$ we have:

$\displaystyle\|\|Ty-Tx\|\|^{2}$	$\displaystyle=$	$\displaystyle\|\|Ty-PTx\|\|^{2}$
	$\displaystyle=$	$\displaystyle\|\|Ty-PTy+PTy-PTx\|\|^{2}$
	$\displaystyle=$	$\displaystyle\|\|Ty-PTy\|\|^{2}+\|\|PTx-PTy\|\|^{2}$

Now by picking $x\in S$ such that the ball $B_{\varepsilon}(Tx)$ covers the point $Ty$ , we conclude from this that we have the following estimate:

||Ty-PTy||\leq||Ty-Tx||\leq\varepsilon

Thus we have $||T-PT||\leq\varepsilon$ , which gives the density result. ∎

Quite remarkably, the set of compact operators is closed, and we have:

Theorem 4.13.

The set of compact operators

K(H)\subset B(H)

is a closed two-sided $*$ -ideal.

Proof.

We have several assertions here, the idea being as follows:

(1) It is clear from definitions that $K(H)$ is indeed a vector space, with this due to the following formulae, valid for any $S,T\in B(H)$ , which are both clear:

(S+T)(B_{1})\subset S(B_{1})+T(B_{1})

(\lambda T)(B_{1})=|\lambda|\cdot T(B_{1})

(2) In order to prove now that $K(H)$ is closed, assume that a sequence $T_{n}\in K(H)$ converges to $T\in B(H)$ . Given $\varepsilon>0$ , let us pick $N\in\mathbb{N}$ such that:

||T-T_{N}||\leq\varepsilon

By compactness of $T_{N}$ we can find a finite set $S\subset B_{1}$ such that:

T_{N}(B_{1})\subset\bigcup_{x\in S}B_{\varepsilon}(T_{N}x)

We conclude that for any $y\in B_{1}$ there exists $x\in S$ such that:

$\displaystyle\|\|Ty-Tx\|\|$	$\displaystyle\leq$	$\displaystyle\|\|Ty-T_{N}y\|\|+\|\|T_{N}y-T_{N}x\|\|+\|\|T_{N}x-Tx\|\|$
	$\displaystyle\leq$	$\displaystyle\varepsilon+\varepsilon+\varepsilon$
	$\displaystyle=$	$\displaystyle 3\varepsilon$

Thus, we have an inclusion as follows, with $S\subset B_{1}$ being finite:

T(B_{1})\subset\bigcup_{x\in S}B_{3\varepsilon}(Tx)

But this shows that our limiting operator $T$ is compact, as desired.

(3) Regarding the fact that $K(H)$ is stable under involution, this follows from Proposition 4.10, Proposition 4.12 and (2). Indeed, by using Proposition 4.12, given $T\in K(H)$ we can write it as a limit of finite rank operators, as follows:

T=\lim_{n\to\infty}T_{n}

Now by applying the adjoint, we obtain that we have as well:

T^{*}=\lim_{n\to\infty}T_{n}^{*}

We know from Proposition 4.10 that the operators $T_{n}^{*}$ are of finite rank, and so compact by Proposition 4.12, and by using (2) we obtain that $T^{*}$ is compact too, as desired.

(4) Finally, regarding the ideal property, this follows from the following two formulae, valid for any $S,T\in B(H)$ , which are once again clear from definitions:

(ST)(B_{1})=S(T(B_{1}))

(TS)(B_{1})\subset||S||\cdot T(B_{1})

Thus, we are led to the conclusion in the statement. ∎

Here is now a second key result regarding the compact operators:

Theorem 4.14.

A bounded operator $T\in B(H)$ is compact precisely when

Te_{n}\to 0

for any orthonormal system $\{e_{n}\}\subset H$ .

Proof.

We have two implications to be proved, the idea being as follows:

“ $\implies$ ” Assume that $T$ is compact. By contradiction, assume $Te_{n}\not\to 0$ . This means that there exists $\varepsilon>0$ and a subsequence satisfying $||Te_{n_{k}}||>\varepsilon$ , and by replacing $\{e_{n}\}$ with this subsequence, we can assume that the following holds, with $\varepsilon>0$ :

||Te_{n}||>\varepsilon

Since $T$ was assumed to be compact, and the sequence $\{e_{n}\}$ is bounded, a certain subsequence $\{Te_{n_{k}}\}$ must converge. Thus, by replacing once again $\{e_{n}\}$ with a subsequence, we can assume that the following holds, with $x\neq 0$ :

Te_{n}\to x

But this is a contradiction, because we obtain in this way:

$\displaystyle<x,x>$	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}<Te_{n},x>$
	$\displaystyle=$	$\displaystyle\lim_{n\to\infty}<e_{n},T^{*}x>$
	$\displaystyle=$	$\displaystyle 0$

Thus our assumption $Te_{n}\not\to 0$ was wrong, and we obtain the result.

“ $\Longleftarrow$ ” Assume $Te_{n}\to 0$ , for any orthonormal system $\{e_{n}\}\subset H$ . In order to prove that $T$ is compact, we use the various results established above, which show that this is the same as proving that $T$ is in the closure of the space of finite rank operators:

T\in\overline{F(H)}

We do this by contradiction. So, assume that the above is wrong, and so that there exists $\varepsilon>0$ such that the following holds:

S\in F(H)\implies||T-S||>\varepsilon

As a first observation, by using $S=0$ we obtain $||T||>\varepsilon$ . Thus, we can find a norm one vector $e_{1}\in H$ such that the following holds:

||Te_{1}||>\varepsilon

Our claim, which will bring the desired contradiction, is that we can construct by recurrence vectors $e_{1},\ldots,e_{n}$ such that the following holds, for any $i$ :

||Te_{i}||>\varepsilon

Indeed, assume that we have constructed such vectors $e_{1},\ldots,e_{n}$ . Let $E\subset H$ be the linear space spanned by these vectors, and let us set:

P=Proj(E)

Since the operator $TP$ has finite rank, our assumption above shows that we have:

||T-TP||>\varepsilon

Thus, we can find a vector $x\in H$ such that the following holds:

||(T-TP)x||>\varepsilon

We have then $x\not\in E$ , and so we can consider the following nonzero vector:

y=(1-P)x

With this nonzero vector $y$ constructed, in this way, now let us set:

e_{n+1}=\frac{y}{||y||}

This vector $e_{n+1}$ is then orthogonal to $E$ , has norm one, and satisfies:

||Te_{n+1}||\geq||y||^{-1}\varepsilon\geq\varepsilon

Thus we are done with our construction by recurrence, and this contradicts our assumption that $Te_{n}\to 0$ , for any orthonormal system $\{e_{n}\}\subset H$ , as desired. ∎

Summarizing, we have so far a number of results regarding the compact operators, in analogy with what we know about the usual complex matrices. Let us discuss now the spectral theory of the compact operators. We first have the following result:

Proposition 4.15.

Assuming that $T\in B(H)$ , with $\dim H=\infty$ , is compact and self-adjoint, the following happen:

(1)

The eigenvalues of $T$ form a sequence $\lambda_{n}\to 0$ .
(2)

All eigenvalues $\lambda_{n}\neq 0$ have finite multiplicity.

Proof.

We prove both the assertions at the same time. For this purpose, we fix a number $\varepsilon>0$ , we consider all the eigenvalues satisfying $|\lambda|\geq\varepsilon$ , and for each such eigenvalue we consider the corresponding eigenspace $E_{\lambda}\subset H$ . Let us set:

E=span\left(E_{\lambda}\,\Big{|}\,|\lambda|\geq\varepsilon\right)

Our claim, which will prove both (1) and (2), is that this space $E$ is finite dimensional. In now to prove now this claim, we can proceed as follows:

(1) We know that we have $E\subset Im(T)$ . Our claim is that we have:

\bar{E}\subset Im(T)

Indeed, assume that we have a sequence $g_{n}\in E$ which converges, $g_{n}\to g\in\bar{E}$ . Let us write $g_{n}=Tf_{n}$ , with $f_{n}\in H$ . By definition of $E$ , the following condition is satisfied:

h\in E\implies||Th||\geq\varepsilon||h||

Now since the sequence $\{g_{n}\}$ is Cauchy we obtain from this that the sequence $\{f_{n}\}$ is Cauchy as well, and with $f_{n}\to f$ we have $Tf_{n}\to Tf$ , as desired.

(2) Consider now the projection $P\in B(H)$ onto the closure $\bar{E}$ of the above vector space $E$ . The composition $PT$ is then as follows, surjective on its target:

PT:H\to\bar{E}

On the other hand since $T$ is compact so must be $PT$ , and if follows from this that the space $\bar{E}$ is finite dimensional. Thus $E$ itself must be finite dimensional too, and as explained in the beginning of the proof, this gives (1) and (2), as desired. ∎

In order to construct now eigenvalues, we will need:

Proposition 4.16.

If $T$ is compact and self-adjoint, one of the numbers

||T||\ ,\ -||T||

must be an eigenvalue of $T$ .

Proof.

We know from the spectral theory of the self-adjoint operators that the spectral radius $||T||$ of our operator $T$ is attained, and so one of the numbers $||T||,-||T||$ must be in the spectrum. In order to prove now that one of these numbers must actually appear as an eigenvalue, we must use the compactness of $T$ , as follows:

(1) First, we can assume $||T||=1$ . By functional calculus this implies $||T^{3}||=1$ too, and so we can find a sequence of norm one vectors $x_{n}\in H$ such that:

|<T^{3}x_{n},x_{n}>|\to 1

By using our assumption $T=T^{*}$ , we can rewrite this formula as follows:

|<T^{2}x_{n},Tx_{n}>|\to 1

Now since $T$ is compact, and $\{x_{n}\}$ is bounded, we can assume, up to changing the sequence $\{x_{n}\}$ to one of its subsequences, that the sequence $Tx_{n}$ converges:

Tx_{n}\to y

Thus, the convergence formula found above reformulates as follows, with $y\neq 0$ :

|<Ty,y>|=1

(2) Our claim now, which will finish the proof, is that this latter formula implies $Ty=\pm y$ . Indeed, by using Cauchy-Schwarz and $||T||=1$ , we have:

|<Ty,y>|\leq||Ty||\cdot||y||\leq 1

We know that this must be an equality, so $Ty,y$ must be proportional. But since $T$ is self-adjoint the proportionality factor must be $\pm 1$ , and so we obtain, as claimed:

Ty=\pm y

Thus, we have constructed an eigenvector for $\lambda=\pm 1$ , as desired. ∎

We can further build on the above results in the following way:

Proposition 4.17.

If $T$ is compact and self-adjoint, there is an orthogonal basis of $H$ made of eigenvectors of $T$ .

Proof.

We use Proposition 4.15. According to the results there, we can arrange the nonzero eigenvalues of $T$ , taken with multiplicities, into a sequence $\lambda_{n}\to 0$ . Let $y_{n}\in H$ be the corresponding eigenvectors, and consider the following space:

E=\overline{span(y_{n})}

The result follows then from the following observations:

(1) Since we have $T=T^{*}$ , both $E$ and its orthogonal $E^{\perp}$ are invariant under $T$ .

(2) On the space $E$ , our operator $T$ is by definition diagonal.

(3) On the space $E^{\perp}$ , our claim is that we have $T=0$ . Indeed, assuming that the restriction $S=T_{E^{\perp}}$ is nonzero, we can apply Proposition 4.16 to this restriction, and we obtain an eigenvalue for $S$ , and so for $T$ , contradicting the maximality of $E$ . ∎

With the above results in hand, we can now formulate a first spectral theory result for compact operators, which closes the discussion in the self-adjoint case:

Theorem 4.18.

Assuming that $T\in B(H)$ , with $\dim H=\infty$ , is compact and self-adjoint, the following happen:

(1)

The spectrum $\sigma(T)\subset\mathbb{R}$ consists of a sequence $\lambda_{n}\to 0$ .
(2)

All spectral values $\lambda\in\sigma(T)-\{0\}$ are eigenvalues.
(3)

All eigenvalues $\lambda\in\sigma(T)-\{0\}$ have finite multiplicity.
(4)

There is an orthogonal basis of $H$ made of eigenvectors of $T$ .

Proof.

This follows from the various results established above:

(1) In view of Proposition 4.15 (1), this will follow from (2) below.

(2) Assume that $\lambda\neq 0$ belongs to the spectrum $\sigma(T)$ , but is not an eigenvalue. By using Proposition 4.17, let us pick an orthonormal basis $\{e_{n}\}$ of $H$ consisting of eigenvectors of $T$ , and then consider the following operator:

Sx=\sum_{n}\frac{<x,e_{n}>}{\lambda_{n}-\lambda}\,e_{n}

Then $S$ is an inverse for $T-\lambda$ , and so we have $\lambda\notin\sigma(T)$ , as desired.

(3) This is something that we know, from Proposition 4.15 (2).

(4) This is something that we know too, from Proposition 4.17. ∎

Finally, we have the following result, regarding the general case:

Theorem 4.19.

The compact operators $T\in B(H)$ , with $\dim H=\infty$ , are the operators of the following form, with $\{e_{n}\}$ , $\{f_{n}\}$ being orthonormal families, and with $\lambda_{n}\searrow 0$ :

T(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

The numbers $\lambda_{n}$ , called singular values of $T$ , are the eigenvalues of $|T|$ . In fact, the polar decomposition of $T$ is given by $T=U|T|$ , with

|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

and with $U$ being given by $Ue_{n}=f_{n}$ , and $U=0$ on the complement of $span(e_{i})$ .

Proof.

This basically follows from Theorem 4.8 and Theorem 4.18, as follows:

(1) Given two orthonormal families $\{e_{n}\}$ , $\{f_{n}\}$ , and a sequence of real numbers $\lambda_{n}\searrow 0$ , consider the linear operator given by the formula in the statement, namely:

T(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Our first claim is that $T$ is bounded. Indeed, when assuming $|\lambda_{n}|\leq\varepsilon$ for any $n$ , which is something that we can do if we want to prove that $T$ is bounded, we have:

$\displaystyle\|\|T(x)\|\|^{2}$	$\displaystyle=$	$\displaystyle\left\|\sum_{n}\lambda_{n}<x,e_{n}>f_{n}\right\|^{2}$
	$\displaystyle=$	$\displaystyle\sum_{n}\|\lambda_{n}\|^{2}\|<x,e_{n}>\|^{2}$
	$\displaystyle\leq$	$\displaystyle\varepsilon^{2}\sum_{n}\|<x,e_{n}>\|^{2}$
	$\displaystyle\leq$	$\displaystyle\varepsilon^{2}\|\|x\|\|^{2}$

(2) The next observation is that this operator is indeed compact, because it appears as the norm limit, $T_{N}\to T$ , of the following sequence of finite rank operators:

T_{N}=\sum_{n\leq N}\lambda_{n}<x,e_{n}>f_{n}

(3) Regarding now the polar decomposition assertion, for the above operator, this follows once again from definitions. Indeed, the adjoint is given by:

T^{*}(x)=\sum_{n}\lambda_{n}<x,f_{n}>e_{n}

Thus, when composing $T^{*}$ with $T$ , we obtain the following operator:

T^{*}T(x)=\sum_{n}\lambda_{n}^{2}<x,e_{n}>e_{n}

Now by extracting the square root, we obtain the formula in the statement, namely:

|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

(4) Conversely now, assume that $T\in B(H)$ is compact. Then $T^{*}T$ , which is self-adjoint, must be compact as well, and so by Theorem 4.18 we have a formula as follows, with $\{e_{n}\}$ being a certain orthonormal family, and with $\lambda_{n}\searrow 0$ :

T^{*}T(x)=\sum_{n}\lambda_{n}^{2}<x,e_{n}>e_{n}

By extracting the square root we obtain the formula of $|T|$ in the statement, and then by setting $U(e_{n})=f_{n}$ we obtain a second orthonormal family, $\{f_{n}\}$ , such that:

T(x)=U|T|=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Thus, our compact operator $T\in B(H)$ appears indeed as in the statement. ∎

As a technical remark here, it is possible to slightly improve a part of the above statement. Consider indeed an operator of the following form, with $\{e_{n}\}$ , $\{f_{n}\}$ being orthonormal families as before, and with $\lambda_{n}\to 0$ being now complex numbers:

T(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

Then the same proof as before shows that $T$ is compact, and that the polar decomposition of $T$ is given by $T=U|T|$ , with the modulus $|T|$ being as follows:

|T|(x)=\sum_{n}|\lambda_{n}|<x,e_{n}>e_{n}

As for the partial isometry $U$ , this is given by $Ue_{n}=w_{n}f_{n}$ , and $U=0$ on the complement of $span(e_{i})$ , where $w_{n}\in\mathbb{T}$ are such that $\lambda_{n}=|\lambda_{n}|w_{n}$ .

4c. Trace class operators

We have not talked so far about the trace of operators $T\in B(H)$ , in analogy with the trace of the usual matrices $M\in M_{N}(\mathbb{C})$ . This is because the trace can be finite or infinite, or even not well-defined, and we will discuss this now. Let us start with:

Proposition 4.20.

Given a positive operator $T\in B(H)$ , the quantity

Tr(T)=\sum_{n}<Te_{n},e_{n}>\in[0,\infty]

is indpendent on the choice of an orthonormal basis $\{e_{n}\}$ .

Proof.

If $\{f_{n}\}$ is another orthonormal basis, we have:

$\displaystyle\sum_{n}<Tf_{n},f_{n}>$	$\displaystyle=$	$\displaystyle\sum_{n}<\sqrt{T}f_{n},\sqrt{T}f_{n}>$
	$\displaystyle=$	$\displaystyle\sum_{n}\|\|\sqrt{T}f_{n}\|\|^{2}$
	$\displaystyle=$	$\displaystyle\sum_{mn}\|<\sqrt{T}f_{n},e_{m}>\|^{2}$
	$\displaystyle=$	$\displaystyle\sum_{mn}\|<T^{1/4}f_{n},T^{1/4}e_{m}>\|^{2}$

Since this quantity is symmetric in $e,f$ , this gives the result. ∎

We can now introduce the trace class operators, as follows:

Definition 4.21.

An operator $T\in B(H)$ is said to be of trace class if:

Tr|T|<\infty

The set of such operators, also called integrable, is denoted $B_{1}(H)$ .

In finite dimensions, any operator is of course of trace class. In arbitrary dimension, finite or not, we first have the following result, regarding such operators:

Proposition 4.22.

Any finite rank operator is of trace class, and any trace class operator is compact, so that we have embeddings as follows:

F(H)\subset B_{1}(H)\subset K(H)

Moreover, for any compact operator $T\in K(H)$ we have the formula

Tr|T|=\sum_{n}\lambda_{n}

where $\lambda_{n}\geq 0$ are the singular values, and so $T\in B_{1}(H)$ precisely when $\sum_{n}\lambda_{n}<\infty$ .

Proof.

We have several assertions here, the idea being as follows:

(1) If $T$ is of finite rank, it is clearly of trace class.

(2) In order to prove now the second assertion, assume first that $T>0$ is of trace class. For any orthonormal basis $\{e_{n}\}$ we have:

$\displaystyle\sum_{n}\|\|\sqrt{T}e_{n}\|\|^{2}$	$\displaystyle=$	$\displaystyle\sum_{n}<Te_{n},e_{n}>$
	$\displaystyle\leq$	$\displaystyle Tr(T)$
	$\displaystyle<$	$\displaystyle\infty$

But this shows that we have a convergence as follows:

\sqrt{T}e_{n}\to 0

Thus the operator $\sqrt{T}$ is compact. Now since the compact operators form an ideal, it follows that $T=\sqrt{T}\cdot\sqrt{T}$ is compact as well, as desired.

(3) In order to prove now the second assertion in general, assume that $T\in B(H)$ is of trace class. Then $|T|$ is also of trace class, and so compact by (2), and since we have $T=U|T|$ by polar decomposition, it follows that $T$ is compact too.

(4) Finally, in order to prove the last assertion, assume that $T$ is compact. The singular value decomposition of $|T|$ , from Theorem 4.19, is then as follows:

|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>e_{n}

But this gives the formula for $Tr|T|$ in the statement, and proves the last assertion. ∎

Here is a useful reformulation of the above result, or rather of the above result coupled with Theorem 4.19, without reference to compact operators:

Theorem 4.23.

The trace class operators are precisely the operators of the form

|T|(x)=\sum_{n}\lambda_{n}<x,e_{n}>f_{n}

with $\{e_{n}\}$ , $\{f_{n}\}$ being orthonormal systems, and with $\lambda\searrow 0$ being a sequence satisfying:

\sum_{n}\lambda_{n}<\infty

Moreover, for such an operator we have the following estimate:

|Tr(T)|\leq Tr|T|=\sum_{n}\lambda_{n}

Proof.

This follows indeed from Proposition 4.22, or rather for step (4) in the proof of Proposition 4.22, coupled with Theorem 4.19. ∎

Next, we have the following result, which comes as a continuation of Proposition 4.22, and is our central result here, regarding the trace class operators:

Theorem 4.24.

The space of trace class operators, which appears as an intermediate space between the finite rank operators and the compact operators,

F(H)\subset B_{1}(H)\subset K(H)

is a two-sided $*$ -ideal of $K(H)$ . The following is a Banach space norm on $B_{1}(H)$ ,

||T||_{1}=Tr|T|

satisfying $||T||\leq||T||_{1}$ , and for $T\in B_{1}(H)$ and $S\in B(H)$ we have:

||ST||_{1}\leq||S||\cdot||T||_{1}

Also, the subspace $F(H)$ is dense inside $B_{1}(H)$ , with respect to this norm.

Proof.

There are several assertions here, the idea being as follows:

(1) In order to prove that $B_{1}(H)$ is a linear space, and that $||T||_{1}=Tr|T|$ is a norm on it, the only non-trivial point is that of proving the following inequality:

Tr|S+T|\leq Tr|S|+Tr|T|

For this purpose, consider the polar decompositions of these operators:

S=U|S|\quad,\quad T=V|T|\quad,\quad S+T=W|S+T|

Given an orthonormal basis $\{e_{n}\}$ , we have the following formula:

$\displaystyle Tr\|S+T\|$	$\displaystyle=$	$\displaystyle\sum_{n}<\|S+T\|e_{n},e_{n}>$
	$\displaystyle=$	$\displaystyle\sum_{n}<W^{*}(S+T)e_{n},e_{n}>$
	$\displaystyle=$	$\displaystyle\sum_{n}<W^{}U\|S\|e_{n},e_{n}>+\sum_{n}<W^{}V\|T\|e_{n},e_{n}>$

The point now is that the first sum can be estimated as follows:

			$\displaystyle\sum_{n}<W^{*}U\|S\|e_{n},e_{n}>$
		$\displaystyle=$	$\displaystyle\sum_{n}<\sqrt{\|S\|}e_{n},\sqrt{\|S\|}U^{*}We_{n}>$
		$\displaystyle\leq$	$\displaystyle\sum_{n}\Big{\|}\Big{\|}\sqrt{\|S\|}e_{n}\Big{\|}\Big{\|}\cdot\Big{\|}\Big{\|}\sqrt{\|S\|}U^{*}We_{n}\Big{\|}\Big{\|}$
		$\displaystyle\leq$	$\displaystyle\sqrt{\sum_{n}\Big{\|}\Big{\|}\sqrt{\|S\|}e_{n}\Big{\|}\Big{\|}^{2}}\cdot\sqrt{\sum_{n}\Big{\|}\Big{\|}\sqrt{\|S\|}U^{*}We_{n}\Big{\|}\Big{\|}^{2}}$

In order to estimate the terms on the right, we can proceed as follows:

$\displaystyle\sum_{n}\Big{\|}\Big{\|}\sqrt{\|S\|}U^{*}We_{n}\Big{\|}\Big{\|}^{2}$	$\displaystyle=$	$\displaystyle\sum_{n}<W^{}U\|S\|U^{}We_{n},e_{n}>$
	$\displaystyle=$	$\displaystyle Tr(W^{}U\|S\|U^{}W)$
	$\displaystyle\leq$	$\displaystyle Tr(U\|S\|U^{*})$
	$\displaystyle\leq$	$\displaystyle Tr(\|S\|)$

The second sum in the above formula of $Tr|S+T|$ can be estimated in the same way, and in the end we obtain, as desired:

Tr|S+T|\leq Tr|S|+Tr|T|

(2) The estimate $||T||\leq||T||_{1}$ can be established as follows:

$\displaystyle\|\|T\|\|$	$\displaystyle=$	$\displaystyle\big{\|}\big{\|}\|T\|\big{\|}\big{\|}$
	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}<\|T\|x,x>$
	$\displaystyle\leq$	$\displaystyle Tr\|T\|$

(3) The fact that $B_{1}(H)$ is indeed a Banach space follows by constructing a limit for any Cauchy sequence, by using the singular value decomposition.

(4) The fact that $B_{1}(H)$ is indeed closed under the involution follows from:

$\displaystyle Tr(T^{*})$	$\displaystyle=$	$\displaystyle\sum_{n}<T^{*}e_{n},e_{n}>$
	$\displaystyle=$	$\displaystyle\sum_{n}<e_{n},Te_{N}>$
	$\displaystyle=$	$\displaystyle\overline{Tr(T)}$

(5) In order to prove now the ideal property of $B_{1}(H)$ , we use the standard fact, that we know from Proposition 4.5, that any bounded operator $T\in B(H)$ can be written as a linear combination of 4 unitary operators, as follows:

T=\lambda_{1}U_{1}+\lambda_{2}U_{2}+\lambda_{3}U_{3}+\lambda_{4}U_{4}

Indeed, by taking the real and imaginary part we can first write $T$ as a linear combination of 2 self-adjoint operators, and then by functional calculus each of these 2 self-adjoint operators can be written as a linear linear combination of 2 unitary operators.

(6) With this trick in hand, we can now prove the ideal property of $B_{1}(H)$ . Indeed, it is enough to prove that we have:

T\in B_{1}(H),U\in U(H)\implies UT,TU\in B_{1}(H)

But this latter result follows by using the polar decomposition theorem.

(7) With a bit more care, we obtain from this the estimate $||ST||_{1}\leq||S||\cdot||T||_{1}$ from the statement. As for the last assertion, this is clear as well. ∎

This was for the basic theory of the trace class operators. Much more can be said, and we refer here to the literature, such as Lax [lax]. In what concerns us, we will be back to these operators later in this book, in Part III, when discussing operator algebras.

4d. Hilbert-Schmidt operators

As a last topic of this chapter, let us discuss yet another important class of operators, namely the Hilbert-Schmidt ones. These operators, that we will need on several key occasions in what follows, when talking operator algebras, are introduced as follows:

Definition 4.25.

An operator $T\in B(H)$ is said to be Hilbert-Schmidt if:

Tr(T^{*}T)<\infty

The set of such operators is denoted $B_{2}(H)$ .

As before with other sets of operators, in finite dimensions we obtain in this way all the operators. In general, we have the following result, regarding such operators:

Theorem 4.26.

The space $B_{2}(H)$ of Hilbert-Schmidt operators, which appears as an intermediate space between the trace class operators and the compact operators,

F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H)

is a two-sided $*$ -ideal of $K(H)$ . This ideal has the property

S,T\in B_{2}(H)\implies ST\in B_{1}(H)

and conversely, each $T\in B_{1}(H)$ appears as product of two operators in $B_{2}(H)$ . In terms of the singular values $(\lambda_{n})$ , the Hilbert-Schmidt operators are characterized by:

\sum_{n}\lambda_{n}^{2}<\infty

Also, the following formula, whose output is finite by Cauchy-Schwarz,

<S,T>=Tr(ST^{*})

defines a scalar product of $B_{2}(H)$ , making it a Hilbert space.

Proof.

All this is quite standard, from the results that we have already, and more specifically from the singular value decomposition theorem, and its applications. To be more precise, the proof of the various assertions goes as follows:

(1) First of all, the fact that the space of Hilbert-Schmidt operators $B_{2}(H)$ is stable under taking sums, and so is a vector space, follows from:

$\displaystyle(S+T)^{*}(S+T)$	$\displaystyle\leq$	$\displaystyle(S+T)^{}(S+T)+(S-T)^{}(S-T)$
	$\displaystyle=$	$\displaystyle(S^{}+T^{})(S+T)+(S^{}-T^{})(S-T)$
	$\displaystyle=$	$\displaystyle 2(S^{}S+T^{}T)$

Regarding now multiplicative properties, we can use here the following inequality:

(ST)^{*}(ST)=T^{*}S^{*}ST\leq||S||^{2}T^{*}T

Thus, the space $B_{2}(H)$ is a two-sided $*$ -ideal of $K(H)$ , as claimed.

(2) In order to prove now that the product of any two Hilbert-Schmidt operators is a trace class operator, we can use the following formula, which is elementary:

S^{*}T=\sum_{k=1}^{4}i^{k}(S-iT)^{*}(S-iT)

Conversely, given an arbitrary trace class operator $T\in B_{1}(H)$ , we have:

T\in B_{1}(H)\implies|T|\in B_{1}(H)\implies\sqrt{|T|}\in B_{2}(H)

Thus, by using the polar decomposition $T=U|T|$ , we obtain the following decomposition for $T$ , with both components being Hilbert-Schmidt operators:

T=U|T|=U\sqrt{|T|}\cdot\sqrt{|T|}

(3) The condition for the singular values is clear.

(4) The fact that we have a scalar product is clear as well.

(5) The proof of the completness property is routine as well. ∎

We have as well the following key result, regarding the Hilbert-Schmidt operators:

Theorem 4.27.

We have the following formula,

Tr(ST)=Tr(TS)

valied for any Hilbert-Schmidt operators $S,T\in B_{2}(H)$ .

Proof.

We can prove this in two steps, as follows:

(1) Assume first that $|S|$ is trace class. Consider the polar decomposition $S=U|S|$ , and choose an orthonormal basis $\{x_{i}\}$ for the image of $U$ , suitably extended to an orthonormal basis of $H$ . We have then the following computation, as desired:

$\displaystyle Tr(ST)$	$\displaystyle=$	$\displaystyle\sum_{i}<U\|S\|Tx_{i},x_{i}>$
	$\displaystyle=$	$\displaystyle\sum_{i}<\|S\|TUU^{}x_{i},U^{}x_{i}>$
	$\displaystyle=$	$\displaystyle Tr(\|S\|TU)$
	$\displaystyle=$	$\displaystyle Tr(TU\|S\|)$
	$\displaystyle=$	$\displaystyle Tr(TS)$

(2) Assume now that we are in the general case, where $S$ is only assumed to be Hilbert-Schmidt. For any finite rank operator $S^{\prime}$ we have then:

	$\displaystyle\|Tr(ST)-Tr(TS)\|$	$\displaystyle=$	$\displaystyle\|Tr((S-S^{\prime})T)-Tr(T(S-S^{\prime}))\|$
		$\displaystyle\leq$	$\displaystyle 2\|\|S-S^{\prime}\|\|_{2}\cdot\|\|T\|\|_{2}$

Thus by choosing $S^{\prime}$ with $||S-S^{\prime}||_{2}\to 0$ , we obtain the result. ∎

This was for the basic theory of bounded operators on a Hilbert space, $T\in B(H)$ . In the remainder of this book we will be rather interested in the operator algebras $A\subset B(H)$ that these operators can form. This is of course related to operator theory, because we can, at least in theory, take $A=<T>$ , and then study $T$ via the properties of $A$ . Actually, this is something that we already did a few times, when doing spectral theory, and notably when talking about functional calculus for normal operators.

For further operator theory, however, nothing beats a good operator theory book, and various ad-hoc methods, depending on the type of operators involved, and especially, on what you want to do with them. As before, in relation with topics to be later discussed in this book, we recommend here the books of Lax [lax] and Blackadar [bla].

Let us mention as well that there is a lot of interesting theory regarding the unbounded operators $T\in\mathcal{L}(H)$ too, which is something quite technical, and here once again, we warmly recommend a good operator theory book. In addition, we recommend as well a good PDE book, because most of the questions making appear unbounded operators usually have PDE formulations as well, which are extremely efficient.

4e. Exercises

There has been a lot of theory in this chapter, with some of the things not really explained in great detail, and we have several exercises about all this. First comes:

Exercise 4.28.

Try to find the best operator theoretic analogue of the formula

z=re^{it}

for the complex numbers, telling us that any number is a real multiple of a unitary.

As explained in the above, a weak analogue of this holds, stating that any operator is a linear combination of 4 unitaries. The problem is that of improving this.

Exercise 4.29.

Work out a few explicit examples of the polar decomposition formula

T=U\sqrt{T^{*}T}

with, if possible, a non-trivial computation for the square root.

This is actually something quite tricky, even for the usual matrices. So, as a preliminary exercise here, have some fun with the $2\times 2$ matrices.

Exercise 4.30.

Look up the various extra general properties of the sets of finite rank, trace class, Hilbert-Schmidt and compact operators,

F(H)\subset B_{1}(H)\subset B_{2}(H)\subset K(H)

coming in addition to what has been said above, about such operators.

This is of course quite vague, and, as good news, it is not indicated either if you should just come with a list of such properties, or with a list of such properties coming with complete proofs. Up to you here, and the more the better.

Part II Operator algebras

There was something in the air that night

The stars were bright, Fernando

They were shining there for you and me

For liberty, Fernando

Chapter 5 Operator algebras

5a. Normed algebras

We have seen that the study of the bounded operators $T\in B(H)$ often leads to the consideration of the algebras $<T>\subset B(H)$ generated by such operators, the idea being that the study of $A=<T>$ can lead to results about $T$ itself. In the remainder of this book we focus on the study of such algebras $A\subset B(H)$ . Before anything, we should mention that there are countless ways of getting introduced to operator algebras, depending on motivations and taste, with the available books including:

(1) The old book of von Neumann [vn4], which started everything. This is a very classical book, with mathematical physics content, written at times when mathematics and physics were starting to part ways. A great book, still enjoyable nowadays.

(2) Various post-war treatises, such as Dixmier [dix], Kadison-Ringrose [kri], Strătilă-Zsidó [szs] and Takesaki [tak]. As a warning, however, these books are purely mathematical. Also, they sometimes avoid deep results of von Neumann and Connes.

(3) More recent books, including Arveson [arv], Blackadar [bla], Brown-Ozawa [boz], Connes [co3], Davidson [dav], Jones [jo6], Murphy [mur], Pedersen [ped] and Sakai [sak]. These are well-concieved one-volume books, written with various purposes in mind.

Our presentation below is inspired by Blackadar [bla], Connes [co3], Jones [jo6], but is yet another type of beast, often insisting on probabilistic aspects. But probably enough talking, more on this later, and let us get to work. We are interested in the study of the algebras of bounded operators $A\subset B(H)$ . Let us start our discussion with the following broad definition, obtained by imposing the “minimal” set of reasonable axioms:

Definition 5.1.

An operator algebra is an algebra of bounded operators $A\subset B(H)$ which contains the unit, is closed under taking adjoints,

T\in A\implies T^{*}\in A

and is closed as well under the norm.

Here, as in the previous chapters, $H$ is an arbitrary Hilbert space, with the case that we are mostly interested in being the separable one. By separable we mean having a countable orthonormal basis, $\{e_{i}\}_{i\in I}$ with $I$ countable, and such a space is of course unique. The simplest model is the space $l^{2}(\mathbb{N})$ , but in practice, we are particularly interested in the spaces of the form $H=L^{2}(X)$ , which are separable too, but with the basis $\{e_{i}\}_{i\in\mathbb{N}}$ and the subsequent identification $H\simeq l^{2}(\mathbb{N})$ being not necessarily very explicit.

Also as in the previous chapters, $B(H)$ is the algebra of linear operators $T:H\to H$ which are bounded, in the sense that the norm $||T||=\sup_{||x||=1}||Tx||$ is finite. This algebra has an involution $T\to T^{*}$ , with the adjoint operator $T^{*}\in B(H)$ being defined by the formula $<Tx,y>=<x,T^{*}y>$ , and in the above definition, the assumption $T\in A\implies T^{*}\in A$ refers to this involution. Thus, $A$ must be a $*$ -algebra.

As a first result now regarding the operator algebras, in relation with the normal operators, where most of the non-trivial results that we have so far are, we have:

Theorem 5.2.

The operator algebra $<T>\subset B(H)$ generated by a normal operator $T\in B(H)$ appears as an algebra of continuous functions,

<T>=C(\sigma(T))

where $\sigma(T)\subset\mathbb{C}$ denotes as usual the spectrum of $T$ .

Proof.

This is an abstract reformulation of the continuous functional calculus theorem for the normal operators, that we know from chapter 3. Indeed, that theorem tells us that we have a continuous morphism of $*$ -algebras, as follows:

C(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Moreover, by the general properties of the continuous calculus, also established in chapter 3, this morphism is injective, and its image is the norm closed algebra $<T>$ generated by $T,T^{*}$ . Thus, we obtain the isomorphism in the statement. ∎

The above result is very nice, and it is possible to further build on it, by using this time the spectral theorem for families of normal operators, as follows:

Theorem 5.3.

The operator algebra $<T_{i}>\subset B(H)$ generated by a family of normal operators $T_{i}\in B(H)$ appears as an algebra of continuous functions,

<T>=C(X)

where $X\subset\mathbb{C}$ is a certain compact space associated to the family $\{T_{i}\}$ . Equivalently, any commutative operator algebra $A\subset B(H)$ is of the form $A=C(X)$ .

Proof.

We have two assertions here, the idea being as follows:

(1) Regarding the first assertion, this follows exactly as in the proof of Theorem 5.2, by using this time the spectral theorem for families of normal operators.

(2) As for the second assertion, this is clear from the first one, because any commutative algebra $A\subset B(H)$ is generated by its elements $T\in A$ , which are all normal. ∎

All this is good to know, but Theorem 5.2 and Theorem 5.3 remain something quite heavy, based on the spectral theorem. We would like to present now an alternative proof for these results, which is rather elementary, and has the advantage of reconstructing the compact space $X$ directly from the knowledge of the algebra $A$ . We will need:

Theorem 5.4.

Given an operator $T\in A\subset B(H)$ , define its spectrum as:

\sigma(T)=\left\{\lambda\in\mathbb{C}\Big{|}T-\lambda\notin A^{-1}\right\}

The following spectral theory results hold, exactly as in the $A=B(H)$ case:

(1)

We have $\sigma(ST)\cup\{0\}=\sigma(TS)\cup\{0\}$ .
(2)

We have polynomial, rational and holomorphic calculus.
(3)

As a consequence, the spectra are compact and non-empty.
(4)

The spectra of unitaries $(U^{*}=U^{-1})$ and self-adjoints $(T=T^{*})$ are on $\mathbb{T},\mathbb{R}$ .
(5)

The spectral radius of normal elements $(TT^{*}=T^{*}T)$ is given by $\rho(T)=||T||$ .

In addition, assuming $T\in A\subset B$ , the spectra of $T$ with respect to $A$ and to $B$ coincide.

Proof.

This is something that we know from the beginning of chapter 3, in the case $A=B(H)$ . In general the proof is similar, the idea being as follows:

(1) Regarding the assertions (1-5), which are of course formulated a bit informally, the proofs here are perfectly similar to those for the full operator algebra $A=B(H)$ . All this is standard material, and in fact, things in chapter 3 were written in such a way as for their extension now, to the general operator algebra setting, to be obvious.

(2) Regarding the last assertion, the inclusion $\sigma_{B}(T)\subset\sigma_{A}(T)$ is clear. For the converse, assume $T-\lambda\in B^{-1}$ , and consider the following self-adjoint element:

S=(T-\lambda)^{*}(T-\lambda)

The difference between the two spectra of $S\in A\subset B$ is then given by:

\sigma_{A}(S)-\sigma_{B}(S)=\left\{\mu\in\mathbb{C}-\sigma_{B}(S)\Big{|}(S-\mu)^{-1}\in B-A\right\}

Thus this difference in an open subset of $\mathbb{C}$ . On the other hand $S$ being self-adjoint, its two spectra are both real, and so is their difference. Thus the two spectra of $S$ are equal, and in particular $S$ is invertible in $A$ , and so $T-\lambda\in A^{-1}$ , as desired.

(3) As an observation, the last assertion applied with $B=B(H)$ shows that the spectrum $\sigma(T)$ as constructed in the statement coincides with the spectrum $\sigma(T)$ as constructed and studied in chapter 3, so the fact that (1-5) hold indeed is no surprise.

(4) Finally, I can hear you screaming that I should have concieved this book differently, matter of not proving the same things twice. Good point, with my distinguished colleague Bourbaki saying the same, and in answer, wait for chapter 7 below, where we will prove exactly the same things a third time. We can discuss pedagogy at that time. ∎

We can now get back to the commutative algebras, and we have the following result, due to Gelfand, which provides an alternative to Theorem 5.2 and Theorem 5.3:

Theorem 5.5.

Any commutative operator algebra $A\subset B(H)$ is of the form

A=C(X)

with the “spectrum” $X$ of such an algebra being the space of characters $\chi:A\to\mathbb{C}$ , with topology making continuous the evaluation maps $ev_{T}:\chi\to\chi(T)$ .

Proof.

Given a commutative operator algebra $A$ , we can define $X$ as in the statement. Then $X$ is compact, and $T\to ev_{T}$ is a morphism of algebras, as follows:

ev:A\to C(X)

(1) We first prove that $ev$ is involutive. We use the following formula, which is similar to the $z=Re(z)+iIm(z)$ formula for the usual complex numbers:

T=\frac{T+T^{*}}{2}+i\cdot\frac{T-T^{*}}{2i}

Thus it is enough to prove the equality $ev_{T^{*}}=ev_{T}^{*}$ for self-adjoint elements $T$ . But this is the same as proving that $T=T^{*}$ implies that $ev_{T}$ is a real function, which is in turn true, because $ev_{T}(\chi)=\chi(T)$ is an element of $\sigma(T)$ , contained in $\mathbb{R}$ .

(2) Since $A$ is commutative, each element is normal, so $ev$ is isometric:

||ev_{T}||=\rho(T)=||T||

(3) It remains to prove that $ev$ is surjective. But this follows from the Stone-Weierstrass theorem, because $ev(A)$ is a closed subalgebra of $C(X)$ , which separates the points. ∎

The above theorem of Gelfand is something very beautiful, and far-reaching. It is possible to further build on it, indefinitely high. We will be back to this.

5b. Von Neumann algebras

Instead of further building on the above results, which are already quite non-trivial, let us return to our modest status of apprentice operator algebraists, and declare ourselves rather unsatisfied with Definition 5.1, on the following intuitive grounds:

Thought 5.6.

Our assumption that $A\subset B(H)$ is norm closed is not satisfying, because we would like $A$ to be stable under polar decomposition, under taking spectral projections, and more generally, under measurable functional calculus.

Here all these “defects” are best visible in the context of Theorem 5.3, with the algebra $A=C(X)$ found there, with $X=\sigma(T)$ , being obviously too small. In fact, Theorem 5.3 teaches us that, when looking for a fix, we should look for a weaker topology on $B(H)$ , as for the algebra $A=<T>$ generated by a normal operator to be $A=L^{\infty}(X)$ .

So, let us get now into this, topologies on $B(H)$ , and fine-tunings of Definition 5.1, based on them. The result that we will need, which is elementary, is as follows:

Proposition 5.7.

For a subalgebra $A\subset B(H)$ , the following are equivalent:

(1)

$A$ is closed under the weak operator topology, making each of the linear maps $T\to<Tx,y>$ continuous.
(2)

$A$ is closed under the strong operator topology, making each of the linear maps $T\to Tx$ continuous.

In the case where these conditions are satisfied, $A$ is closed under the norm topology.

Proof.

There are several statements here, the proof being as follows:

(1) It is clear that the norm topology is stronger than the strong operator topology, which is in turn stronger than the weak operator topology. At the level of the subsets $S\subset B(H)$ which are closed things get reversed, in the sense that weakly closed implies strongly closed, which in turn implies norm closed. Thus, we are left with proving that for any algebra $A\subset B(H)$ , strongly closed implies weakly closed.

(2) Consider the Hilbert space obtained by summing $n$ times $H$ with itself:

K=H\oplus\ldots\oplus H

The operators over $K$ can be regarded as being square matrices with entries in $B(H)$ , and in particular, we have a representation $\pi:B(H)\to B(K)$ , as follows:

\pi(T)=\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}

Assume now that we are given an operator $T\in\bar{A}$ , with the bar denoting the weak closure. We have then, by using the Hahn-Banach theorem, for any $x\in K$ :

$\displaystyle T\in\bar{A}$	$\displaystyle\implies$	$\displaystyle\pi(T)\in\overline{\pi(A)}$
	$\displaystyle\implies$	$\displaystyle\pi(T)x\in\overline{\pi(A)x}$
	$\displaystyle\implies$	$\displaystyle\pi(T)x\in\overline{\pi(A)x}^{\,\|\|.\|\|}$

Now observe that the last formula tells us that for any $x=(x_{1},\ldots,x_{n})$ , and any $\varepsilon>0$ , we can find $S\in A$ such that the following holds, for any $i$ :

||Sx_{i}-Tx_{i}||<\varepsilon

Thus $T$ belongs to the strong operator closure of $A$ , as desired. ∎

Observe that in the above the terminology is a bit confusing, because the norm topology is stronger than the strong operator topology. As a solution, we agree to call the norm topology “strong”, and the weak and strong operator topologies “weak”, whenever these two topologies coincide. With this convention made, the algebras $A\subset B(H)$ in Proposition 5.7 are those which are weakly closed. Thus, we can now formulate:

Definition 5.8.

A von Neumann algebra is an operator algebra

A\subset B(H)

which is closed under the weak topology.

These algebras will be our main objects of study, in what follows. As basic examples, we have the algebra $B(H)$ itself, then the singly generated algebras, $A=<T>$ with $T\in B(H)$ , and then the multiply generated algebras, $A=<T_{i}>$ with $T_{i}\in B(H)$ . But for the moment, let us keep things simple, and build directly on Definition 5.8, by using basic functional analysis methods. We will need the following key result:

Theorem 5.9.

For an operator algebra $A\subset B(H)$ , we have

A^{\prime\prime}=\bar{A}

with $A^{\prime\prime}$ being the bicommutant inside $B(H)$ , and $\bar{A}$ being the weak closure.

Proof.

We can prove this by double inclusion, as follows:

“ $\supset$ ” Since any operator commutes with the operators that it commutes with, we have a trivial inclusion $S\subset S^{\prime\prime}$ , valid for any set $S\subset B(H)$ . In particular, we have:

A\subset A^{\prime\prime}

Our claim now is that the algebra $A^{\prime\prime}$ is closed, with respect to the strong operator topology. Indeed, assuming that we have $T_{i}\to T$ in this topology, we have:

$\displaystyle T_{i}\in A^{\prime\prime}$	$\displaystyle\implies$	$\displaystyle ST_{i}=T_{i}S,\ \forall S\in A^{\prime}$
	$\displaystyle\implies$	$\displaystyle ST=TS,\ \forall S\in A^{\prime}$
	$\displaystyle\implies$	$\displaystyle T\in A$

Thus our claim is proved, and together with Proposition 5.7, which allows us to pass from the strong to the weak operator topology, this gives $\bar{A}\subset A^{\prime\prime}$ , as desired.

“ $\subset$ ” Here we must prove that we have the following implication, valid for any $T\in B(H)$ , with the bar denoting as usual the weak operator closure:

T\in A^{\prime\prime}\implies T\in\bar{A}

For this purpose, we use the same amplification trick as in the proof of Proposition 5.7. Consider the Hilbert space obtained by summing $n$ times $H$ with itself:

K=H\oplus\ldots\oplus H

The operators over $K$ can be regarded as being square matrices with entries in $B(H)$ , and in particular, we have a representation $\pi:B(H)\to B(K)$ , as follows:

\pi(T)=\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}

The idea will be that of doing the computations in this representation. First, in this representation, the image of our algebra $A\subset B(H)$ is given by:

\pi(A)=\left\{\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}\Big{|}T\in A\right\}

We can compute the commutant of this image, exactly as in the usual scalar matrix case, and we obtain the following formula:

\pi(A)^{\prime}=\left\{\begin{pmatrix}S_{11}&\ldots&S_{1n}\\ \vdots&&\vdots\\ S_{n1}&\ldots&S_{nn}\end{pmatrix}\Big{|}S_{ij}\in A^{\prime}\right\}

We conclude from this that, given an operator $T\in A^{\prime\prime}$ as above, we have:

\begin{pmatrix}T\\ &\ddots\\ &&T\end{pmatrix}\in\pi(A)^{\prime\prime}

In other words, the conclusion of all this is that we have:

T\in A^{\prime\prime}\implies\pi(T)\in\pi(A)^{\prime\prime}

Now given a vector $x\in K$ , consider the orthogonal projection $P\in B(K)$ on the norm closure of the vector space $\pi(A)x\subset K$ . Since the subspace $\pi(A)x\subset K$ is invariant under the action of $\pi(A)$ , so is its norm closure inside $K$ , and we obtain from this:

P\in\pi(A)^{\prime}

By combining this with what we found above, we conclude that we have:

T\in A^{\prime\prime}\implies\pi(T)P=P\pi(T)

Since this holds for any $x\in K$ , we conclude that any operator $T\in A^{\prime\prime}$ belongs to the strong operator closure of $A$ . By using now Proposition 5.7, which allows us to pass from the strong to the weak operator closure, we conclude that we have:

A^{\prime\prime}\subset\bar{A}

Thus, we have the desired reverse inclusion, and this finishes the proof. ∎

Now by getting back to the von Neumann algebras, from Definition 5.8, we have the following result, which is a reformulation of Theorem 5.9, by using this notion:

Theorem 5.10.

For an operator algebra $A\subset B(H)$ , the following are equivalent:

(1)

$A$ is weakly closed, so it is a von Neumann algebra.
(2)

$A$ equals its algebraic bicommutant $A^{\prime\prime}$ , taken inside $B(H)$ .

Proof.

This follows from the formula $A^{\prime\prime}=\bar{A}$ from Theorem 5.9, along with the trivial fact that the commutants are automatically weakly closed. ∎

The above statement, called bicommutant theorem, and due to von Neumann [vn1], is quite interesting, philosophically speaking. Among others, it shows that the von Neumann algebras are exactly the commutants of the self-adjoint sets of operators:

Proposition 5.11.

Given a subset $S\subset B(H)$ which is closed under $*$ , the commutant

A=S^{\prime}

is a von Neumann algebra. Any von Neumann algebra appears in this way.

Proof.

We have two assertions here, the idea being as follows:

(1) Given $S\subset B(H)$ satisfying $S=S^{*}$ , the commutant $A=S^{\prime}$ satisfies $A=A^{*}$ , and is also weakly closed. Thus, $A$ is a von Neumann algebra. Note that this follows as well from the following “tricommutant formula”, which follows from Theorem 5.10:

S^{\prime\prime\prime}=S^{\prime}

(2) Given a von Neumann algebra $A\subset B(H)$ , we can take $S=A^{\prime}$ . Then $S$ is closed under the involution, and we have $S^{\prime}=A$ , as desired. ∎

Observe that Proposition 5.11 can be regarded as yet another alternative definition for the von Neumann algebras, and with this definition being probably the best one when talking about quantum mechanics, where the self-adjoint operators $T:H\to H$ can be though of as being “observables” of the system, and with the commutants $A=S^{\prime}$ of the sets of such observables $S=\{T_{i}\}$ being the algebras $A\subset B(H)$ that we are interested in. And with all this actually needing some discussion about self-adjointness, and about boundedness too, but let us not get into this here, and stay mathematical, as before.

As another interesting consequence of Theorem 5.10, we have:

Proposition 5.12.

Given a von Neumann algebra $A\subset B(H)$ , its center

Z(A)=A\cap A^{\prime}

regarded as an algebra $Z(A)\subset B(H)$ , is a von Neumann algebra too.

Proof.

This follows from the fact that the commutants are weakly closed, that we know from the above, which shows that $A^{\prime}\subset B(H)$ is a von Neumann algebra. Thus, the intersection $Z(A)=A\cap A^{\prime}$ must be a von Neumann algebra too, as claimed. ∎

In order to develop some general theory, let us start by investigating the finite dimensional case. Here the ambient algebra is $B(H)=M_{N}(\mathbb{C})$ , any linear subspace $A\subset B(H)$ is automatically closed, for all 3 topologies in Proposition 5.7, and we have:

Theorem 5.13.

The $*$ -algebras $A\subset M_{N}(\mathbb{C})$ are exactly the algebras of the form

A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

depending on parameters $k\in\mathbb{N}$ and $n_{1},\ldots,n_{k}\in\mathbb{N}$ satisfying

n_{1}+\ldots+n_{k}=N

embedded into $M_{N}(\mathbb{C})$ via the obvious block embedding, twisted by a unitary $U\in U_{N}$ .

Proof.

We have two assertions to be proved, the idea being as follows:

(1) Given numbers $n_{1},\ldots,n_{k}\in\mathbb{N}$ satisfying $n_{1}+\ldots+n_{k}=N$ , we have indeed an obvious embedding of $*$ -algebras, via matrix blocks, as follows:

M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})\subset M_{N}(\mathbb{C})

In addition, we can twist this embedding by a unitary $U\in U_{N}$ , as follows:

M\to UMU^{*}

(2) In the other sense now, consider a $*$ -algebra $A\subset M_{N}(\mathbb{C})$ . It is elementary to prove that the center $Z(A)=A\cap A^{\prime}$ , as an algebra, is of the following form:

Z(A)\simeq\mathbb{C}^{k}

Consider now the standard basis $e_{1},\ldots,e_{k}\in\mathbb{C}^{k}$ , and let $p_{1},\ldots,p_{k}\in Z(A)$ be the images of these vectors via the above identification. In other words, these elements $p_{1},\ldots,p_{k}\in A$ are central minimal projections, summing up to 1:

p_{1}+\ldots+p_{k}=1

The idea is then that this partition of the unity will eventually lead to the block decomposition of $A$ , as in the statement. We prove this in 4 steps, as follows:

Step 1. We first construct the matrix blocks, our claim here being that each of the following linear subspaces of $A$ are non-unital $*$ -subalgebras of $A$ :

A_{i}=p_{i}Ap_{i}

But this is clear, with the fact that each $A_{i}$ is closed under the various non-unital $*$ -subalgebra operations coming from the projection equations $p_{i}^{2}=p_{i}^{*}=p_{i}$ .

Step 2. We prove now that the above algebras $A_{i}\subset A$ are in a direct sum position, in the sense that we have a non-unital $*$ -algebra sum decomposition, as follows:

A=A_{1}\oplus\ldots\oplus A_{k}

As with any direct sum question, we have two things to be proved here. First, by using the formula $p_{1}+\ldots+p_{k}=1$ and the projection equations $p_{i}^{2}=p_{i}^{*}=p_{i}$ , we conclude that we have the needed generation property, namely:

A_{1}+\ldots+A_{k}=A

As for the fact that the sum is indeed direct, this follows as well from the formula $p_{1}+\ldots+p_{k}=1$ , and from the projection equations $p_{i}^{2}=p_{i}^{*}=p_{i}$ .

Step 3. Our claim now, which will finish the proof, is that each of the $*$ -subalgebras $A_{i}=p_{i}Ap_{i}$ constructed above is a full matrix algebra. To be more precise here, with $n_{i}=rank(p_{i})$ , our claim is that we have isomorphisms, as follows:

A_{i}\simeq M_{n_{i}}(\mathbb{C})

In order to prove this claim, recall that the projections $p_{i}\in A$ were chosen central and minimal. Thus, the center of each of the algebras $A_{i}$ reduces to the scalars:

Z(A_{i})=\mathbb{C}

But this shows, either via a direct computation, or via the bicommutant theorem, that the each of the algebras $A_{i}$ is a full matrix algebra, as claimed.

Step 4. We can now obtain the result, by putting together what we have. Indeed, by using the results from Step 2 and Step 3, we obtain an isomorphism as follows:

A\simeq M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

Moreover, a more careful look at the isomorphisms established in Step 3 shows that at the global level, that of the algebra $A$ itself, the above isomorphism simply comes by twisting the following standard multimatrix embedding, discussed in the beginning of the proof, (1) above, by a certain unitary matrix $U\in U_{N}$ :

M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})\subset M_{N}(\mathbb{C})

Now by putting everything together, we obtain the result. ∎

In relation with the bicommutant theorem, we have the following result, which fully clarifies the situation, with a very explicit proof, in finite dimensions:

Proposition 5.14.

Consider a $*$ -algebra $A\subset M_{N}(\mathbb{C})$ , written as above:

A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The commutant of this algebra is then, with respect with the block decomposition used,

A^{\prime}=\mathbb{C}\oplus\ldots\oplus\mathbb{C}

and by taking one more time the commutant we obtain $A$ itself, $A=A^{\prime\prime}$ .

Proof.

Let us decompose indeed our algebra $A$ as in Theorem 5.13:

A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The center of each matrix algebra being reduced to the scalars, the commutant of this algebra is then as follows, with each copy of $\mathbb{C}$ corresponding to a matrix block:

A^{\prime}=\mathbb{C}\oplus\ldots\oplus\mathbb{C}

By taking once again the commutant we obtain $A$ itself, and we are done. ∎

As another interesting application of Theorem 5.13, clarifying this time the relation with operator theory, in finite dimensions, we have the following result:

Theorem 5.15.

Given an operator $T\in B(H)$ in finite dimensions, $H=\mathbb{C}^{N}$ , the von Neumann algebra $A=<T>$ that it generates inside $B(H)=M_{N}(\mathbb{C})$ is

A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

with the sizes of the blocks $n_{1},\ldots,n_{k}\in\mathbb{N}$ coming from the spectral theory of the associated matrix $M\in M_{N}(\mathbb{C})$ . In the normal case $TT^{*}=T^{*}T$ , this decomposition comes from

T=UDU^{*}

with $D\in M_{N}(\mathbb{C})$ diagonal, and with $U\in U_{N}$ unitary.

Proof.

This is something which is routine, by using the linear algebra and spectral theory developed in chapter 1, for the matrices $M\in M_{N}(\mathbb{C})$ . To be more precise:

(1) The fact that $A=<T>$ decomposes into a direct sum of matrix algebras is something that we already know, coming from Theorem 5.13.

(2) By using standard linear algebra, we can compute the block sizes $n_{1},\ldots,n_{k}\in\mathbb{N}$ , from the knowledge of the spectral theory of the associated matrix $M\in M_{N}(\mathbb{C})$ .

(3) In the normal case, $TT^{*}=T^{*}T$ , we can simply invoke the spectral theorem, and by suitably changing the basis, we are led to the conclusion in the statement. ∎

Let us get now to infinite dimensions, with Theorem 5.15 as our main source of inspiration. The same argument applies, provided that we are in the normal case, and we have the following result, summarizing our basic knowledge here:

Theorem 5.16.

Given a bounded operator $T\in B(H)$ which is normal, $TT^{*}=T^{*}T$ , the von Neumann algebra $A=<T>$ that it generates inside $B(H)$ is

<T>=L^{\infty}(\sigma(T))

with $\sigma(T)\subset\mathbb{C}$ being as usual its spectrum.

Proof.

The measurable functional calculus theorem for the normal operators tells us that we have a weakly continuous morphism of $*$ -algebras, as follows:

L^{\infty}(\sigma(T))\to B(H)\quad,\quad f\to f(T)

Moreover, by the general properties of the measurable calculus, also established in chapter 3, this morphism is injective, and its image is the weakly closed algebra $<T>$ generated by $T,T^{*}$ . Thus, we obtain the isomorphism in the statement. ∎

More generally now, along the same lines, we have the following result:

Theorem 5.17.

Given operators $T_{i}\in B(H)$ which are normal, and which commute, the von Neumann algebra $A=<T_{i}>$ that these operators generates inside $B(H)$ is

<T_{i}>=L^{\infty}(X)

with $X$ being a certain measured space, associated to the family $\{T_{i}\}$ .

Proof.

This is once again routine, by using the spectral theory for the families of commuting normal operators $T_{i}\in B(H)$ developed in chapter 3. ∎

As a fundamental consequence now of the above results, we have:

Theorem 5.18.

The commutative von Neumann algebras are the algebras

A=L^{\infty}(X)

with $X$ being a measured space.

Proof.

We have two assertions to be proved, the idea being as follows:

(1) In one sense, we must prove that given a measured space $X$ , we can realize the $A=L^{\infty}(X)$ as a von Neumann algebra, on a certain Hilbert space $H$ . But this is something that we know since chapter 2, the representation being as follows:

L^{\infty}(X)\subset B(L^{2}(X))\quad,\quad f\to(g\to fg)

(2) In the other sense, given a commutative von Neumann algebra $A\subset B(H)$ , we must construct a certain measured space $X$ , and an identification $A=L^{\infty}(X)$ . But this follows from Theorem 5.17, because we can write our algebra as follows:

A=<T_{i}>

To be more precise, $A$ being commutative, any element $T\in A$ is normal, so we can pick a basis $\{T_{i}\}\subset A$ , and then we have $A=<T_{i}>$ as above, with $T_{i}\in B(H)$ being commuting normal operators. Thus Theorem 5.17 applies, and gives the result.

(3) Alternatively, and more explicitly, we can deduce this from Theorem 5.16, applied with $T=T^{*}$ . Indeed, by using $T=Re(T)+iIm(T)$ , we conclude that any von Neumann algebra $A\subset B(H)$ is generated by its self-adjoint elements $T\in A$ . Moreover, by using measurable functional calculus, we conclude that $A$ is linearly generated by its projections. But then, assuming $A=\overline{span}\{p_{i}\}$ , with $p_{i}$ being projections, we can set:

T=\sum_{i=0}^{\infty}\frac{p_{i}}{3^{i}}

Then $T=T^{*}$ , and by functional calculus we have $p_{0}\in<T>$ , then $p_{1}\in<T>$ , and so on. Thus $A=<T>$ , and $A=L^{\infty}(X)$ comes now via Theorem 5.16, as claimed. ∎

The above result is the foundation for all the advanced von Neumann algebra theory, that we will discuss in the remainder of this book, and there are many things that can be said about it. To start with, in relation with the general theory of the normed closed algebras, that we developed in the beginning of this chapter, we have:

Warning 5.19.

Although the von Neumann algebras are norm closed, the theory of norm closed algebras does not always apply well to them. For instance for $A=L^{\infty}(X)$ Gelfand gives $A=C(\widehat{X})$ , with $\widehat{X}$ being a certain technical compactification of $X$ .

In short, this would be my advice, do not mess up the two theories that we will be developing in this book, try finding different rooms for them, in your brain. At least at this stage of things, because later, do not worry, we will be playing with both.

Now forgetting about Gelfand, and taking Theorem 5.18 as such, tentative foundation for the theory that we want to develop, as a first consequence of this, we have:

Theorem 5.20.

Given a von Neumann algebra $A\subset B(H)$ , we have

Z(A)=L^{\infty}(X)

with $X$ being a certain measured space.

Proof.

We know from Proposition 5.12 that the center $Z(A)\subset B(H)$ is a von Neumann algebra. Thus Theorem 5.18 applies, and gives the result. ∎

It is possible to further build on this, with a powerful decomposition result as follows, over the measured space $X$ constructed in Theorem 5.20:

A=\int_{X}A_{x}\,dx

But more on this later, after developing the appropriate tools for this program, which is something non-trivial. Among others, before getting into such things, we will have to study the von Neumann algebras $A$ having trivial center, $Z(A)=\mathbb{C}$ , called factors, which include the fibers $A_{x}$ in the above decomposition result. More on this later.

5c. Random matrices

Our main results so far on the von Neumann algebras concern the finite dimensional case, where the algebra is of the form $A=\oplus_{i}M_{n_{i}}(\mathbb{C})$ , and the commutative case, where the algebra is of the form $A=L^{\infty}(X)$ . In order to advance, we must solve:

Question 5.21.

What are the next simplest von Neumann algebras, generalizing at the same time the finite dimensional ones, $A=\oplus_{i}M_{n_{i}}(\mathbb{C})$ , and the commutative ones, $A=L^{\infty}(X)$ , that we can use as input for our study?

In this formulation, our question is a no-brainer, the answer to it being that of looking at the direct integrals of matrix algebras, over an arbitrary measured space $X$ :

A=\int_{X}M_{n_{x}}(\mathbb{C})dx

However, when thinking a bit, all this looks quite tricky, with most likely lots of technical functional analysis and measure theory involved. So, we will leave the investigation of such algebras, which are indeed quite basic, and called of type I, for later.

Nevermind. Let us replace Question 5.21 with something more modest, as follows:

Question 5.22 (update).

What are the next simplest von Neumann algebras, generalizing at the same time the usual matrix algebras, $A=M_{N}(\mathbb{C})$ , and the commutative ones, $A=L^{\infty}(X)$ , that we can use as input for our study?

But here, what we have is again a no-brainer, because in relation to what has been said above, we just have to restrict the attention to the “isotypic” case, where all fibers are isomorphic. And in this case our algebra is a random matrix algebra:

A=\int_{X}M_{N}(\mathbb{C})dx

Which looks quite nice, and so good news, we have our algebras. In practice now, although there is some functional analysis to be done with these algebras, the main questions regard the individual operators $T\in A$ , called random matrices. Thus, we are basically back to good old operator theory. Let us begin our discussion with:

Definition 5.23.

A random matrix algebra is a von Neumann algebra of the following type, with $X$ being a probability space, and with $N\in\mathbb{N}$ being an integer:

A=M_{N}(L^{\infty}(X))

In other words, $A$ appears as a tensor product, as follows,

A=M_{N}(\mathbb{C})\otimes L^{\infty}(X)

of a matrix algebra and a commutative von Neumann algebra.

As a first observation, our algebra can be written as well as follows, with this latter convention being quite standard in the probability literature:

A=L^{\infty}(X,M_{N}(\mathbb{C}))

In connection with the tensor product notation, which is often the most useful one for computations, we have as well the following possible writing, also used in probability:

A=L^{\infty}(X)\otimes M_{N}(\mathbb{C})

Importantly now, each random matrix algebra $A$ is naturally endowed with a canonical von Neumann algebra trace $tr:A\to\mathbb{C}$ , which appears as follows:

Proposition 5.24.

Given a random matrix algebra $A=M_{N}(L^{\infty}(X))$ , consider the linear form $tr:A\to\mathbb{C}$ given by:

tr(T)=\frac{1}{N}\sum_{i=1}^{N}\int_{X}T_{ii}^{x}dx

In tensor product notation, $A=M_{N}(\mathbb{C})\otimes L^{\infty}(X)$ , we have then the formula

tr=\frac{1}{N}\,Tr\otimes\int_{X}

and this functional $tr:A\to\mathbb{C}$ is a faithful positive unital trace.

Proof.

The first assertion, regarding the tensor product writing of $tr$ , is clear from definitions. As for the second assertion, regarding the various properties of $tr$ , this follows from this, because these properties are stable under taking tensor products. ∎

As before, there is a discussion here in connection with the other possible writings of $A$ . With the probabilistic notation $A=L^{\infty}(X,M_{N}(\mathbb{C}))$ , the trace appears as:

tr(T)=\int_{X}\frac{1}{N}\,Tr(T^{x})\,dx

Also, with the probabilistic tensor notation $A=L^{\infty}(X)\otimes M_{N}(\mathbb{C})$ , the trace appears exactly as in the second part of Proposition 5.24, with the order inverted:

tr=\int_{X}\otimes\,\,\frac{1}{N}\,Tr

To summarize, the random matrix algebras appear to be very basic objects, and the only difficulty, in the beginning, lies in getting familiar with the 4 possible notations for them. Or perhaps 5 possible notations, because we have $A=\int_{X}M_{N}(\mathbb{C})dx$ as well.

Getting to work now, as already said, the main questions about random matrix algebras regard the individual operators $T\in A$ , called random matrices. To be more precise, we are interested in computing the laws of such matrices, constructed according to:

Theorem 5.25.

Given an operator algebra $A\subset B(H)$ with a faithful trace $tr:A\to\mathbb{C}$ , any normal element $T\in A$ has a law, namely a probability measure $\mu$ satisfying

tr(T^{k})=\int_{\mathbb{C}}z^{k}d\mu(z)

with the powers being with respect to colored exponents $k=\circ\bullet\bullet\circ\ldots\,$ , defined via

a^{\emptyset}=1\quad,\quad a^{\circ}=a\quad,\quad a^{\bullet}=a^{*}

and multiplicativity. This law is unique, and is supported by the spectrum $\sigma(T)\subset\mathbb{C}$ . In the non-normal case, $TT^{*}\neq T^{*}T$ , such a law does not exist.

Proof.

We have two assertions here, the idea being as follows:

(1) In the normal case, $TT^{*}=T^{*}T$ , we know from Theorem 5.2, based on the continuous functional calculus theorem, that we have:

<T>=C(\sigma(T))

Thus the functional $f(T)\to tr(f(T))$ can be regarded as an integration functional on the algebra $C(\sigma(T))$ , and by the Riesz theorem, this latter functional must come from a probability measure $\mu$ on the spectrum $\sigma(T)$ , in the sense that we must have:

tr(f(T))=\int_{\sigma(T)}f(z)d\mu(z)

We are therefore led to the conclusions in the statement, with the uniqueness assertion coming from the fact that the operators $T^{k}$ , taken as usual with respect to colored integer exponents, $k=\circ\bullet\bullet\circ\ldots$ , generate the whole operator algebra $C(\sigma(T))$ .

(2) In the non-normal case now, $TT^{*}\neq T^{*}T$ , we must show that such a law does not exist. For this purpose, we can use a positivity trick, as follows:

$\displaystyle TT^{}-T^{}T\neq 0$	$\displaystyle\implies$	$\displaystyle(TT^{}-T^{}T)^{2}>0$
	$\displaystyle\implies$	$\displaystyle TT^{}TT^{}-TT^{}T^{}T-T^{}TTT^{}+T^{}TT^{}T>0$
	$\displaystyle\implies$	$\displaystyle tr(TT^{}TT^{}-TT^{}T^{}T-T^{}TTT^{}+T^{}TT^{}T)>0$
	$\displaystyle\implies$	$\displaystyle tr(TT^{}TT^{}+T^{}TT^{}T)>tr(TT^{}T^{}T+T^{}TTT^{})$
	$\displaystyle\implies$	$\displaystyle tr(TT^{}TT^{})>tr(TTT^{}T^{})$

Now assuming that $T$ has a law $\mu\in\mathcal{P}(\mathbb{C})$ , in the sense that the moment formula in the statement holds, the above two different numbers would have to both appear by integrating $|z|^{2}$ with respect to this law $\mu$ , which is contradictory, as desired. ∎

Back now to the random matrices, as a basic example, assume $X=\{.\}$ , so that we are dealing with a usual scalar matrix, $T\in M_{N}(\mathbb{C})$ . By changing the basis of $\mathbb{C}^{N}$ , which won’t affect our trace computations, we can assume that $T$ is diagonal:

T\sim\begin{pmatrix}\lambda_{1}\\ &\ddots\\ &&\lambda_{N}\end{pmatrix}

But for such a diagonal matrix, we have the following formula:

tr(T^{k})=\frac{1}{N}(\lambda_{1}^{k}+\ldots+\lambda_{N}^{k})

Thus, the law of $T$ is the average of the Dirac masses at the eigenvalues:

\mu=\frac{1}{N}\left(\delta_{\lambda_{1}}+\ldots+\delta_{\lambda_{N}}\right)

As a second example now, assume $N=1$ , and so $T\in L^{\infty}(X)$ . In this case we obtain the usual law of $T$ , because the equation to be satisfied by $\mu$ is:

\int_{X}\varphi(T)=\int_{\mathbb{C}}\varphi(x)d\mu(x)

At a more advanced level, the main problem regarding the random matrices is that of computing the law of various classes of such matrices, coming in series:

Question 5.26.

What is the law of random matrices coming in series

T_{N}\in M_{N}(L^{\infty}(X))

in the $N>>0$ regime?

The general strategy here, coming from physicists, is that of computing first the asymptotic law $\mu^{0}$ , in the $N\to\infty$ limit, and then looking for the higher order terms as well, as to finally reach to a series in $N^{-1}$ giving the law of $T_{N}$ , as follows:

\mu_{N}=\mu^{0}+N^{-1}\mu^{1}+N^{-2}\mu^{2}+\ldots

As a basic example here, of particular interest are the random matrices having i.i.d. complex normal entries, under the constraint $T=T^{*}$ . Here the asymptotic law $\mu^{0}$ is the Wigner semicircle law on $[-2,2]$ . We will discuss this in chapter 6 below, and in the meantime we can only recommend some reading, from the original papers of Marchenko-Pastur [mpa], Voiculescu [vo2], Wigner [wig], and from the books of Anderson-Guionnet-Zeitouni [agz], Mehta [meh], Nica-Speicher [nsp], Voiculescu-Dykema-Nica [vdn].

5d. Quantum spaces

Let us end this preliminary chapter on operator algebras with some philosophy, a bit a la Heisenberg. In relation with general “quantum space” goals, Theorem 5.18 is something very interesting, philosophically speaking, suggesting us to formulate:

Definition 5.27.

Given a von Neumann algebra $A\subset B(H)$ , we write

A=L^{\infty}(X)

and call $X$ a quantum measured space.

As an example here, for the simplest noncommutative von Neumann algebra that we know, namely the usual matrix algebra $A=M_{N}(\mathbb{C})$ , the formula that we want to write is as follows, with $M_{N}$ being a certain mysterious quantum space:

M_{N}(\mathbb{C})=L^{\infty}(M_{N})

So, what can we say about this space $M_{N}$ ? As a first observation, this is a finite space, with its cardinality being defined and computed as follows:

|M_{N}|=\dim_{\mathbb{C}}M_{N}(\mathbb{C})=N^{2}

Now since this is the same as the cardinality of the set $\{1,\ldots,N^{2}\}$ , we are led to the conclusion that we should have a twisting result as follows, with the twisting operation $X\to X^{\sigma}$ being something that destroys the points, but keeps the cardinality:

M_{N}=\{1,\ldots,N^{2}\}^{\sigma}

From an analytic viewpoint now, we would like to understand what is the integration over $M_{N}$ , giving rise to the corresponding $L^{\infty}$ functions. And here, we can set:

\int_{M_{N}}A=tr(A)

To be more precise, on the left we have the integral of an arbitrary function on $M_{N}$ , which according to our conventions, should be a usual matrix:

A\in L^{\infty}(M_{N})=M_{N}(\mathbb{C})

As for the quantity on the right, the outcome of the computation, this can only be the trace of $A$ . In addition, it is better to choose this trace to be normalized, by $tr(1)=1$ , and this in order for our measure on $M_{N}$ to have mass 1, as it is ideal:

tr(A)=\frac{1}{N}\,Tr(A)

We can say even more about this. Indeed, since the traces of positive matrices are positive, we are led to the following formula, to be taken with the above conventions, which shows that the measure on $M_{N}$ that we constructed is a probability measure:

A>0\implies\int_{M_{N}}A>0

Before going further, let us record what we found, for future reference:

Theorem 5.28.

The quantum measured space $M_{N}$ formally given by

M_{N}(\mathbb{C})=L^{\infty}(M_{N})

has cardinality $N^{2}$ , appears as a twist, in a purely algebraic sense,

M_{N}=\{1,\ldots,N^{2}\}^{\sigma}

and is a probability space, its uniform integration being given by

\int_{M_{N}}A=tr(A)

where at right we have the normalized trace of matrices, $tr=Tr/N$ .

Proof.

This is something half-informal, mostly for fun, which basically follows from the above discussion, the details and missing details being as follows:

(1) In what regards the formula $|M_{N}|=N^{2}$ , coming by computing the complex vector space dimension, as explained above, this is obviously something rock-solid.

(2) Regarding twisting, we would like to have a formula as follows, with the operation $A\to A^{\sigma}$ being something that destroys the commutativity of the multiplication:

L^{\infty}(M_{N})=L^{\infty}(1,\ldots,N^{2})^{\sigma}

In more familiar terms, with usual complex matrices on the left, and with a better-looking product of sets being used on the right, this formula reads:

M_{N}(\mathbb{C})=L^{\infty}\Big{(}\{1,\ldots,N\}\times\{1,\ldots,N\}\Big{)}^{\sigma}

In order to establish this formula, consider the algebra on the right. As a complex vector space, this algebra has the standard basis $\{f_{ij}\}$ formed by the Dirac masses at the points $(i,j)$ , and the multiplicative structure of this algebra is given by:

f_{ij}f_{kl}=\delta_{ij,kl}

Now let us twist this multiplication, according to the formula $e_{ij}e_{kl}=\delta_{jk}e_{il}$ . We obtain in this way the usual combination formulae for the standard matrix units $e_{ij}:e_{j}\to e_{i}$ of the algebra $M_{N}(\mathbb{C})$ , and so we have our twisting result, as claimed.

(3) In what regards the integration formula in the statement, with the conclusion that the underlying measure on $M_{N}$ is a probability one, this is something that we fully explained before, and as for the result (1) above, it is something rock-solid.

(4) As a last technical comment, observe that the twisting operation performed in (2) destroys both the involution, and the trace of the algebra. This is something quite interesting, which cannot be fixed, and we will back to it, later on. ∎

In order to advance now, based on the above result, the key point there is the construction and interpretation of the trace $tr:M_{N}(\mathbb{C})\to\mathbb{C}$ , as an integration functional. But this leads us into the following natural, and quite puzzling question:

Question 5.29.

In the general context of Definition 5.27, where we formally wrote $A=L^{\infty}(X)$ , what is the underlying integration functional $tr:A\to\mathbb{C}$ ?

This is a quite subtle question, and there are several possible answers here. For instance, we would like the integration functional to have the following property:

tr(ab)=tr(ba)

And the problem is that certain von Neumann algebras do not possess such traces. This is actually something quite advanced, that we do not know yet, but by anticipating a bit, we are in trouble, and we must modify Definition 5.27, as follows:

Definition 5.30 (update).

Given a von Neumann algebra $A\subset B(H)$ , coming with a faithful positive unital trace $tr:A\to\mathbb{C}$ , we write

A=L^{\infty}(X)

and call $X$ a quantum probability space. We also write the trace as $tr=\int_{X}$ , and call it integration with respect to the uniform measure on $X$ .

At the level of examples, passed the classical probability spaces $X$ , we know from Theorem 5.28 that the quantum space $M_{N}$ is a finite quantum probability space. But this raises the question of understanding what the finite quantum probability spaces are, in general. For this purpose, we need to examine the finite dimensional von Neumann algebras. And the result here, extending Theorem 5.13, is as follows:

Theorem 5.31.

The finite dimensional von Neumann algebras $A\subset B(H)$ over an arbitrary Hilbert space $H$ are exactly the direct sums of matrix algebras,

A=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

embedded into $B(H)$ by using a partition of unity of $B(H)$ with rank $1$ projections

1=P_{1}+\ldots+P_{k}

with the “factors” $M_{n_{i}}(\mathbb{C})$ being each embedded into the algebra $P_{i}B(H)P_{i}$ .

Proof.

This is standard, as in the case $A\subset M_{N}(\mathbb{C})$ . Consider the center of $A$ , which is a finite dimensional commutative von Neumann algebra, of the following form:

Z(A)=\mathbb{C}^{k}

Now let $P_{i}$ be the Dirac mass at $i\in\{1,\ldots,k\}$ . Then $P_{i}\in B(H)$ is an orthogonal projection, and these projections form a partition of unity, as follows:

1=P_{1}+\ldots+P_{k}

With $A_{i}=P_{i}AP_{i}$ , we have then a non-unital $*$ -algebra decomposition, as follows:

A=A_{1}\oplus\ldots\oplus A_{k}

On the other hand, it follows from the minimality of each of the projections $P_{i}\in Z(A)$ that we have unital $*$ -algebra isomorphisms $A_{i}\simeq M_{n_{i}}(\mathbb{C})$ , and this gives the result. ∎

We can now deduce what the finite quantum measured spaces are, in the sense of the old Definition 5.27. Indeed, we must solve here the following equation:

L^{\infty}(X)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

Now since the direct unions of sets correspond to direct sums at the level of the associated algebras of functions, in the classical case, we can take the following formula as a definition for a direct union of sets, in the general, noncommutative case:

L^{\infty}(X_{1}\sqcup\ldots\sqcup X_{k})=L^{\infty}(X_{1})\oplus\ldots\oplus L^{\infty}(X_{k})

With this, and by remembering the definition of $M_{N}$ , we are led to the conclusion that the solution to our quantum measured space equation above is as follows:

X=M_{n_{1}}\sqcup\ldots\sqcup M_{n_{k}}

For fully solving our problem, in the spirit of the new Definition 5.30, we still have to discuss the traces on $L^{\infty}(X)$ . We are led in this way to the following statement:

Theorem 5.32.

The finite quantum measured spaces are the spaces

X=M_{n_{1}}\sqcup\ldots\sqcup M_{n_{k}}

according to the following formula, for the associated algebras of functions:

L^{\infty}(X)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{k}}(\mathbb{C})

The cardinality $|X|$ of such a space is the following number,

N=n_{1}^{2}+\ldots+n_{k}^{2}

and the possible traces are as follows, with $\lambda_{i}>0$ summing up to $1$ :

tr=\lambda_{1}tr_{1}\oplus\ldots\oplus\lambda_{k}tr_{k}

Among these traces, we have the canonical trace, appearing as

tr:L^{\infty}(X)\subset\mathcal{L}(L^{\infty}(X))\to\mathbb{C}

via the left regular representation, having weights $\lambda_{i}=n_{i}^{2}/N$ .

Proof.

We have many assertions here, basically coming from the above discussion, with only the last one needing some explanations. Consider the left regular representation of our algebra $A=L^{\infty}(X)$ , which is given by the following formula:

\pi:A\subset\mathcal{L}(A)\quad,\quad\pi(a):b\to ab

We know that the algebra $\mathcal{L}(A)$ of linear operators $T:A\to A$ is isomorphic to a matrix algebra, and more specifically to $M_{N}(\mathbb{C})$ , with $N=|X|$ being as before:

\mathcal{L}(A)\simeq M_{N}(\mathbb{C})

Thus, this algebra has a trace $tr:\mathcal{L}(A)\to\mathbb{C}$ , and by composing this trace with the representation $\pi$ , we obtain a certain trace $tr:A\to\mathbb{C}$ , that we can call “canonical”:

tr:A\subset\mathcal{L}(A)\to\mathbb{C}

We can compute the weights of this trace by using a multimatrix basis of $A$ , formed by matrix units $e_{ab}^{i}$ , with $i\in\{1,\ldots,k\}$ and with $a,b\in\{1,\ldots,n_{i}\}$ , and we obtain:

\lambda_{i}=\frac{n_{i}^{2}}{N}

Thus, we are led to the conclusion in the statement. ∎

We will be back to quantum spaces on several occasions, in what follows. In fact, the present book is as much on operator algebras as it is on quantum spaces, and this because these two points of view are both useful, and complementary to each other.

5e. Exercises

The theory in this chapter has been quite exciting, and we have already run into a number of difficult questions. As a basic exercise on all this, we have:

Exercise 5.33.

Find a simple proof for the von Neumann bicommutant theorem, in finite dimensions.

This is something quite subjective, and try not to cheat. That is, not to convert the amplification proof that we have in general, by using matrix algebras everywhere, nor by using the structure result for the finite dimensional algebras either.

Exercise 5.34.

Again in finite dimensions, $H=\mathbb{C}^{N}$ , compute explicitly the von Neumann algebra $<T>\subset B(H)$ generated by a single operator.

As mentioned above, in the normal case the answer is clear, by diagonalizing $T$ . The problem is that of understanding what happens when $T$ is not normal.

Exercise 5.35.

Try understanding what the law of the simplest non-normal operator,

J=\begin{pmatrix}0&1\\ 0&0\end{pmatrix}

acting on $H=\mathbb{C}^{2}$ should be. Look also at more general Jordan blocks.

There are many non-trivial computations here. We will be back to this.

Exercise 5.36.

Develop a full theory of finite quantum spaces, by enlarging what has been said above, with various geometric topics, of your choice.

This is of course a bit vague, but some further thinking at all this is certainly useful, at this point, and this is what the exercise is about.

Chapter 6 Random matrices

6a. Random matrices

We have seen so far the basics of von Neumann algebras $A\subset B(H)$ , with a look into some interesting ramifications too, concerning random matrices and quantum spaces. In what regards these ramifications, the situation is as follows:

(1) The random matrix algebras, $A=M_{N}(L^{\infty}(X))$ acting on $H=\mathbb{C}^{N}\otimes L^{2}(X)$ , are the simplest von Neumann algebras, from a variety of viewpoints. The main problem regarding them is of operator theoretic nature, regarding the computation of the law of individual elements $T\in A$ with respect to the random matrix trace $tr:A\to\mathbb{C}$ .

(2) The quantum spaces are exciting abstract objects, obtained by looking at an arbitrary von Neumann algebra $A\subset B(H)$ coming with a trace $tr:A\to\mathbb{C}$ , and formally writing the algebra as $A=L^{\infty}(X)$ , and its trace as $tr=\int_{X}$ . In this picture, $X$ is our quantum probability space, and $\int_{X}$ is the integration over it, or expectation.

All this is quite interesting, and we will further explore these two topics, random matrices and quantum spaces, with some basic theory for them, in this chapter and in the next one. As a first observation, these two topics are closely related, due to:

Fact 6.1.

A random matrix algebra can be written in the following way,

$\displaystyle M_{N}(L^{\infty}(X))$	$\displaystyle=$	$\displaystyle M_{N}(\mathbb{C})\otimes L^{\infty}(X)$
	$\displaystyle=$	$\displaystyle L^{\infty}(M_{N})\otimes L^{\infty}(X)$
	$\displaystyle=$	$\displaystyle L^{\infty}(M_{N}\times X)$

so the underlying quantum space is something very simple, $Y=M_{N}\times X$ .

With this understood, the philosophical problem is now, what to do with our quantum spaces, be them of random matrix type $Y=M_{N}\times X$ , or more general. Good question, and do not expect a simple answer to it. Indeed, quantum spaces are more or less the same thing as operator algebras, and from this perspective, our question becomes “what are the operator algebras, and what is to be done with them”, obviously difficult.

And there is even worse, because when remembering that operator algebras are more or less the same thing as quantum mechanics, our question becomes something of type “what is quantum mechanics, and what is to be done with it”. So, modesty.

Getting back to Earth, now that we have our questions and philosophy, for the whole remainder of this book, let us get into random matrices. Quite remarkably, these provide us with an epsilon of answer to our philosophical questions, as follows:

Answer 6.2.

The simplest quantum spaces are those coming from random matrix algebras, which are as follows, with $X$ being a usual probability space,

Y=M_{N}\times X

and what is to be done with them is the computation of the law of individual elements, the random matrices $T\in L^{\infty}(Y)=M_{N}(L^{\infty}(X))$ , in the $N>>0$ regime.

Which looks very nice, we eventually reached to some concrete questions, and time now for mathematics and computations. Getting started, we must first further build on the material from chapter 5. We recall from there that given a von Neumann algebra $A\subset B(H)$ coming with a trace $tr:A\to\mathbb{C}$ , any normal element $T\in A$ has a law, which is the complex probability measure $\mu\in\mathcal{P}(\mathbb{C})$ given by the following formula:

tr(T^{k})=\int_{\mathbb{C}}z^{k}d\mu(z)

In the non-normal case, $TT^{*}\neq T^{*}T$ , the law does not exist as a complex probability measure $\mu\in\mathcal{P}(\mathbb{C})$ , as also explained in chapter 5. However, we can trick a bit, and talk about the law of non-normal elements as well, in the following abstract way:

Definition 6.3.

Let $A$ be a von Neumann algebra, given with a trace $tr:A\to\mathbb{C}$ .

(1)

The elements $T\in A$ are called random variables.
(2)

The moments of such a variable are the numbers $M_{k}(T)=tr(T^{k})$ .
(3)

The law of such a variable is the functional $\mu:P\to tr(P(T))$ .

Here $k=\circ\bullet\bullet\circ\ldots$ is by definition a colored integer, and the powers $T^{k}$ are defined by multiplicativity and the usual formulae, namely:

T^{\emptyset}=1\quad,\quad T^{\circ}=T\quad,\quad T^{\bullet}=T^{*}

As for the polynomial $P$ , this is a noncommuting $*$ -polynomial in one variable:

P\in\mathbb{C}<X,X^{*}>

Observe that the law is uniquely determined by the moments, because:

P(X)=\sum_{k}\lambda_{k}X^{k}\implies\mu(P)=\sum_{k}\lambda_{k}M_{k}(T)

Generally speaking, the above definition, due to Voiculescu [vdn], is something quite abstract, but there is no other way of doing things, at least at this level of generality. However, in the special case where our variable $T\in A$ is self-adjoint, or more generally normal, the theory simplifies, and we recover more familiar objects, as follows:

Theorem 6.4.

The law of a normal variable $T\in A$ can be identified with the corresponding spectral measure $\mu\in\mathcal{P}(\mathbb{C})$ , according to the following formula,

tr(f(T))=\int_{\sigma(T)}f(x)d\mu(x)

valid for any $f\in L^{\infty}(\sigma(T))$ , coming from the measurable functional calculus. In the self-adjoint case the spectral measure is real, $\mu\in\mathcal{P}(\mathbb{R})$ .

Proof.

This is something that we know well, from chapter 5, coming from the spectral theorem for the normal operators, as developed in chapter 3. ∎

Getting back now to the random matrices, we have all we need, as general formalism, and we are ready for doing some computations. As a first observation, we have:

Theorem 6.5.

The laws of basic random matrices $T\in M_{N}(L^{\infty}(X))$ are as follows:

(1)

In the case $N=1$ the random matrix is a usual random variable, $T\in L^{\infty}(X)$ , automatically normal, and its law as defined above is the usual law.
(2)

In the case $X=\{.\}$ the random matrix is a usual scalar matrix, $T\in M_{N}(\mathbb{C})$ , and in the diagonalizable case, the law is $\mu=\frac{1}{N}\left(\delta_{\lambda_{1}}+\ldots+\delta_{\lambda_{N}}\right)$ .

Proof.

This is something that we know, once again, from chapter 5, and which is elementary. Indeed, the first assertion follows from definitions, and the above discussion. As for the second assertion, this follows by diagonalizing the matrix. ∎

In general, what we have can only be a mixture of (1) and (2) above. Our plan will be that of discussing more in detail (1), and then getting into the general case, or rather into the case of the most interesting random matrices, with inspiration from (2).

6b. Probability theory

So, let us set $N=1$ . Here our algebra is $A=L^{\infty}(X)$ , an arbitrary commutative von Neumann algebra. The most interesting linear operators $T\in A$ , that we will rather denote as complex functions $f:X\to\mathbb{C}$ , and call random variables, as it is customary, are the normal, or Gaussian variables, which are defined as follows:

Definition 6.6.

A variable $f:X\to\mathbb{R}$ is called standard normal when its law is:

g_{1}=\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}dx

More generally, the normal law of parameter $t>0$ is the following measure:

g_{t}=\frac{1}{\sqrt{2\pi t}}e^{-x^{2}/2t}dx

These are also called Gaussian distributions, with “g” standing for Gauss.

Observe that these normal laws have indeed mass 1, as they should, as shown by a quick change of variable, and the Gauss formula, namely:

$\displaystyle\left(\int_{\mathbb{R}}e^{-x^{2}}dx\right)^{2}$	$\displaystyle=$	$\displaystyle\int_{\mathbb{R}}\int_{\mathbb{R}}e^{-x^{2}-y^{2}}dxdy$
	$\displaystyle=$	$\displaystyle\int_{0}^{2\pi}\int_{0}^{\infty}e^{-r^{2}}rdrdt$
	$\displaystyle=$	$\displaystyle 2\pi\times\frac{1}{2}$
	$\displaystyle=$	$\displaystyle\pi$

Let us start with some basic results regarding the normal laws. We first have:

Proposition 6.7.

The normal law $g_{t}$ with $t>0$ has the following properties:

(1)

The variance is $V=t$ .
(2)

The density is even, so the odd moments vanish.
(3)

The even moments are $M_{k}=t^{k/2}\times k!!$ , with $k!!=(k-1)(k-3)(k-5)\ldots\,$ .
(4)

Equivalently, the moments are $M_{k}=\sum_{\pi\in P_{2}(k)}t^{|\pi|}$ , for any $k\in\mathbb{N}$ .
(5)

The Fourier transform $F_{f}(x)=\mathbb{E}(e^{ixf})$ is given by $F(x)=e^{-tx^{2}/2}$ .
(6)

We have the convolution semigroup formula $g_{s}*g_{t}=g_{s+t}$ , for any $s,t>0$ .

Proof.

All this is very standard, with the various notations used in the statement being explained below, the idea being as follows:

(1) The normal law $g_{t}$ being centered, its variance is the second moment, $V=M_{2}$ . Thus the result follows from (3), proved below, which gives in particular:

M_{2}=t^{2/2}\times 2!!=t

(2) This is indeed something self-explanatory.

(3) We have indeed the following computation, by partial integration:

$\displaystyle M_{k}$	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}x^{k}e^{-x^{2}/2t}dx$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}(tx^{k-1})\left(-e^{-x^{2}/2t}\right)^{\prime}dx$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}t(k-1)x^{k-2}e^{-x^{2}/2t}dx$
	$\displaystyle=$	$\displaystyle t(k-1)\times\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}x^{k-2}e^{-x^{2}/2t}dx$
	$\displaystyle=$	$\displaystyle t(k-1)M_{k-2}$

The initial value being $M_{0}=1$ , we obtain the result.

(4) We know from (2,3) that the moments of the normal law $g_{t}$ satisfy the following recurrence formula, with the initial data $M_{0}=1,M_{1}=0$ :

M_{k}=t(k-1)M_{k-2}

Now let us look at $P_{2}(k)$ , the set of pairings of $\{1,\ldots,k\}$ . In order to have such a pairing, we must pair 1 with a number chosen among $2,\ldots,k$ , and then come up with a pairing of the remaining $k-2$ numbers. Thus, the number $N_{k}=|P_{2}(k)|$ of such pairings is subject to the following recurrence formula, with initial data $N_{0}=1,N_{1}=0$ :

N_{k}=(k-1)N_{k-2}

But this solves our problem at $t=1$ , because in this case we obtain the following formula, with $|.|$ standing as usual for the number of blocks of a partition:

M_{k}=N_{k}=|P_{2}(k)|=\sum_{\pi\in P_{2}(k)}1=\sum_{\pi\in P_{2}(k)}1^{|\pi|}

Now back to the general case, $t>0$ , our problem here is solved in fact too, because the number of blocks of a pairing $\pi\in P_{2}(k)$ being constant, $|\pi|=k/2$ , we obtain:

M_{k}=t^{k/2}N_{k}=\sum_{\pi\in P_{2}(k)}t^{k/2}=\sum_{\pi\in P_{2}(k)}t^{|\pi|}

(5) The Fourier transform formula can be established as follows:

$\displaystyle F(x)$	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-y^{2}/2t+ixy}dy$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-(y/\sqrt{2t}-\sqrt{t/2}ix)^{2}-tx^{2}/2}dy$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{2\pi t}}\int_{\mathbb{R}}e^{-z^{2}-tx^{2}/2}\sqrt{2t}dz$
	$\displaystyle=$	$\displaystyle\frac{1}{\sqrt{\pi}}e^{-tx^{2}/2}\int_{\mathbb{R}}e^{-z^{2}}dz$
	$\displaystyle=$	$\displaystyle e^{-tx^{2}/2}$

(6) This follows indeed from (5), because $\log F_{g_{t}}$ is linear in $t$ . ∎

We are now ready to establish the Central Limit Theorem (CLT), which is a key result, telling us why the normal laws appear a bit everywhere, in the real life:

Theorem 6.8.

Given a sequence of real random variables $f_{1},f_{2},f_{3},\ldots\in L^{\infty}(X)$ , which are i.i.d., centered, and with variance $t>0$ , we have

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}f_{i}\sim g_{t}

with $n\to\infty$ , in moments.

Proof.

In terms of moments, the Fourier transform $F_{f}(x)=\mathbb{E}(e^{ixf})$ is given by:

F_{f}(x)=\mathbb{E}\left(\sum_{k=0}^{\infty}\frac{(ixf)^{k}}{k!}\right)=\sum_{k=0}^{\infty}\frac{i^{k}M_{k}(f)}{k!}\,x^{k}

Thus, the Fourier transform of the variable in the statement is:

$\displaystyle F(x)$	$\displaystyle=$	$\displaystyle\left[F_{f}\left(\frac{x}{\sqrt{n}}\right)\right]^{n}$
	$\displaystyle=$	$\displaystyle\left[1-\frac{tx^{2}}{2n}+O(n^{-2})\right]^{n}$
	$\displaystyle\simeq$	$\displaystyle\left[1-\frac{tx^{2}}{2n}\right]^{n}$
	$\displaystyle\simeq$	$\displaystyle e^{-tx^{2}/2}$

But this latter function being the Fourier transform of $g_{t}$ , we obtain the result. ∎

Let us discuss as well the “discrete” counterpart of the above results, that we will need too a bit later, in relation with the random matrices. We have:

Definition 6.9.

The Poisson law of parameter $1$ is the following measure,

p_{1}=\frac{1}{e}\sum_{k}\frac{\delta_{k}}{k!}

and the Poisson law of parameter $t>0$ is the following measure,

p_{t}=e^{-t}\sum_{k}\frac{t^{k}}{k!}\,\delta_{k}

with the letter “p” standing for Poisson.

We will see in a moment why these laws appear everywhere, in discrete probability, the reasons behind this coming from the Poisson Limit Theorem (PLT). Getting started now, in analogy with the normal laws, the Poisson laws have the following properties:

Proposition 6.10.

The Poisson law $p_{t}$ with $t>0$ has the following properties:

(1)

The variance is $V=t$ .
(2)

The moments are $M_{k}=\sum_{\pi\in P(k)}t^{|\pi|}$ .
(3)

The Fourier transform is $F(x)=\exp\left((e^{ix}-1)t\right)$ .
(4)

We have the semigroup formula $p_{s}*p_{t}=p_{s+t}$ , for any $s,t>0$ .

Proof.

We have four formulae to be proved, the idea being as follows:

(1) The variance is $V=M_{2}-M_{1}^{2}$ , and by using the formulae $M_{1}=t$ and $M_{2}=t+t^{2}$ , coming from (2), proved below, we obtain as desired, $V=t$ .

(2) This is something more tricky. Consider indeed the set $P(k)$ of all partitions of $\{1,\ldots,k\}$ . At $t=1$ , to start with, the formula that we want to prove is:

M_{k}=|P(k)|

We have the following recurrence formula for the moments of $p_{1}$ :

$\displaystyle M_{k+1}$	$\displaystyle=$	$\displaystyle\frac{1}{e}\sum_{s}\frac{(s+1)^{k+1}}{(s+1)!}$
	$\displaystyle=$	$\displaystyle\frac{1}{e}\sum_{s}\frac{s^{k}}{s!}\left(1+\frac{1}{s}\right)^{k}$
	$\displaystyle=$	$\displaystyle\frac{1}{e}\sum_{s}\frac{s^{k}}{s!}\sum_{r}\binom{k}{r}s^{-r}$
	$\displaystyle=$	$\displaystyle\sum_{r}\binom{k}{r}\cdot\frac{1}{e}\sum_{s}\frac{s^{k-r}}{s!}$
	$\displaystyle=$	$\displaystyle\sum_{r}\binom{k}{r}M_{k-r}$

Our claim is that the numbers $B_{k}=|P(k)|$ satisfy the same recurrence formula. Indeed, since a partition of $\{1,\ldots,k+1\}$ appears by choosing $r$ neighbors for $1$ , among the $k$ numbers available, and then partitioning the $k-r$ elements left, we have:

B_{k+1}=\sum_{r}\binom{k}{r}B_{k-r}

Thus we obtain by recurrence $M_{k}=B_{k}$ , as desired. Regarding now the general case, $t>0$ , we can use here a similar method. We have the following recurrence formula for the moments of $p_{t}$ , obtained by using the binomial formula:

$\displaystyle M_{k+1}$	$\displaystyle=$	$\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}(s+1)^{k+1}}{(s+1)!}$
	$\displaystyle=$	$\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}s^{k}}{s!}\left(1+\frac{1}{s}\right)^{k}$
	$\displaystyle=$	$\displaystyle e^{-t}\sum_{s}\frac{t^{s+1}s^{k}}{s!}\sum_{r}\binom{k}{r}s^{-r}$
	$\displaystyle=$	$\displaystyle\sum_{r}\binom{k}{r}\cdot e^{-t}\sum_{s}\frac{t^{s+1}s^{k-r}}{s!}$
	$\displaystyle=$	$\displaystyle t\sum_{r}\binom{k}{r}M_{k-r}$

On the other hand, consider the numbers in the statement, $S_{k}=\sum_{\pi\in P(k)}t^{|\pi|}$ . As before, since a partition of $\{1,\ldots,k+1\}$ appears by choosing $r$ neighbors for $1$ , among the $k$ numbers available, and then partitioning the $k-r$ elements left, we have:

S_{k+1}=t\sum_{r}\binom{k}{r}S_{k-r}

Thus we obtain by recurrence $M_{k}=B_{k}$ , as desired.

(3) The Fourier transform formula can be established as follows:

$\displaystyle F_{p_{t}}(x)$	$\displaystyle=$	$\displaystyle e^{-t}\sum_{k}\frac{t^{k}}{k!}F_{\delta_{k}}(x)$
	$\displaystyle=$	$\displaystyle e^{-t}\sum_{k}\frac{t^{k}}{k!}\,e^{ikx}$
	$\displaystyle=$	$\displaystyle e^{-t}\sum_{k}\frac{(e^{ix}t)^{k}}{k!}$
	$\displaystyle=$	$\displaystyle\exp(-t)\exp(e^{ix}t)$
	$\displaystyle=$	$\displaystyle\exp\left((e^{ix}-1)t\right)$

(4) This follows from (3), because $\log F_{p_{t}}$ is linear in $t$ . ∎

We are now ready to establish the Poisson Limit Theorem (PLT), as follows:

Theorem 6.11.

We have the following convergence, in moments,

\left(\left(1-\frac{t}{n}\right)\delta_{0}+\frac{t}{n}\delta_{1}\right)^{*n}\to p_{t}

for any $t>0$ .

Proof.

Let us denote by $\mu_{n}$ the Bernoulli measure appearing under the convolution sign. We have then the following computation:

$\displaystyle F_{\delta_{r}}(x)=e^{irx}$	$\displaystyle\implies$	$\displaystyle F_{\mu_{n}}(x)=\left(1-\frac{t}{n}\right)+\frac{t}{n}e^{ix}$
	$\displaystyle\implies$	$\displaystyle F_{\mu_{n}^{*n}}(x)=\left(\left(1-\frac{t}{n}\right)+\frac{t}{n}e^{ix}\right)^{n}$
	$\displaystyle\implies$	$\displaystyle F_{\mu_{n}^{*n}}(x)=\left(1+\frac{(e^{ix}-1)t}{n}\right)^{n}$
	$\displaystyle\implies$	$\displaystyle F(x)=\exp\left((e^{ix}-1)t\right)$

Thus, we obtain the Fourier transform of $p_{t}$ , as desired. ∎

As a third and last topic from classical probability, let us discuss now the complex normal laws, that we will need too. To start with, we have the following definition:

Definition 6.12.

The complex Gaussian law of parameter $t>0$ is

G_{t}=law\left(\frac{1}{\sqrt{2}}(a+ib)\right)

where $a,b$ are independent, each following the law $g_{t}$ .

As in the real case, these measures form convolution semigroups:

Proposition 6.13.

The complex Gaussian laws have the property

G_{s}*G_{t}=G_{s+t}

for any $s,t>0$ , and so they form a convolution semigroup.

Proof.

This follows indeed from the real result, namely $g_{s}*g_{t}=g_{s+t}$ , established above, simply by taking real and imaginary parts. ∎

We have the following complex analogue of the CLT:

Theorem 6.14 (CCLT).

Given complex random variables $f_{1},f_{2},f_{3},\ldots\in L^{\infty}(X)$ which are i.i.d., centered, and with variance $t>0$ , we have, with $n\to\infty$ , in moments,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}f_{i}\sim G_{t}

where $G_{t}$ is the complex Gaussian law of parameter $t$ .

Proof.

This follows indeed from the real CLT, established above, simply by taking the real and imaginary parts of all the variables involved. ∎

Regarding now the moments, we use the general formalism from Definition 6.3, involving colored integer exponents $k=\circ\bullet\bullet\circ\ldots\,$ We say that a pairing $\pi\in P_{2}(k)$ is matching when it pairs $\circ-\bullet$ symbols. With this convention, we have the following result:

Theorem 6.15.

The moments of the complex normal law are the numbers

M_{k}(G_{t})=\sum_{\pi\in\mathcal{P}_{2}(k)}t^{|\pi|}

where $\mathcal{P}_{2}(k)$ are the matching pairings of $\{1,\ldots,k\}$ , and $|.|$ is the number of blocks.

Proof.

This is something well-known, which can be established as follows:

(1) As a first observation, by using a standard dilation argument, it is enough to do this at $t=1$ . So, let us first recall from the above that the moments of the real Gaussian law $g_{1}$ , with respect to integer exponents $k\in\mathbb{N}$ , are the following numbers:

m_{k}=|P_{2}(k)|

Numerically, we have the following formula, explained as well in the above:

m_{k}=\begin{cases}k!!&(k\ {\rm even})\\ 0&(k\ {\rm odd})\end{cases}

(2) We will show here that in what concerns the complex Gaussian law $G_{1}$ , similar results hold. Numerically, we will prove that we have the following formula, where a colored integer $k=\circ\bullet\bullet\circ\ldots$ is called uniform when it contains the same number of $\circ$ and $\bullet$ , and where $|k|\in\mathbb{N}$ is the length of such a colored integer:

M_{k}=\begin{cases}(|k|/2)!&(k\ {\rm uniform})\\ 0&(k\ {\rm not\ uniform})\end{cases}

Now since the matching partitions $\pi\in\mathcal{P}_{2}(k)$ are counted by exactly the same numbers, and this for trivial reasons, we will obtain the formula in the statement, namely:

M_{k}=|\mathcal{P}_{2}(k)|

(3) This was for the plan. In practice now, we must compute the moments, with respect to colored integer exponents $k=\circ\bullet\bullet\circ\ldots$ , of the variable in the statement:

c=\frac{1}{\sqrt{2}}(a+ib)

As a first observation, in the case where such an exponent $k=\circ\bullet\bullet\circ\ldots$ is not uniform in $\circ,\bullet$ , a rotation argument shows that the corresponding moment of $c$ vanishes. To be more precise, the variable $c^{\prime}=wc$ can be shown to be complex Gaussian too, for any $w\in\mathbb{C}$ , and from $M_{k}(c)=M_{k}(c^{\prime})$ we obtain $M_{k}(c)=0$ , in this case.

(4) In the uniform case now, where $k=\circ\bullet\bullet\circ\ldots$ consists of $p$ copies of $\circ$ and $p$ copies of $\bullet$ , the corresponding moment can be computed as follows:

$\displaystyle M_{k}$	$\displaystyle=$	$\displaystyle\frac{1}{2^{p}}\int(a^{2}+b^{2})^{p}$
	$\displaystyle=$	$\displaystyle\frac{1}{2^{p}}\sum_{s}\binom{p}{s}\int a^{2s}\int b^{2p-2s}$
	$\displaystyle=$	$\displaystyle\frac{1}{2^{p}}\sum_{s}\binom{p}{s}(2s)!!(2p-2s)!!$
	$\displaystyle=$	$\displaystyle\frac{1}{2^{p}}\sum_{s}\frac{p!}{s!(p-s)!}\cdot\frac{(2s)!}{2^{s}s!}\cdot\frac{(2p-2s)!}{2^{p-s}(p-s)!}$
	$\displaystyle=$	$\displaystyle\frac{p!}{4^{p}}\sum_{s}\binom{2s}{s}\binom{2p-2s}{p-s}$

(5) In order to finish now the computation, let us recall that we have the following formula, coming from the generalized binomial formula, or from the Taylor formula:

\frac{1}{\sqrt{1+t}}=\sum_{k=0}^{\infty}\binom{2k}{k}\left(\frac{-t}{4}\right)^{k}

By taking the square of this series, we obtain the following formula:

	$\displaystyle\frac{1}{1+t}$	$\displaystyle=$	$\displaystyle\sum_{ks}\binom{2k}{k}\binom{2s}{s}\left(\frac{-t}{4}\right)^{k+s}$
		$\displaystyle=$	$\displaystyle\sum_{p}\left(\frac{-t}{4}\right)^{p}\sum_{s}\binom{2s}{s}\binom{2p-2s}{p-s}$

Now by looking at the coefficient of $t^{p}$ on both sides, we conclude that the sum on the right equals $4^{p}$ . Thus, we can finish the moment computation in (4), as follows:

M_{p}=\frac{p!}{4^{p}}\times 4^{p}=p!

(6) As a conclusion, if we denote by $|k|$ the length of a colored integer $k=\circ\bullet\bullet\circ\ldots$ , the moments of the variable $c$ in the statement are given by:

M_{k}=\begin{cases}(|k|/2)!&(k\ {\rm uniform})\\ 0&(k\ {\rm not\ uniform})\end{cases}

On the other hand, the numbers $|\mathcal{P}_{2}(k)|$ are given by exactly the same formula. Indeed, in order to have matching pairings of $k$ , our exponent $k=\circ\bullet\bullet\circ\ldots$ must be uniform, consisting of $p$ copies of $\circ$ and $p$ copies of $\bullet$ , with $p=|k|/2$ . But then the matching pairings of $k$ correspond to the permutations of the $\bullet$ symbols, as to be matched with $\circ$ symbols, and so we have $p!$ such matching pairings. Thus, we have the same formula as for the moments of $c$ , and we are led to the conclusion in the statement. ∎

This was for the basic probability theory, which is in a certain sense advanced operator theory, inside the commutative von Neumann algebras, $A=L^{\infty}(X)$ . We will be back to this, with some further limiting theorems, in chapter 8 below.

6c. Wigner matrices

Let us exit now the classical world, that of the commutative von Neumann algebras $A=L^{\infty}(X)$ , and do as promised some random matrix theory. We recall that a random matrix algebra is a von Neumann algebra of type $A=M_{N}(L^{\infty}(X))$ , and that we are interested in the computation of the laws of the operators $T\in A$ , called random matrices. Regarding the precise classes of random matrices that we are interested in, first we have the complex Gaussian matrices, which are constructed as follows:

Definition 6.16.

A complex Gaussian matrix is a random matrix of type

Z\in M_{N}(L^{\infty}(X))

which has i.i.d. complex normal entries.

We will see that the above matrices have an interesting, and “central” combinatorics, among all kinds of random matrices, with the study of the other random matrices being usually obtained as a modification of the study of the Gaussian matrices.

As a somewhat surprising remark, using real normal variables in Definition 6.16, instead of the complex ones appearing there, leads nowhere. The correct real versions of the Gaussian matrices are the Wigner random matrices, constructed as follows:

Definition 6.17.

A Wigner matrix is a random matrix of type

Z\in M_{N}(L^{\infty}(X))

which has i.i.d. complex normal entries, up to the constraint $Z=Z^{*}$ .

In other words, a Wigner matrix must be as follows, with the diagonal entries being real normal variables, $a_{i}\sim g_{t}$ , for some $t>0$ , the upper diagonal entries being complex normal variables, $b_{ij}\sim G_{t}$ , the lower diagonal entries being the conjugates of the upper diagonal entries, as indicated, and with all the variables $a_{i},b_{ij}$ being independent:

Z=\begin{pmatrix}a_{1}&b_{12}&\ldots&\ldots&b_{1N}\\ \bar{b}_{12}&a_{2}&\ddots&&\vdots\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ \vdots&&\ddots&a_{N-1}&b_{N-1,N}\\ \bar{b}_{1N}&\ldots&\ldots&\bar{b}_{N-1,N}&a_{N}\end{pmatrix}

As a comment here, for many concrete applications the Wigner matrices are in fact the central objects in random matrix theory, and in particular, they are often more important than the Gaussian matrices. In fact, these are the random matrices which were first considered and investigated, a long time ago, by Wigner himself [wig].

Finally, we will be interested as well in the complex Wishart matrices, which are the positive versions of the above random matrices, constructed as follows:

Definition 6.18.

A complex Wishart matrix is a random matrix of type

Z=YY^{*}\in M_{N}(L^{\infty}(X))

with $Y$ being a complex Gaussian matrix.

As before with the Gaussian and Wigner matrices, there are many possible comments that can be made here, of technical or historical nature. First, using real Gaussian variables instead of complex ones leads to a less interesting combinatorics. Also, these matrices were introduced and studied by Marchenko-Pastur not long after Wigner, in [mpa], and so historically came second. Finally, in what regards their combinatorics and applications, these matrices quite often come first, before both the Gaussian and the Wigner ones, with all this being of course a matter of knowledge and taste.

Summarizing, we have three main types of random matrices, which can be somehow designated as “complex”, “real” and “positive”, and that we will study in what follows. Let us also mention that there are many other interesting classes of random matrices, usually appearing as modifications of the above. More on these later.

In order to compute the asymptotic laws of the above matrices, we will use the moment method. We have the following result, which will be our main tool here:

Theorem 6.19.

Given independent variables $X_{i}$ , each following the complex normal law $G_{t}$ , with $t>0$ being a fixed parameter, we have the Wick formula

\mathbb{E}\left(X_{i_{1}}^{k_{1}}\ldots X_{i_{s}}^{k_{s}}\right)=t^{s/2}\#\left\{\pi\in\mathcal{P}_{2}(k)\Big{|}\pi\leq\ker i\right\}

where $k=k_{1}\ldots k_{s}$ and $i=i_{1}\ldots i_{s}$ , for the joint moments of these variables.

Proof.

This is something well-known, and the basis for all possible computations with complex normal variables, which can be proved in two steps, as follows:

(1) Let us first discuss the case where we have a single complex normal variable $X$ , which amounts in taking $X_{i}=X$ for any $i$ in the formula in the statement. What we have to compute here are the moments of $X$ , with respect to colored integer exponents $k=\circ\bullet\bullet\circ\ldots\,$ , and the formula in the statement tells us that these moments must be:

\mathbb{E}(X^{k})=t^{|k|/2}|\mathcal{P}_{2}(k)|

But this is something that we know well from the above, the idea being that at $t=1$ this follows by doing some combinatorics and calculus, in analogy with the combinatorics and calculus from the real case, where the moment formula is identical, save for the matching pairings $\mathcal{P}_{2}$ being replaced by the usual pairings $P_{2}$ , and then that the general case $t>0$ follows from this, by rescaling. Thus, we are done with this case.

(2) In general now, the point is that we obtain the formula in the statement. Indeed, when expanding the product $X_{i_{1}}^{k_{1}}\ldots X_{i_{s}}^{k_{s}}$ and rearranging the terms, we are left with doing a number of computations as in (1), and then making the product of the expectations that we found. But this amounts precisely in counting the partitions in the statement, with the condition $\pi\leq\ker i$ there standing precisely for the fact that we are doing the various type (1) computations independently, and then making the product. ∎

Now by getting back to the Gaussian matrices, we have the following result, with $\mathcal{NC}_{2}(k)=\mathcal{P}_{2}(k)\cap NC(k)$ standing for the noncrossing pairings of a colored integer $k$ :

Theorem 6.20.

Given a sequence of Gaussian random matrices

Z_{N}\in M_{N}(L^{\infty}(X))

having independent $G_{t}$ variables as entries, for some fixed $t>0$ , we have

M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{|k|/2}|\mathcal{NC}_{2}(k)|

for any colored integer $k=\circ\bullet\bullet\circ\ldots\,$ , in the $N\to\infty$ limit.

Proof.

This is something standard, which can be done as follows:

(1) We fix $N\in\mathbb{N}$ , and we let $Z=Z_{N}$ . Let us first compute the trace of $Z^{k}$ . With $k=k_{1}\ldots k_{s}$ , and with the convention $(ij)^{\circ}=ij,(ij)^{\bullet}=ji$ , we have:

$\displaystyle Tr(Z^{k})$	$\displaystyle=$	$\displaystyle Tr(Z^{k_{1}}\ldots Z^{k_{s}})$
	$\displaystyle=$	$\displaystyle\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z^{k_{1}})_{i_{1}i_{2}}(Z^{k_{2}})_{i_{2}i_{3}}\ldots(Z^{k_{s}})_{i_{s}i_{1}}$
	$\displaystyle=$	$\displaystyle\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}$

(2) Next, we rescale our variable $Z$ by a $\sqrt{N}$ factor, as in the statement, and we also replace the usual trace by its normalized version, $tr=Tr/N$ . Our formula becomes:

tr\left(\left(\frac{Z}{\sqrt{N}}\right)^{k}\right)=\frac{1}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}

Thus, the moment that we are interested in is given by:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)=\frac{1}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}\int_{X}(Z_{(i_{1}i_{2})^{k_{1}}})^{k_{1}}(Z_{(i_{2}i_{3})^{k_{2}}})^{k_{2}}\ldots(Z_{(i_{s}i_{1})^{k_{s}}})^{k_{s}}

(3) Let us apply now the Wick formula, from Theorem 6.19. We conclude that the moment that we are interested in is given by the following formula:

			$\displaystyle M_{k}\left(\frac{Z}{\sqrt{N}}\right)$
		$\displaystyle=$	$\displaystyle\frac{t^{s/2}}{N^{s/2+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{s}=1}^{N}\#\left\{\pi\in\mathcal{P}_{2}(k)\Big{\|}\pi\leq\ker\left((i_{1}i_{2})^{k_{1}},(i_{2}i_{3})^{k_{2}},\ldots,(i_{s}i_{1})^{k_{s}}\right)\right\}$
		$\displaystyle=$	$\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\frac{1}{N^{s/2+1}}\#\left\{i\in\{1,\ldots,N\}^{s}\Big{\|}\pi\leq\ker\left((i_{1}i_{2})^{k_{1}},(i_{2}i_{3})^{k_{2}},\ldots,(i_{s}i_{1})^{k_{s}}\right)\right\}$

(4) Our claim now is that in the $N\to\infty$ limit the combinatorics of the above sum simplifies, with only the noncrossing partitions contributing to the sum, and with each of them contributing precisely with a 1 factor, so that we will have, as desired:

$\displaystyle M_{k}\left(\frac{Z}{\sqrt{N}}\right)$	$\displaystyle=$	$\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\Big{(}\delta_{\pi\in NC_{2}(k)}+O(N^{-1})\Big{)}$
	$\displaystyle\simeq$	$\displaystyle t^{s/2}\sum_{\pi\in\mathcal{P}_{2}(k)}\delta_{\pi\in NC_{2}(k)}$
	$\displaystyle=$	$\displaystyle t^{s/2}\|\mathcal{NC}_{2}(k)\|$

(5) In order to prove this, the first observation is that when $k$ is not uniform, in the sense that it contains a different number of $\circ$ , $\bullet$ symbols, we have $\mathcal{P}_{2}(k)=\emptyset$ , and so:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{s/2}|\mathcal{NC}_{2}(k)|=0

(6) Thus, we are left with the case where $k$ is uniform. Let us examine first the case where $k$ consists of an alternating sequence of $\circ$ and $\bullet$ symbols, as follows:

k=\underbrace{\circ\bullet\circ\bullet\ldots\ldots\circ\bullet}_{2p}

In this case it is convenient to relabel our multi-index $i=(i_{1},\ldots,i_{s})$ , with $s=2p$ , in the form $(j_{1},l_{1},j_{2},l_{2},\ldots,j_{p},l_{p})$ . With this done, our moment formula becomes:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{p}\sum_{\pi\in\mathcal{P}_{2}(k)}\frac{1}{N^{p+1}}\#\left\{j,l\in\{1,\ldots,N\}^{p}\Big{|}\pi\leq\ker\left(j_{1}l_{1},j_{2}l_{1},j_{2}l_{2},\ldots,j_{1}l_{p}\right)\right\}

Now observe that, with $k$ being as above, we have an identification $\mathcal{P}_{2}(k)\simeq S_{p}$ , obtained in the obvious way. With this done too, our moment formula becomes:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)=t^{p}\sum_{\pi\in S_{p}}\frac{1}{N^{p+1}}\#\left\{j,l\in\{1,\ldots,N\}^{p}\Big{|}j_{r}=j_{\pi(r)+1},l_{r}=l_{\pi(r)},\forall r\right\}

(7) We are now ready to do our asymptotic study, and prove the claim in (4). Let indeed $\gamma\in S_{p}$ be the full cycle, which is by definition the following permutation:

\gamma=(1\,2\,\ldots\,p)

In terms of $\gamma$ , the conditions $j_{r}=j_{\pi(r)+1}$ and $l_{r}=l_{\pi(r)}$ found above read:

\gamma\pi\leq\ker j\quad,\quad\pi\leq\ker l

Counting the number of free parameters in our moment formula, we obtain:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)=\frac{t^{p}}{N^{p+1}}\sum_{\pi\in S_{p}}N^{|\pi|+|\gamma\pi|}=t^{p}\sum_{\pi\in S_{p}}N^{|\pi|+|\gamma\pi|-p-1}

(8) The point now is that the last exponent is well-known to be $\leq 0$ , with equality precisely when the permutation $\pi\in S_{p}$ is geodesic, which in practice means that $\pi$ must come from a noncrossing partition. Thus we obtain, in the $N\to\infty$ limit, as desired:

M_{k}\left(\frac{Z}{\sqrt{N}}\right)\simeq t^{p}|\mathcal{NC}_{2}(k)|

This finishes the proof in the case of the exponents $k$ which are alternating, and the case where $k$ is an arbitrary uniform exponent is similar, by permuting everything. ∎

As a conclusion to this, we have obtained as asymptotic law for the Gaussian matrices a certain mysterious distribution, having as moments some numbers which are similar to the moments of the usual normal laws, but with the “underlying matching pairings being now replaced by underlying matching noncrossing pairings”. More on this later.

Regarding now the Wigner matrices, we have here the following result, coming as a consequence of Theorem 6.20, via some simple algebraic manipulations:

Theorem 6.21.

Given a sequence of Wigner random matrices

Z_{N}\in M_{N}(L^{\infty}(X))

having independent $G_{t}$ variables as entries, with $t>0$ , up to $Z_{N}=Z_{N}^{*}$ , we have

M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{k/2}|NC_{2}(k)|

for any integer $k\in\mathbb{N}$ , in the $N\to\infty$ limit.

Proof.

This can be deduced from a direct computation based on the Wick formula, similar to that from the proof of Theorem 6.20, but the best is to deduce this result from Theorem 6.20 itself. Indeed, we know from there that for Gaussian matrices $Y_{N}\in M_{N}(L^{\infty}(X))$ we have the following formula, valid for any colored integer $K=\circ\bullet\bullet\circ\ldots\,$ , in the $N\to\infty$ limit, with $\mathcal{NC}_{2}$ standing for noncrossing matching pairings:

M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)\simeq t^{|K|/2}|\mathcal{NC}_{2}(K)|

By doing some combinatorics, we deduce from this that we have the following formula for the moments of the matrices $Re(Y_{N})$ , with respect to usual exponents, $k\in\mathbb{N}$ :

$\displaystyle M_{k}\left(\frac{Re(Y_{N})}{\sqrt{N}}\right)$	$\displaystyle=$	$\displaystyle 2^{-k}\cdot M_{k}\left(\frac{Y_{N}}{\sqrt{N}}+\frac{Y_{N}^{*}}{\sqrt{N}}\right)$
	$\displaystyle=$	$\displaystyle 2^{-k}\sum_{\|K\|=k}M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)$
	$\displaystyle\simeq$	$\displaystyle 2^{-k}\sum_{\|K\|=k}t^{k/2}\|\mathcal{NC}_{2}(K)\|$
	$\displaystyle=$	$\displaystyle 2^{-k}\cdot t^{k/2}\cdot 2^{k/2}\|\mathcal{NC}_{2}(k)\|$
	$\displaystyle=$	$\displaystyle 2^{-k/2}\cdot t^{k/2}\|NC_{2}(k)\|$

Now since the matrices $Z_{N}=\sqrt{2}Re(Y_{N})$ are of Wigner type, this gives the result. ∎

Summarizing, all this brings us into counting noncrossing pairings. So, let us start with some preliminaries here. We first have the following well-known result:

Theorem 6.22.

The Catalan numbers, which are by definition given by

C_{k}=|NC_{2}(2k)|

satisfy the following recurrence formula, with initial data $C_{0}=C_{1}=1$ ,

C_{k+1}=\sum_{a+b=k}C_{a}C_{b}

their generating series $f(z)=\sum_{k\geq 0}C_{k}z^{k}$ satisfies the equation

zf^{2}-f+1=0

and is given by the following explicit formula,

f(z)=\frac{1-\sqrt{1-4z}}{2z}

and we have the following explicit formula for these numbers:

C_{k}=\frac{1}{k+1}\binom{2k}{k}

Numerically, these numbers are $1,1,2,5,14,42,132,429,1430,4862,16796,\ldots$

Proof.

We must count the noncrossing pairings of $\{1,\ldots,2k\}$ . Now observe that such a pairing appears by pairing 1 to an odd number, $2a+1$ , and then inserting a noncrossing pairing of $\{2,\ldots,2a\}$ , and a noncrossing pairing of $\{2a+2,\ldots,2l\}$ . We conclude that we have the following recurrence formula for the Catalan numbers:

C_{k}=\sum_{a+b=k-1}C_{a}C_{b}

In terms of the generating series $f(z)=\sum_{k\geq 0}C_{k}z^{k}$ , this recurrence formula reads:

$\displaystyle zf^{2}$	$\displaystyle=$	$\displaystyle\sum_{a,b\geq 0}C_{a}C_{b}z^{a+b+1}$
	$\displaystyle=$	$\displaystyle\sum_{k\geq 1}\sum_{a+b=k-1}C_{a}C_{b}z^{k}$
	$\displaystyle=$	$\displaystyle\sum_{k\geq 1}C_{k}z^{k}$
	$\displaystyle=$	$\displaystyle f-1$

Thus $f$ satisfies $zf^{2}-f+1=0$ , and by solving this equation, and choosing the solution which is bounded at $z=0$ , we obtain the following formula:

f(z)=\frac{1-\sqrt{1-4z}}{2z}

In order to finish, we use the generalized binomial formula, which gives:

\sqrt{1+t}=1-2\sum_{k=1}^{\infty}\frac{1}{k}\binom{2k-2}{k-1}\left(\frac{-t}{4}\right)^{k}

Now back to our series $f$ , we obtain the following formula for it:

$\displaystyle f(z)$	$\displaystyle=$	$\displaystyle\frac{1-\sqrt{1-4z}}{2z}$
	$\displaystyle=$	$\displaystyle\sum_{k=1}^{\infty}\frac{1}{k}\binom{2k-2}{k-1}z^{k-1}$
	$\displaystyle=$	$\displaystyle\sum_{k=0}^{\infty}\frac{1}{k+1}\binom{2k}{k}z^{k}$

It follows that the Catalan numbers are given by:

C_{k}=\frac{1}{k+1}\binom{2k}{k}

Thus, we are led to the conclusion in the statement. ∎

In order to recapture now the Wigner measure from its moments, we can use:

Proposition 6.23.

The Catalan numbers are the even moments of

\gamma_{1}=\frac{1}{2\pi}\sqrt{4-x^{2}}dx

called standard semicircle law. As for the odd moments of $\gamma_{1}$ , these all vanish.

Proof.

The even moments of the semicircle law in the statement can be computed with the change of variable $x=2\cos t$ , and we are led to the following formula:

$\displaystyle M_{2k}$	$\displaystyle=$	$\displaystyle\frac{1}{\pi}\int_{0}^{2}\sqrt{4-x^{2}}x^{2k}dx$
	$\displaystyle=$	$\displaystyle\frac{1}{\pi}\int_{0}^{\pi/2}\sqrt{4-4\cos^{2}t}\,(2\cos t)^{2k}2\sin t\,dt$
	$\displaystyle=$	$\displaystyle\frac{4^{k+1}}{\pi}\int_{0}^{\pi/2}\cos^{2k}t\sin^{2}t\,dt$
	$\displaystyle=$	$\displaystyle\frac{4^{k+1}}{\pi}\cdot\frac{\pi}{2}\cdot\frac{(2k)!!2!!}{(2k+3)!!}$
	$\displaystyle=$	$\displaystyle 2\cdot 4^{k}\cdot\frac{(2k)!/2^{k}k!}{2^{k+1}(k+1)!}$
	$\displaystyle=$	$\displaystyle C_{k}$

As for the odd moments, these all vanish, because the density of $\gamma_{1}$ is an even function. Thus, we are led to the conclusion in the statement. ∎

More generally, we have the following result, involving a parameter $t>0$ :

Proposition 6.24.

Given $t>0$ , the real measure having as even moments the numbers $M_{2k}=t^{k}C_{k}$ and having all odd moments $0$ is the measure

\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx

called Wigner semicircle law on $[-2\sqrt{t},2\sqrt{t}]$ .

Proof.

This follows indeed from Proposition 6.23, via a change of variables. ∎

Now by putting everything together, we obtain the Wigner theorem, as follows:

Theorem 6.25.

Given a sequence of Wigner random matrices

Z_{N}\in M_{N}(L^{\infty}(X))

which by definition have i.i.d. complex normal entries, up to $Z_{N}=Z_{N}^{*}$ , we have

Z_{N}\sim\gamma_{t}

in the $N\to\infty$ limit, where $\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx$ is the Wigner semicircle law.

Proof.

This follows indeed from all the above, and more specifically, by combining Theorem 6.21, Theorem 6.22 and Proposition 6.24. ∎

Regarding now the complex Gaussian matrices, in view of this result, it is natural to think at the law found in Theorem 6.20 as being “circular”. But this is just a thought, and more on this later, in chapter 8 below, when doing free probability.

6d. Wishart matrices

Let us discuss now the Wishart matrices, which are the positive analogues of the Wigner matrices. Quite surprisingly, the computation here leads to the Catalan numbers, but not in the same way as for the Wigner matrices, the result being as follows:

Theorem 6.26.

Given a sequence of complex Wishart matrices

W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with $Y_{N}$ being $N\times N$ complex Gaussian of parameter $t>0$ , we have

M_{k}\left(\frac{W_{N}}{N}\right)\simeq t^{k}C_{k}

for any exponent $k\in\mathbb{N}$ , in the $N\to\infty$ limit.

Proof.

There are several possible proofs for this result, as follows:

(1) A first method is by using the formula that we have in Theorem 6.20, for the Gaussian matrices $Y_{N}$ . Indeed, we know from there that we have the following formula, valid for any colored integer $K=\circ\bullet\bullet\circ\ldots\,$ , in the $N\to\infty$ limit:

M_{K}\left(\frac{Y_{N}}{\sqrt{N}}\right)\simeq t^{|K|/2}|\mathcal{NC}_{2}(K)|

With $K=\circ\bullet\circ\bullet\ldots\,$ , alternating word of length $2k$ , with $k\in\mathbb{N}$ , this gives:

M_{k}\left(\frac{Y_{N}Y_{N}^{*}}{N}\right)\simeq t^{k}|\mathcal{NC}_{2}(K)|

Thus, in terms of the Wishart matrix $W_{N}=Y_{N}Y_{N}^{*}$ we have, for any $k\in\mathbb{N}$ :

M_{k}\left(\frac{W_{N}}{N}\right)\simeq t^{k}|\mathcal{NC}_{2}(K)|

The point now is that, by doing some combinatorics, we have:

|\mathcal{NC}_{2}(K)|=|NC_{2}(2k)|=C_{k}

Thus, we are led to the formula in the statement.

(2) A second method, that we will explain now as well, is by proving the result directly, starting from definitions. The matrix entries of our matrix $W=W_{N}$ are given by:

W_{ij}=\sum_{r=1}^{N}Y_{ir}\bar{Y}_{jr}

Thus, the normalized traces of powers of $W$ are given by the following formula:

	$\displaystyle tr(W^{k})$	$\displaystyle=$	$\displaystyle\frac{1}{N}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}W_{i_{1}i_{2}}W_{i_{2}i_{3}}\ldots W_{i_{k}i_{1}}$
		$\displaystyle=$	$\displaystyle\frac{1}{N}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}Y_{i_{1}r_{1}}\bar{Y}_{i_{2}r_{1}}Y_{i_{2}r_{2}}\bar{Y}_{i_{3}r_{2}}\ldots Y_{i_{k}r_{k}}\bar{Y}_{i_{1}r_{k}}$

By rescaling now $W$ by a $1/N$ factor, as in the statement, we obtain:

tr\left(\left(\frac{W}{N}\right)^{k}\right)=\frac{1}{N^{k+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}Y_{i_{1}r_{1}}\bar{Y}_{i_{2}r_{1}}Y_{i_{2}r_{2}}\bar{Y}_{i_{3}r_{2}}\ldots Y_{i_{k}r_{k}}\bar{Y}_{i_{1}r_{k}}

By using now the Wick rule, we obtain the following formula for the moments, with $K=\circ\bullet\circ\bullet\ldots\,$ , alternating word of lenght $2k$ , and with $I=(i_{1}r_{1},i_{2}r_{1},\ldots,i_{k}r_{k},i_{1}r_{k})$ :

	$\displaystyle M_{k}\left(\frac{W}{N}\right)$	$\displaystyle=$	$\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{i_{1}=1}^{N}\ldots\sum_{i_{k}=1}^{N}\sum_{r_{1}=1}^{N}\ldots\sum_{r_{k}=1}^{N}\#\left\{\pi\in\mathcal{P}_{2}(K)\Big{\|}\pi\leq\ker(I)\right\}$
		$\displaystyle=$	$\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{\pi\in\mathcal{P}_{2}(K)}\#\left\{i,r\in\{1,\ldots,N\}^{k}\Big{\|}\pi\leq\ker(I)\right\}$

In order to compute this quantity, we use the standard bijection $\mathcal{P}_{2}(K)\simeq S_{k}$ . By identifying the pairings $\pi\in\mathcal{P}_{2}(K)$ with their counterparts $\pi\in S_{k}$ , we obtain:

\displaystyle M_{k}\left(\frac{W}{N}\right)

\displaystyle=

\displaystyle\frac{t^{k}}{N^{k+1}}\sum_{\pi\in S_{k}}\#\left\{i,r\in\{1,\ldots,N\}^{k}\Big{|}i_{s}=i_{\pi(s)+1},r_{s}=r_{\pi(s)},\forall s\right\}

Now let $\gamma\in S_{k}$ be the full cycle, which is by definition the following permutation:

\gamma=(1\,2\,\ldots\,k)

The general factor in the product computed above is then 1 precisely when following two conditions are simultaneously satisfied:

\gamma\pi\leq\ker i\quad,\quad\pi\leq\ker r

Counting the number of free parameters in our moment formula, we obtain:

M_{k}\left(\frac{W}{N}\right)=t^{k}\sum_{\pi\in S_{k}}N^{|\pi|+|\gamma\pi|-k-1}

The point now is that the last exponent is well-known to be $\leq 0$ , with equality precisely when the permutation $\pi\in S_{k}$ is geodesic, which in practice means that $\pi$ must come from a noncrossing partition. Thus we obtain, in the $N\to\infty$ limit:

M_{k}\left(\frac{W}{N}\right)\simeq t^{k}C_{k}

Thus, we are led to the conclusion in the statement. ∎

As a consequence of the above result, we have a new look on the Catalan numbers, which is more adapted to our present Wishart matrix considerations, as follows:

Proposition 6.27.

The Catalan numbers $C_{k}=|NC_{2}(2k)|$ appear as well as

C_{k}=|NC(k)|

where $NC(k)$ is the set of all noncrossing partitions of $\{1,\ldots,k\}$ .

Proof.

This follows indeed from the proof of Theorem 6.26. Observe that we obtain as well a formula in terms of matching pairings of alternating colored integers. ∎

The direct explanation for the above formula, relating noncrossing partitions and pairings, comes form the following result, which is very useful, and good to know:

Proposition 6.28.

We have a bijection between noncrossing partitions and pairings

NC(k)\simeq NC_{2}(2k)

which is constructed as follows:

(1)

The application $NC(k)\to NC_{2}(2k)$ is the “fattening” one, obtained by doubling all the legs, and doubling all the strings as well.
(2)

Its inverse $NC_{2}(2k)\to NC(k)$ is the “shrinking” application, obtained by collapsing pairs of consecutive neighbors.

Proof.

The fact that the two operations in the statement are indeed inverse to each other is clear, by computing the corresponding two compositions, with the remark that the construction of the fattening operation requires the partitions to be noncrossing. ∎

Getting back now to probability, we are led to the question of finding the law having the Catalan numbers as moments, in the above way. The result here is as follows:

Proposition 6.29.

The real measure having the Catalan numbers as moments is

\pi_{1}=\frac{1}{2\pi}\sqrt{4x^{-1}-1}\,dx

called Marchenko-Pastur law of parameter $1$ .

Proof.

The moments of the law $\pi_{1}$ in the statement can be computed with the change of variable $x=4\cos^{2}t$ , as follows:

$\displaystyle M_{k}$	$\displaystyle=$	$\displaystyle\frac{1}{2\pi}\int_{0}^{4}\sqrt{4x^{-1}-1}\,x^{k}dx$
	$\displaystyle=$	$\displaystyle\frac{1}{2\pi}\int_{0}^{\pi/2}\frac{\sin t}{\cos t}\cdot(4\cos^{2}t)^{k}\cdot 2\cos t\sin t\,dt$
	$\displaystyle=$	$\displaystyle\frac{4^{k+1}}{\pi}\int_{0}^{\pi/2}\cos^{2k}t\sin^{2}t\,dt$
	$\displaystyle=$	$\displaystyle\frac{4^{k+1}}{\pi}\cdot\frac{\pi}{2}\cdot\frac{(2k)!!2!!}{(2k+3)!!}$
	$\displaystyle=$	$\displaystyle 2\cdot 4^{k}\cdot\frac{(2k)!/2^{k}k!}{2^{k+1}(k+1)!}$
	$\displaystyle=$	$\displaystyle C_{k}$

Thus, we are led to the conclusion in the statement. ∎

Now back to the Wishart matrices, we are led to the following result:

Theorem 6.30.

Given a sequence of complex Wishart matrices

W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with $Y_{N}$ being $N\times N$ complex Gaussian of parameter $t>0$ , we have

\frac{W_{N}}{tN}\sim\frac{1}{2\pi}\sqrt{4x^{-1}-1}\,dx

with $N\to\infty$ , with the limiting measure being the Marchenko-Pastur law $\pi_{1}$ .

Proof.

This follows indeed from Theorem 6.26 and Proposition 6.29. ∎

As a comment now, while the above result is definitely something interesting at $t=1$ , at general $t>0$ this looks more like a “fake” generalization of the $t=1$ result, because the law $\pi_{1}$ stays the same, modulo a trivial rescaling. The reasons behind this phenomenon are quite subtle, and skipping some discussion, the point is that Theorem 6.30 is indeed something “fake” at general $t>0$ , and the correct generalization of the $t=1$ computation, involving more general classes of complex Wishart matrices, is as follows:

Theorem 6.31.

Given a sequence of general complex Wishart matrices

W_{N}=Y_{N}Y_{N}^{*}\in M_{N}(L^{\infty}(X))

with $Y_{N}$ being $N\times M$ complex Gaussian of parameter $1$ , we have

\frac{W_{N}}{N}\sim\max(1-t,0)\delta_{0}+\frac{\sqrt{4t-(x-1-t)^{2}}}{2\pi x}\,dx

with $M=tN\to\infty$ , with the limiting measure being the Marchenko-Pastur law $\pi_{t}$ .

Proof.

This follows once again by using the moment method, the limiting moments in the $M=tN\to\infty$ regime being as follows, after doing the combinatorics:

M_{k}\left(\frac{W_{N}}{N}\right)\simeq\sum_{\pi\in NC(k)}t^{|\pi|}

But these numbers are the moments of the Marchenko-Pastur law $\pi_{t}$ , which in addition has the density given by the formula in the statement, and this gives the result. ∎

As a philosophical conclusion now, we have 4 main laws in what we have been doing so far, namely the Gaussian laws $g_{t}$ , the Poisson laws $p_{t}$ , the Wigner laws $\gamma_{t}$ and the Marchenko-Pastur laws $\pi_{t}$ . These laws naturally form a diagram, as follows:

We will see in chapter 8 that $\pi_{t},\gamma_{t}$ appear as “free analogues” of $p_{t},g_{t}$ , and that a full theory can be developed, with central limiting theorems for all 4 laws, convolution semigroup results for all 4 laws too, and Lie group type results for all 4 laws too. And also, we will be back to the random matrices as well, with further results about them.

6e. Exercises

There has been a lot of non-trivial combinatorics and calculus in this chapter, sometimes only briefly explained, and as an exercise on all this, we have:

Exercise 6.32.

Clarify all the details in connection with the Wigner and Marchenko-Pastur computations, first at $t=1$ , and then for general $t>0$ .

As before, these are things discussed in the above, but only briefly, this whole chapter having been just a modest introduction to this exciting subject which are the random matrices. In the hope that you will find some time, and do the exercise.

Chapter 7 Quantum spaces

7a. Gelfand theorem

We have seen that the von Neumann algebras $A\subset B(H)$ are interesting objects, and it is tempting to go ahead with a systematic study of such algebras. This is what Murray and von Neumann did, when first coming across such algebras, back in the 1930s, in their series of papers [mv1], [mv2], [mv3], [vn1], [vn2], [vn3]. In what concerns us, we will rather keep this material for later, and talk instead, in this chapter and in the next one, of things which are perhaps more basic, motivated by the following definition:

Definition 7.1.

Given a von Neumann algebra $A\subset B(H)$ , coming with a faithful positive unital trace $tr:A\to\mathbb{C}$ , we write

A=L^{\infty}(X)

and call $X$ a quantum probability space. We also write the trace as $tr=\int_{X}$ , and call it integration with respect to the uniform measure on $X$ .

Obviously, this is something exciting, and we have seen how some interesting theory can be developed along these lines in the simplest case, that of the random matrix algebras. Thus, all this needs a better understanding, before going ahead with the above-mentioned Murray-von Neumann theory. In order to get started, here are a few comments:

(1) Generally speaking, all this comes from the fact that the commutative von Neumann algebras are those of the form $A=L^{\infty}(X)$ , with $X$ being a measured space. Since in the finite measure case, $\mu(X)<\infty$ , the integration can be regarded as being a faithful positive unital trace $tr:L^{\infty}(X)\to\mathbb{C}$ , we are basically led to Definition 7.1.

(2) Regarding our assumption $\mu(X)<\infty$ , making the integration $tr:A\to\mathbb{C}$ bounded, this is something advanced, coming from deep classification results of von Neumann and Connes, which roughly state that “modulo classical measure theory, the study of the quantum measured spaces $X$ basically reduces to the case $\mu(X)<\infty$ ”.

(3) Finally, the traciality of $tr:A\to\mathbb{C}$ is something advanced too, again coming from that classification results of von Neumann and Connes, which in their more precise formulation state that “modulo classical measure theory, the study of the quantum measured spaces $X$ basically reduces to the case where $\mu(X)<\infty$ , and $\int_{X}$ is a trace”.

In short, complicated all this, and you will have to trust me here. Moving ahead now, there is one more thing to be discussed in connection with Definition 7.1, and this is physics. Let me formulate here the question that you surely have in mind:

Question 7.2.

As physicists we already agreed, without clear evidence, that our operators $T:H\to H$ should be bounded. But what about quantum spaces, is it a good idea to assume that these are as above, of finite mass, and with tracial integration?

Well, this is certainly an interesting question. In favor of my choice, I would argue that the mathematical physics of Jones [jo1], [jo2], [jo3], [jo5], [jo6] and Voiculescu [vo1], [vo2], [vdn] needs a trace $tr:A\to\mathbb{C}$ , as above. And the same goes for certain theoretical physics continuations of the main work of Connes [co3], as for instance the basic theory of the Standard Model spectral triple of Chamseddine-Connes, whose free gauge group has tracial Haar integration. Needless to say, all this is quite subjective. But hey, question of theoretical physics you asked, answer of theoretical physics is what you get.

Hang on, we are not done yet. Now that we are convinced that Definition 7.1 is the correct one, be that on mathematical or physical grounds, let us look for examples. And here the situation is quite grim, because even in the classical case, we have:

Fact 7.3.

The measure on a classical measured space $X$ cannot come out of nowhere, and is usually a Haar measure, appearing by theorem. Thus, in our picture

A\subset B(H)

both the Hilbert space $H=L^{2}(X)$ and the von Neumann algebra $A=L^{\infty}(X)$ should appear by theorem, not by definition, contrary to what Definition 7.1 says.

To be more precise, in what regards the first assertion, this is certainly the case with simple objects like Lie groups, or spheres and other homogeneous spaces. Of course you might say that $[0,1]$ with the uniform measure is a measured space, but isn’t $[0,1]$ obtained by cutting the Lie group $\mathbb{R}$ , with its Haar measure. And the same goes with $[0,1]$ with an arbitrary measure $f(x)dx$ , or with $[0,1]$ being deformed into a curve, and so on, because that $dx$ , or what is left from it, will always refer to the Haar measure of $\mathbb{R}$ .

As for the second assertion, nothing much to comment here, mathematics has spoken. So, getting back now to Definition 7.1 as it is, looks like we have two dead bodies there, the Hilbert space $H$ and the operator algebra $A$ . So let us try to get rid of at least one of them. But which? In the lack of any obvious idea, let us turn to physics:

Question 7.4.

In quantum mechanics, which came first, the Hilbert space $H$ , or the operator algebra $A$ ?

Unfortunately this question is as difficult as the one regarding the chicken and the egg. A look at what various physicists said on this matter, in a direct or indirect way, does not help much, and by the end of the day we are left with guidelines like “no one understands quantum mechanics” (Feynman), “shut up and compute” (Dirac) and so on. And all this, coming on top on what has been already said on Definition 7.1, of rather unclear nature, is probably too much. That is, the last drop, time to conclude:

Conclusion 7.5.

The theory of von Neumann algebras has the same peculiarity as quantum mechanics: it tends to self-destruct, when approached axiomatically.

And we will take this as good news, providing us with warm evidence that the theory of von Neumann algebras is indeed related to quantum mechanics. This is what matters, being on the right track, and difficulties and all the rest, we won’t be scared by them.

Back to business now, in practice, we must go back to chapter 5, and examine what we were saying right before introducing the von Neumann algebras. And at that time, we were talking about general operator algebras $A\subset B(H)$ , closed with respect to the norm, but not necessarily with respect to the weak topology. But this suggests formulating the following definition, somewhat as a purely mathematical answer to Question 7.4:

Definition 7.6.

A $C^{*}$ -algebra is an complex algebra $A$ , given with:

(1)

A norm $a\to||a||$ , making it into a Banach algebra.
(2)

An involution $a\to a^{*}$ , related to the norm by the formula $||aa^{*}||=||a||^{2}$ .

Here by Banach algebra we mean a complex algebra with a norm satisfying all the conditions for a vector space norm, along with $||ab||\leq||a||\cdot||b||$ and $||1||=1$ , and which is such that our algebra is complete, in the sense that the Cauchy sequences converge. As for the involution, this must be antilinear, antimultiplicative, and satisfying $a^{**}=a$ .

As basic examples, we have the operator algebra $B(H)$ , for any Hilbert space $H$ , and more generally, the norm closed $*$ -subalgebras $A\subset B(H)$ . It is possible to prove that any $C^{*}$ -algebra appears in this way, but this is a non-trivial result, called GNS theorem, and more on this later. Note in passing that this result tells us that there is no need to memorize the above axioms for the $C^{*}$ -algebras, because these are simply the obvious things that can be said about $B(H)$ , and its norm closed $*$ -subalgebras $A\subset B(H)$ .

As a second class of basic examples, which are of great interest for us, we have:

Proposition 7.7.

If $X$ is a compact space, the algebra $C(X)$ of continuous functions $f:X\to\mathbb{C}$ is a $C^{*}$ -algebra, with the usual norm and involution, namely:

||f||=\sup_{x\in X}|f(x)|\quad,\quad f^{*}(x)=\overline{f(x)}

This algebra is commutative, in the sense that $fg=gf$ , for any $f,g\in C(X)$ .

Proof.

All this is clear from definitions. Observe that we have indeed:

||ff^{*}||=\sup_{x\in X}|f(x)|^{2}=||f||^{2}

Thus, the axioms are satisfied, and finally $fg=gf$ is clear. ∎

In general, the $C^{*}$ -algebras can be thought of as being algebras of operators, over some Hilbert space which is not present. By using this philosophy, one can emulate spectral theory in this setting, with extensions of the various results from chapters 3,5:

Theorem 7.8.

Given element $a\in A$ of a $C^{*}$ -algebra, define its spectrum as:

\sigma(a)=\left\{\lambda\in\mathbb{C}\Big{|}a-\lambda\notin A^{-1}\right\}

The following spectral theory results hold, exactly as in the $A=B(H)$ case:

(1)

We have $\sigma(ab)\cup\{0\}=\sigma(ba)\cup\{0\}$ .
(2)

We have polynomial, rational and holomorphic calculus.
(3)

As a consequence, the spectra are compact and non-empty.
(4)

The spectra of unitaries $(u^{*}=u^{-1})$ and self-adjoints $(a=a^{*})$ are on $\mathbb{T},\mathbb{R}$ .
(5)

The spectral radius of normal elements $(aa^{*}=a^{*}a)$ is given by $\rho(a)=||a||$ .

In addition, assuming $a\in A\subset B$ , the spectra of $a$ with respect to $A$ and to $B$ coincide.

Proof.

This is something that we know from chapter 3, in the case $A=B(H)$ , and then from chapter 5, in the case $A\subset B(H)$ . In general, the proof is similar:

(1) Regarding the assertions (1-5), which are of course formulated a bit informally, the proofs here are perfectly similar to those for the full operator algebra $A=B(H)$ . All this is standard material, and in fact, things in chapters 3 were written in such a way as for their extension now, to the general $C^{*}$ -algebra setting, to be obvious.

(2) Regarding the last assertion, we know this from chapter 5 for $A\subset B\subset B(H)$ , and the proof in general is similar. Indeed, the inclusion $\sigma_{B}(a)\subset\sigma_{A}(a)$ is clear. For the converse, assume $a-\lambda\in B^{-1}$ , and consider the following self-adjoint element:

b=(a-\lambda)^{*}(a-\lambda)

The difference between the two spectra of $b\in A\subset B$ is then given by:

\sigma_{A}(b)-\sigma_{B}(b)=\left\{\mu\in\mathbb{C}-\sigma_{B}(b)\Big{|}(b-\mu)^{-1}\in B-A\right\}

Thus this difference in an open subset of $\mathbb{C}$ . On the other hand $b$ being self-adjoint, its two spectra are both real, and so is their difference. Thus the two spectra of $b$ are equal, and in particular $b$ is invertible in $A$ , and so $a-\lambda\in A^{-1}$ , as desired. ∎

We can now get back to the commutative $C^{*}$ -algebras, and we have the following result, due to Gelfand, which will be of crucial importance for us:

Theorem 7.9.

The commutative $C^{*}$ -algebras are exactly the algebras of the form

A=C(X)

with the “spectrum” $X$ of such an algebra being the space of characters $\chi:A\to\mathbb{C}$ , with topology making continuous the evaluation maps $ev_{a}:\chi\to\chi(a)$ .

Proof.

This is something that we basically know from chapter 5, but always good to talk about it again. Given a commutative $C^{*}$ -algebra $A$ , we can define $X$ as in the statement. Then $X$ is compact, and $a\to ev_{a}$ is a morphism of algebras, as follows:

ev:A\to C(X)

(1) We first prove that $ev$ is involutive. We use the following formula, which is similar to the $z=Re(z)+iIm(z)$ formula for the usual complex numbers:

a=\frac{a+a^{*}}{2}+i\cdot\frac{a-a^{*}}{2i}

Thus it is enough to prove the equality $ev_{a^{*}}=ev_{a}^{*}$ for self-adjoint elements $a$ . But this is the same as proving that $a=a^{*}$ implies that $ev_{a}$ is a real function, which is in turn true, because $ev_{a}(\chi)=\chi(a)$ is an element of $\sigma(a)$ , contained in $\mathbb{R}$ .

(2) Since $A$ is commutative, each element is normal, so $ev$ is isometric:

||ev_{a}||=\rho(a)=||a||

(3) It remains to prove that $ev$ is surjective. But this follows from the Stone-Weierstrass theorem, because $ev(A)$ is a closed subalgebra of $C(X)$ , which separates the points. ∎

In view of the Gelfand theorem, we can formulate the following key definition:

Definition 7.10.

Given an arbitrary $C^{*}$ -algebra $A$ , we write

A=C(X)

and call $X$ a compact quantum space.

This might look like something informal, but it is not. Indeed, we can define the category of compact quantum spaces to be the category of the $C^{*}$ -algebras, with the arrows reversed. When $A$ is commutative, the above space $X$ exists indeed, as a Gelfand spectrum, $X=Spec(A)$ . In general, $X$ is something rather abstract, and our philosophy here will be that of studying of course $A$ , but formulating our results in terms of $X$ . For instance whenever we have a morphism $\Phi:A\to B$ , we will write $A=C(X),B=C(Y)$ , and rather speak of the corresponding morphism $\phi:Y\to X$ . And so on.

Technically speaking, we will see later that the above formalism has its limitations, and needs a fix. To be more precise, when looking at compact quantum spaces having a probability measure, there are more of them in the sense of Definition 7.10, than in the von Neumann algebra sense. Thus, all this needs a fix. But more on this later.

As a first concrete consequence of the Gelfand theorem, we have:

Proposition 7.11.

Assume that $a\in A$ is normal, and let $f\in C(\sigma(a))$ .

(1)

We can define $f(a)\in A$ , with $f\to f(a)$ being a morphism of $C^{*}$ -algebras.
(2)

We have the “continuous functional calculus” formula $\sigma(f(a))=f(\sigma(a))$ .

Proof.

Since $a$ is normal, the $C^{*}$ -algebra $<a>$ that is generates is commutative, so if we denote by $X$ the space formed by the characters $\chi:<a>\to\mathbb{C}$ , we have:

<a>=C(X)

Now since the map $X\to\sigma(a)$ given by evaluation at $a$ is bijective, we obtain:

<a>=C(\sigma(a))

Thus, we are dealing with usual functions, and this gives all the assertions. ∎

As another consequence of the Gelfand theorem, we have:

Proposition 7.12.

For a normal element $a\in A$ , the following are equivalent:

(1)

$a$ is positive, in the sense that $\sigma(a)\subset[0,\infty)$ .
(2)

$a=b^{2}$ , for some $b\in A$ satisfying $b=b^{*}$ .
(3)

$a=cc^{*}$ , for some $c\in A$ .

Proof.

This is very standard, exactly as in $A=B(H)$ case, as follows:

$(1)\implies(2)$ Since $f(z)=\sqrt{z}$ is well-defined on $\sigma(a)\subset[0,\infty)$ , we can set $b=\sqrt{a}$ .

$(2)\implies(3)$ This is trivial, because we can set $c=b$ .

$(3)\implies(1)$ We proceed by contradiction. By multiplying $c$ by a suitable element of $<cc^{*}>$ , we are led to the existence of an element $d\neq 0$ satisfying $-dd^{*}\geq 0$ . By writing now $d=x+iy$ with $x=x^{*},y=y^{*}$ we have:

dd^{*}+d^{*}d=2(x^{2}+y^{2})\geq 0

Thus $d^{*}d\geq 0$ , contradicting the fact that $\sigma(dd^{*}),\sigma(d^{*}d)$ must coincide outside $\{0\}$ . ∎

Let us clarify now the relation between $C^{*}$ -algebras and von Neumann algebras. In order to do so, we need a prove a key result, called GNS representation theorem, stating that any $C^{*}$ -algebra appears as an operator algebra. As a first result, we have:

Proposition 7.13.

Let $A$ be a commutative $C^{*}$ -algebra, write $A=C(X)$ , with $X$ being a compact space, and let $\mu$ be a positive measure on $X$ . We have then

A\subset B(H)

where $H=L^{2}(X)$ , with $f\in A$ corresponding to the operator $g\to fg$ .

Proof.

Given a continuous function $f\in C(X)$ , consider the operator $T_{f}(g)=fg$ , on $H=L^{2}(X)$ . Observe that $T_{f}$ is indeed well-defined, and bounded as well, because:

||fg||_{2}=\sqrt{\int_{X}|f(x)|^{2}|g(x)|^{2}d\mu(x)}\leq||f||_{\infty}||g||_{2}

The application $f\to T_{f}$ being linear, involutive, continuous, and injective as well, we obtain in this way a $C^{*}$ -algebra embedding $A\subset B(H)$ , as claimed. ∎

In order to prove the GNS representation theorem, we must extend the above construction, to the case where $A$ is not necessarily commutative. Let us start with:

Definition 7.14.

Consider a $C^{*}$ -algebra $A$ .

(1)

$\varphi:A\to\mathbb{C}$ is called positive when $a\geq 0\implies\varphi(a)\geq 0$ .
(2)

$\varphi:A\to\mathbb{C}$ is called faithful and positive when $a\geq 0,a\neq 0\implies\varphi(a)>0$ .

In the commutative case, $A=C(X)$ , the positive elements are the positive functions, $f:X\to[0,\infty)$ . As for the positive linear forms $\varphi:A\to\mathbb{C}$ , these appear as follows, with $\mu$ being positive, and strictly positive if we want $\varphi$ to be faithful and positive:

\varphi(f)=\int_{X}f(x)d\mu(x)

In general, the positive linear forms can be thought of as being integration functionals with respect to some underlying “positive measures”. We can use them as follows:

Proposition 7.15.

Let $\varphi:A\to\mathbb{C}$ be a positive linear form.

(1)

$<a,b>=\varphi(ab^{*})$ defines a generalized scalar product on $A$ .
(2)

By separating and completing we obtain a Hilbert space $H$ .
(3)

$\pi(a):b\to ab$ defines a representation $\pi:A\to B(H)$ .
(4)

If $\varphi$ is faithful in the above sense, then $\pi$ is faithful.

Proof.

Almost everything here is straightforward, as follows:

(1) This is clear from definitions, and from the basic properties of the positive elements $a\geq 0$ , which can be established exactly as in the $A=B(H)$ case.

(2) This is a standard procedure, which works for any scalar product, the idea being that of dividing by the vectors satisfying $<x,x>=0$ , then completing.

(3) All the verifications here are standard algebraic computations, in analogy with what we have seen many times, for multiplication operators, or group algebras.

(4) Assuming that we have $a\neq 0$ , we have then $\pi(aa^{*})\neq 0$ , which in turn implies by faithfulness that we have $\pi(a)\neq 0$ , which gives the result. ∎

In order to establish the embedding theorem, it remains to prove that any $C^{*}$ -algebra has a faithful positive linear form $\varphi:A\to\mathbb{C}$ . This is something more technical:

Proposition 7.16.

Let $A$ be a $C^{*}$ -algebra.

(1)

Any positive linear form $\varphi:A\to\mathbb{C}$ is continuous.
(2)

A linear form $\varphi$ is positive iff there is a norm one $h\in A_{+}$ such that $||\varphi||=\varphi(h)$ .
(3)

For any $a\in A$ there exists a positive norm one form $\varphi$ such that $\varphi(aa^{*})=||a||^{2}$ .
(4)

If $A$ is separable there is a faithful positive form $\varphi:A\to\mathbb{C}$ .

Proof.

The proof here is quite technical, inspired from the existence proof of the probability measures on abstract compact spaces, the idea being as follows:

(1) This follows from Proposition 7.15, via the following estimate:

|\varphi(a)|\leq||\pi(a)||\varphi(1)\leq||a||\varphi(1)

(2) In one sense we can take $h=1$ . Conversely, let $a\in A_{+}$ , $||a||\leq 1$ . We have:

|\varphi(h)-\varphi(a)|\leq||\varphi||\cdot||h-a||\leq\varphi(h)

Thus we have $Re(\varphi(a))\geq 0$ , and with $a=1-h$ we obtain:

Re(\varphi(1-h))\geq 0

Thus $Re(\varphi(1))\geq||\varphi||$ , and so $\varphi(1)=||\varphi||$ , so we can assume $h=1$ . Now observe that for any self-adjoint element $a$ , and any $t\in\mathbb{R}$ we have, with $\varphi(a)=x+iy$ :

$\displaystyle\varphi(1)^{2}(1+t^{2}\|\|a\|\|^{2})$	$\displaystyle\geq$	$\displaystyle\varphi(1)^{2}\|\|1+t^{2}a^{2}\|\|$
	$\displaystyle=$	$\displaystyle\|\|\varphi\|\|^{2}\cdot\|\|1+ita\|\|^{2}$
	$\displaystyle\geq$	$\displaystyle\|\varphi(1+ita)\|^{2}$
	$\displaystyle=$	$\displaystyle\|\varphi(1)-ty+itx\|$
	$\displaystyle\geq$	$\displaystyle(\varphi(1)-ty)^{2}$

Thus we have $y=0$ , and this finishes the proof of our remaining claim.

(3) We can set $\varphi(\lambda aa^{*})=\lambda||a||^{2}$ on the linear space spanned by $aa^{*}$ , then extend this functional by Hahn-Banach, to the whole $A$ . The positivity follows from (2).

(4) This is standard, by starting with a dense sequence $(a_{n})$ , and taking the Cesàro limit of the functionals constructed in (3). We have $\varphi(aa^{*})>0$ , and we are done. ∎

With these ingredients in hand, we can now state and prove:

Theorem 7.17.

Any $C^{*}$ -algebra appears as a norm closed $*$ -algebra of operators

A\subset B(H)

over a certain Hilbert space $H$ . When $A$ is separable, $H$ can be taken to be separable.

Proof.

This result, called called GNS representation theorem after Gelfand, Naimark and Segal, follows indeed by combining Proposition 7.15 with Proposition 7.16. ∎

All this might seem quite surprising, and your first reaction would be to say what have we been we doing here, with our $C^{*}$ -algebra theory, because we are now back to operator algebras $A\subset B(H)$ , and everything that we did with $C^{*}$ -algebras, extending things that we knew about operator algebras $A\subset B(H)$ , looks more like a waste of time.

Error. The axioms in Definition 7.6, coupled with the writing $A=C(X)$ in Definition 7.10, are something powerful, because they do not involve any kind of $L^{2}$ or $L^{\infty}$ functions on our quantum spaces $X$ . Thus, we can start hunting for such spaces, just by defining $C^{*}$ -algebras with generators and relations, then look for Haar measures on such spaces, and use the GNS construction in order to reach to von Neumann algebras. Before getting into this, however, let us summarize the above discussion as follows:

Theorem 7.18.

We can talk about compact quantum measured spaces, as follows:

(1)

The category of compact quantum measured spaces $(X,\mu)$ is the category of the $C^{*}$ -algebras with faithful traces $(A,\varphi)$ , with the arrows reversed.
(2)

In the case where we have a non-faithful trace $\varphi$ , we can still talk about the corresponding space $(X,\mu)$ , by performing the GNS construction.
(3)

By taking the weak closure in the GNS representation, we obtain the von Neumann algebra $A^{\prime\prime}=L^{\infty}(X)$ , in the previous general measured space sense.

Proof.

All this follows from Theorem 7.17, and from the other things that we already know, with the whole result itself being something rather philosophical. ∎

7b. Tori, amenability

In the remainder of this chapter we explore the whole new world opened by the $C^{*}$ -algebra theory, with the study of several key examples. We will first discuss the group duals, also called noncommutative tori. Let us start with a well-known result:

Theorem 7.19.

The compact abelian groups $G$ are in correspondence with the discrete abelian groups $\Gamma$ , via Pontrjagin duality,

G=\widehat{\Gamma}\quad,\quad\Gamma=\widehat{G}

with the dual of a locally compact group $L$ being the locally compact group $\widehat{L}$ consisting of the continuous group characters $\chi:L\to\mathbb{T}$ .

Proof.

This is something very standard, the idea being that, given a group $L$ as above, its continuous characters $\chi:L\to\mathbb{T}$ form indeed a group, that we can call $\widehat{L}$ . The correspondence $L\to\widehat{L}$ constructed in this way has then the following properties:

(1) We have $\widehat{\mathbb{Z}}_{N}=\mathbb{Z}_{N}$ . This is the basic computation to be performed, before anything else, and which is something algebraic, with roots of unity.

(2) More generally, the dual of a finite abelian group $G=\mathbb{Z}_{N_{1}}\times\ldots\times\mathbb{Z}_{N_{k}}$ is the group $G$ itself. This comes indeed from (1) and from $\widehat{G\times H}=\widehat{G}\times\widehat{H}$ .

(3) At the opposite end now, that of the locally compact groups which are not compact, nor discrete, the main example, which is standard, is $\widehat{\mathbb{R}}=\mathbb{R}$ .

(4) Getting now to what we are interested in, it follows from the definition of the correspondence $L\to\widehat{L}$ that when $L$ is compact $\widehat{L}$ is discrete, and vice versa.

(5) Finally, in order to best understand this latter phenomenon, the best is to work out the main pair of examples, which are $\widehat{\mathbb{T}}=\mathbb{Z}$ and $\widehat{\mathbb{Z}}=\mathbb{T}$ . ∎

Our claim now is that, by using operator algebra theory, we can talk about the dual $G=\widehat{\Gamma}$ of any discrete group $\Gamma$ . Let us start our discussion in the von Neumann algebra setting, where things are particularly simple. We have here:

Theorem 7.20.

Given a discrete group $\Gamma$ , we can construct its von Neumann algebra

L(\Gamma)\subset B(l^{2}(\Gamma))

by using the left regular representation. This algebra has a faithful positive trace, $tr(g)=\delta_{g,1}$ , and when $\Gamma$ is abelian we have an isomorphism of tracial von Neumann algebras

L(\Gamma)\simeq L^{\infty}(G)

given by a Fourier type transform, where $G=\widehat{\Gamma}$ is the compact dual of $\Gamma$ .

Proof.

There are many assertions here, the idea being as follows:

(1) The first part is standard, with the left regular representation of $\Gamma$ working as expected, and being a unitary representation, as follows:

\Gamma\subset B(l^{2}(\Gamma))\quad,\quad\pi(g):h\to gh

(2) The positivity of the trace comes from the following alternative formula for it, with the equivalence with the definition in the statement being clear:

tr(T)=<T1,1>

(3) The third part is standard as well, because when $\Gamma$ is abelian the algebra $L(\Gamma)$ is commutative, and its spectral decomposition leads by delinearization to the group characters $\chi:\Gamma\to\mathbb{T}$ , and so the dual group $G=\widehat{\Gamma}$ , as indicated.

(4) Finally, the fact that our isomorphism transforms the trace of $L(\Gamma)$ into the Haar integration functional of $L^{\infty}(G)$ is clear. Moreover, the study of various examples show that what we constructed is in fact the Fourier transform, in its various incarnations. ∎

Getting back now to our quantum space questions, we have a beginning of answer, because based on the above, we can formulate the following definition:

Definition 7.21.

Given a discrete group $\Gamma$ , not necessarily abelian, we can construct its abstract dual $G=\widehat{\Gamma}$ as a quantum measured space, via the following formula:

L^{\infty}(G)=L(\Gamma)

In the case where $\Gamma$ happens to be abelian, this quantum space $G=\widehat{\Gamma}$ is a classical space, namely the usual Pontrjagin dual of $\Gamma$ , endowed with its Haar measure.

Let us discuss now the same questions, in the $C^{*}$ -algebra setting. The situation here is more complicated than in the von Neumann algebra setting, as follows:

Proposition 7.22.

Associated to any discrete group $\Gamma$ are several group $C^{*}$ -algebras,

C^{*}(\Gamma)\to C^{*}_{\pi}(\Gamma)\to C^{*}_{red}(\Gamma)

which are constructed as follows:

(1)

$C^{*}(\Gamma)$ is the closure of the group algebra $\mathbb{C}[\Gamma]$ , with involution $g^{*}=g^{-1}$ , with respect to the maximal $C^{*}$ -seminorm on this $*$ -algebra, which is a $C^{*}$ -norm.
(2)

$C^{*}_{red}(\Gamma)$ is the norm closure of the group algebra $\mathbb{C}[\Gamma]$ in the left regular representation, on the Hilbert space $l^{2}(\Gamma)$ , given by $\lambda(g)(h)=gh$ and linearity.
(3)

$C^{*}_{\pi}(\Gamma)$ can be any intermediate $C^{*}$ -algebra, but for best results, the indexing object $\pi$ must be a unitary group representation, satisfying $\pi\otimes\pi\subset\pi$ .

Proof.

This is something quite technical, with (2) being very similar to the von Neumann algebra construction from Theorem 7.20, with (1) being something new, with the norm property there coming from (2), and finally with (3) being an informal statement, that we will comment on later, once we will know about compact quantum groups. ∎

When $\Gamma$ is finite, or abelian, or more generally amenable, all the above group algebras coincide. In the abelian case, that we are particularly interested in here, the precise result is as follows, complementing the $L^{\infty}$ analysis from Theorem 7.20:

Theorem 7.23.

When $\Gamma$ is abelian all its group $C^{*}$ -algebras coincide, and we have an isomorphism as follows, given by a Fourier type transform,

C^{*}(\Gamma)\simeq C(G)

where $G=\widehat{\Gamma}$ is the compact dual of $\Gamma$ . Moreover, this isomorphism transforms the standard group algebra trace $tr(g)=\delta_{g,1}$ into the Haar integration of $G$ .

Proof.

Since $\Gamma$ is abelian, any of its group $C^{*}$ -algebras $A=C^{*}_{\pi}(\Gamma)$ is commutative. Thus, we can apply the Gelfand theorem, and we obtain $A=C(X)$ , with $X=Spec(A)$ . But the spectrum $X=Spec(A)$ , consisting of the characters $\chi:A\to\mathbb{C}$ , can be identified by delinearizing with the Pontrjagin dual $G=\widehat{\Gamma}$ , and this gives the results. ∎

At a more advanced level now, we have the following result:

Theorem 7.24.

For a discrete group $\Gamma=<g_{1},\ldots,g_{N}>$ , the following conditions are equivalent, and if they are satisfied, we say that $\Gamma$ is amenable:

(1)

The projection map $C^{*}(\Gamma)\to C^{*}_{red}(\Gamma)$ is an isomorphism.
(2)

The morphism $\varepsilon:C^{*}(\Gamma)\to\mathbb{C}$ given by $g\to 1$ factorizes through $C^{*}_{red}(\Gamma)$ .
(3)

We have $N\in\sigma(Re(g_{1}+\ldots+g_{N}))$ , the spectrum being taken inside $C^{*}_{red}(\Gamma)$ .

The amenable groups include all finite groups, and all abelian groups. As a basic example of a non-amenable group, we have the free group $F_{N}$ , with $N\geq 2$ .

Proof.

There are several things to be proved, the idea being as follows:

(1) The implication $(1)\implies(2)$ is trivial, and $(2)\implies(3)$ comes from the following computation, which shows that $N-Re(g_{1}+\ldots+g_{N})$ is not invertible inside $C^{*}_{red}(\Gamma)$ :

$\displaystyle\varepsilon[N-Re(g_{1}+\ldots+g_{N})]$	$\displaystyle=$	$\displaystyle N-Re[\varepsilon(g_{1})+\ldots+\varepsilon(g_{n})]$
	$\displaystyle=$	$\displaystyle N-N$
	$\displaystyle=$	$\displaystyle 0$

As for $(3)\implies(1)$ , this is something more advanced, that we will not need for the moment. We will be back to this later, directly in a more general setting.

(2) The fact that any finite group $G$ is amenable is clear, because all the group $C^{*}$ -algebras are equal to the usual group $*$ -algebra $\mathbb{C}[G]$ , in this case. As for the case of the abelian groups, these are all amenable as well, as shown by Theorem 7.23.

(3) It remains to prove that $F_{N}$ with $N\geq 2$ is not amenable. By using $F_{2}\subset F_{N}$ , it is enough to do this at $N=2$ . So, consider the free group $F_{2}=<g,h>$ . In order to prove that $F_{2}$ is not amenable, we use $(1)\implies(3)$ . To be more precise, it is enough to show that 4 is not in the spectrum of the following operator:

T=\lambda(g)+\lambda(g^{-1})+\lambda(h)+\lambda(h^{-1})

This is a sum of four terms, each of them acting via $\delta_{w}\to\delta_{ew}$ , with $e$ being a certain length one word. Thus if $w\neq 1$ has length $n$ then $T(\delta_{w})$ is a sum of four Dirac masses, three of them at words of length $n+1$ and the remaining one at a length $n-1$ word. We can therefore decompose $T$ as a sum $T_{+}+T_{-}$ , where $T_{+}$ adds and $T_{-}$ cuts:

T=T_{+}+T_{-}

That is, if $w\neq 1$ is a word, say beginning with $h$ , then $T_{\pm}$ act on $\delta_{w}$ as follows:

T_{+}(\delta_{w})=\delta_{gw}+\delta_{g^{-1}w}+\delta_{hw}\quad,\quad T_{-}(\delta_{w})=\delta_{h^{-1}w}

It follows from definitions that we have $T_{+}^{*}=T_{-}$ . We can use the following trick:

(T_{+}+T_{-})^{2}+\left(i(T_{+}-T_{-})\right)^{2}=2(T_{+}T_{-}+T_{-}T_{+})

Indeed, this gives $(T_{+}+T_{-})^{2}\leq 2(T_{+}T_{-}+T_{-}T_{+})$ , and we obtain in this way:

||T||^{2}=||T_{+}+T_{-}||^{2}\leq 2||T_{+}T_{-}+T_{-}T_{+}||

Let $w\neq 1$ be a word, say beginning with $h$ . We have then:

T_{-}T_{+}(\delta_{w})=T_{-}(\delta_{gw}+\delta_{g^{-1}w}+\delta_{hw})=3\delta_{w}

The action of $T_{-}T_{+}$ on the remaining vector $\delta_{1}$ is computed as follows:

T_{-}T_{+}(\delta_{1})=T_{-}(\delta_{g}+\delta_{g^{-1}}+\delta_{h}+\delta_{h^{-1}})=4\delta_{1}

Summing up, with $P:\delta_{w}\to\delta_{1}$ being the projection onto $\mathbb{C}\delta_{1}$ , we have:

T_{-}T_{+}=3+P

On the other hand we have $T_{+}T_{-}(\delta_{1})=T_{+}(0)=0$ , so the subspace $\mathbb{C}\delta_{1}$ is invariant under the operator $T_{+}T_{-}+T_{-}T_{+}$ . We have the following norm estimate:

||T||^{2}\leq 2||T_{+}T_{-}+T_{-}T_{+}||\leq 2\cdot\max\left\{||3+P||,\,\,\,||(T_{+}T_{-}+T_{-}T_{+})(1-P)||\right\}

The norm of $3+P$ is equal to $4$ , and the other norm is estimated as follows:

$\displaystyle\|\|(T_{+}T_{-}+T_{-}T_{+})(1-P)\|\|$	$\displaystyle\leq$	$\displaystyle\|\|T_{+}T_{-}\|\|+\|\|(3+P)(1-P)\|\|$
	$\displaystyle=$	$\displaystyle\|\|T_{-}T_{+}\|\|+3$
	$\displaystyle=$	$\displaystyle 7$

Thus we have $||T||\leq\sqrt{14}<4$ , and this finishes the proof. ∎

7c. Quantum groups

The duals of discrete groups have several similarities with the compact groups, and our goal now will be that of unifying these two classes of compact quantum spaces. Let us start with the following definition, due to Woronowicz [wo1]:

Definition 7.25.

A Woronowicz algebra is a $C^{*}$ -algebra $A$ , given with a unitary matrix $u\in M_{N}(A)$ whose coefficients generate $A$ , such that the formulae

\Delta(u_{ij})=\sum_{k}u_{ik}\otimes u_{kj}\quad,\quad\varepsilon(u_{ij})=\delta_{ij}\quad,\quad S(u_{ij})=u_{ji}^{*}

define morphisms of $C^{*}$ -algebras $\Delta:A\to A\otimes A$ , $\varepsilon:A\to\mathbb{C}$ , $S:A\to A^{opp}$ .

We say that $A$ is cocommutative when $\Sigma\Delta=\Delta$ , where $\Sigma(a\otimes b)=b\otimes a$ is the flip. We have the following result, which justifies the terminology and axioms:

Proposition 7.26.

The following are Woronowicz algebras:

(1)

$C(G)$ , with $G\subset U_{N}$ compact Lie group. Here the structural maps are:

$\Delta(\varphi)=(g,h)\to\varphi(gh)\quad,\quad\varepsilon(\varphi)=\varphi(1)\quad,\quad S(\varphi)=g\to\varphi(g^{-1})$
(2)

$C^{*}(\Gamma)$ , with $F_{N}\to\Gamma$ finitely generated group. Here the structural maps are:

$\Delta(g)=g\otimes g\quad,\quad\varepsilon(g)=1\quad,\quad S(g)=g^{-1}$

Moreover, we obtain in this way all the commutative/cocommutative algebras.

Proof.

In both cases, we have to exhibit a certain matrix $u$ . For the first assertion, we can use the matrix $u=(u_{ij})$ formed by matrix coordinates of $G$ , given by:

g=\begin{pmatrix}u_{11}(g)&\ldots&u_{1N}(g)\\ \vdots&&\vdots\\ u_{N1}(g)&\ldots&u_{NN}(g)\end{pmatrix}

As for the second assertion, here we can use the diagonal matrix formed by generators, $u=diag(g_{1},\ldots,g_{N})$ . Finally, the last assertion follows from the Gelfand theorem, in the commutative case, and in the cocommutative case, we will be back to this later. ∎

In general now, the structural maps $\Delta,\varepsilon,S$ have the following properties:

Proposition 7.27.

Let $(A,u)$ be a Woronowicz algebra.

(1)

$\Delta,\varepsilon$ satisfy the usual axioms for a comultiplication and a counit, namely:

$(\Delta\otimes id)\Delta=(id\otimes\Delta)\Delta$

$(\varepsilon\otimes id)\Delta=(id\otimes\varepsilon)\Delta=id$
(2)

$S$ satisfies the antipode axiom, on the $*$ -subalgebra generated by entries of $u$ :

$m(S\otimes id)\Delta=m(id\otimes S)\Delta=\varepsilon(.)1$
(3)

In addition, the square of the antipode is the identity, $S^{2}=id$ .

Proof.

When $A$ is commutative, by using Proposition 7.26 we can write:

\Delta=m^{t}\quad,\quad\varepsilon=u^{t}\quad,\quad S=i^{t}

The above 3 conditions come then by transposition from the basic 3 group theory conditions satisfied by $m,u,i$ , which are as follows, with $\delta(g)=(g,g)$ :

m(m\times id)=m(id\times m)

m(id\times u)=m(u\times id)=id

m(id\times i)\delta=m(i\times id)\delta=1

Observe that $S^{2}=id$ is satisfied as well, coming from $i^{2}=id$ . In general now, all the formulae in the statement are satisfied on the generators $u_{ij}$ , and so by linearity, multiplicativity and continuity they are satisfied everywhere, as desired. ∎

In view of Proposition 7.26, we can formulate the following definition:

Definition 7.28.

Given a Woronowicz algebra $A$ , we formally write

A=C(G)=C^{*}(\Gamma)

and call $G$ compact quantum group, and $\Gamma$ discrete quantum group.

When $A$ is both commutative and cocommutative, $G$ is a compact abelian group, $\Gamma$ is a discrete abelian group, and these groups are dual to each other:

G=\widehat{\Gamma}\quad,\quad\Gamma=\widehat{G}

In general, we still agree to write $G=\widehat{\Gamma},\Gamma=\widehat{G}$ , in a formal sense. Finally, in relation with functoriality issues, let us complement Definitions 7.25 and 7.28 with:

Definition 7.29.

Given two Woronowicz algebras $(A,u)$ and $(B,v)$ , we write

A\simeq B

and we identify as well the corresponding compact and discrete quantum groups, when we have an isomorphism of $*$ -algebras $<u_{ij}>\simeq<v_{ij}>$ , mapping $u_{ij}\to v_{ij}$ .

In order to develop now some theory, let us call corepresentation of $A$ any unitary matrix $v\in M_{n}(\mathcal{A})$ , with $\mathcal{A}=<u_{ij}>$ , satisfying the same conditions as $u$ , namely:

\Delta(v_{ij})=\sum_{k}v_{ik}\otimes v_{kj}\quad,\quad\varepsilon(v_{ij})=\delta_{ij}\quad,\quad S(v_{ij})=v_{ji}^{*}

These can be thought of as corresponding to the unitary representations of the underlying compact quantum group $G$ . Following Woronowicz [wo1], we have:

Theorem 7.30.

Any Woronowicz algebra has a unique Haar integration functional,

\left(\int_{G}\otimes id\right)\Delta=\left(id\otimes\int_{G}\right)\Delta=\int_{G}(.)1

which can be constructed by starting with any faithful positive form $\varphi\in A^{*}$ , and setting

\int_{G}=\lim_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\varphi^{*k}

where $\phi*\psi=(\phi\otimes\psi)\Delta$ . Moreover, for any corepresentation $v\in M_{n}(\mathbb{C})\otimes A$ we have

\left(id\otimes\int_{G}\right)v=P

where $P$ is the orthogonal projection onto $Fix(v)=\{\xi\in\mathbb{C}^{n}|v\xi=\xi\}$ .

Proof.

Following [wo1], this can be done in 3 steps, as follows:

(1) Given $\varphi\in A^{*}$ , our claim is that the following limit converges, for any $a\in A$ :

\int_{\varphi}a=\lim_{n\to\infty}\frac{1}{n}\sum_{k=1}^{n}\varphi^{*k}(a)

Indeed, by linearity we can assume that $a$ is the coefficient of corepresentation, $a=(\tau\otimes id)v$ . But in this case, an elementary computation shows that we have the following formula, where $P_{\varphi}$ is the orthogonal projection onto the $1$ -eigenspace of $(id\otimes\varphi)v$ :

\left(id\otimes\int_{\varphi}\right)v=P_{\varphi}

(2) Since $v\xi=\xi$ implies $[(id\otimes\varphi)v]\xi=\xi$ , we have $P_{\varphi}\geq P$ , where $P$ is the orthogonal projection onto the space $Fix(v)=\{\xi\in\mathbb{C}^{n}|v\xi=\xi\}$ . The point now is that when $\varphi\in A^{*}$ is faithful, by using a positivity trick, one can prove that we have $P_{\varphi}=P$ . Thus our linear form $\int_{\varphi}$ is independent of $\varphi$ , and is given on coefficients $a=(\tau\otimes id)v$ by:

\left(id\otimes\int_{\varphi}\right)v=P

(3) With the above formula in hand, the left and right invariance of $\int_{G}=\int_{\varphi}$ is clear on coefficients, and so in general, and this gives all the assertions. See [wo1]. ∎

As a main application, we can develop a Peter-Weyl type theory for the corepresentations of $A$ . Consider the dense $*$ -subalgebra $\mathcal{A}\subset A$ generated by the coefficients of the fundamental corepresentation $u$ , and endow it with the following scalar product:

<a,b>=\int_{G}ab^{*}

With this convention, we have the following result, also from Woronowicz [wo1]:

Theorem 7.31.

We have the following Peter-Weyl type results:

(1)

Any corepresentation decomposes as a sum of irreducible corepresentations.
(2)

Each irreducible corepresentation appears inside a certain $u^{\otimes k}$ .
(3)

$\mathcal{A}=\bigoplus_{v\in Irr(A)}M_{\dim(v)}(\mathbb{C})$ , the summands being pairwise orthogonal.
(4)

The characters of irreducible corepresentations form an orthonormal system.

Proof.

All these results are from [wo1], the idea being as follows:

(1) Given $v\in M_{n}(A)$ , its intertwiner algebra $End(v)=\{T\in M_{n}(\mathbb{C})|Tv=vT\}$ is a finite dimensional $C^{*}$ -algebra, and so decomposes as $End(v)=M_{n_{1}}(\mathbb{C})\oplus\ldots\oplus M_{n_{r}}(\mathbb{C})$ . But this gives a decomposition of type $v=v_{1}+\ldots+v_{r}$ , as desired.

(2) Consider indeed the Peter-Weyl corepresentations, $u^{\otimes k}$ with $k$ colored integer, defined by $u^{\otimes\emptyset}=1$ , $u^{\otimes\circ}=u$ , $u^{\otimes\bullet}=\bar{u}$ and multiplicativity. The coefficients of these corepresentations span the dense algebra $\mathcal{A}$ , and by using (1), this gives the result.

(3) Here the direct sum decomposition, which is technically a $*$ -coalgebra isomorphism, follows from (2). As for the second assertion, this follows from the fact that $(id\otimes\int_{G})v$ is the orthogonal projection $P_{v}$ onto the space $Fix(v)$ , for any corepresentation $v$ .

(4) Let us define indeed the character of $v\in M_{n}(A)$ to be the matrix trace, $\chi_{v}=Tr(v)$ . Since this character is a coefficient of $v$ , the orthogonality assertion follows from (3). As for the norm 1 claim, this follows once again from $(id\otimes\int_{G})v=P_{v}$ . ∎

We can now solve a problem that we left open before, namely:

Proposition 7.32.

The cocommutative Woronowicz algebras appear as the quotients

C^{*}(\Gamma)\to A\to C^{*}_{red}(\Gamma)

given by $A=C^{*}_{\pi}(\Gamma)$ with $\pi\otimes\pi\subset\pi$ , with $\Gamma$ being a discrete group.

Proof.

This follows from the Peter-Weyl theory, and clarifies a number of things said before, notably in Proposition 7.26. Indeed, for a cocommutative Woronowicz algebra the irreducible corepresentations are all 1-dimensional, and this gives the results. ∎

As another consequence of the above results, once again by following Woronowicz [wo1], we have the following statement, dealing with functional analysis aspects, and extending what we already knew about the $C^{*}$ -algebras of the usual discrete groups:

Theorem 7.33.

Let $A_{full}$ be the enveloping $C^{*}$ -algebra of $\mathcal{A}$ , and $A_{red}$ be the quotient of $A$ by the null ideal of the Haar integration. The following are then equivalent:

(1)

The Haar functional of $A_{full}$ is faithful.
(2)

The projection map $A_{full}\to A_{red}$ is an isomorphism.
(3)

The counit map $\varepsilon:A_{full}\to\mathbb{C}$ factorizes through $A_{red}$ .
(4)

We have $N\in\sigma(Re(\chi_{u}))$ , the spectrum being taken inside $A_{red}$ .

If this is the case, we say that the underlying discrete quantum group $\Gamma$ is amenable.

Proof.

This is well-known in the group dual case, $A=C^{*}(\Gamma)$ , with $\Gamma$ being a usual discrete group. In general, the result follows by adapting the group dual case proof:

$(1)\iff(2)$ This simply follows from the fact that the GNS construction for the algebra $A_{full}$ with respect to the Haar functional produces the algebra $A_{red}$ .

$(2)\iff(3)$ Here $\implies$ is trivial, and conversely, a counit map $\varepsilon:A_{red}\to\mathbb{C}$ produces an isomorphism $A_{red}\to A_{full}$ , via a formula of type $(\varepsilon\otimes id)\Phi$ . See [wo1].

$(3)\iff(4)$ Here $\implies$ is clear, coming from $\varepsilon(N-Re(\chi(u)))=0$ , and the converse can be proved by doing some functional analysis. Once again, we refer here to [wo1]. ∎

Let us discuss now some interesting examples. Following Wang [wan], we have:

Proposition 7.34.

The following universal algebras are Woronowicz algebras,

C(O_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u=\bar{u},u^{t}=u^{-1}\right)

C(U_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u^{*}=u^{-1},u^{t}=\bar{u}^{-1}\right)

so the underlying spaces $O_{N}^{+},U_{N}^{+}$ are compact quantum groups.

Proof.

This follows from the elementary fact that if a matrix $u=(u_{ij})$ is orthogonal or biunitary, then so must be the following matrices:

u^{\Delta}_{ij}=\sum_{k}u_{ik}\otimes u_{kj}\quad,\quad u^{\varepsilon}_{ij}=\delta_{ij}\quad,\quad u^{S}_{ij}=u_{ji}^{*}

Thus, we can indeed define morphisms $\Delta,\varepsilon,S$ as in Definition 7.25, by using the universal properties of $C(O_{N}^{+})$ , $C(U_{N}^{+})$ , and this gives the result. ∎

There is a connection here with group duals, coming from:

Proposition 7.35.

Given a closed subgroup $G\subset U_{N}^{+}$ , consider its “diagonal torus”, which is the closed subgroup $T\subset G$ constructed as follows:

C(T)=C(G)\Big{/}\left<u_{ij}=0\Big{|}\forall i\neq j\right>

This torus is then a group dual, $T=\widehat{\Lambda}$ , where $\Lambda=<g_{1},\ldots,g_{N}>$ is the discrete group generated by the elements $g_{i}=u_{ii}$ , which are unitaries inside $C(T)$ .

Proof.

Since $u$ is unitary, its diagonal entries $g_{i}=u_{ii}$ are unitaries inside $C(T)$ . Moreover, from $\Delta(u_{ij})=\sum_{k}u_{ik}\otimes u_{kj}$ we obtain, when passing inside the quotient:

\Delta(g_{i})=g_{i}\otimes g_{i}

It follows that we have $C(T)=C^{*}(\Lambda)$ , modulo identifying as usual the $C^{*}$ -completions of the various group algebras, and so that we have $T=\widehat{\Lambda}$ , as claimed. ∎

With this notion in hand, we have the following result:

Theorem 7.36.

The diagonal tori of the basic rotation groups are as follows,

where $F_{N}$ is the free group on $N$ generators, and $*$ is a group-theoretical free product.

Proof.

This is clear indeed from $U_{N}^{+}$ , and the other results can be obtained by imposing to the generators of $F_{N}$ the relations defining the corresponding quantum groups. ∎

As a conclusion to all this, the $C^{*}$ -algebra theory suggests developing a theory of “noncommutative geometry”, covering both the classical and the free geometry, by using compact quantum groups. We will be back to this in chapter 8.

7d. Cuntz algebras

We would like to end this chapter with an interesting class of $C^{*}$ -algebras, discovered by Cuntz in [cun], and heavily used since then, for various technical purposes. These algebras are not obviously related to the quantum space program that we have been developing so far, and might even look like some sort of Devil’s invention, orthogonal to what is beautiful in operator algebras, but believe me, if planning to do some serious operator algebra work, you will certainly run into them. Their definition is as follows:

Definition 7.37.

The Cuntz algebra $O_{n}$ is the $C^{*}$ -algebra generated by isometries $S_{1},\ldots,S_{n}$ satisfying the following condition:

S_{1}S_{1}^{*}+\ldots+S_{n}S_{n}^{*}=1

That is, $O_{n}\subset B(H)$ is generated by $n$ isometries whose ranges sum up to $H$ .

Observe that $H$ must be infinite dimensional, in order to have isometries as above. In what follows we will prove that $O_{n}$ is independent on the choice of such isometries, and also that this algebra is simple. We will restrict the attention to the case $n=2$ , the proof in general being similar. Let us start with some simple computations, as follows:

Proposition 7.38.

Given a word $i=i_{1}\ldots i_{k}$ with $i_{l}\in\{1,2\}$ , we associate to it the element $S_{i}=S_{i_{1}}\ldots S_{i_{k}}$ of the algebra $O_{2}$ . Then $S_{i}$ are isometries, and we have

S_{i}^{*}S_{j}=\delta_{ij}1

for any two words $i,j$ having the same lenght.

Proof.

We use the relations defining the algebra $O_{2}$ , namely:

S_{1}^{*}S_{1}=S_{2}^{*}S_{2}=1\quad,\quad S_{1}S_{1}^{*}+S_{2}S_{2}^{*}=1

The fact that $S_{i}$ are isometries is clear, here being the check for $i=12$ :

S_{12}^{*}S_{12}=(S_{1}S_{2})^{*}(S_{1}S_{2})=S_{2}^{*}S_{1}^{*}S_{1}S_{2}=S_{2}^{*}S_{2}=1

Regarding the last assertion, by recurrence we just have to establish the formula there for the words of length 1. That is, we want to prove the following formulae:

S_{1}^{*}S_{2}=S_{2}^{*}S_{1}=0

But these two formulae follow from the fact that the projections $P_{i}=S_{i}S_{i}^{*}$ satisfy by definition $P_{1}+P_{2}=1$ . Indeed, we have the following computation:

$\displaystyle P_{1}+P_{2}=1$	$\displaystyle\implies$	$\displaystyle P_{1}P_{2}=0$
	$\displaystyle\implies$	$\displaystyle S_{1}S_{1}^{}S_{2}S_{2}^{}=0$
	$\displaystyle\implies$	$\displaystyle S_{1}^{}S_{2}=S_{1}^{}S_{1}S_{1}^{}S_{2}S_{2}^{}S_{2}=0$

Thus, we have the first formula, and the proof of the second one is similar. ∎

We can use the formulae in Proposition 7.38 as follows:

Proposition 7.39.

Consider words in $O_{2}$ , meaning products of $S_{1},S_{1}^{*},S_{2},S_{2}^{*}$ .

(1)

Each word in $O_{2}$ is of form $0$ or $S_{i}S_{j}^{*}$ for some words $i,j$ .
(2)

Words of type $S_{i}S_{j}^{*}$ with $l(i)=l(j)=k$ form a system of $2^{k}\times 2^{k}$ matrix units.
(3)

The algebra $A_{k}$ generated by matrix units in (2) is a subalgebra of $A_{k+1}$ .

Proof.

Here the first two assertions follow from the formulae in Proposition 7.38, and for the last assertion, we can use the following formula:

S_{i}S_{j}^{*}=S_{i}1S_{j}^{*}=S_{i}(S_{1}S_{1}^{*}+S_{2}S_{2}^{*})S_{j}^{*}

Thus, we obtain an embedding of algebras $A_{k}$ , as in the statement. ∎

Observe now that the embedding constructed in (3) above is compatible with the matrix unit systems in (2). Consider indeed the following diagram:

\begin{matrix}A_{k+1}&\simeq&M_{2^{k+1}}(\mathbb{C})\\ \ &\ &\ \\ \cup&\ &\cup\\ \ &\ &\ \\ A_{k}&\simeq&M_{2^{k}}(\mathbb{C})\end{matrix}

With the notation $e_{ix,yj}=e_{ij}\otimes e_{xy}$ , the inclusion on the right is given by:

$\displaystyle e_{ij}$	$\displaystyle\to$	$\displaystyle e_{i1,1h}+e_{i2,2j}$
	$\displaystyle=$	$\displaystyle e_{ij}\otimes e_{11}+e_{ij}\otimes e_{22}$
	$\displaystyle=$	$\displaystyle e_{ij}\otimes 1$

Thus, with standard tensor product notations, the inclusion on the right is the canonical inclusion $m\to m\otimes 1$ , and so the above diagram becomes:

\begin{matrix}A_{k+1}&\simeq&M_{2}(\mathbb{C})^{\otimes k+1}\\ \ &\ &\ \\ \cup&\ &\cup\\ \ &\ &\ \\ A_{k}&\simeq&M_{2}(\mathbb{C})^{\otimes k}\end{matrix}

The passage from the algebra $A=\cup_{k}A_{k}\simeq M_{2}(\mathbb{C})^{\otimes\infty}$ coming from this observation to the full the algebra $O_{2}$ that we are interested in can be done by using:

Proposition 7.40.

Each element $X\in<S_{1},S_{2}>\subset O_{2}$ decomposes as a finite sum

X=\sum_{i>0}S_{1}^{*i}X_{-i}+X_{0}+\sum_{i>0}X_{i}S_{1}^{i}

where each $X_{i}$ is in the union $A$ of algebras $A_{k}$ .

Proof.

By linearity and by using Proposition 7.39 we may assume that $X$ is a nonzero word, say $X=S_{i}S_{j}^{*}$ . In the case $l(i)=l(j)$ we can set $X_{0}=X$ and we are done. Otherwise, we just have to add at left or at right terms of the form $1=S_{1}^{*}S_{1}$ . For instance $X=S_{2}$ is equal to $S_{2}S_{1}^{*}S_{1}$ , and we can take $X_{1}=S_{2}S_{1}^{*}\in A_{1}$ . ∎

We must show now that the decomposition $X\to(X_{i})$ found above is unique, and then prove that each application $X\to X_{i}$ has good continuity properties. The following formulae show that in both problems we may restrict attention to the case $i=0$ :

X_{i+1}=(XS_{1}^{*})_{i}\hskip 56.9055ptX_{-i-1}=(S_{1}X)_{i}

In order to solve these questions, we use the following fact:

Proposition 7.41.

If $P$ is a nonzero projection in $\mathcal{O}_{2}=<S_{1},S_{2}>\subset O_{2}$ , its $k$ -th average, given by the formula

Q=\sum_{l(i)=k}S_{i}PS_{i}^{*}

is a nonzero projection in $\mathcal{O}_{2}$ having the property that the linear subspace $QA_{k}Q$ is isomorphic to a matrix algebra, and $Y\to QYQ$ is an isomorphism of $A_{k}$ onto it.

Proof.

We know that the words of form $S_{i}S_{j}^{*}$ with $l(i)=l(j)=k$ are a system of matrix units in $A_{k}$ . We apply to them the map $Y\to QYQ$ , and we obtain:

$\displaystyle QS_{i}S_{j}^{*}Q$	$\displaystyle=$	$\displaystyle\sum_{pq}S_{p}PS_{p}^{}S_{i}S_{j}^{}S_{q}PS_{q}^{*}$
	$\displaystyle=$	$\displaystyle\sum_{pq}\delta_{ip}\delta_{jq}S_{p}P^{2}S_{q}^{*}$
	$\displaystyle=$	$\displaystyle S_{i}PS_{j}^{*}$

The output being a system of matrix units, $Y\to QYQ$ is an isomorphism from the algebra of matrices $A_{k}$ to another algebra of matrices $QA_{k}Q$ , and this gives the result. ∎

Thus any map $Y\to QYQ$ behaves well on the $i=0$ part of the decomposition on $X$ . It remains to find $P$ such that $Y\to QYQ$ destroys all $i\neq 0$ terms, and we have here:

Proposition 7.42.

Assuming $X_{0}\in A_{k}$ , there is a nonzero projection $P\in A$ such that $QXQ=QX_{0}Q$ , where $Q$ is the $k$ -th average of $P$ .

Proof.

We want $Y\to QYQ$ to map to zero all terms in the decomposition of $X$ , except for $X_{0}$ . Let us call $M_{1},\ldots,M_{t}\in\mathcal{O}_{2}-A$ the terms to be destroyed. We want the following equalities to hold, with the sum over all pairs of length $k$ indices:

\sum_{ij}S_{i}PS_{i}^{*}M_{q}S_{j}PS_{j}^{*}=0

The simplest way is to look for $P$ such that all terms of all sums are $0$ :

S_{i}PS_{i}^{*}M_{q}S_{j}PS_{j}^{*}=0

By multiplying to the left by $S_{i}^{*}$ and to the right by $S_{j}$ , we want to have:

PS_{i}^{*}M_{q}S_{j}P=0

With $N_{z}=S_{i}^{*}M_{q}S_{j}$ , where $z$ belongs to some new index set, we want to have:

PN_{z}P=0

Since $N_{z}\in\mathcal{O}_{2}-A$ , we can write $N_{z}=S_{m_{z}}S_{n_{z}}^{*}$ with $l(m_{z})\neq l(n_{z})$ , and we want:

PS_{m_{z}}S_{n_{z}}^{*}P=0

In order to do this, we can the projections of form $P=S_{r}S_{r}^{*}$ . We want:

S_{r}S_{r}^{*}S_{m_{z}}S_{n_{z}}^{*}S_{r}S_{r}^{*}=0

Let $K$ be the biggest length of all $m_{z},n_{z}$ . Assume that we have fixed $r$ , of length bigger than $K$ . If the above product is nonzero then both $S_{r}^{*}S_{m_{z}}$ and $S_{n_{z}}^{*}S_{r}$ must be nonzero, which gives the following equalities of words:

r_{1}\ldots r_{l(m_{z})}=m_{z}\quad,\quad r_{1}\ldots r_{l(n_{z})}=n_{z}

Assuming that these equalities hold indeed, the above product reduces as follows:

S_{r}S_{r_{l(r)}}^{*}\ldots S_{r_{l(m_{z})+1}}^{*}S_{r_{l(n_{z})+1}}\ldots S_{r_{l(r)}}S_{r}^{*}

Now if this product is nonzero, the middle term must be nonzero:

S_{r_{l(r)}}^{*}\ldots S_{r_{l(m_{z})+1}}^{*}S_{r_{l(n_{z})+1}}\ldots S_{r_{l(r)}}\neq 0

In order for this for hold, the indices starting from the middle to the right must be equal to the indices starting from the middle to the left. Thus $r$ must be periodic, of period $|l(m_{z})-l(n_{z})|>0$ . But this is certainly possible, because we can take any aperiodic infinite word, and let $r$ be the sequence of first $M$ letters, with $M$ big enough. ∎

We can now start solving our problems. We first have:

Proposition 7.43.

The decomposition of $X$ is unique, and we have

||X_{i}||\leq||X||

for any $i$ .

Proof.

It is enough to do this for $i=0$ . But this follows from the previous result, via the following sequence of equalities and inequalities:

||X_{0}||=||QX_{0}Q||=||QXQ||\leq||X||

Thus we got the inequality in the statement. As for the uniqueness part, this follows from the fact that $X_{0}\to QX_{0}Q=QXQ$ is an isomorphism. ∎

Remember now we want to prove that the Cuntz algebra $O_{2}$ does not depend on the choice of the isometries $S_{1},S_{2}$ . In order to do so, let $\overline{\mathcal{O}}_{2}$ be the completion of the $*$ -algebra $\mathcal{O}_{2}=<S_{1},S_{2}>\subset O_{2}$ with respect to the biggest $C^{*}$ -norm. We have:

Proposition 7.44.

We have the equivalence

X=0\iff X_{i}=0,\forall i

valid for any element $X\in\overline{\mathcal{O}}_{2}$ .

Proof.

Assume $X_{i}=0$ for any $i$ , and choose a sequence $X^{k}\to X$ with $X^{k}\in\mathcal{O}_{2}$ . For $\lambda\in\mathbb{T}$ we define a representation $\rho_{\lambda}$ in the following way:

\rho_{\lambda}:S_{i}\to\lambda S_{i}

We have then $\rho_{\lambda}(Y)=Y$ for any element $Y\in A$ . We fix norm one vectors $\xi,\eta$ and we consider the following continuous functions $f:\mathbb{T}\to\mathbb{C}$ :

f^{k}(\lambda)=<\rho_{\lambda}(X^{k})\xi,\eta>

From $X^{k}\to X$ we get, with respect to the usual sup norm of $C(\mathbb{T})$ :

f^{k}\to f

Each $X^{k}\in\mathcal{O}_{2}$ can be decomposed, and $f^{k}$ is given by the following formula:

f^{k}(\lambda)=\sum_{i>0}\lambda^{-i}<S_{1}^{*i}X^{k}_{-i}\xi,\eta>+<X_{0}\xi,\eta>+\sum_{i>0}\lambda^{i}<X_{i}^{k}S_{1}^{i}\xi,\eta>

This is a Fourier type expansion of $f^{k}$ , that can we write in the following way:

f^{k}(\lambda)=\sum_{j=-\infty}^{\infty}a_{j}^{k}\lambda^{j}

By using Proposition 7.43 we obtain that with $k\to\infty$ , we have:

|a_{j}^{k}|\leq||X_{j}^{k}||\to||X_{j}^{\infty}||=0

On the other hand we have $a_{j}^{k}\to a_{j}$ with $k\to\infty$ . Thus all Fourier coefficients $a_{j}$ of $f$ are zero, so $f=0$ . With $\lambda=1$ this gives the following equality:

<X\xi,\eta>=0

This is true for arbitrary norm one vectors $\xi,\eta$ , so $X=0$ and we are done. ∎

We can now formulate the Cuntz theorem, from [cun], as follows:

Theorem 7.45 (Cuntz).

Let $S_{1},S_{2}$ be isometries satisfying $S_{1}S_{1}^{*}+S_{2}S_{2}^{*}=1$ .

(1)

The $C^{*}$ -algebra $O_{2}$ generated by $S_{1},S_{2}$ does not depend on the choice of $S_{1},S_{2}$ .
(2)

For any nonzero $X\in O_{2}$ there are $A,B\in O_{2}$ with $AXB=1$ .
(3)

In particular $O_{2}$ is simple.

Proof.

This basically follows from the various results established above:

(1) Consider the canonical projection map $\pi:\overline{O}_{2}\to O_{2}$ . We know that $\pi$ is surjective, and we will prove now that $\pi$ is injective. Indeed, if $\pi(X)=0$ then $\pi(X)_{i}=0$ for any $i$ . But $\pi(X)_{i}$ is in the dense $*$ -algebra $A$ , so it can be regarded as an element of $\overline{O}_{2}$ , and with this identification, we have $\pi(X)_{i}=X_{i}$ in $\overline{O}_{2}$ . Thus $X_{i}=0$ for any $i$ , so $X=0$ . Thus $\pi$ is an isomorphism. On the other hand $\overline{O}_{2}$ depends only on $\mathcal{O}_{2}$ , and the above formulae in $\mathcal{O}_{2}$ , for algebraic calculus and for decomposition of an arbitrary $X\in\mathcal{O}_{2}$ , show that $\mathcal{O}_{2}$ does not depend on the choice of $S_{1},S_{2}$ . Thus, we obtain the result.

(2) Choose a sequence $X^{k}\to X$ with $X^{k}\in\mathcal{O}_{2}$ . We have the following formula:

(X^{*}X)_{0}=\lim_{k\to\infty}\left(\sum_{i>0}X^{k*}_{-i}X^{k}_{-i}+X^{k*}_{0}X^{k}_{0}+\sum_{i>0}S_{1}^{*i}X^{k*}_{i}X^{k}_{i}S_{1}^{i}\right)

Thus $X\neq 0$ implies $(X^{*}X)_{0}\neq 0$ . By linearity we can assume that we have:

||(X^{*}X)_{0}||=1

Now choose a positive element $Y\in\mathcal{O}_{2}$ which is close enough to $X^{*}X$ :

||X^{*}X-Y||<\varepsilon

Since $Z\to Z_{0}$ is norm decreasing, we have the following estimate:

||Y_{0}||>1-\varepsilon

We apply Proposition 7.42 to our positive element $Y\in\mathcal{O}_{2}$ . We obtain in this way a certain projection $Q$ such that $QY_{0}Q=QYQ$ belongs to a certain matrix algebra. We have $QYQ>0$ , so we can diagonalize this latter element, as follows:

QYQ=\sum\lambda_{i}R_{i}

Here $\lambda_{i}$ are positive numbers and $R_{i}$ are minimal projections in the matrix algebra. Now since $||QYQ||=||Y_{0}||$ , there must be an eigenvalue greater that $1-\varepsilon$ :

\lambda_{0}>1-\varepsilon

By linear algebra, we can pass from a minimal projection to another:

U^{*}U=R_{i}\quad,\quad UU^{*}=S_{1}^{k}S_{1}^{*k}

The element $B=QU^{*}S_{1}^{k}$ has norm $\leq 1$ , and we get the following inequality:

	$\displaystyle\|\|1-B^{}X^{}XB\|\|$	$\displaystyle\leq$	$\displaystyle\|\|1-B^{}YB\|\|+\|\|B^{}YB-B^{}X^{}XB\|\|$
		$\displaystyle<$	$\displaystyle\|\|1-B^{*}YB\|\|+\varepsilon$

The last term can be computed by using the diagonalization of $QYQ$ , as follows:

$\displaystyle B^{*}YB$	$\displaystyle=$	$\displaystyle S_{1}^{k}UQYQU^{}S_{1}^{k}$
	$\displaystyle=$	$\displaystyle S_{1}^{k}\left(\sum\lambda_{i}UR_{i}U^{}\right)S_{1}^{k}$
	$\displaystyle=$	$\displaystyle\lambda_{0}S_{1}^{k}S_{1}^{k}S_{1}^{k}S_{1}^{k}$
	$\displaystyle=$	$\displaystyle\lambda_{0}$

From $\lambda_{0}>1-\varepsilon$ we get $||1-B^{*}YB||<\varepsilon$ , and we obtain the following estimate:

||1-B^{*}X^{*}XB||<2\varepsilon

Thus $B^{*}X^{*}XB$ is invertible, say with inverse $C$ , and we have $(B^{*}X^{*})X(BC)=1$ .

(3) This is clear from the formula $AXB=1$ established in (2). ∎

7e. Exercises

We have seen many things in this chapter, and there are many potential exercises, on all this. We will be however short, and as unique, key exercise, we have:

Exercise 7.46.

Work out the proof of the existence result for the Haar measure on a compact group $G$ , as a particular case of the result proved for quantum groups.

This is of course something very standard, the problem being that of eliminating algebras, linear forms and other functional analysis notions from the proof for the quantum groups, as to have in the end something talking about spaces, and measures on them.

Chapter 8 Geometric aspects

8a. Topology, K-theory

This chapter is a continuation of the previous one, meant to be a grand finale to the $C^{*}$ -algebra theory that we started to develop there, before getting back to more traditional von Neumann algebra material, following Murray, von Neumann and others. There are countless things to be said, and possible paths to be taken. En hommage to Connes, and his book [co3], which is probably the finest ever on $C^{*}$ -algebras, we will adopt a geometric viewpoint. To be more precise, we know that a $C^{*}$ -algebra is a beast of type $A=C(X)$ , with $X$ being a compact quantum space. So, it is about the “geometry” of $X$ that we would like to talk about, everything else being rather of administrative nature.

Let us first look at the classical case, where $X$ is a usual compact space. You might say right away that wrong way, what we need for doing geometry is a manifold. But my answer here is modesty, and no hurry. It is right that you cannot do much geometry with a compact space $X$ , but you can do some, and we have here, for instance:

Definition 8.1.

Given a compact space $X$ , its first $K$ -theory group $K_{0}(X)$ is the group of formal differences of complex vector bundles over $X$ .

This notion is quite interesting, and we can talk in fact about higher $K$ -theory groups $K_{n}(X)$ as well, and all this is related to the homotopy groups $\pi_{n}(X)$ too. There are many non-trivial results on the subject, the end of the game being of course that of understanding the “shape” of $X$ , that you need to know a bit about, before getting into serious geometry, in the case where $X$ happens to be a manifold.

As a question for us now, operator algebra theorists, we have:

Question 8.2.

Can we talk about the first $K$ -theory group $K_{0}(X)$ of a compact quantum space $X$ ?

We will see that this is a quite subtle question. To be more precise, we will see that we can talk, in a quite straightforward way, of the group $K_{0}(A)$ of an arbitrary $C^{*}$ -algebra $A$ , which is constructed as to have $K_{0}(A)=K_{0}(X)$ in the commutative case, where $A=C(X)$ , with $X$ being a usual compact space. In the noncommutative case, however, $K_{0}(A)$ will sometimes depend on the choice of $A$ satisfying $A=C(X)$ , and so all this will eventually lead to a sort of dead end, and to a rather “no” answer to Question 8.2.

Getting started now, in order to talk about the first $K$ -theory group $K_{0}(A)$ of an arbitrary $C^{*}$ -algebra $A$ , we will need the following simple fact:

Proposition 8.3.

Given a $C^{*}$ -algebra $A$ , the finitely generated projective $A$ -modules $E$ appear via quotient maps $f:A^{n}\to E$ , so are of the form

E=pA^{n}

with $p\in M_{n}(A)$ being an idempotent. In the commutative case, $A=C(X)$ with $X$ classical, these $A$ -modules consist of sections of the complex vector bundles over $X$ .

Proof.

Here the first assertion is clear from definitions, via some standard algebra, and the second assertion is clear from definitions too, again via some algebra. ∎

With this in hand, let us go back to Definition 8.1. Given a compact space $X$ , it is now clear that its $K$ -theory group $K_{0}(X)$ can be recaptured from the knowledge of the associated $C^{*}$ -algebra $A=C(X)$ , and to be more precise we have $K_{0}(X)=K_{0}(A)$ , when the first $K$ -theory group of an arbitrary $C^{*}$ -algebra is constructed as follows:

Definition 8.4.

The first $K$ -theory group of a $C^{*}$ -algebra $A$ is the group of formal differences

K_{0}(A)=\big{\{}p-q\big{\}}

of equivalence classes of projections $p\in M_{n}(A)$ , with the equivalence being given by

p\sim q\iff\exists u,uu^{*}=p,u^{*}u=q

and with the additive structure being the obvious one, by diagonal concatenation.

This is very nice, and as a first example, we have $K_{0}(\mathbb{C})=\mathbb{Z}$ . More generally, as already mentioned above, it follows from Proposition 8.3 that in the commutative case, where $A=C(X)$ with $X$ being a compact space, we have $K_{0}(A)=K_{0}(X)$ . Observe also that we have, by definition, the following formula, valid for any $n\in\mathbb{N}$ :

K_{0}(A)=K_{0}(M_{n}(A))

Some further elementary observations include the fact that $K_{0}$ behaves well with respect to direct sums and with inductive limits, and also that $K_{0}$ is a homotopy invariant, and for details here, we refer to any introductory book on the subject, such as [bla].

In what concerns us, back to our Question 8.2, what has been said above is certainly not enough for investigating our question, and we need more examples. However, these examples are not easy to find, and for getting them, we need more theory. We have:

Definition 8.5.

The second $K$ -theory group of a $C^{*}$ -algebra $A$ is the group of connected components of the unitary group of $GL_{\infty}(A)$ , with

GL_{n}(A)\subset GL_{n+1}(A)\quad,\quad a\to\begin{pmatrix}a&0\\ 0&1\end{pmatrix}

being the embeddings producing the inductive limit $GL_{\infty}(A)$ .

Again, for a basic example we can take $A=\mathbb{C}$ , and we have here $K_{1}(\mathbb{C})=\{1\}$ , trivially. In fact, in the commutative case, where $A=C(X)$ , with $X$ being a usual compact space, it is possible to establish a formula of type $K_{1}(A)=K_{1}(X)$ . Further elementary observations include the fact that $K_{1}$ behaves well with respect to direct sums and with inductive limits, and also that $K_{1}$ is a homotopy invariant.

Importantly, the first and second $K$ -theory groups are related, as follows:

Theorem 8.6.

Given a $C^{*}$ -algebra $A$ , we have isomorphisms as follows, with

SA=\left\{f\in C([0,1],A)\Big{|}f(0)=0\right\}

standing for the suspension operation for the $C^{*}$ -algebras:

(1)

$K_{1}(A)=K_{0}(SA)$ .
(2)

$K_{0}(A)=K_{1}(SA)$ .

Proof.

Here the isomorphism in (1) is something rather elementary, and the isomorphism in (2) is something more complicated. In both cases, the idea is to start first with the commutative case, where $A=C(X)$ with $X$ being a compact space, and understand there the isomorphisms (1,2), called Bott periodicity isomorphisms. Then, with this understood, the extension to the general $C^{*}$ -algebra case is quite straightforward. ∎

The above result is quite interesting, making it clear that the groups $K_{0},K_{1}$ are of the same nature. In fact, it is possible to be a bit more abstract here, and talk in various clever ways about the higher $K$ -theory groups, $K_{n}(A)$ with $n\in\mathbb{N}$ , of an arbitrary $C^{*}$ -algebra, with the result that these higher $K$ -theory groups are subject to Bott periodicity:

K_{n}(A)=K_{n+2}(A)

However, in practice, this leads us back to Definition 8.4, Definition 8.5 and Theorem 8.6, with these statements containing in fact all we need to know, at $n=0,1$ .

Going ahead with examples, following Cuntz [cun] and related papers, we have:

Theorem 8.7.

The $K$ -theory groups of the Cuntz algebra $O_{n}$ are given by

K_{0}(O_{n})=\mathbb{Z}_{n-1}\quad,\quad K_{1}(O_{n})=\{1\}

with the equivalent projections $P_{i}=S_{i}S_{i}^{*}$ standing for the standard generator of $\mathbb{Z}_{n-1}$ .

Proof.

We recall that the Cuntz algebra $O_{n}$ is generated by isometries $S_{1},\ldots,S_{n}$ satisfying $S_{1}S_{1}^{*}+\ldots+S_{n}S_{n}^{*}=1$ . Since we have $S_{i}^{*}S_{i}=1$ , with $P_{i}=S_{i}S_{i}^{*}$ , we have:

P_{1}\sim\ldots\sim P_{n}\sim 1

On the other hand, we also know that we have $P_{1}+\ldots+P_{n}=1$ , and the conclusion is that, in the first $K$ -theory group $K_{1}(O_{n})$ , the following equality happens:

n[1]=[1]

Thus $(n-1)[1]=0$ , and it is quite elementary to prove that $k[1]=0$ happens in fact precisely when $k$ is a multiple of $n-1$ . Thus, we have a group embedding, as follows:

\mathbb{Z}_{n-1}\subset K_{0}(O_{n})

The whole point now is that of proving that this group embedding is an isomorphism, which in practice amounts in proving that any projection in $O_{n}$ is equivalent to a sum of the form $P_{1}+\ldots+P_{k}$ , with $P_{i}=S_{i}S_{i}^{*}$ as above. Which is something non-trivial, requiring the use of Bott periodicity, and the consideration of the second $K$ -theory group $K_{1}(O_{n})$ as well, and for details here, we refer to Cuntz [cun] and related papers. ∎

The above result is very interesting, for various reasons. First, it shows that the structure of the first $K$ -theory groups $K_{0}(A)$ of the arbitrary $C^{*}$ -algebras can be more complicated than that of the first $K$ -theory groups $K_{0}(X)$ of the usual compact spaces $X$ , with the group $K_{0}(A)$ being for instance not ordered, in the case $A=O_{n}$ , and with this being the first in a series of no-go observations that can be formulated.

Second, and on a positive note now, what we have in Theorem 8.7 is a true noncommutative computation, dealing with an algebra which is rather of “free” type. The outcome of the computation is something nice and clear, suggesting that, modulo the small technical issues mentioned above, we are on our way of developing a nice theory, and that the answer to Question 8.2 might be “yes”. However, as bad news, we have:

Theorem 8.8.

There are discrete groups $\Gamma$ having the property that the projection

\pi:C^{*}(\Gamma)\to C^{*}_{red}(\Gamma)

is not an isomorphism, at the level of $K$ -theory groups.

Proof.

For constructing such a counterexample, the group $\Gamma$ must be definitely non-amenable, and the first thought goes to the free group $F_{2}$ . But it is possible to prove that $F_{2}$ is $K$ -amenable, in the sense that $\pi$ is an isomorphism at the $K$ -theory level. However, counterexamples do exist, such as the infinite groups $\Gamma$ having Kazhdan’s property $(T)$ . Indeed, for such a group the asssociated Kazhdan projection $p\in K_{0}(C^{*}(\Gamma))$ is nonzero, while mapping to the zero element $0\in K_{0}(C^{*}_{red}(\Gamma))$ , so we have our counterexample. ∎

As a conclusion to all this, which might seem a bit dissapointing, we have:

Conclusion 8.9.

The answer to Question 8.2 is no.

Of course, the answer to Question 8.2 remains “yes” in many cases, the general idea being that, as long as we don’t get too far away from the classical case, the answer remains “yes”, so we can talk about the $K$ -theory groups of our compact quantum spaces $X$ , and also, about countless other invariants inspired from the classical theory. For a survey of what can be done here, including applications too, we refer to Connes’ book [co3].

In what concerns us, however, we will not take this path. For various reasons, coming from certain quantum physics beliefs, which can be informally summarized as “at sufficiently tiny scales, freeness rules”, we will be rather interested in this book in compact quantum spaces $X$ which are of “free” type, and we will only accept geometric invariants for them which are well-defined. And $K$ -theory, unfortunately, does not qualify.

8b. Free probability

As a solution to the difficulties met in the previous section, let us turn to probability. This is surely not geometry, in a standard sense, but at a more advanced level, geometry that is. For instance if you have a quantum manifold $X$ , and you want to talk about its Laplacian, or its Dirac operator, you will certainly need to know a bit about $L^{2}(X)$ . And isn’t advanced measure theory the same as probability theory, hope we agree on this.

Let us start our discussion with something that we know since chapter 5:

Definition 8.10.

Let $A$ be a $C^{*}$ -algebra, given with a trace $tr:A\to\mathbb{C}$ .

(1)

The elements $a\in A$ are called random variables.
(2)

The moments of such a variable are the numbers $M_{k}(a)=tr(a^{k})$ .
(3)

The law of such a variable is the functional $\mu:P\to tr(P(a))$ .

Here the exponent $k=\circ\bullet\bullet\circ\ldots$ is as before a colored integer, with the powers $a^{k}$ being defined by multiplicativity and the usual formulae, namely:

a^{\emptyset}=1\quad,\quad a^{\circ}=a\quad,\quad a^{\bullet}=a^{*}

As for the polynomial $P$ , this is a noncommuting $*$ -polynomial in one variable:

P\in\mathbb{C}<X,X^{*}>

Generally speaking, the above definition is something quite abstract, but there is no other way of doing things, at least at this level of generality. However, in the special case where our variable $a\in A$ is self-adjoint, or more generally normal, we have:

Proposition 8.11.

The law of a normal variable $a\in A$ can be identified with the corresponding spectral measure $\mu\in\mathcal{P}(\mathbb{C})$ , according to the following formula,

tr(f(a))=\int_{\sigma(a)}f(x)d\mu(x)

valid for any $f\in L^{\infty}(\sigma(a))$ , coming from the measurable functional calculus. In the self-adjoint case the spectral measure is real, $\mu\in\mathcal{P}(\mathbb{R})$ .

Proof.

This is something that we again know well, either from chapter 5, or simply from chapter 3, coming from the spectral theorem for normal operators. ∎

Let us discuss now independence, and its noncommutative versions. As a starting point, we have the following update of the classical notion of independence:

Definition 8.12.

We call two subalgebras $B,C\subset A$ independent when the following condition is satisfied, for any $x\in B$ and $y\in C$ :

tr(xy)=tr(x)tr(y)

Equivalently, the following condition must be satisfied, for any $x\in B$ and $y\in C$ :

tr(x)=tr(y)=0\implies tr(xy)=0

Also, $b,c\in A$ are called independent when $B=<b>$ and $C=<c>$ are independent.

It is possible to develop some theory here, but this leads to the usual CLT. As a much more interesting notion now, we have Voiculescu’s freeness [vo1]:

Definition 8.13.

Given a pair $(A,tr)$ , we call two subalgebras $B,C\subset A$ free when the following condition is satisfied, for any $x_{i}\in B$ and $y_{i}\in C$ :

tr(x_{i})=tr(y_{i})=0\implies tr(x_{1}y_{1}x_{2}y_{2}\ldots)=0

Also, $b,c\in A$ are called free when $B=<b>$ and $C=<c>$ are free.

As a first observation, there is a certain lack of symmetry between Definition 8.12 and Definition 8.13, because the latter does not include an explicit formula for quantities of type $tr(x_{1}y_{1}x_{2}y_{2}\ldots)$ . But this can be done, the precise result being as follows:

Proposition 8.14.

If $B,C\subset A$ are free, the restriction of $tr$ to $<B,C>$ can be computed in terms of the restrictions of $tr$ to $B,C$ . To be more precise, we have

tr(x_{1}y_{1}x_{2}y_{2}\ldots)=P\Big{(}\{tr(x_{i_{1}}x_{i_{2}}\ldots)\}_{i},\{tr(y_{j_{1}}y_{j_{2}}\ldots)\}_{j}\Big{)}

where $P$ is certain polynomial, depending on the length of $x_{1}y_{1}x_{2}y_{2}\ldots\,$ , having as variables the traces of products $x_{i_{1}}x_{i_{2}}\ldots$ and $y_{j_{1}}y_{j_{2}}\ldots\,$ , with $i_{1}<i_{2}<\ldots$ and $j_{1}<j_{2}<\ldots$

Proof.

With $x^{\prime}=x-tr(x)$ , we can start our computation as follows:

$\displaystyle tr(x_{1}y_{1}x_{2}y_{2}\ldots)$	$\displaystyle=$	$\displaystyle tr\big{[}(x_{1}^{\prime}+tr(x_{1}))(y_{1}^{\prime}+tr(y_{1}))(x_{2}^{\prime}+tr(x_{2}))\ldots\big{]}$
	$\displaystyle=$	$\displaystyle tr(x_{1}^{\prime}y_{1}^{\prime}x_{2}^{\prime}y_{2}^{\prime}\ldots)+{\rm other\ terms}$
	$\displaystyle=$	$\displaystyle{\rm other\ terms}$

Thus, we are led to a kind of recurrence, and this gives the result. ∎

Let us discuss now some examples of independence and freeness. We first have the following result, from [vo1], which is something elementary:

Proposition 8.15.

Given two algebras $(A,tr)$ and $(B,tr)$ , the following hold:

(1)

$A,B$ are independent inside their tensor product $A\otimes B$ , endowed with its canonical tensor product trace, given on basic tensors by $tr(a\otimes b)=tr(a)tr(b)$ .
(2)

$A,B$ are free inside their free product $A*B$ , endowed with its canonical free product trace, given by the formulae in Proposition 8.14.

Proof.

Both the assertions are indeed clear from definitions, with just some standard discussion needed for (2), in connection with the free product trace. See [vo1]. ∎

More concretely now, we have the following result, also from Voiculescu [vo1]:

Proposition 8.16.

We have the following results, valid for group algebras:

(1)

$L(\Gamma),L(\Lambda)$ are independent inside $L(\Gamma\times\Lambda)$ .
(2)

$L(\Gamma),L(\Lambda)$ are free inside $L(\Gamma*\Lambda)$ .

Proof.

In order to prove these results, we can use the general results in Proposition 8.15, along with the following two isomorphisms, which are both standard:

L(\Gamma\times\Lambda)=L(\Lambda)\otimes L(\Gamma)\quad,\quad L(\Gamma*\Lambda)=L(\Lambda)*L(\Gamma)

Alternatively, we can check the independence and freeness formulae on group elements, which is something trivial, and then conclude by linearity. See [vo1]. ∎

We have already seen limiting theorems in classical probability, in chapter 6. In order to deal now with freeness, let us develop some tools. First, we have:

Proposition 8.17.

We have a well-defined operation $\boxplus$ , given by

\mu_{a}\boxplus\mu_{b}=\mu_{a+b}

with $a,b$ being free, called free convolution.

Proof.

We need to check here that if $a,b$ are free, then the distribution $\mu_{a+b}$ depends only on the distributions $\mu_{a},\mu_{b}$ . But for this purpose, we can use the formula in Proposition 8.14. Indeed, by plugging in arbitrary powers of $a,b$ as variables $x_{i},y_{j}$ , we obtain a family of formulae of the following type, with $Q$ being certain polyomials:

tr(a^{k_{1}}b^{l_{1}}a^{k_{2}}b^{l_{2}}\ldots)=P\Big{(}\{tr(a^{k})\}_{k},\{tr(b^{l})\}_{l}\Big{)}

Thus the moments of $a+b$ depend only on the moments of $a,b$ , and the same argument shows that the same holds for $*$ -moments, and this gives the result. ∎

In order to advance now, we would need an analogue of the Fourier transform, or rather of the log of the Fourier transform. Quite remarkably, such a transform exists indeed, the precise result here, due to Voiculescu [vo1], being as follows:

Theorem 8.18.

Given a probability measure $\mu$ , define its $R$ -transform as follows:

G_{\mu}(\xi)=\int_{\mathbb{R}}\frac{d\mu(t)}{\xi-t}\implies G_{\mu}\left(\ R_{\mu}(\xi)+\frac{1}{\xi}\right)=\xi

The free convolution operation is then linearized by the $R$ -transform.

Proof.

This is something quite tricky, the idea being as follows:

(1) In order to model the free convolution, the best is to use creation operators on free Fock spaces, corresponding to the semigroup von Neumann algebras $L(\mathbb{N}^{*k})$ . Indeed, we have some freeness here, a bit in the same way as in the free group algebras $L(F_{k})$ .

(2) The point now, motivating this choice, is that the variables of type $S^{*}+f(S)$ , with $S\in L(\mathbb{N})$ being the shift, and with $f\in\mathbb{C}[X]$ being an arbitrary polynomial, are easily seen to model in moments all the possible distributions $\mu:\mathbb{C}[X]\to\mathbb{C}$ .

(3) Now let $f,g\in\mathbb{C}[X]$ and consider the variables $S^{*}+f(S)$ and $T^{*}+g(T)$ , where $S,T\in L(\mathbb{N}*\mathbb{N})$ are the shifts corresponding to the generators of $\mathbb{N}*\mathbb{N}$ . These variables are free, and by using a $45^{\circ}$ argument, their sum has the same law as $S^{*}+(f+g)(S)$ .

(4) Thus the operation $\mu\to f$ linearizes the free convolution. We are therefore left with a computation inside $L(\mathbb{N})$ , which is elementary, and whose conclusion is that $R_{\mu}=f$ can be recaptured from $\mu$ via the Cauchy transform $G_{\mu}$ , as in the statement. ∎

With the above linearization technology in hand, we can now establish the following remarkable free analogue of the CLT, also due to Voiculescu [vo1]:

Theorem 8.19 (Free CLT).

Given self-adjoint variables $x_{1},x_{2},x_{3},\ldots,$ which are f.i.d., centered, with variance $t>0$ , we have, with $n\to\infty$ , in moments,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i}\sim\gamma_{t}

where $\gamma_{t}=\frac{1}{2\pi t}\sqrt{4t-x^{2}}dx$ is the Wigner semicircle law of parameter $t$ .

Proof.

We follow the same idea as in the proof of the CLT:

(1) At $t=1$ , the $R$ -transform of the variable in the statement can be computed by using the linearization property from Theorem 8.18, and is given by:

R(\xi)=nR_{x}\left(\frac{\xi}{\sqrt{n}}\right)\simeq\xi

(2) On the other hand, some standard computations show that the Cauchy transform of the Wigner law $\gamma_{1}$ satisfies the following equation:

G_{\gamma_{1}}\left(\xi+\frac{1}{\xi}\right)=\xi

Thus, by using Theorem 8.18, we have the following formula:

R_{\gamma_{1}}(\xi)=\xi

(3) We conclude that the laws in the statement have the same $R$ -transforms, and so they are equal. The passage to the general case, $t>0$ , is routine, by dilation. ∎

In the complex case now, we have a similar result, also from [vo1], as follows:

Theorem 8.20 (Free CCLT).

Given random variables $x_{1},x_{2},x_{3},\ldots$ which are f.i.d., centered, with variance $t>0$ , we have, with $n\to\infty$ , in moments,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}x_{i}\sim\Gamma_{t}

where $\Gamma_{t}=law\big{(}(a+ib)/\sqrt{2}\big{)}$ , with $a,b$ being free, each following the Wigner semicircle law $\gamma_{t}$ , is the Voiculescu circular law of parameter $t$ .

Proof.

This follows indeed from the free CLT, established before, simply by taking real and imaginary parts of all the variables involved. ∎

Now that we are done with the basic results in continuous case, let us discuss as well the discrete case. We can establish a free version of the PLT, as follows:

Theorem 8.21 (Free PLT).

The following limit converges, for any $t>0$ ,

\lim_{n\to\infty}\left(\left(1-\frac{t}{n}\right)\delta_{0}+\frac{t}{n}\delta_{1}\right)^{\boxplus n}

and we obtain the Marchenko-Pastur law of parameter $t$ ,

\pi_{t}=\max(1-t,0)\delta_{0}+\frac{\sqrt{4t-(x-1-t)^{2}}}{2\pi x}\,dx

also called free Poisson law of parameter $t$ .

Proof.

Let $\mu$ be the measure in the statement, appearing under the convolution sign. The Cauchy transform of this measure is elementary to compute, given by:

G_{\mu}(\xi)=\left(1-\frac{t}{n}\right)\frac{1}{\xi}+\frac{t}{n}\cdot\frac{1}{\xi-1}

By using Theorem 8.18, we want to compute the following $R$ -transform:

R=R_{\mu^{\boxplus n}}(y)=nR_{\mu}(y)

We know that the equation for this function $R$ is as follows:

\left(1-\frac{t}{n}\right)\frac{1}{y^{-1}+R/n}+\frac{t}{n}\cdot\frac{1}{y^{-1}+R/n-1}=y

With $n\to\infty$ we obtain from this the following formula:

R=\frac{t}{1-y}

But this being the $R$ -transform of $\pi_{t}$ , via some calculus, we are done. ∎

As a first application now of all this, following Voiculescu [vo2], we have:

Theorem 8.22.

Given a sequence of complex Gaussian matrices $Z_{N}\in M_{N}(L^{\infty}(X))$ , having independent $G_{t}$ variables as entries, with $t>0$ , we have

\frac{Z_{N}}{\sqrt{N}}\sim\Gamma_{t}

in the $N\to\infty$ limit, with the limiting measure being Voiculescu’s circular law.

Proof.

We know from chapter 6 that the asymptotic moments are:

M_{k}\left(\frac{Z_{N}}{\sqrt{N}}\right)\simeq t^{|k|/2}|\mathcal{NC}_{2}(k)|

On the other hand, the free Fock space analysis done in the proof of Theorem 8.18 shows that we have, with the notations there, the following formulae:

S+S^{*}\sim\gamma_{1}\quad,\quad S+T^{*}\sim\Gamma_{1}

By doing some combinatorics, this shows that an abstract noncommutative variable $a\in A$ is circular, following the law $\Gamma_{t}$ , precisely when its moments are:

M_{k}(a)=t^{|k|/2}|\mathcal{NC}_{2}(k)|

Thus, we are led to the conclusion in the statement. See [vo2]. ∎

Next in line, comes the main result of Voiculescu in [vo2], as follows:

Theorem 8.23.

Given a family of sequences of Wigner matrices,

Z^{i}_{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with pairwise independent entries, each following the complex normal law $G_{t}$ , with $t>0$ , up to the constraint $Z_{N}^{i}=(Z_{N}^{i})^{*}$ , the rescaled sequences of matrices

\frac{Z^{i}_{N}}{\sqrt{N}}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with $N\to\infty$ semicircular, each following the Wigner law $\gamma_{t}$ , and free.

Proof.

We can assume that we are dealing with 2 sequences of matrices, $Z_{N},Z_{N}^{\prime}$ . In order to prove the asymptotic freeness, consider the following matrix:

Y_{N}=\frac{1}{\sqrt{2}}(Z_{N}+iZ_{N}^{\prime})

This is then a complex Gaussian matrix, so by using Theorem 8.22, we have:

\frac{Y_{N}}{\sqrt{N}}\sim\Gamma_{t}

We are therefore in the situation where $(Z_{N}+iZ_{N}^{\prime})/\sqrt{N}$ , which has asymptotically semicircular real and imaginary parts, converges to the distribution of a free combination of such variables. Thus $Z_{N},Z_{N}^{\prime}$ become asymptotically free, as desired. ∎

Getting now to the complex case, we have a similar result here, as follows:

Theorem 8.24.

Given a family of sequences of complex Gaussian matrices,

Z^{i}_{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with pairwise independent entries, each following the law $G_{t}$ , with $t>0$ , the matrices

\frac{Z^{i}_{N}}{\sqrt{N}}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with $N\to\infty$ circular, each following the Voiculescu law $\Gamma_{t}$ , and free.

Proof.

This follows indeed from Theorem 8.23, which applies to the real and imaginary parts of our complex Gaussian matrices, and gives the result. ∎

Finally, we have as well a similar result for the Wishart matrices, as follows:

Theorem 8.25.

Given a family of sequences of complex Wishart matrices,

Z^{i}_{N}=Y^{i}_{N}(Y^{i}_{N})^{*}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

with each $Y^{i}_{N}$ being a $N\times M$ matrix, with entries following the normal law $G_{1}$ , and with all these entries being pairwise independent, the rescaled sequences of matrices

\frac{Z^{i}_{N}}{N}\in M_{N}(L^{\infty}(X))\quad,\quad i\in I

become with $M=tN\to\infty$ Marchenko-Pastur, each following the law $\pi_{t}$ , and free.

Proof.

Here the first assertion is the Marchenko-Pastur theorem, from chapter 6, and the second assertion follows from Theorem 8.23, or from Theorem 8.24. ∎

Let us develop now some further limiting theorems, classical and free. We have the following definition, extending the Poisson limit theory developed before:

Definition 8.26.

Associated to any compactly supported positive measure $\rho$ on $\mathbb{C}$ are the probability measures

p_{\rho}=\lim_{n\to\infty}\left(\left(1-\frac{c}{n}\right)\delta_{0}+\frac{1}{n}\rho\right)^{*n}\quad,\quad\pi_{\rho}=\lim_{n\to\infty}\left(\left(1-\frac{c}{n}\right)\delta_{0}+\frac{1}{n}\rho\right)^{\boxplus n}

where $c=mass(\rho)$ , called compound Poisson and compound free Poisson laws.

In what follows we will be interested in the case where $\rho$ is discrete, as is for instance the case for $\rho=t\delta_{1}$ with $t>0$ , which produces the Poisson and free Poisson laws. The following result allows one to detect compound Poisson/free Poisson laws:

Proposition 8.27.

For $\rho=\sum_{i=1}^{s}c_{i}\delta_{z_{i}}$ with $c_{i}>0$ and $z_{i}\in\mathbb{C}$ , we have

F_{p_{\rho}}(y)=\exp\left(\sum_{i=1}^{s}c_{i}(e^{iyz_{i}}-1)\right)\quad,\quad R_{\pi_{\rho}}(y)=\sum_{i=1}^{s}\frac{c_{i}z_{i}}{1-yz_{i}}

where $F,R$ denote respectively the Fourier transform, and Voiculescu’s $R$ -transform.

Proof.

Let $\mu_{n}$ be the measure appearing in Definition 8.26. We have:

	$\displaystyle F_{\mu_{n}}(y)=\left(1-\frac{c}{n}\right)+\frac{1}{n}\sum_{i=1}^{s}c_{i}e^{iyz_{i}}$	$\displaystyle\implies$	$\displaystyle F_{\mu_{n}^{*n}}(y)=\left(\left(1-\frac{c}{n}\right)+\frac{1}{n}\sum_{i=1}^{s}c_{i}e^{iyz_{i}}\right)^{n}$
		$\displaystyle\implies$	$\displaystyle F_{p_{\rho}}(y)=\exp\left(\sum_{i=1}^{s}c_{i}(e^{iyz_{i}}-1)\right)$

In the free case we can use a similar method, and we obtain the above formula. ∎

We have the following result, providing an alternative to Definition 8.26, which will be our formulation here of the Compond Poisson Limit Theorem, classical and free:

Theorem 8.28 (CPLT).

For $\rho=\sum_{i=1}^{s}c_{i}\delta_{z_{i}}$ with $c_{i}>0$ and $z_{i}\in\mathbb{C}$ , we have

p_{\rho}/\pi_{\rho}={\rm law}\left(\sum_{i=1}^{s}z_{i}\alpha_{i}\right)

where the variables $\alpha_{i}$ are Poisson/free Poisson $(c_{i})$ , independent/free.

Proof.

This follows indeed from the fact that the the Fourier/ $R$ -transform of the variable in the statement is given by the formulae in Proposition 8.27. ∎

Following [bb+], [bbc], we will be interested here in the main examples of classical and free compound Poisson laws, which are constructed as follows:

Definition 8.29.

The Bessel and free Bessel laws are the compound Poisson laws

b^{s}_{t}=p_{t\varepsilon_{s}}\quad,\quad\beta^{s}_{t}=\pi_{t\varepsilon_{s}}

where $\varepsilon_{s}$ is the uniform measure on the $s$ -th roots unity. In particular:

(1)

At $s=1$ we obtain the usual Poisson and free Poisson laws, $p_{t},\pi_{t}$ .
(2)

At $s=2$ we obtain the “real” Bessel and free Bessel laws, denoted $b_{t},\beta_{t}$ .
(3)

At $s=\infty$ we obtain the “complex” Bessel and free Bessel laws, denoted $B_{t},\mathfrak{B}_{t}$ .

There is a lot of theory regarding these laws, and we refer here to [bb+], [bbc], where these laws were introduced. We will be back to these laws, in a moment.

8c. Algebraic manifolds

We are now ready, or almost, to develop some basic noncommutative geometry. The idea will be that of further building on the material from chapter 7, by enlarging the class of compact quantum groups studied there, with the consideration of quantum homogeneous spaces, $X=G/H$ , and with classical and free probability as our main tools.

But let us start with something intuitive, namely basic algebraic geometry, in a basic sense. The simplest compact manifolds that we know are the spheres, and if we want to have free analogues of these spheres, there are not many choices here, and we have:

Definition 8.30.

We have compact quantum spaces, constructed as follows,

C(S^{N-1}_{\mathbb{R},+})=C^{*}\left(x_{1},\ldots,x_{N}\Big{|}x_{i}=x_{i}^{*},\sum_{i}x_{i}^{2}=1\right)

C(S^{N-1}_{\mathbb{C},+})=C^{*}\left(x_{1},\ldots,x_{N}\Big{|}\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1\right)

called respectively free real sphere, and free complex sphere.

Observe that our spheres are indeed well-defined, due to the following estimate:

||x_{i}||^{2}=||x_{i}x_{i}^{*}||\leq\left|\left|\sum_{i}x_{i}x_{i}^{*}\right|\right|=1

Given a compact quantum space $X$ , meaning as usual the abstract spectrum of a $C^{*}$ -algebra, we define its classical version to be the classical space $X_{class}$ obtained by dividing $C(X)$ by its commutator ideal, then applying the Gelfand theorem:

C(X_{class})=C(X)/I\quad,\quad I=<[a,b]>

Observe that we have an embedding of compact quantum spaces $X_{class}\subset X$ . In this situation, we also say that $X$ appears as a “liberation” of $X$ . We have:

Proposition 8.31.

We have embeddings of compact quantum spaces

and the spaces on the right appear as liberations of the spaces of the left.

Proof.

In order to prove this, we must establish the following isomorphisms:

C(S^{N-1}_{\mathbb{R}})=C^{*}_{comm}\left(x_{1},\ldots,x_{N}\Big{|}x_{i}=x_{i}^{*},\sum_{i}x_{i}^{2}=1\right)

C(S^{N-1}_{\mathbb{C}})=C^{*}_{comm}\left(x_{1},\ldots,x_{N}\Big{|}\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1\right)

But these isomorphisms are both clear, by using the Gelfand theorem. ∎

We can now introduce a broad class of compact quantum manifolds, as follows:

Definition 8.32.

A real algebraic submanifold $X\subset S^{N-1}_{\mathbb{C},+}$ is a closed quantum space defined, at the level of the corresponding $C^{*}$ -algebra, by a formula of type

C(X)=C(S^{N-1}_{\mathbb{C},+})\Big{/}\Big{<}f_{i}(x_{1},\ldots,x_{N})=0\Big{>}

for certain noncommutative polynomials $f_{i}\in\mathbb{C}<X_{1},\ldots,X_{N}>$ . We identify two such manifolds, $X\simeq Y$ , when we have an isomorphism of $*$ -algebras of coordinates

\mathcal{C}(X)\simeq\mathcal{C}(Y)

mapping standard coordinates to standard coordinates.

In practice, while our assumption $X\subset S^{N-1}_{\mathbb{C},+}$ is definitely something technical, we are not losing much when imposing it, and we have the following list of examples:

Proposition 8.33.

The following are algebraic submanifolds $X\subset S^{N-1}_{\mathbb{C},+}$ :

(1)

The spheres $S^{N-1}_{\mathbb{R}}\subset S^{N-1}_{\mathbb{C}},S^{N-1}_{\mathbb{R},+}\subset S^{N-1}_{\mathbb{C},+}$ .
(2)

Any compact Lie group, $G\subset U_{n}$ , with $N=n^{2}$ .
(3)

The duals $\widehat{\Gamma}$ of finitely generated groups, $\Gamma=<g_{1},\ldots,g_{N}>$ .
(4)

More generally, the closed subgroups $G\subset U_{n}^{+}$ , with $N=n^{2}$ .

Proof.

These facts are all well-known, the proofs being as follows:

(1) This is indeed true by definition of our various spheres.

(2) Given a closed subgroup $G\subset U_{n}$ , we have an embedding $G\subset S^{N-1}_{\mathbb{C}}$ , with $N=n^{2}$ , given in double indices by $x_{ij}=u_{ij}/\sqrt{n}$ , that we can further compose with the standard embedding $S^{N-1}_{\mathbb{C}}\subset S^{N-1}_{\mathbb{C},+}$ . As for the fact that we obtain indeed a real algebraic manifold, this is standard too, coming either from Lie theory or from Tannakian duality.

(3) Given a group $\Gamma=<g_{1},\ldots,g_{N}>$ , consider the variables $x_{i}=g_{i}/\sqrt{N}$ . These variables satisfy then the quadratic relations $\sum_{i}x_{i}x_{i}^{*}=\sum_{i}x_{i}^{*}x_{i}=1$ defining $S^{N-1}_{\mathbb{C},+}$ , and the algebricity claim for the manifold $\widehat{\Gamma}\subset S^{N-1}_{\mathbb{C},+}$ is clear.

(4) Given a closed subgroup $G\subset U_{n}^{+}$ , we have indeed an embedding $G\subset S^{N-1}_{\mathbb{C},+}$ , with $N=n^{2}$ , given by $x_{ij}=u_{ij}/\sqrt{n}$ . As for the fact that we obtain indeed a real algebraic manifold, this comes from the Tannakian duality results in [mal], [wo2]. ∎

Summarizing, what we have in Definition 8.32 is something quite fruitful, covering many interesting examples. In addition, all this is nice too at the axiomatic level, because the equivalence relation for our algebraic manifolds, as formulated in Definition 8.32, fixes in a quite clever way the functoriality issues of the Gelfand correspondence.

At the level of the general theory now, as a first tool that we can use, for the study of our manifolds, we have the following version of the Gelfand theorem:

Theorem 8.34.

Assuming that $X\subset S^{N-1}_{\mathbb{C},+}$ is an algebraic manifold, given by

C(X)=C(S^{N-1}_{\mathbb{C},+})\Big{/}\Big{<}f_{i}(x_{1},\ldots,x_{N})=0\Big{>}

for certain noncommutative polynomials $f_{i}\in\mathbb{C}<X_{1},\ldots,X_{N}>$ , we have

X_{class}=\left\{x\in S^{N-1}_{\mathbb{C}}\Big{|}f_{i}(x_{1},\ldots,x_{N})=0\right\}

and $X$ itself appears as a liberation of $X_{class}$ .

Proof.

This is something that we know well for the spheres, from Proposition 8.31. In general, the proof is similar, coming from the Gelfand theorem. ∎

There are of course many other things that can be said about our manifolds, at the purely algebraic level. But in what follows we will be rather going towards analysis.

8d. Free geometry

We have now all the needed tools in our bag for developing “free geometry”. The idea will be that of going back to the free quantum groups from chapter 7, and further building on that material, with a beginning of free geometry. Let us start with:

Theorem 8.35.

The classical and free, real and complex quantum rotation groups can be complemented with quantum reflection groups, as follows,

with $H_{N}=\mathbb{Z}_{2}\wr S_{N}$ and $K_{N}=\mathbb{T}\wr S_{N}$ being the hyperoctahedral group and the full complex reflection group, and $H_{N}^{+}=\mathbb{Z}_{2}\wr_{*}S_{N}^{+}$ and $K_{N}^{+}=\mathbb{T}\wr_{*}S_{N}^{+}$ being their free versions.

Proof.

This is something quite tricky, the idea being as follows:

(1) The first observation is that $S_{N}$ , regarded as group of permutations of the $N$ coordinate axes of $\mathbb{R}^{N}$ , is a group of orthogonal matrices, $S_{N}\subset O_{N}$ . The corresponding coordinate functions $u_{ij}:S_{N}\to\{0,1\}$ form a matrix $u=(u_{ij})$ which is “magic”, in the sense that its entries are projections, summing up to 1 on each row and each column. In fact, by using the Gelfand theorem, we have the following presentation result:

C(S_{N})=C^{*}_{comm}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u={\rm magic}\right)

(2) Based on the above, and following Wang’s paper [wan], we can construct the free analogue $S_{N}^{+}$ of the symmetric group $S_{N}$ via the following formula:

C(S_{N}^{+})=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}u={\rm magic}\right)

Here the fact that we have indeed a Woronowicz algebra is standard, exactly as for the free rotation groups in chapter 7, because if a matrix $u=(u_{ij})$ is magic, then so are the matrices $u^{\Delta},u^{\varepsilon},u^{S}$ constructed there, and this gives the existence of $\Delta,u,S$ .

(3) Consider now the group $H_{N}^{s}\subset U_{N}$ consisting of permutation-like matrices having as entries the $s$ -th roots of unity. This group decomposes as follows:

H_{N}^{s}=\mathbb{Z}_{s}\wr S_{N}

It is straightforward then to construct a free analogue $H_{N}^{s+}\subset U_{N}^{+}$ of this group, for instance by formulating a definition as follows, with $\wr_{*}$ being a free wreath product:

H_{N}^{s+}=\mathbb{Z}_{s}\wr_{*}S_{N}^{+}

(4) In order to finish, besides the case $s=1$ , of particular interest are the cases $s=2,\infty$ . Here the corresponding reflection groups are as follows:

H_{N}=\mathbb{Z}_{2}\wr S_{N}\quad,\quad K_{N}=\mathbb{T}\wr S_{N}

As for the corresponding quantum groups, these are denoted as follows:

H_{N}^{+}=\mathbb{Z}_{2}\wr_{*}S_{N}^{+}\quad,\quad K_{N}^{+}=\mathbb{T}\wr_{*}S_{N}^{+}

Thus, we are led to the conclusions in the statement. See [bb+], [bbc]. ∎

The point now is that we can add to the picture spheres and tori, as follows:

Fact 8.36.

The basic quantum groups can be complemented with spheres and tori,

with $T_{N}=\mathbb{Z}_{2}^{N},\mathbb{T}_{N}=\mathbb{T}^{N}$ , and with $T_{N}^{+},\mathbb{T}_{N}^{+}$ standing for the duals of $\mathbb{Z}_{2}^{*N},F_{N}$ .

Again, this is something quite tricky, and there is a long story with all this. We already know from chapter 7 that the diagonal subgroups of the rotation groups are the tori in the statement, but this is just an epsilon of what can be said, and this type of result can be extended as well to the reflection groups, and then we can make the spheres come into play too, with various results connecting them to the quantum groups, and to the tori.

Instead of getting into details here, let us formulate, again a bit informally:

Fact 8.37.

The various quantum manifolds that we have, namely spheres $S$ , tori $T$ , unitary groups $U$ , and reflection groups $K$ , arrange into $4$ diagrams, as follows,

with the arrows standing for various correspondences between $(S,T,U,K)$ . These diagrams correspond to $4$ main noncommutative geometries, real and complex, classical and free,

with the remark that, technically speaking, $\mathbb{R}^{N}_{+}$ , $\mathbb{C}^{N}_{+}$ do not exist, as quantum spaces.

As before, things here are quite long and tricky, but we already have some good evidence for all this, so I guess you can just trust me. And if truly interested in all this, later after finishing this book, you can check [bgo] and subsequent papers for details.

Summarizing, we have some beginning of theory. Now with this understood, let us try to integrate on our manifolds. In order to deal with quantum groups, we will need:

Definition 8.38.

The Tannakian category associated to a Woronowicz algebra $(A,u)$ is the collection $C_{A}=(C_{A}(k,l))$ of vector spaces

C_{A}(k,l)=Hom(u^{\otimes k},u^{\otimes l})

where the corepresentations $u^{\otimes k}$ with $k=\circ\bullet\bullet\circ\ldots$ colored integer, defined by

u^{\otimes\emptyset}=1\quad,\quad u^{\otimes\circ}=u\quad,\quad u^{\otimes\bullet}=\bar{u}

and multiplicativity, $u^{\otimes kl}=u^{\otimes k}\otimes u^{\otimes l}$ , are the Peter-Weyl corepresentations.

As a key remark, the fact that $u\in M_{N}(A)$ is biunitary translates into the following conditions, where $R:\mathbb{C}\to\mathbb{C}^{N}\otimes\mathbb{C}^{N}$ is the linear map given by $R(1)=\sum_{i}e_{i}\otimes e_{i}$ :

R\in Hom(1,u\otimes\bar{u})\quad,\quad R\in Hom(1,\bar{u}\otimes u)

R^{*}\in Hom(u\otimes\bar{u},1)\quad,\quad R^{*}\in Hom(\bar{u}\otimes u,1)

We are therefore led to the following abstract definition, summarizing the main properties of the categories appearing from Woronowicz algebras:

Definition 8.39.

Let $H$ be a finite dimensional Hilbert space. A tensor category over $H$ is a collection $C=(C(k,l))$ of subspaces

C(k,l)\subset\mathcal{L}(H^{\otimes k},H^{\otimes l})

satisfying the following conditions:

(1)

$S,T\in C$ implies $S\otimes T\in C$ .
(2)

If $S,T\in C$ are composable, then $ST\in C$ .
(3)

$T\in C$ implies $T^{*}\in C$ .
(4)

Each $C(k,k)$ contains the identity operator.
(5)

$C(\emptyset,\circ\bullet)$ and $C(\emptyset,\bullet\circ)$ contain the operator $R:1\to\sum_{i}e_{i}\otimes e_{i}$ .

The point now is that conversely, we can associate a Woronowicz algebra to any tensor category in the sense of Definition 8.39, in the following way:

Proposition 8.40.

Given a tensor category $C=(C(k,l))$ over $\mathbb{C}^{N}$ , as above,

A_{C}=C^{*}\left((u_{ij})_{i,j=1,\ldots,N}\Big{|}T\in Hom(u^{\otimes k},u^{\otimes l}),\forall k,l,\forall T\in C(k,l)\right)

is a Woronowicz algebra.

Proof.

This is something standard, because the relations $T\in Hom(u^{\otimes k},u^{\otimes l})$ determine a Hopf ideal, so they allow the construction of $\Delta,\varepsilon,S$ as in chapter 7. ∎

With the above constructions in hand, we have the following result:

Theorem 8.41.

The Tannakian duality constructions

C\to A_{C}\quad,\quad A\to C_{A}

are inverse to each other, modulo identifying full and reduced versions.

Proof.

The idea is that we have $C\subset C_{A_{C}}$ , for any algebra $A$ , and so we are left with proving that we have $C_{A_{C}}\subset C$ , for any category $C$ . But this follows from a long series of algebraic manipulations, and for details we refer to Malacarne [mal], and also to Woronowicz [wo2], where this result was first proved, by using other methods. ∎

In practice now, all this is quite abstract, and we will rather need Brauer type results, for the specific quantum groups that we are interested in. Let us start with:

Definition 8.42.

Let $P(k,l)$ be the set of partitions between an upper colored integer $k$ , and a lower colored integer $l$ . A collection of subsets

D=\bigsqcup_{k,l}D(k,l)

with $D(k,l)\subset P(k,l)$ is called a category of partitions when it has the following properties:

(1)

Stability under the horizontal concatenation, $(\pi,\sigma)\to[\pi\sigma]$ .
(2)

Stability under vertical concatenation $(\pi,\sigma)\to[^{\sigma}_{\pi}]$ , with matching middle symbols.
(3)

Stability under the upside-down turning $*$ , with switching of colors, $\circ\leftrightarrow\bullet$ .
(4)

Each set $P(k,k)$ contains the identity partition $||\ldots||$ .
(5)

The sets $P(\emptyset,\circ\bullet)$ and $P(\emptyset,\bullet\circ)$ both contain the semicircle $\cap$ .

Observe the similarity with Definition 8.39. In fact Definition 8.42 is a delinearized version of Definition 8.39, the relation with the Tannakian categories coming from:

Proposition 8.43.

Given a partition $\pi\in P(k,l)$ , consider the linear map

T_{\pi}:(\mathbb{C}^{N})^{\otimes k}\to(\mathbb{C}^{N})^{\otimes l}

given by the following formula, where $e_{1},\ldots,e_{N}$ is the standard basis of $\mathbb{C}^{N}$ ,

T_{\pi}(e_{i_{1}}\otimes\ldots\otimes e_{i_{k}})=\sum_{j_{1}\ldots j_{l}}\delta_{\pi}\begin{pmatrix}i_{1}&\ldots&i_{k}\\ j_{1}&\ldots&j_{l}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{l}}

and with the Kronecker type symbols $\delta_{\pi}\in\{0,1\}$ depending on whether the indices fit or not. The assignement $\pi\to T_{\pi}$ is then categorical, in the sense that we have

T_{\pi}\otimes T_{\sigma}=T_{[\pi\sigma]}\quad,\quad T_{\pi}T_{\sigma}=N^{c(\pi,\sigma)}T_{[^{\sigma}_{\pi}]}\quad,\quad T_{\pi}^{*}=T_{\pi^{*}}

where $c(\pi,\sigma)$ are certain integers, coming from the erased components in the middle.

Proof.

The concatenation property follows from the following computation:

			$\displaystyle(T_{\pi}\otimes T_{\sigma})(e_{i_{1}}\otimes\ldots\otimes e_{i_{p}}\otimes e_{k_{1}}\otimes\ldots\otimes e_{k_{r}})$
		$\displaystyle=$	$\displaystyle\sum_{j_{1}\ldots j_{q}}\sum_{l_{1}\ldots l_{s}}\delta_{\pi}\begin{pmatrix}i_{1}&\ldots&i_{p}\\ j_{1}&\ldots&j_{q}\end{pmatrix}\delta_{\sigma}\begin{pmatrix}k_{1}&\ldots&k_{r}\\ l_{1}&\ldots&l_{s}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{q}}\otimes e_{l_{1}}\otimes\ldots\otimes e_{l_{s}}$
		$\displaystyle=$	$\displaystyle\sum_{j_{1}\ldots j_{q}}\sum_{l_{1}\ldots l_{s}}\delta_{[\pi\sigma]}\begin{pmatrix}i_{1}&\ldots&i_{p}&k_{1}&\ldots&k_{r}\\ j_{1}&\ldots&j_{q}&l_{1}&\ldots&l_{s}\end{pmatrix}e_{j_{1}}\otimes\ldots\otimes e_{j_{q}}\otimes e_{l_{1}}\otimes\ldots\otimes e_{l_{s}}$
		$\displaystyle=$	$\displaystyle T_{[\pi\sigma]}(e_{i_{1}}\otimes\ldots\otimes e_{i_{p}}\otimes e_{k_{1}}\otimes\ldots\otimes e_{k_{r}})$

As for the other two formulae in the statement, their proofs are similar. ∎

In relation with quantum groups, we have the following result, from [bsp]:

Theorem 8.44.

Each category of partitions $D=(D(k,l))$ produces a family of compact quantum groups $G=(G_{N})$ , one for each $N\in\mathbb{N}$ , via the following formula:

Hom(u^{\otimes k},u^{\otimes l})=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

To be more precise, the spaces on the right form a Tannakian category, and so produce a certain closed subgroup $G_{N}\subset U_{N}^{+}$ , via the Tannakian duality correspondence.

Proof.

This follows indeed from Woronowicz’s Tannakian duality, in its “soft” form from Malacarne [mal], as explained in Theorem 8.41. Indeed, let us set:

C(k,l)=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

By using the axioms in Definition 8.42, and the categorical properties of the operation $\pi\to T_{\pi}$ , from Proposition 8.43, we deduce that $C=(C(k,l))$ is a Tannakian category. Thus the Tannakian duality applies, and gives the result. ∎

Philosophically speaking, the quantum groups appearing as in Theorem 8.44 are the simplest, from the perspective of Tannakian duality, so let us formulate:

Definition 8.45.

A closed subgroup $G\subset U_{N}^{+}$ is called easy when we have

Hom(u^{\otimes k},u^{\otimes l})=span\left(T_{\pi}\Big{|}\pi\in D(k,l)\right)

for any colored integers $k,l$ , for a certain category of partitions $D\subset P$ .

All this might seem a bit complicated, but we will see examples in a moment. Getting back now to integration questions, we have the following key result:

Theorem 8.46.

For an easy quantum group $G\subset U_{N}^{+}$ , coming from a category of partitions $D=(D(k,l))$ , we have the Weingarten integration formula

\int_{G}u_{i_{1}j_{1}}^{e_{1}}\ldots u_{i_{k}j_{k}}^{e_{k}}=\sum_{\pi,\sigma\in D(k)}\delta_{\pi}(i)\delta_{\sigma}(j)W_{kN}(\pi,\sigma)

for any $k=e_{1}\ldots e_{k}$ and any $i,j$ , where $D(k)=D(\emptyset,k)$ , $\delta$ are usual Kronecker symbols, and $W_{kN}=G_{kN}^{-1}$ , with $G_{kN}(\pi,\sigma)=N^{|\pi\vee\sigma|}$ , where $|.|$ is the number of blocks.

Proof.

We know from chapter 7 that the integrals in the statement form altogether the orthogonal projection $P$ onto the space $Fix(u^{\otimes k})=span(D(k))$ . Let us set:

E(x)=\sum_{\pi\in D(k)}<x,T_{\pi}>T_{\pi}

By standard linear algebra, it follows that we have $P=WE$ , where $W$ is the inverse on $span(T_{\pi}|\pi\in D(k))$ of the restriction of $E$ . But this restriction is the linear map given by $G_{kN}$ , and so $W$ is the linear map given by $W_{kN}$ , and this gives the result. ∎

All this is very nice. However, before enjoying the Weingarten formula, we still have to prove that our main quantum groups are easy. The result here is as follows:

Theorem 8.47.

The basic quantum unitary and reflection groups

are all easy, the corresponding categories of partitions being as follows,

with $P,NC$ standing for partitions and noncrosssing partitions, $2,even$ standing for pairings, and partitions with even blocks, and with calligraphic standing for matching.

Proof.

The quantum group $U_{N}^{+}$ is defined via the following relations:

u^{*}=u^{-1}\quad,\quad u^{t}=\bar{u}^{-1}

Thus, the following operators must be in the associated Tannakian category:

T_{\pi}\quad,\quad\pi={\ }^{\,\cap}_{\circ\bullet}\ ,{\ }^{\,\cap}_{\bullet\circ}

We conclude that the associated Tannakian category is $span(T_{\pi}|\pi\in D)$ , with:

D=<{\ }^{\,\cap}_{\circ\bullet}\,\,,{\ }^{\,\cap}_{\bullet\circ}>={\mathcal{N}C}_{2}

Thus, we have one result, and the other ones are similar. See [bb+], [bbc]. ∎

We are not ready yet for applications, because we still have to understand which assumptions on $N\in\mathbb{N}$ make the vectors $T_{\pi}$ linearly independent. We will need:

Definition 8.48.

The Möbius function of any lattice, and so of $P$ , is given by

\mu(\pi,\sigma)=\begin{cases}1&{\rm if}\ \pi=\sigma\\ -\sum_{\pi\leq\tau<\sigma}\mu(\pi,\tau)&{\rm if}\ \pi<\sigma\\ 0&{\rm if}\ \pi\not\leq\sigma\end{cases}

with the construction being performed by recurrence.

The main interest in this function comes from the Möbius inversion formula:

f(\sigma)=\sum_{\pi\leq\sigma}g(\pi)\implies g(\sigma)=\sum_{\pi\leq\sigma}\mu(\pi,\sigma)f(\pi)

In linear algebra terms, the statement and proof of this formula are as follows:

Proposition 8.49.

The inverse of the adjacency matrix of $P$ , given by

A_{\pi\sigma}=\begin{cases}1&{\rm if}\ \pi\leq\sigma\\ 0&{\rm if}\ \pi\not\leq\sigma\end{cases}

is the Möbius matrix of $P$ , given by $M_{\pi\sigma}=\mu(\pi,\sigma)$ .

Proof.

This is well-known, coming for instance from the fact that $A$ is upper triangular. Indeed, when inverting, we are led into the recurrence from Definition 8.48. ∎

Now back to our Gram and Weingarten matrix considerations, with $W_{kN}=G_{kN}^{-1}$ , as in the statement of Theorem 8.46, we have the following result:

Proposition 8.50.

The Gram matrix is given by $G_{kN}=AL$ , where

L(\pi,\sigma)=\begin{cases}N(N-1)\ldots(N-|\pi|+1)&{\rm if}\ \sigma\leq\pi\\ 0&{\rm otherwise}\end{cases}

and where $A=M^{-1}$ is the adjacency matrix of $P(k)$ .

Proof.

We have indeed the following computation:

$\displaystyle N^{\|\pi\vee\sigma\|}$	$\displaystyle=$	$\displaystyle\#\left\{i_{1},\ldots,i_{k}\in\{1,\ldots,N\}\Big{\|}\ker i\geq\pi\vee\sigma\right\}$
	$\displaystyle=$	$\displaystyle\sum_{\tau\geq\pi\vee\sigma}\#\left\{i_{1},\ldots,i_{k}\in\{1,\ldots,N\}\Big{\|}\ker i=\tau\right\}$
	$\displaystyle=$	$\displaystyle\sum_{\tau\geq\pi\vee\sigma}N(N-1)\ldots(N-\|\tau\|+1)$

According to the definition of $G_{kN}$ and of $A,L$ , this formula reads:

(G_{kN})_{\pi\sigma}=\sum_{\tau\geq\pi}L_{\tau\sigma}=\sum_{\tau}A_{\pi\tau}L_{\tau\sigma}=(AL)_{\pi\sigma}

Thus, we obtain the formula in the statement. ∎

With the above result in hand, we can now formulate:

Theorem 8.51.

The determinant of the Gram matrix $G_{kN}$ is given by:

\det(G_{kN})=\prod_{\pi\in P(k)}\frac{N!}{(N-|\pi|)!}

In particular, the vectors $\left\{\xi_{\pi}|\pi\in P(k)\right\}$ are linearly independent for $N\geq k$ .

Proof.

This is an old formula from the 60s, due to Lindstöm and others, having many things behind it. By using the formula in Proposition 8.50, we have:

\det(G_{kN})=\det(A)\det(L)

Now if we order $P(k)$ with respect to the number of blocks, then lexicographically, $A$ is upper triangular, and $L$ is lower triangular, and we obtain the above formula. ∎

Now back to our quantum groups, let us start with:

Theorem 8.52.

For an easy quantum group $G=(G_{N})$ , coming from a category of partitions $D=(D(k,l))$ , the asymptotic moments of the character $\chi=\sum_{i}u_{ii}$ are

\lim_{N\to\infty}\int_{G_{N}}\chi^{k}=|D(k)|

where $D(k)=D(\emptyset,k)$ , with the limiting sequence on the left consisting of certain integers, and being stationary at least starting from the $k$ -th term.

Proof.

This is something elementary, which follows straight from Peter-Weyl theory, by using the linear independence result from Theorem 8.51. ∎

In practice now, for the basic rotation and reflection groups, we obtain:

Theorem 8.53.

The character laws for basic rotation and reflection groups are

in the $N\to\infty$ limit, corresponding to the basic probabilistic limiting theorems, at $t=1$ .

Proof.

This follows indeed from Theorem 8.47 and Theorem 8.52, by using the known moment formulae for the laws in the statement, at $t=1$ . ∎

In the free case, the convergence can be shown to be stationary starting from $N=4$ . The “fix” comes by looking at truncated characters, constructed as follows:

\chi_{t}=\sum_{i=1}^{[tN]}u_{ii}

With this convention, we have the following final result on the subject, with the convergence being non-stationary at $t<1$ , in both the classical and free cases:

Theorem 8.54.

The truncated character laws for the basic quantum groups are

in the $N\to\infty$ limit, corresponding to the basic probabilistic limiting theorems.

Proof.

We already know that the result holds at $t=1$ , and the proof at arbitrary $t>0$ is once again based on easiness, but this time by using the Weingarten formula for the computation of the moments. We refer here to [bb+], [bbc], [bco], [bsp]. ∎

All this is very nice, as a beginning. Of course, still left for this chapter would be the extension of all this to the case of more general homogeneous spaces $X=G/H$ , and other free manifolds, in the sense of the free real and complex geometry axiomatized before.

But hey, we learned enough math in this chapter, time for a beer. We refer here to the 2010 paper [bgo], which started things with the computation for $S^{N-1}_{\mathbb{R},+}$ , and to the book [ba3], which explains what was found on the subject, in the 10s. And if interested in this, the hot topic, waiting for input from you, are the applications to quantum physics.

8e. Exercises

There has been a lot of exciting theory in this chapter, and as exercise, we have:

Exercise 8.55.

Prove that $S_{N}^{+}$ is easy, coming from the category of all noncrossing partitions $NC$ , and compute the asymptotic law of the main character.

As bonus exercise, try as well the truncated characters. Also, don’t forget about $S_{N}$ .

Part III Theory of factors

And the story tellers say

That the score brave souls inside

For many a lonely day sailed across the milky seas

Never looked back, never feared, never cried

Chapter 9 Functional analysis

9a. Kaplansky density

Welcome to this second half of the present book. We will get back here to a more normal pace, at least for most of the text to follow, our goal being to discuss the basics of the von Neumann algebra theory, due to Murray, von Neumann and Connes [co1], [co2], [mv1], [mv2], [mv3], [vn1], [vn2], [vn3], or at least the “basics of the basics”, the whole theory being quite complex, and then the most beautiful advanced theory which can be built on this, which is the subfactor theory of Jones [jo1], [jo2], [jo3], [jo4], [jo5], [jms], [jsu].

The material here will be in direct continuation of what we learned in chapter 5, namely bicommutant theorem, commutative case, finite dimensions, and a handful of other things. The idea will be that of building directly on that material, and using the same basic techniques, namely functional analysis and operator theory.

As an important point, all this is related, but in a subtle way, to what we learned in chapters 6-8 too. To be more precise, what we will be doing in chapters 9-12 here will be more or less orthogonal to what we did in chapters 6-8. However, and here comes our point, the continuation of all this, chapters 13-16 below following Jones, will stand as a direct continuation of what we did in chapters 6-8, with Jones’ subfactors being something more general than the random matrices and quantum groups from there.

Getting started, as a first objective we would like to have a better understanding of the precise difference between the norm closed $*$ -algebras, or $C^{*}$ -algebras, $A\subset B(H)$ , and the weakly closed such algebras, which are the von Neumann algebras, from a functional analytic viewpoint. Let us begin with some generalities. We first have:

Proposition 9.1.

The weak operator topology on $B(H)$ is the topology having the following equivalent properties:

(1)

It makes $T\to<Tx,y>$ continuous, for any $x,y\in H$ .
(2)

It makes $T_{n}\to T$ when $<T_{n}x,y>\to<Tx,y>$ , for any $x,y\in H$ .
(3)

Has as subbase the sets $U_{T}(x,y,\varepsilon)=\{S:|<(S-T)x,y>|<\varepsilon\}$ .
(4)

Has as base $U_{T}(x_{1},\ldots,x_{n},y_{1},\ldots,y_{n},\varepsilon)=\{S:|<(S-T)x_{i},y_{i}>|<\varepsilon,\forall i\}$ .

Proof.

The equivalences $(1)\iff(2)\iff(3)\iff(4)$ all follow from definitions, with of course (1,2) referring to the coarsest topology making that things happen. ∎

Similarly, in what regards the strong operator topology, we have:

Proposition 9.2.

The strong operator topology on $B(H)$ is the topology having the following equivalent properties:

(1)

It makes $T\to Tx$ continuous, for any $x\in H$ .
(2)

It makes $T_{n}\to T$ when $T_{n}x\to Tx$ , for any $x\in H$ .
(3)

Has as subbase the sets $V_{T}(x,\varepsilon)=\{S:||(S-T)x||<\varepsilon\}$ .
(4)

Has as base the sets $V_{T}(x_{1},\ldots,x_{n},\varepsilon)=\{S:||(S-T)x_{i}||<\varepsilon,\forall i\}$ .

Proof.

Again, the equivalences $(1)\iff(2)\iff(3)\iff(4)$ are all clear, and with (1,2) referring to the coarsest topology making that things happen. ∎

We know from chapter 5 that an operator algebra $A\subset B(H)$ is weakly closed if and only if it is strongly closed. Here is a useful generalization of this fact:

Theorem 9.3.

Given a convex set $C\subset B(H)$ , its weak operator closure and strong operator closure coincide.

Proof.

Since the weak operator topology on $B(H)$ is weaker by definition than the strong operator topology on $B(H)$ , we have, for any subset $C\subset B(H)$ :

\overline{C}^{\,strong}\subset\overline{C}^{\,weak}

Now by assuming that $C\subset B(H)$ is convex, we must prove that:

T\in\overline{C}^{\,weak}\implies T\in\overline{C}^{\,strong}

In order to do so, let us pick vectors $x_{1},\ldots,x_{n}\in H$ and $\varepsilon>0$ . We let $K=H^{\oplus n}$ , and we consider the standard embedding $i:B(H)\subset B(K)$ , given by:

iT(y_{1},\ldots,y_{n})=(Ty_{1},\ldots,Ty_{n})

We have then the following implications, which are all trivial:

T\in\overline{C}^{\,weak}\implies iT\in\overline{iC}^{\,weak}\implies iT(x)\in\overline{iC(x)}^{\,weak}

Now since the set $C\subset B(H)$ was assumed to be convex, the set $iC(x)\subset K$ is convex too, and by the Hahn-Banach theorem, for compact sets, it follows that we have:

iT(x)\in\overline{iC(x)}^{\,||.||}

Thus, there exists an operator $S\in C$ such that we have, for any $i$ :

||Sx_{i}-Tx_{i}||<\varepsilon

But this shows that we have $S\in V_{T}(x_{1},\ldots,x_{n},\varepsilon)$ , and since $x_{1},\ldots,x_{n}\in H$ and $\varepsilon>0$ were arbitrary, by Proposition 9.2 it follows that we have $T\in\overline{C}^{\,strong}$ , as desired. ∎

We will need as well the following standard result:

Proposition 9.4.

Given a vector space $E\subset B(H)$ , and a linear form $f:E\to\mathbb{C}$ , the following conditions are equivalent:

(1)

$f$ is weakly continuous.
(2)

$f$ is strongly continuous.
(3)

$f(T)=\sum_{i=1}^{n}<Tx_{i},y_{i}>$ , for certain vectors $x_{i},y_{i}\in H$ .

Proof.

This is something standard, using the same tools at those already used in chapter 5, namely basic functional analysis, and amplification tricks:

$(1)\implies(2)$ Since the weak operator topology on $B(H)$ is weaker than the strong operator topology on $B(H)$ , weakly continuous implies strongly continuous. To be more precise, assume $T_{n}\to T$ strongly. Then $T_{n}\to T$ weakly, and since $f$ was assumed to be weakly continuous, we have $f(T_{n})\to f(T)$ . Thus $f$ is strongly continuous, as desired.

$(2)\implies(3)$ Assume indeed that our linear form $f:E\to\mathbb{C}$ is strongly continuous. In particular $f$ is strongly continuous at 0, and Proposition 9.2 provides us with vectors $x_{1},\ldots,x_{n}\in H$ and a number $\varepsilon>0$ such that, with the notations there:

			$\displaystyle\|\|x+y\|\|\leq\|\|x\|\|+\|\|y\|\|$
		$\displaystyle\iff$	$\displaystyle\|\|x+y\|\|^{2}\leq(\|\|x\|\|+\|\|y\|\|)^{2}$
		$\displaystyle\iff$	$\displaystyle\|\|x\|\|^{2}+\|\|y\|\|^{2}+2Re<x,y>\leq\|\|x\|\|^{2}+\|\|y\|\|^{2}+2\|\|x\|\|\cdot\|\|y\|\|$
		$\displaystyle\iff$	$\displaystyle Re<x,y>\leq\|\|x\|\|\cdot\|\|y\|\|$

			$\displaystyle\|\|T_{n}-T_{m}\|\|\leq\varepsilon\ ,\ \forall n,m\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{n}x-T_{m}x\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1\ ,\ \forall n,m\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{n}x-Tx\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1\ ,\ \forall n\geq N$
		$\displaystyle\implies$	$\displaystyle\|\|T_{N}x-Tx\|\|\leq\varepsilon\ ,\ \forall\|\|x\|\|=1$
		$\displaystyle\implies$	$\displaystyle\|\|T_{N}-T\|\|\leq\varepsilon$

$\displaystyle\|\|T\|\|$	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\sup_{\|\|y\|\|=1}<Tx,y>$
	$\displaystyle=$	$\displaystyle\sup_{\|\|y\|\|=1}\sup_{\|\|x\|\|=1}<x,T^{*}y>$
	$\displaystyle=$	$\displaystyle\|\|T^{*}\|\|$

$\displaystyle\|\|T\|\|^{2}$	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\|<Tx,Tx>\|$
	$\displaystyle=$	$\displaystyle\sup_{\|\|x\|\|=1}\|<x,T^{*}Tx>\|$
	$\displaystyle\leq$	$\displaystyle\|\|T^{*}T\|\|$

$\displaystyle\|\|fg_{n}\|\|_{2}$	$\displaystyle=$	$\displaystyle\sqrt{\int_{X}\|f(x)\|^{2}\|g_{n}(x)\|^{2}d\mu(x)}$
	$\displaystyle\simeq$	$\displaystyle\sup_{x\in X}\|f(x)\|^{2}\sqrt{\int_{X}\|g_{n}(x)\|^{2}d\mu(x)}$
	$\displaystyle=$	$\displaystyle\|\|f\|\|_{\infty}\|\|g_{n}\|\|_{2}$