We discuss here in depth the structure of general operators on complex vector spaces. Unlike the previous note, we free ourselves from the definition of inner products on finite-dimensional spaces as it does not help much. All the results in this note assume vector space V{0}, and F=R or C.1. Generalized Eigenvectors and Nilpotent Operators 1.1. Generalized eigenvectors 1.2. Nilpotent operators 2. Decomposition of an Operator 2.1. Multiplicity of an eigenvalue 2.2. Block diagonal matrix 2.3. Square roots 3. Characteristic and Minimal Polynomial 3.1. The Cayley-Hamilton theorem 3.2. Minimal Polynomial 4. Jordan Form 1. Generalized Eigenvectors and Nilpotent OperatorsBy the terminology in Axler's book, structure of an operator refers heuristically to decomposition of the vector space the operator and its powers acting on. We will discuss in this section how a vector space can be divided into invariant subspaces spanned by generalized eigenvectors. To get started we need three theorems that set the arena for relating an operator and its powers to important subspaces and to generalized eigenvectors.For an operator TL(V) The first theorem tells us that the null space keeps growing as the power of T increases.
Theorem1For TL(V), {0}=nullT0nullT1nullTnnullTn+1
Proof: we prove Theorem 1 by induction. We need to prove nullTmnullTm+1 for any integer m. Let vnullTm, then Tm+1v=T(Tmv)=0. Hence nullTmnullTm+1 as desired.The subset symbol "" implies possible equality. The following theorem says that the equality, if it exists between two adjacent null spaces, propagates to consecutive powers.
Theorem2Suppose TL(V). Suppose m is a nonnegative integer such that nullTm=nullTm+1. Then nullTm=nullTm+1=.
Proof: we prove this by induction, and we need to show nullTm+k=nullTm+k+1for any nonnegative integer k. Since we already know nullTm+knullTm+k+1 due to Theorem 1, we need to show nullTm+k+1nullTm+k. Let vnullTm+k+1, we have Tm+k+1v=Tm+1(Tkv)=0. Hence TkvnullTm+1=nullTm. Thus, Tm(Tkv)=Tm+kv=0, and vnullTm+k. So nullTm+k+1nullTm+k as desired.In the proof above, one might deduce Tm+k+1v=Tm+1(Tkv)=Tm(Tkv)=0 from vnullTm+k+1, which has a minor logic leap. Because nullTm=nullTm+1 does not necessarily lead to Tm=Tm+1. With Theorem 2 we can take a step further and show that the consecutive equalities always hold when m=dimV.
Theorem3Suppose TL(V). Let n=dimV. ThennullTn=nullTn+1=nullTn+2=
Proof: Suppose nullTnnullTn+1. Then for m<n we must NOT have nullTm=nullTm+1 due to Theorem 2. Hence, from Theorem 1, we have{0}=nullT0nullT1nullTnnullTn+1 The relation guarantees that dimnullTk+1dimnullTk+1, which implies dimnullTn+1n+1>dimV. This is not possible as dimV=dimnullTn+1+dimrangeTn+1 and hence, dimnullTn+1dimV. Thus, the contradiction indicates nullTn=nullTn+1.The proof above used the fundamental theorem of linear maps, dimV=dimnullT+dimrangeT, which does NOT always result in V=nullTrangeT. To see this, let TL(R3), and T(x1,x2,x3)=(x2,x3,0), then nullT={(x1,0,0):x1R}, and rangeT={(x1,x2,0):x1,x2R}.while dimnullT+dimrangeT=1+2=dimV, nullTrangeT={(x1,0,0):x1R}{0}. Thus, R3nullTrangeT. As a matter of fact, R3nullT+rangeT. Fortunately, the following theorem could be useful complement to the fundamental theorem of linear maps:
Theorem4Suppose TL(V). Let n=dimV. ThenV=nullTnrangeTn
Proof: we prove Theorem 4 by two steps. The first step shows nullTnrangeTn={0}, and then prove that dimnullTn+dimrangeTn=n as dim(AB)=dimA+dimB. Let vnullTnrangeTn, then Tnv=0 and uVsuch that Tnu=vSo T2nu=Tnv=0. Note that unullT2n=nullTn due to Theorem 3, resulting in Tnu=0. Thus, v=0. Now that nullTn+rangeTn=nullTnrangeTn, we havedim(nullTnrangeTn)=dimnullTn+dimrangeTn=dimV=nwhere we used the fundamental theorem of linear maps again. The equation above implies V=nullTnrangeTn as desired.1.1. Generalized eigenvectorsThe null-range division is arguably the simplest one caused by application of an operator and its power. Another nice decomposition of a vector space is a direct sum of 1-dimensional subspaces spanned by eigenvectors. Unfortunately, some operators do not have enough eigenvectors to lead to such a nice decomposition. To see how rare the decomposition is, let Tvi=𝜆ivi for distinct 𝜆i and i=1,...,m. When Ui={v:v=zvi,zC}, thenV=U1U2Um if and only if V has a basis of eigenvectors of T. According to the diagonalizablity conditions, this happens if and only if V=E(𝜆1,T)E(𝜆m,T).From the complex spectral theorems, normal operators have the equation above hold no matter what on complex vector spaces. But it does NOT hold for general operators. Fortunately, by defining the concept of generalized eigenvectors, we will see any complex vector space is a direct sum of generalized eigenspace.
Definition1Suppose TL(V) and 𝜆 is an eigenvalue of T. A vector vV is called a generalized eigenvector (of rank j) of T corresponding to 𝜆 if v0 and(T-𝜆I)jv=0for some positive integer j. But (T-𝜆I)j-1v0.
Definition2Suppose TL(V) and 𝜆F. The generalized eigenspace of T corresponding to 𝜆, denoted G(𝜆,T), is defined to be the set of all generalized eigenvectors of T corresponding to 𝜆, along with the 0 vector.
We DO NOT define "generalized eigenvalue" here. From Definition 1, (T-𝜆I)j is not injective, so does (T-𝜆I), as nonzero vector (T-𝜆I)j-1vnull(T-𝜆I). Because T-𝜆I is finite-dimensional, 𝜆 is just an eigenvalue of T. From Definition 1 and Definition 2, we know that G(𝜆,T)=null(T-𝜆I)j where different 𝜆 might correspond to different j. However, the next result shows that we can unify the value to define generalized eigenspaces for all possible eigenvalues.
Theorem5Suppose TL(V) and 𝜆F. Then G(𝜆,T)=null(T-𝜆I)dimV.
Proof: Suppose vnull(T-𝜆I)dimV, then vG(𝜆,T) as indicated by Definition 2. Therefore null(T-𝜆I)dimVG(𝜆,T). Conversely, Suppose vG(𝜆,T), then there exist a positive integer such that (T-𝜆I)jv=0. If j<dimV, then null(T-𝜆I)j=G(𝜆,T)null(T-𝜆I)dimV due to Theorem 1. If j>dimV, then null(T-𝜆I)dimV=null(T-𝜆I)j. Hence G(𝜆,T)null(T-𝜆I)dimV for any positive integer.Eigenvectors corresponding to distinct eigenvalues are linearly independent, which is also true for generalized eigenvectors.
Theorem6Let TL(V). Suppose 𝜆1,,𝜆m are distinct eigenvalues of T and v1,,vm are corresponding generalized eigenvectors. Then v1,,vm is linearly independent.
1.2. Nilpotent operatorsWe end this section by introducing nilpotent operators. The latin word nil means zero, and thus nilpotent literally means zero power.
Definition3An operator is called nilpotent if some power of it equals 0
For a nilpotent operator, we never need to raise its power higher than dimV to turn it into zero, as indicated by the theorem below
Theorem7Suppose NL(V) is nilpotent. Then NdimV=0.
Proof: to see this, we use the definition of generalized eigenspace to se that G(0,N)=V. From Theorem 5, G(0,N)=nullNdimV=V, indicating that NdimV=0.Each nilpotent operator has a matrix that has all zero on its diagonal, as we can prove as the following.
Theorem8Suppose N is a nilpotent operator on V. Then there is a basis of V with respect to which the matrix of N has the form(0*00)here all entries on and below the diagonal are 0 's.
Proof: Because of Theorem 7, we can get a desired basis through the following steps. First, we find a basis of nullN. Due to Theorem 1, we then expand such the basis to a basis of nullN2. We repeat such a step till we get a basis for nullNdimV=V. Now, suppose we write down M(N) using this basis, and the first several columns that correspond to the basis vectors of nullN must be made of all zeros. After the first serveral "zero" columns, we reach the second set of columns that correspond to the extended basis vectors of nullN2. Suppose v is a basis vector of nullN2, we have N2v=0=N(Nv), indicating that NvnullN. Thus, applying N to v gives us a vector that is linear combination of basis vectors of nullN, which tells us that the second set of columns have nonzero entries at the rows that correspond to basis vectors of nullN, and such rows are all above the diagonal. By continuing the process described here, we will eventually get a matrix that has all nonzero entris above the diagonal.2. Decomposition of an OperatorAs we discussed in the pevious section, for general complex vector spaces, it might not have enough normal eigenvectors that dissect the space into one-dimensional subspaces invariant under a given operator. But it is promised that complex vector spaces can always be represented as a direct sum of generalized eigenspaces. The following results show that those generalized eigenspaces are invariant under associated operator as well.
Theorem9Suppose V is a complex vector space and TL(V). Let 𝜆1,,𝜆m be the distinct eigenvalues of T. Then(a) V=G(𝜆1,T)G(𝜆m,T);(b) each G(𝜆j,T) is invariant under T;(c) each (T-𝜆jI)|G(𝜆j,T) is nilpotent.
Proof: We are not going to prove (a) here as the proof given in Axler's book was not very clear to the author. One might need to know the Lemme des Noyaux to prove it. Because we have not introduced the concept of kernel so far, the proof is skipped here. To prove (b), From Theorem 5, G(𝜆j,T)=null(T-𝜆jI)dimV and notice that (T-𝜆jI)dimV is a polynomial of the operator T-𝜆jI. So to prove (b) we need to first prove the following lemma:
Lemma1Suppose TL(V) and pP(F). Then nullp(T) and rangep(T) are invariant under T.
Suppose vnullp(T), and p(T)v=0. Then Tp(T)v=p(T)(Tv)=0, indicating that Tvnullp(T) as well. Also if vrangep(T), then there exist a uV such that v=p(T)u. Similarly, Tv=Tp(T)u=p(T)Tu, so Tvrangep(T) which can be obtained by applying p(T) to TuV. In summary, nullp(T) and rangep(T) are invariant under T. Now, if we replace p(T) with (T-𝜆jI)dimV, then null(T-𝜆jI)dimV=G(𝜆j,T) is invariant under T because of Lemma 1. Finally, (c) must be true as the operator (T-𝜆jI)|G(𝜆j,T) lives on G(𝜆j,T), and we have (T-𝜆jI)dimVv=0 for every vnull(T-𝜆jI)dimV=G(𝜆j,T).Now that we have Theorem 9(a), we can take the basis from every generalized eginspace to make a basis of any complex vector space, i.e.,
Theorem10Suppose V is a complex vector space and TL(V). Then there is a basis of V consisting of generalized eigenvectors of T.
2.1. Multiplicity of an eigenvalueBecause of Theorem 9 (a), we can also define multiplicity of an eigenvalue, and the sum of multiplicities of all the eigenvalues of an operator TL(V)equals to dimV.
Definition4- Suppose TL(V). The multiplicity of an eigenvalue 𝜆 of T is defined to be the dimension of the corresponding generalized eigenspace G(𝜆,T).- In other words, the multiplicity of an eigenvalue 𝜆 of T equals dimnull(T-𝜆I)dimV.
Theorem11Suppose V is a complex vector space and TL(V). Then the sum of the multiplicities of all the eigenvalues of T equals dimV.
Multiplicity defined above is also called algebraic multiplicity in some books. The term of geometric multiplicity is also used, but it refers to the dimension of the corresponding eigenspace. In other words, geometric multiplicity of 𝜆=dimnull(T-𝜆I)algebraic multiplicity of 𝜆=dimnull(T-𝜆I)dimV2.2. Block diagonal matrixWhat matrices of operators look like with respect to generalized eigenvectors? To answer this question we first introduce the concept of block diagonal matrix as the following:
Definition5A block diagonal matrix is a square matrix of the form(A100Am),where A1,,Am are square matrices lying along the diagonal and all the other entries of the matrix equal 0 .
As an example, the matrix below is a block diagonal matrix:A=(4000002-300002000001700001)=(A10A20A3)The following result shows how a block diagonal matrix can be related to multiplicities of distinct eigenvalues through upper-triangular matrices.
Theorem12Suppose V is a complex vector space and TL(V). Let 𝜆1,,𝜆m be the distinct eigenvalues of T, with multiplicities d1,,dm. Then there is a basis of V with respect to which T has a block diagonal matrix of the form(A100Am),where each Aj is a dj-by- dj upper-triangular matrix of the formAj=(𝜆j*0𝜆j).
Proof: To see this, we first use Theorem 9 (a) to dissect a given complex vector space into generalized eigenspaces. For any eigenvalue 𝜆j, G(𝜆j,T)=null(T-𝜆jI)dimV indicates that T-𝜆jI is nilpotent. Then T-𝜆jI has a matrix in a form shown in Theorem 8 with respect to basis of G(𝜆j,T). Furthermore, the matrix of T|G(𝜆j,T)=(T-𝜆jI)|G(𝜆j,T)+𝜆jI|G(𝜆j,T) has 𝜆j on its diagonal with respect to the same basis. We can write similar matrices for other eigenvalues with respect to bases of generalized eigenspaces. Because putting the bases of G(𝜆j,T) together give a basis for TL(V), the matrix of T with respect to this basis has M(T|G(𝜆j,T)) on its diagonal, as desired.The matrix in Eqn.(1) has its diagonal blocks being upper-triangular matrices, which indicates that the eigenvalues of T are 4,2,1, with their multiplicities being 1,2,2.2.3. Square rootsNot every operator on complex vector spaces has a square root, but we will see that I+N always has a square root if N is nilpotent. Notice that the following lemma applies to both complex and real vector spaces.
Lemma2Suppose NL(V) is nilpotent. Then I+N has a square root.
Proof: We first consider the Tyler expansion of 1+x=1+a1x+a2x2+. We don't care exact values of coefficients aj but guess that the square root to I+N has a form of I+a1N+a2N2+. Because Nm=0 for some integer m, the series above ends at am-1Nm-1. With this guess, we should haveI+N=(I+a1N+a2N2++am-1Nm-1)2=I+2a1N+(2a2+a12)N2+(2a3+2a1a2)N3++(2am-1+terms involvinga1,...,am-2)Nm-1By matching the terms between the two sides above, we have a1=1/2, 2a2+a12=0,... Again, we don't care specific values of the coefficients. What matters here is we can always find a set of aj that satisfy the equation of I+N=(I+a1N+a2N2++am-1Nm-1)2 for some integer m.Lemma 2 applies to both real and complex vector spaces, The following lemma about square root, however, only applies to complex vector spaces.
Lemma3Suppose V is a complex vector space and TL(V) is invertible. Then T has a square root.
Proof: Let 𝜆1,...,𝜆m to be distinct eigenvalues of T. On each generalized eigenspace G(𝜆j,T), we have (T-𝜆jI)|G(𝜆j,T) being nilpotent due to Theorem 9 (c). Let Nj=(T-𝜆jI)|G(𝜆j,T), then T|G(𝜆j,T)=Nj+𝜆jI=𝜆j(I+Nj/𝜆j). Apparently Nj/𝜆j is also nilpotent so I+Nj/𝜆j has a square root due to Lemma 2. Thus, a square root Rj of T|G(𝜆j,T) is 𝜆j×I+Nj/𝜆j. Let vV, and ujG(𝜆j,T), we havev=u1+u2++um If we define R asRv=R1u1+R2u2++Rmumthen R2v=R12u1+R22u2++Rm2um=i=1mT|G(𝜆i.T)ui=Tvas desired.By imitating the techniques in this section, you should be able to prove that if V is a complex vector space and TL(V) is invertible, then T has a kth  root for every positive integer k.3. Characteristic and Minimal PolynomialWe will prove Cayley-Hamilton theorem in this section, based on which we can further include important results of minimal polynomial of operators. The theorem does not have much application in quantum theory, but it is a big deal in control theory (check this video about reachability and controllability by Prof. Steve Brunton). Mathematically, the theorem provides a way to simplify the calculation of matrix exponential as we will see later.3.1. The Cayley-Hamilton theoremBefore we prove Cayley-Hamilton theorem, we need to define the concept of characteristic polynomial.
Definition6Suppose V is a complex vector space and TL(V). Let 𝜆1,,𝜆m denote the distinct eigenvalues of T, with multiplicities d1,,dm. The polynomial(z-𝜆1)d1(z-𝜆m)dmis called the characteristic polynomial of T.
With this, we then give the theorem as the following
Theorem13Suppose V is a complex vector space and TL(V). Let q denote the characteristic polynomial of T. Then q(T)=0.
Proof: First, we can dissect V into a direct sum of distinct generalized eigenspaces associated with eigenvalues of 𝜆1,...,𝜆m. Let the multiplicities of these eigenspaces be d1,...,dm. Then the characteristic polynomail of T is q(T)=(T-𝜆1I)d1(T-𝜆mI)dmFrom Theorem 9 (c) we know (T-𝜆jI)|G(𝜆j,T) is nilpotent. Also, for a nilpotent operator N on a vector space of dimension d, we have Nd=0 because of Theorem 7. Since dimG(𝜆j,T)=dj, we have (T-𝜆jI)|G(𝜆j,T)dj=0. To show that q(T) in eqn. (2) equals zero, we need to show q(T)|G(𝜆j,T)=0 for each 𝜆j. Let vG(𝜆j,T), we then have that q(T)v=(T-𝜆1I)d1(T-𝜆jI)dj(T-𝜆mI)dm=(T-𝜆1I)d1(T-𝜆mI)dm(T-𝜆jI)djvThe second equality in Eqn.(3) holds because the operators all commute, i.e., (T-𝜆iI)(T-𝜆jI)=(T-𝜆jI)(T-𝜆iI). Therefore, we can move (T-𝜆jI)djto the far right to give (T-𝜆jI)djv=0. That is, q(T)|G(𝜆j,T)=0 as desired. As an example of applying Theorem 13, we consider the calculation of exp(At), with A being an operator or a matrix. Expanding exp(At) givesexp(At)=I+At+12!A2t2+13!A3t3+Let the characteristic polynomial of A to be q(A)=(A-𝜆1I)d1(A-𝜆mI)dm, then Theorem 13 indicates that Am=c0I+c1A+c2A2++cm-1Am-1Replacing Am in Eqn.(4) with Eqn.(5) turns the infinite series into a finite series,exp(At)=p0(t)I+p1(t)A+p2(t)A2++pm-1(t)Am-1for appropriate polynomials p0,...,pm-1. This conversion may allow simpler calculation.3.2. Minimal PolynomialThe definition of monimal polynomial for a operator depends on the concept of monic polynomial as we give below:
Definition7A monic polynomial is a polynomial whose highest-degree coefficient equals 1.
As an example, the polynomial of z5+7z3+z+1 is a monic polynomial of degree 5. We now give the definition of minimal polynomial and then prove that it is the unique monic polynomial for a given operator.
Definition8Suppose TL(V). Then the minimal polynomial of T is the unique monic polynomial p of smallest degree such that p(T)=0.
As promised, now we prove that
Lemma4Suppose TL(V). Then there is a unique monic polynomial p of smallest degree such that p(T)=0.
Proof: We first prove such a monic polynomial exists. Let n=dimV, then dimL(V)=n2 and the listI,T,T2,...,Tn2is not linearly independent as it has a length of n2+1. Let m be the lowest integer such that I,T,...Tm is linearly dependent. Because of the linear dependency lemma, Tm can be expressed as a linear combination of I,T,..Tm-1, i.e.,a0I+a1T++am-1Tm-1+Tm=0Let p(z)=a0+a1z++am-1zm-1+zm,then p(T)=0. To prove such a monic polynomial is unique, suppose we have another monic polynomial q(T)=0. Notice that the degree of q(T) must be m as q(T) with its degree lower than m cannot be zero. Then (p-q)(T)=0 and deg(p-q)<m, indicating that p=q.The following results tell more about inner structure of minimal polynomial but we will omit their proofs here.
Lemma5Suppose TL(V) and qP(F). Then q(T)=0 if and only if q is a polynomial multiple of the minimal polynomial of T, p(T). In other words, there exists sP(F) such that q=ps.
Lemma6Suppose F=C and TL(V). Then the characteristic polynomial of T is a polynomial multiple of the minimal polynomial of T.
From Definition 6, we know zeros of characteristic polynomial of an operator correspond to its eigenvalues. Now we can show that the minimal polynomial has the same zeros but may have different multiplicities.
Lemma7Let TL(V). Then the zeros of the minimal polynomial of T are precisely the eigenvalues of T.
4. Jordan Form In Section 2.2. we have shown that there is a basis of complex vector space V to make matrix of operator T a nice upper-triangular matrix. Fortunately, we can do even better by writting down M(T) with respect to a Jordan basis.
Definition9Suppose TL(V). A basis of V is called a Jordan basis for T if with respect to this basis T has a block diagonal matrix(A100Ap),where each Aj is an upper-triangular matrix of the formAj=(𝜆j1010𝜆j)
The following result indicates that Jordan form always exists for operators on complex vector spaces.
Theorem14Suppose V is a complex vector space. If TL(V), then there is a basis of V that is a Jordan basis for T.
We will skip the proof here. For interested reader, you might find the lemma below useful to prove Theorem 14.
Lemma8Suppose NL(V) is nilpotent. Then there exist vectors v1,,vnV and nonnegative integers m1,,mn such that(a) Nm1v1,,Nv1,v1,,Nmnvn,,Nvn,vn is a basis of V;(b) Nm1+1v1==Nmn+1vn=0