Proof: we prove Theorem 1 by induction. We need to prove null Tm⊂null Tm+1 for any integer m. Let v∈null Tm, then Tm+1v=T(Tmv)=0. Hence null Tm⊂null Tm+1 as desired.□The subset symbol "⊂" implies possible equality. The following theorem says that the equality, if it exists between two adjacent null spaces, propagates to consecutive powers.
Theorem2Suppose T∈L(V). Suppose m is a nonnegative integer such that nullTm=nullTm+1. ThennullTm=nullTm+1=⋯.
Proof: we prove this by induction, and we need to show null Tm+k=null Tm+k+1for any nonnegative integer k. Since we already know null Tm+k⊂null Tm+k+1 due to Theorem 1, we need to show null Tm+k+1⊂null Tm+k. Let v∈nullTm+k+1, we have Tm+k+1v=Tm+1(Tkv)=0. Hence Tkv∈nullTm+1=nullTm. Thus, Tm(Tkv)=Tm+kv=0, and v∈nullTm+k. So null Tm+k+1⊂null Tm+k as desired.□In the proof above, one might deduce Tm+k+1v=Tm+1(Tkv)=Tm(Tkv)=0 from v∈nullTm+k+1, which has a minor logic leap. Because nullTm=nullTm+1 does not necessarily lead to Tm=Tm+1. With Theorem 2 we can take a step further and show that the consecutive equalities always hold when m=dimV.
Theorem3Suppose T∈L(V). Let n=dimV. ThennullTn=nullTn+1=nullTn+2=⋯.
Proof: Suppose nullTn≠nullTn+1. Then for m<n we must NOT have nullTm=nullTm+1 due to Theorem 2. Hence, from Theorem 1, we have{0}=nullT0⊊null T1⊊⋯⊊nullTn⊊nullTn+1⊊⋯ The ⊊relation guarantees that dimnullTk+1≥dimnullTk+1, which implies dimnullTn+1≥n+1>dimV. This is not possible as dimV=dimnullTn+1+dimrangeTn+1 and hence, dimnullTn+1≤dimV. Thus, the contradiction indicates nullTn=nullTn+1.□The proof above used the fundamental theorem of linear maps, dimV=dimnullT+dimrangeT, which does NOT always result in V=nullT⊕rangeT. To see this, let T∈L(R3), and T(x1,x2,x3)=(x2,x3,0), then nullT={(x1,0,0):x1∈R}, and rangeT={(x1,x2,0):x1,x2∈R}.while dimnullT+dimrangeT=1+2=dimV, nullT∩rangeT={(x1,0,0):x1∈R}≠{0}. Thus, R3≠nullT⊕rangeT. As a matter of fact, R3≠nullT+rangeT. Fortunately, the following theorem could be useful complement to the fundamental theorem of linear maps:
Theorem4Suppose T∈L(V). Let n=dimV. ThenV=nullTn⊕rangeTn.
Proof: we prove Theorem 4 by two steps. The first step shows nullTn∩rangeTn={0}, and then prove that dimnullTn+dimrangeTn=n as dim(A⊕B)=dimA+dimB. Let v∈nullTn∩rangeTn, then Tnv=0 and ∃u∈V such that Tnu=vSo T2nu=Tnv=0. Note that u∈nullT2n=nullTn due to Theorem 3, resulting in Tnu=0. Thus, v=0. Now that nullTn+rangeTn=nullTn⊕rangeTn, we havedim(nullTn⊕rangeTn)=dimnullTn+dimrangeTn=dimV=nwhere we used the fundamental theorem of linear maps again. The equation above implies V=nullTn⊕rangeTn as desired.□1.1. Generalized eigenvectorsThe null-range division is arguably the simplest one caused by application of an operator and its power. Another nice decomposition of a vector space is a direct sum of 1-dimensional subspaces spanned by eigenvectors. Unfortunately, some operators do not have enough eigenvectors to lead to such a nice decomposition. To see how rare the decomposition is, let Tvi=𝜆ivi for distinct 𝜆i and i=1,...,m. When Ui={v:v=zvi,z∈C}, thenV=U1⊕U2⊕⋯⊕Um if and only if V has a basis of eigenvectors of T. According to the diagonalizablity conditions, this happens if and only if V=E(𝜆1,T)⊕⋯⊕E(𝜆m,T).From the complex spectral theorems, normal operators have the equation above hold no matter what on complex vector spaces. But it does NOT hold for general operators. Fortunately, by defining the concept of generalized eigenvectors, we will see any complex vector space is a direct sum of generalized eigenspace.
Definition1Suppose T∈L(V) and 𝜆 is an eigenvalue of T. A vector v∈V is called a generalized eigenvector (of rank j) of T corresponding to 𝜆 if v≠0 and(T-𝜆I)jv=0for some positive integer j. But (T-𝜆I)j-1v≠0.
Definition2Suppose T∈L(V) and 𝜆∈F. The generalized eigenspace of T corresponding to 𝜆, denoted G(𝜆,T), is defined to be the set of all generalized eigenvectors of T corresponding to 𝜆, along with the 0 vector.
We DO NOT define "generalized eigenvalue" here. From Definition 1, (T-𝜆I)j is not injective, so does (T-𝜆I), as nonzero vector (T-𝜆I)j-1v∈null(T-𝜆I). Because T-𝜆I is finite-dimensional, 𝜆 is just an eigenvalue of T. From Definition 1 and Definition 2, we know that G(𝜆,T)=null(T-𝜆I)j where different 𝜆 might correspond to different j. However, the next result shows that we can unify the value to define generalized eigenspaces for all possible eigenvalues.
Theorem5Suppose T∈L(V) and 𝜆∈F. Then G(𝜆,T)=null(T-𝜆I)dimV.
Proof: Suppose v∈null(T-𝜆I)dimV, then v∈G(𝜆,T) as indicated by Definition 2. Therefore null(T-𝜆I)dimV⊂G(𝜆,T). Conversely, Suppose v∈G(𝜆,T), then there exist a positive integer such that (T-𝜆I)jv=0. If j<dimV, then null(T-𝜆I)j=G(𝜆,T)⊂null(T-𝜆I)dimV due to Theorem 1. If j>dimV, then null(T-𝜆I)dimV=null(T-𝜆I)j. Hence G(𝜆,T)⊂null(T-𝜆I)dimV for any positive integer.□Eigenvectors corresponding to distinct eigenvalues are linearly independent, which is also true for generalized eigenvectors.
Theorem6Let T∈L(V). Suppose 𝜆1,…,𝜆m are distinct eigenvalues of T and v1,…,vm are corresponding generalized eigenvectors. Then v1,…,vm is linearly independent.
1.2. Nilpotent operatorsWe end this section by introducing nilpotent operators. The latin word nil means zero, and thus nilpotent literally means zero power.
Definition3An operator is called nilpotent if some power of it equals 0
For a nilpotent operator, we never need to raise its power higher than dimV to turn it into zero, as indicated by the theorem below
Theorem7Suppose N∈L(V) is nilpotent. Then NdimV=0.
Proof: to see this, we use the definition of generalized eigenspace to se that G(0,N)=V. From Theorem 5, G(0,N)=nullNdimV=V, indicating that NdimV=0.□Each nilpotent operator has a matrix that has all zero on its diagonal, as we can prove as the following.
Theorem8Suppose N is a nilpotent operator on V. Then there is a basis of V with respect to which the matrix of N has the form(a
0
*
⋱
0
0
)here all entries on and below the diagonal are 0 's.
Proof: Because of Theorem 7, we can get a desired basis through the following steps. First, we find a basis of nullN. Due to Theorem 1, we then expand such the basis to a basis of nullN2. We repeat such a step till we get a basis for nullNdimV=V. Now, suppose we write down M(N) using this basis, and the first several columns that correspond to the basis vectors of nullN must be made of all zeros. After the first serveral "zero" columns, we reach the second set of columns that correspond to the extended basis vectors of nullN2. Suppose v is a basis vector of nullN2, we have N2v=0=N(Nv), indicating that Nv∈nullN. Thus, applying N to v gives us a vector that is linear combination of basis vectors of nullN, which tells us that the second set of columns have nonzero entries at the rows that correspond to basis vectors of nullN, and such rows are all above the diagonal. By continuing the process described here, we will eventually get a matrix that has all nonzero entris above the diagonal.□2. Decomposition of an OperatorAs we discussed in the pevious section, for general complex vector spaces, it might not have enough normal eigenvectors that dissect the space into one-dimensional subspaces invariant under a given operator. But it is promised that complex vector spaces can always be represented as a direct sum of generalized eigenspaces. The following results show that those generalized eigenspaces are invariant under associated operator as well.
Theorem9Suppose V is a complex vector space and T∈L(V). Let 𝜆1,…,𝜆m be the distinct eigenvalues of T. Then(a) V=G(𝜆1,T)⊕⋯⊕G(𝜆m,T);(b) each G(𝜆j,T) is invariant under T;(c) each |(T-𝜆jI)||G(𝜆j,T) is nilpotent.
Proof: We are not going to prove (a) here as the proof given in Axler's book was not very clear to the author. One might need to know the Lemme des Noyauxto prove it. Because we have not introduced the concept of kernel so far, the proof is skipped here. To prove (b), From Theorem 5, G(𝜆j,T)=null(T-𝜆jI)dimV and notice that (T-𝜆jI)dimV is a polynomial of the operator T-𝜆jI. So to prove (b) we need to first prove the following lemma:
Lemma1Suppose T∈L(V) and p∈P(F). Then null p(T) and range p(T) are invariant under T.
Suppose v∈null p(T), and p(T)v=0. Then Tp(T)v=p(T)(Tv)=0, indicating that Tv∈null p(T) as well. Also if v∈range p(T), then there exist a u∈V such that v=p(T)u. Similarly, Tv=Tp(T)u=p(T)Tu, so Tv∈range p(T) which can be obtained by applying p(T) to Tu∈V. In summary, null p(T) and range p(T) are invariant under T. Now, if we replace p(T) with (T-𝜆jI)dimV, then null(T-𝜆jI)dimV=G(𝜆j,T) is invariant under T because of Lemma 1. Finally, (c) must be true as the operator (T-𝜆jI)||G(𝜆j,T) lives on G(𝜆j,T), and we have (T-𝜆jI)dimVv=0 for every v∈null(T-𝜆jI)dimV=G(𝜆j,T).□Now that we have Theorem 9(a), we can take the basis from every generalized eginspace to make a basis of any complex vector space, i.e.,
Theorem10Suppose V is a complex vector space and T∈L(V). Then there is a basis of V consisting of generalized eigenvectors of T.
2.1. Multiplicity of an eigenvalueBecause of Theorem 9 (a), we can also define multiplicity of an eigenvalue, and the sum of multiplicities of all the eigenvalues of an operator T∈L(V)equals to dimV.
Definition4- Suppose T∈L(V). The multiplicity of an eigenvalue 𝜆 of T is defined to be the dimension of the corresponding generalized eigenspace G(𝜆,T).- In other words, the multiplicity of an eigenvalue 𝜆 of T equals dimnull(T-𝜆I)dimV.
Theorem11Suppose V is a complex vector space and T∈L(V). Then the sum of the multiplicities of all the eigenvalues of T equals dimV.
Multiplicity defined above is also called algebraic multiplicity in some books. The term of geometric multiplicity is also used, but it refers to the dimension of the corresponding eigenspace. In other words, geometric multiplicity of 𝜆 =dimnull(T-𝜆I)algebraic multiplicity of 𝜆 =dimnull(T-𝜆I)dimV2.2. Block diagonal matrixWhat matrices of operators look like with respect to generalized eigenvectors? To answer this question we first introduce the concept of block diagonal matrix as the following:
Definition5A block diagonal matrix is a square matrix of the form(a
A1
0
⋱
0
Am
),where A1,…,Am are square matrices lying along the diagonal and all the other entries of the matrix equal 0 .
As an example, the matrix below is a block diagonal matrix:A=a
4
0
0
0
0
0
2
-3
0
0
0
0
2
0
0
0
0
0
1
7
0
0
0
0
1
=(a
A1
0
A2
0
A3
)The following result shows how a block diagonal matrix can be related to multiplicities of distinct eigenvalues through upper-triangular matrices.
Theorem12Suppose V is a complex vector space and T∈L(V). Let 𝜆1,…,𝜆m be the distinct eigenvalues of T, with multiplicities d1,…,dm. Then there is a basis of V with respect to which T has a block diagonal matrix of the form(a
A1
0
⋱
0
Am
),where each Aj is a dj-by- dj upper-triangular matrix of the formAj=(a
𝜆j
*
⋱
0
𝜆j
).
Proof: To see this, we first use Theorem 9 (a) to dissect a given complex vector space into generalized eigenspaces. For any eigenvalue 𝜆j, G(𝜆j,T)=null(T-𝜆jI)dimV indicates that T-𝜆jI is nilpotent. Then T-𝜆jI has a matrix in a form shown in Theorem 8 with respect to basis of G(𝜆j, T). Furthermore, the matrix of T|G(𝜆j,T)=(T-𝜆jI)|G(𝜆j,T)+𝜆jI|G(𝜆j,T) has 𝜆j on its diagonal with respect to the same basis. We can write similar matrices for other eigenvalues with respect to bases of generalized eigenspaces. Because putting the bases of G(𝜆j,T) together give a basis for T∈L(V), the matrix of T with respect to this basis has M(T|G(𝜆j,T)) on its diagonal, as desired.□The matrix in Eqn.(1) has its diagonal blocks being upper-triangular matrices, which indicates that the eigenvalues of T are 4,2,1, with their multiplicities being 1,2,2.2.3. Square rootsNot every operator on complex vector spaces has a square root, but we will see that I+N always has a square root if N is nilpotent. Notice that the following lemma applies to both complex and real vector spaces.
Lemma2SupposeN∈L(V)is nilpotent. Then I+N has a square root.
Proof: We first consider the Tyler expansion of 1+x=1+a1x+a2x2+⋯. We don't care exact values of coefficients aj but guess that the square root to I+N has a form of I+a1N+a2N2+⋯. Because Nm=0 for some integer m, the series above ends at am-1Nm-1. With this guess, we should havea
I+N
=(I+a1N+a2N2+⋯+am-1Nm-1)2
=I+2a1N+(2a2+a21)N2+(2a3+2a1a2)N3+⋯
+(2am-1+terms involving a1,...,am-2)Nm-1
By matching the terms between the two sides above, we have a1=1/2, 2a2+a21=0,... Again, we don't care specific values of the coefficients. What matters here is we can always find a set of aj that satisfy the equation of I+N=(I+a1N+a2N2+⋯+am-1Nm-1)2 for some integer m.□Lemma 2 applies to both real and complex vector spaces, The following lemma about square root, however, only applies to complex vector spaces.
Lemma3Suppose V is a complex vector space and T∈L(V) is invertible. Then T has a square root.
Proof: Let 𝜆1,...,𝜆m to be distinct eigenvalues of T. On each generalized eigenspace G(𝜆j,T), we have (T-𝜆jI)|G(𝜆j,T) being nilpotent due to Theorem 9 (c). Let Nj=(T-𝜆jI)|G(𝜆j,T), then T|G(𝜆j,T)=Nj+𝜆jI=𝜆j(I+Nj/𝜆j). Apparently Nj/𝜆j is also nilpotent so I+Nj/𝜆j has a square root due to Lemma 2. Thus, a square root Rj of T|G(𝜆j,T) is 𝜆j×I+Nj/𝜆j. Let v∈V, and uj∈G(𝜆j,T), we havev=u1+u2+⋯+um If we define R asRv=R1u1+R2u2+⋯+Rmumthen R2v=R21u1+R22u2+⋯+R2mum=m∑i=1T|G(𝜆i.T)ui=Tvas desired.□By imitating the techniques in this section, you should be able to prove that if V is a complex vector space and T∈L(V) is invertible, then T has a kth root for every positive integer k.3. Characteristic and Minimal PolynomialWe will prove Cayley-Hamilton theorem in this section, based on which we can further include important results of minimal polynomial of operators. The theorem does not have much application in quantum theory, but it is a big deal in control theory (check this video about reachability and controllability by Prof. Steve Brunton). Mathematically, the theorem provides a way to simplify the calculation of matrix exponential as we will see later.3.1. The Cayley-Hamilton theoremBefore we prove Cayley-Hamilton theorem, we need to define the concept of characteristic polynomial.
Definition6Suppose V is a complex vector space and T∈L(V). Let 𝜆1,…,𝜆m denote the distinct eigenvalues of T, with multiplicities d1,…,dm. The polynomial(z-𝜆1)d1⋯(z-𝜆m)dmis called the characteristic polynomial of T.
With this, we then give the theorem as the following
Theorem13Suppose V is a complex vector space and T∈L(V). Let q denote the characteristic polynomial of T. Then q(T)=0.
Proof: First, we can dissect V into a direct sum of distinct generalized eigenspaces associated with eigenvalues of 𝜆1,...,𝜆m. Let the multiplicities of these eigenspaces be d1,...,dm. Then the characteristic polynomail of T is q(T)=(T-𝜆1I)d1⋯(T-𝜆mI)dmFrom Theorem 9 (c) we know (T-𝜆jI)|G(𝜆j,T) is nilpotent. Also, for a nilpotent operator N on a vector space of dimension d, we have Nd=0 because of Theorem 7. Since dimG(𝜆j,T)=dj, we have (T-𝜆jI)dj|G(𝜆j,T)=0. To show that q(T) in eqn. (2) equals zero, we need to show q(T)|G(𝜆j,T)=0 for each 𝜆j. Let v∈G(𝜆j,T), we then have that q(T)v=(T-𝜆1I)d1⋯(T-𝜆jI)dj⋯(T-𝜆mI)dm=(T-𝜆1I)d1⋯(T-𝜆mI)dm(T-𝜆jI)djvThe second equality in Eqn.(3) holds because the operators all commute, i.e., (T-𝜆iI)(T-𝜆jI)=(T-𝜆jI)(T-𝜆iI). Therefore, we can move (T-𝜆jI)djto the far right to give (T-𝜆jI)djv=0. That is, q(T)|G(𝜆j,T)=0 as desired.□As an example of applying Theorem 13, we consider the calculation of exp(At), with A being an operator or a matrix. Expanding exp(At) givesexp(At)=I+At+1
2!A2t2+1
3!A3t3+⋯Let the characteristic polynomial of A to be q(A)=(A-𝜆1I)d1⋯(A-𝜆mI)dm, then Theorem 13 indicates that Am=c0I+c1A+c2A2+⋯+cm-1Am-1Replacing Am in Eqn.(4) with Eqn.(5) turns the infinite series into a finite series,exp(At)=p0(t)I+p1(t)A+p2(t)A2+⋯+pm-1(t)Am-1for appropriate polynomials p0,...,pm-1. This conversion may allow simpler calculation.3.2. Minimal PolynomialThe definition of monimal polynomial for a operator depends on the concept of monic polynomial as we give below:
Definition7A monic polynomial is a polynomial whose highest-degree coefficient equals 1.
As an example, the polynomial of z5+7z3+z+1 is a monic polynomial of degree 5. We now give the definition of minimal polynomial and then prove that it is the unique monic polynomial for a given operator.
Definition8Suppose T∈L(V). Then the minimal polynomial of T is the unique monic polynomial p of smallest degree such that p(T)=0.
As promised, now we prove that
Lemma4Suppose T∈L(V). Then there is a unique monic polynomial p of smallest degree such that p(T)=0.
Proof: We first prove such a monic polynomial exists. Let n=dimV, then dimL(V)=n2 and the listI,T,T2,...,Tn2is not linearly independent as it has a length of n2+1. Let m be the lowest integer such that I,T,...Tm is linearly dependent. Because of the linear dependency lemma, Tm can be expressed as a linear combination of I,T,..Tm-1, i.e.,a0I+a1T+⋯+am-1Tm-1+Tm=0Let p(z)=a0+a1z+⋯+am-1zm-1+zm,then p(T)=0. To prove such a monic polynomial is unique, suppose we have another monic polynomial q(T)=0. Notice that the degree of q(T) must be m as q(T) with its degree lower than m cannot be zero. Then (p-q)(T)=0 and deg(p-q)<m, indicating that p=q.□The following results tell more about inner structure of minimal polynomial but we will omit their proofs here.
Lemma5Suppose T∈L(V) and q∈P(F). Then q(T)=0 if and only if q is a polynomial multiple of the minimal polynomial of T, p(T). In other words, there exists s∈P(F) such thatq=ps.
Lemma6Suppose F=C and T∈L(V). Then the characteristic polynomial of T is a polynomial multiple of the minimal polynomial of T.
From Definition 6, we know zeros of characteristic polynomial of an operator correspond to its eigenvalues. Now we can show that the minimal polynomial has the same zeros but may have different multiplicities.
Lemma7Let T∈L(V). Then the zeros of the minimal polynomial of T are precisely the eigenvalues of T.
4. Jordan Form In Section 2.2. we have shown that there is a basis of complex vector space V to make matrix of operator T a nice upper-triangular matrix. Fortunately, we can do even better by writting down M(T) with respect to a Jordan basis.
Definition9Suppose T∈L(V). A basis of V is called a Jordan basis for T if with respect to this basis T has a block diagonal matrix(a
A1
0
⋱
0
Ap
),where each Aj is an upper-triangular matrix of the formAj=(a
𝜆j
1
0
⋱
⋱
⋱
1
0
𝜆j
)
The following result indicates that Jordan form always exists for operators on complex vector spaces.
Theorem14Suppose V is a complex vector space. If T∈L(V), then there is a basis of V that is a Jordan basis for T.
We will skip the proof here. For interested reader, you might find the lemma below useful to prove Theorem 14.
Lemma8Suppose N∈L(V) is nilpotent. Then there exist vectors v1,…,vn∈V and nonnegative integers m1,…,mn such that(a) Nm1v1,…,Nv1,v1,…,Nmnvn,…,Nvn,vn is a basis of V;(b) Nm1+1v1=⋯=Nmn+1vn=0