Theorem1Suppose V is finite-dimensional and T∈L(V). Then the following are equivalent: a. T is invertible, b. T is injective, and c. T is surjective.
Proof: It is obvious from a to b and c as invertibility of a map gives injectivity and surjectivity. Suppose T∈L(V) is injective, then null T={0} and dim nullT=0, resulting in dim V=dim rangeT because of the fundamental theorem of linear maps. So b allows c, which in turn results in a. Now if T is surjective. We have rangeT=V, indicating dim range T=dim V again.□With Theorem 1, we also know that for T∈L(V), T not being invertible resulting in T not being injective and surjective, and vice versa. Now that we refreshed the knowledge of operator, we shall focus on the invariant subspace in the following sections from which the concept of eigenvalue and eigenvector are derived.2. From Invariant Subspace to EigenspaceWe first give the definition of invariant subspace, and we will see the subspace spanned by single eigenvector is a special case of invariant subspace.
Definition1Let U⊂V be a subspace of V and T∈L(V), then we call U an invariant subspace when ∀u∈U, Tu∈U.
If we let v∈V, then we can obtain an one-dimensional subspace as U=span(v)={w:w=𝜆v,𝜆∈F}. Suppose U is invariant under T∈L(V), then Tw=w′, which is only possible if w′=𝜆w with 𝜆∈F. As a result, we have Tw=𝜆w for w∈U, and we call 𝜆 and w here eigenvalue and eigenvector, respectively. In the following, we give the formal definitions of these two concepts.
Definition2Suppose T∈L(V), and 𝜆∈F is called an eigenvalue of T if there exists v∈V such that v≠0 and Tv=𝜆v.
Definition3Suppose T∈L(V) and 𝜆∈F, v∈V is called an eigenvector of T if v≠0 and Tv=𝜆v.
There are two things worth noting here: a) there is no constriction for 𝜆=0. because we demand v≠0 in Definition 2, T∈L(V) does NOT always has 0 to be one of its eigenvalues. Especially, we do not have 𝜆=0 if T is injective (e.g. nullT={0}).b) zero vector can NOT be eigenvector.It is not unusual to have linear independent vectors corresponding the same eigenvalue. Suppose we have v1,...vm being eigenvectors of T and they all correspond to eigenvalue 𝜆, then we call the subspace U=span(v1,...,vm) the eigenspace of T corrsponding to 𝜆, denoted as E(𝜆,T). This is an empirical definition of eigenspace. To make if formal, i.e., written in terminology of linear algebra, let us first introduce a special class of operator, T-𝜆I∈L(V). I here is the identity operator on V. We think about it here because for an eigenvector of T, we have(T-𝜆I)v=Tv-𝜆v=0Thus, for T∈L(V) having eigenvalue and eigenvector is equivalent to null(T-𝜆I)≠{0} for some 𝜆∈F. Furthermore, from Theorem 1 we have the following
Theorem2For finite-dimensional vector space V, T∈L(V), and 𝜆∈F, we have the following statements equivalent: a) 𝜆 is eigenvalue of T, b) T-𝜆I is not invertible, c) T-𝜆I is not injective, and d) T-𝜆I is not surjective.
With the introduction above, we are ready to give the definition of eigenspace as
Definition4Suppose T∈L(V) and 𝜆∈F. The eigenspace of T corresponding 𝜆 is defined byE(𝜆,T)= null(T-𝜆I)
Now that we have eigen-value/vector/space defined, we need to answer the question of their existence as indicated in the title of the following subsection.2.1. But, does T have eigenvalues?The following theorem guarantees the existence of eigenvalues in complex vector space.
Theorem3Every operator T∈L(Cn) with n<∞ has its eigenvalue and corresponding eigenvectors.
Proof: Suppose V is a complex vector space, and dimV=n. Let v∈V, then we know thatv, Tv, T2v,...Tnvmust not be linear independent as the list length, n+1>dimV. Therefore,0=a0v+a1Tv+a2T2v+⋯+anTnvhas at least one of the complex coefficients ai being nonzero. According to the fundamental theorem of algebra, we havea
0
=a0v+a1Tv+a2T2v+⋯+anTnv
=(a0I+a1T+a2T2+⋯+anTn)v
=c(T-𝜆1I)(T-𝜆1I)⋯(T-𝜆mI)v
here m≤n as some complex coefficient migth be zero. Thus, T-𝜆jI is not injective for at least one of 𝜆j, which equivalent to one of 𝜆j is eigenvalue of T.□The proof above uses the fundamental theorem of algebra to show the existence of eigenvalue, indicating that polynomial of operator, p(T), could be very useful. So we introduce properties of p(T) in the following section.2.2. Polynomial of operatorThe polynomial in the proof above is a polynomial of operators. You will find such a representation pervasive in the literature of quantum computing/algorithm. So it's worth our time to know their interesting properties.Let us first discuss p(T) with a degree of two. For T∈L(V), we call the operator idempotent if T2=T. Idempotent operators on Hilbert space is an vibrant topic in the community of quantum computing, but we will save the excitement until we hit the topic. From the perspective of linear algebra, arguably the most interesting property of idempotent operation is
Theorem4Suppose P∈L(V) and P2=P, we have V=null P⊕ range P.
Proof: To see these, we first notice that for v∈V, we can rewrite it asv =(I-P)v+PvClearly Pv∈Range P, and P(I-P)v=0, so (I-P)v∈null P.Thus, we have V=null P + range P. To prove it is a direct sum. Suppose u∈null P∩range P, then there is v∈V such that Pv=u and Pu=0. Notice that u=Pv=P2v=Pu=0, we have null P∩range P={0}, which results in V=null P⊕ range P.□The proof for Theorem 3 also indicates that the polynomial of operator can help to find eigenvalues of the operator. Indeed,
Theorem5for an arbitrary polynomial of operator, p(T), if there exists a nonzero vector v∈V such that p(T)v=0, then every zero of p is an eigenvalue of T.
Proof: Let 𝜆 to be one of the zeros for p. Then we can write p(T)=(T-𝜆I)q(T) with q(T)≠0. Suppose 𝜆 is not an eigenvalue of T, then T-𝜆I is injective according to Theorem 2. But p(T)v=(T-𝜆I)q(T)v=0, indicating that q(T)v∈null(T-𝜆I), creating a contradiction. So T-𝜆I must be injective, and 𝜆 must be an eigenvalue□In quantum computing, we sometimes need to consider polynomials of composite operators, e.g, p(ABC). Suppose S is an invertible operator on V, then we can show that
p(STS-1)=Sp(T)S-1 for T,S∈L(V).
3. Eigen- in Matrix representationFrom the previous discussion of the presentation of linear map, if dimV=n, and T∈L(V), then matrix of the linear map M(T) is a n-by-n matrices with respect to a basis of V. In the introductory class of linear algebra, it is emphasized that eigenvalues of a linear map are the diagonal values of its corresponding upper-triangular matrix. Using the language of basis and dimensions, a vector space has various distince bases, and an operator can only be represented by an upper-triangular matrix, if there is one, when we choose the correct basis. Since every operator on complex vector space has its eigenthe values, it should not be a surprise that every such operator can be represented by an upper-triangular matrix.
Theorem6Suppose V is a finite-dimensional complex vector space and T∈L(V). Then T has an upper-triangular matrix with respect to some basis of V.
It is noteworthy that Theorem 6 does not imply that diagonal values of the upper-triangular matrix are always nonzero. It surely can be zero as 0 is a valid value for eigenvalue. On the other hand, if we have zero diagonal values, then the upper-triangular matrix is no longer invertible.Based on our discussion of presentation of linear map, we can deduce that for a linear map, if it has a upper-triangular matrix, its application to basis vector vj of vector space results in linear combination of v1,v2,...vj-1,vj. The means that Tvj∈span(v1,v2,...vj). Because vj∈span(v1,v2,...,vj), we should have that span(v1,v2,...,vj) is invariant under T. With this, we can write out the conditions for an operator to have upper-triangular matrix.
Theorem7Suppose T∈L(V) and v1,...,vn is a basis of V. Then the following conditions are equivalent: a) the matrix of T w.r.t v1,...,vn is upper triangular b) Tvj∈∈span(v1,v2,...vj) for each j=1,...,n, and c) span(v1,v2,...vj) is invariant under T for each j=1,...,n.
While the author admits that Theorem 7 is obvious logically based on our pevious dicussion, it might not be useful for solving real-world problems. In quantum-mechanical calculations, especially, people care more about finding out whether an operator on Hilbert space can be diagonalized. That is, can we find some basis of subspaces of Hilbert space, so the operators have matrices that contain nonzero entries only at their diagonals? To know if an operator is diagonalizable, we check it with the following conditions.
Theorem8Suppose V is finite-dimensional and T∈L(V). Let its eigenvalues 𝜆1,...,𝜆m be distinct. Then T is diagonalizable is equivalent to the following statements: a) V has a basis consisting only of eigenvectors of T b) suppose dim V=n, then there exist n one-dimensional subspaces Ui of V, each invariant under T, such taht V=U1⊕U2⊕⋯⊕Un c) V=E(𝜆1,T)⊕⋯⊕E(𝜆m,T) d) dim V = dim E(𝜆1,T)+dim E(𝜆2,T)+⋯+dim E(𝜆m,T)
Theorem 8 indicates that diagonalizability of operator is related relationships between vector space and its eigenspaces. As a matter of fact, an operator is diagonalizable if it has correct number of eigenvalues.
Theorem9If T∈L(V) has dim V distinct eigenvalues, then T is diagonalizable.
3.1. Why are diagonal matrices important?Now that we have discussed the relationship between diagonal matrices and eigenvalues of operators, we can try to summarize the importance of diagonal matrices in quantum-mechanical calculations. As the starter, we first notice that people use eigenvectors to present states of quantum systems after measurements. Measurements here are physical realization of linear operators we discussed so far. Similarly, specific physical properties of a quantum system(e.g. energies) take the values of eigenvalues of such operators. Thus, for a diagonalizable operator, upon reading out the result(eigenvalue) of a measurement, we know the quantum system of interest must be in one of the states represented by eigenvectors. In some literature, states corresponding to eigenvectors of a diagonal matrix are referred as "pure state". Imagine that we have an operator that only has its upper-triangular matrix, upon the measurement the resulted state is a linear combination of multiple basis vectors, and thus has no "well-defined" physical properties. In quantum computing, we demand logic units of computation (i.e. qubits) have well-defined physical properties. Most of the time, this means the system treated as a qubit should have two energy levels, and/or distinct spin states. The reason is obvious: we do not want fuzzy results coming out of our computation!As an example, suppose we have a qubit made of a two-state system(e.g. superconducting qubit), and we measure the system state after apply to it the Z-gate. Z-gate is an operator that has the following diagonal matrixZ=a
1
0
0
-1
The matrix above is with respect to two basis vectors, v1 and v2, that span the vector space containing all possible quantum states of the system. The two states corresponding to the two basis vectors will be used in calculation (think of these two states as binary numbers 0 and 1). From the matrix, it's transparent that Z-gate has two eigenvalues, 1 and -1. Let Zv1=v1 and Zv2=-v2, then we know that Z-gate has no effect on state of v1 while turn v2 into -v2(or add a global phase in the language of wavefunction). Thus, being diagonalizable here means that the effects of Z-gate is totally known to us, and we can safely use the gate in our quantum circuit to do complicated tasks.