aThis is the last note on preliminary linear algebra. For people who have been following along they might notice that much of the "nomal" stuff about matrices are not stressed or even mentioned so far in this series. To finish strong, we will focus on determinant and trace of matrices in our last note. This first-operator-then-matrix approach is what Axler's book is best known for. To the author, such a method helps to cultivate your intuition about linear maps and operators. If you can prove things without referring to matrices, you are doing the "hard part" of the work to understand linear algebra. Once you survive the "hard" work, manipulation of matrices might be more transparent. Like previous notes, here we still assume vector spaces V is nonzero and finite-dimensional over the field F=C.1. Trace1.1. Change of Basis1.2. Trace: A connection between matrices and operators2. Determinant2.1. Determinant of an operator2.2. Determinant of a matrix2.3. Volume2.3.1. Polar coordinates2.3.2. Spherical coordinates3. Where to go from here?1. TraceAs a starter, we need to revisit the way matrix is define. For any operator T∈L(V,W), we need to one basis of V and another basis of W to define the matrix that represents T. Let v1,...,vn and w1,...,wm be the two bases of V and W, respectively. Then the matrix of T with respect to the two bases is M(T,(v1,...,vn),(w1,..,wm)), which has a shape of m-by-n. We will make it explicit the information of bases in the definition of matrices as it helps us to prove some results in the subsection below. 1.1. Change of BasisSuppose v1,...,vn is a basis of V, and the identity operator I∈L(V) satisfies Ivi=vi. So M(I,(v1,...,vn),(v1,..,vn)) is a n-by-n matrix with its diagonal entries all being unity. Now, let u1,...,un be another basis of V, M(I,(v1,...,vn),(u1,...,un)) is a matrix that changes basis of every vector v∈V from v1,...,vn to u1,...,un. Change of basis has extensive use in physics as it usually is a way to simplify the problem. Here we are interested in using M(I,(v1,...,vn),(u1,...,un)) to relate trace of operators to trace of corresponding matrices. For an operator T∈L(V), we have discussed its matrix M(T,(v1,...,vn)), i.e., we use only one basis to represent the matrix. If we use two different bases to represent matrices of operators, we can then rewrite the rule of matrix multiplication as the following
Lemma1Suppose u1,…,un and v1,…,vn and w1,…,wn are all bases of V. Suppose S,T∈L(V). Thena
M(ST,(u1,…,un),(w1,…,wn))=
M(S,(v1,…,vn),(w1,…,wn))M(T,(u1,…,un),(v1,…,vn)).
Applying ST to any vector v∈V, the right hand side of eqn.(1) changes the basis of v first from {ui} to {vi}, and then from {vi} to {wi}. With Lemma 1, we can also prove that M(I,(v1,...,vn),(u1,...,un)) is invertible.
Lemma2Suppose u1,…,un and v1,…,vn are bases of V. Then the matrices M(I,(u1,…,un),(v1,…,vn)) and M(I,(v1,…,vn),(u1,…,un)) are invertible, and each is the inverse of the other.
Proof: We replace wi with ui and S,T with I in eqn.(1) to giveI=M(I,(u1,…,un))=M(I,(v1,…,vn),(u1,…,un))M(I,(u1,…,un),(v1,…,vn))=ABInterchaging vi with ui in eqn.(2) givesI=M(I,(v1,…,vn))=M(I,(u1,…,un),(v1,…,vn))M(I,(v1,…,vn),(u1,…,un))=BAi.e., I=AB=BA, and M(I,(u1,…,un),(v1,…,vn)) is the inverse of M(I,(v1,…,vn),(u1,…,un)).□Let A=M(I,(v1,…,vn),(u1,…,un)), the proof above indicates that A-1=M(I,(u1,…,un),(v1,…,vn)). The following formulae shows how a matrix of operator changes when we change bases of the vector space.
Lemma3Suppose T∈L(V). Let u1,…,un and v1,…,vn be bases of V. Let A=M(I,(u1,…,un),(v1,…,vn)). ThenM(T,(u1,…,un))=A-1M(T,(v1,…,vn))A
Proof: Based on Lemma 1, if we replace wj with uj and replace S with I, we getM(T,(u1,…,un))=A-1M(T,(u1,…,un),(v1,…,vn))Use Lemma 1 again, but replace wj with vj this time. Also replace T with I and replace S with T, gettingM(T,(u1,…,un),(v1,…,vn))=M(T,(v1,…,vn))ASubstituting (6) into (5) gives desired result.□1.2. Trace: A connection between matrices and operatorsWe will first give the definitions of traces of operators and matrices, separately. While their definitions look different, we will prove that they are essentially the same thing. Also, the trace of a matrix is invariant under the change of basis.
Definition1Suppose T∈L(V)- If F=C, then the trace of T is the sum of the eigenvalues of T, with each eigenvalue repeated according to its multiplicity.- If F=R, then the trace of T is the sum of the eigenvalues of TC, with each eigenvalue repeated according to its multiplicity.The trace of T is denoted by trace T.
The TC above is the complexification of operator T acting on a real vector space. For the sake of completeness, we define TC and complexification of real vector space as the following:
Definition2Suppose V is a real vector space.- The complexification of V, denoted VC, equals V×V. An element of VC is an ordered pair (u,v), where u,v∈V, but we will write this as u+iv.- Addition on VC is defined by(u1+iv1)+(u2+iv2)=(u1+u2)+i(v1+v2)for u1,v1,u2,v2∈V.- Complex scalar multiplication on VC is defined by(a+bi)(u+iv)=(au-bv)+i(av+bu)for a,b∈R and u,v∈V.
Definition3Suppose V is a real vector space and T∈L(V). The complexification of T, denoted TC, is the operator TC∈L(VC) defined byTC(u+iv)=Tu+iTvfor u,v∈V.
The matrix of TC equals the matrix of T. When 𝜆 is a real eigenvalue of T, then it is also an eigenvalue of TC. Suppose V is a real vector space, T∈L(V), and 𝜆∈C. Then 𝜆 is an eigenvalue of TC if and only if ⏨𝜆 is an eigenvalue of TC.Recall that characteristic polynomial of an operator is (z-𝜆1)d1(z-𝜆2)d2⋯(z-𝜆m)dm where 𝜆i and di are eigenvalues and their corresponding multiplicities (dimensions of generalized eigenspaces). We expand such a polynomial to get ,zn-(𝜆1+⋯+𝜆1⏠⏣⏣⏣⏡⏣⏣⏣⏢d1+⋯+𝜆m+⋯+𝜆m⏠⏣⏣⏣⏡⏣⏣⏣⏢dm)zn-1+⋯+(-1)n(𝜆d11⋯𝜆dmm) which results in the following lemma:
Lemma4Suppose T∈L(V). Let n=dimV. Then trace T equals the negative of the coefficient of zn-1 in the characteristic polynomial of T.
For the trace of a matrix, we define it as the following
Definition4The trace of a square matrix A, denoted trace A, is defined to be the sum of the diagonal entries of A.
Lemma5If A and B are square matrices of the same size, thentrace(AB)=trace(BA).
Proof: Let A and B be two n-by-n matrices, then the jth diagonal element of AB is n∑k=1Aj,kBk,j. So the trace of the product istrace(AB)=n∑j=1n∑k=1Aj,kBk,j=n∑k=1n∑j=1Bk,jAj,k=trace(BA)□To prove that trace of an operator equals trace of its matrix, we need to first show that the trace of matrix does not depend on the choice of basis, i.e.,
Lemma6Let T∈L(V). Suppose u1,…,un and v1,…,vn are bases of V. ThentraceM(T,(u1,…,un))=traceM(T,(v1,…,vn))
Proof: From Lemma 3, M(T,(u1,…,un))=A-1M(T,(v1,…,vn))A with A=M(I,(u1,…,un),(v1,…,vn)), sotraceA-1M(T,(v1,…,vn))A=⏠⏣⏣⏡⏣⏣⏢trace(AB)=trace(BA)traceM(T,(v1,…,vn))AA-1=traceM(T,(v1,…,vn))which gives traceM(T,(u1,…,un))=traceM(T,(v1,…,vn)) as desired.□We are now ready to prove
Theorem1Suppose T∈L(V). Then traceT=traceM(T).
Proof: Because trace of matrix is invariant under the choice of basis (Lemma 6), we only need to show traceT equals to traceM(T) for some bases. Recall that operators on complex vector spaces have their block diagonal matrices with respect to bases of generalized eigenspaces. The block diagonal matrix has eigenvalues on its diagonal, resulting in traceT=traceM(T). If V is a real vector space, then applying the complex case to the complexification TC gives the desired result.□Based on Theorem 1 we can show that trace is additive
Lemma7Suppose S,T∈L(V). Then trace (S+T)= trace S+ trace T.
Proof:a
trace(S+T)
=traceM(S+T)
=trace(M(S)+M(T))
=traceM(S)+traceM(T)
=traceS+traceT
where the third equality comes from the definition of trace. □The following result has found its extensive use in quantum theory, especially its infinite-dimensional generalization.
Lemma8There do not exist operators S,T∈L(V) such that ST-TS=I.
2. Determinant2.1. Determinant of an operatorThe definition of determinant below mimics the definition of trace. The only difference is that the sum of eigenvalues is replaced with the product.
Definition5Suppose T∈L(V).- If F=C, then the determinant of T is the product of the eigenvalues of T, with each eigenvalue repeated according to its multiplicity.- If F=R, then the determinant of T is the product of the eigenvalues of TC, with each eigenvalue repeated according to its multiplicity.The determinant of T is denoted by detT.
If 𝜆1,…,𝜆m are the distinct eigenvalues of T (or of TC if V is a real vector space) with multiplicities d1,…,dm, then the definition above impliesdetT=𝜆d11⋯𝜆dmm.From the expansion of characteristic polynomial at (7), we have
Lemma9Suppose T∈L(V). Then the characteristic polynomial of T can be written aszn-(traceT)zn-1+⋯+(-1)n(detT)
We now show the following lemma can be proved easily using Definition 5.
Lemma10An operator on V is invertible if and only if its determinant is nonzero.
Proof: Suppose V is a complex vector space. An operator T∈L(V) is invertible if and only if 0 is not its eigenvalue, resulting in nonzero determinant. If V is a real vector space, we have again T is invertible if and only if 0 is not an eigenvalue of T, which happens if and only if 0 is not an eigenvalue of TC. Thus, we have detT≠0□Definition of determinant also gives a new way to define the characteristic polynomial of operators as shown below.
Lemma11Suppose T∈L(V). Then the characteristic polynomial of T equals det(zI-T).
Proof: First we notice that -(T-𝜆I)=zI-T-(z-𝜆)ISo 𝜆 is an eigenvalue of T indicates that (z-𝜆) is an eigenvalue of zI-T. The equality also implies that dim null(T-𝜆I)dimV=dimnull (zI-T-(z-𝜆)I)dimVi.e., the multiplicity of 𝜆 equals the multiplicity of z-𝜆. Let 𝜆1,...,𝜆m be the eigenvalues of T, and d1,...,dm corresponding multiplicities. det(zI-T)=(z-𝜆1)d1⋯(z-𝜆m)dm, where the right hand side is exactly the characteristic polynomial of T. If V is real vector space, we apply the complex case to TC to give the same result.□Finally, as a special kind of operator, we have the following result regarding the determinant of isometries.
Lemma12Suppose V is an inner product space and S∈L(V) is an isometry. Then |detS|=1
This result should be obvious on complex vector spaces as isometries have absolute values of their eigenvalues being unity. By definition of the determinant on real vector spaces, we have detS=detSC and thus |detS|=1, completing the proof.2.2. Determinant of a matrix To define determinant of an arbitrary matrix, we first introduce the concept of permutation and its sign.
Definition6- A permutation of (1,…,n) is a list (m1,…,mn) that contains each of the numbers 1,…,n exactly once.- The set of all permutations of (1,…,n) is denoted perm n.
Definition7- The sign of a permutation (m1,…,mn) is defined to be 1 if the number of pairs of integers (j,k) with 1≤j<k≤n such that j appears after k in the list (m1,…,mn) is even and -1 if the number of such pairs is odd.- In other words, the sign of a permutation equals 1 if the natural order has been changed an even number of times and equals -1 if the natural order has been changed an odd number of times.
Lemma13Interchanging two entries in a permutation multiplies the sign of the permutation by -1.
The determinant of a matrix is now defined as
Definition8Suppose A is an n-by- n matrixA=(a
A1,1
…
A1,n
⋮
⋮
An,1
…
An,n
)The determinant of A, denoted detA, is defined bydetA=∑(m1,…,mn)∈permn(sign(m1,…,mn))Am1,1⋯Amn,n
Notice that the product Am1,1⋯Amn,n in the definition above has its second subscripts ranging from 1 to n, i.e., is a product of n entries. Also, permn=n!. Based on Definition 7 we can prove that
Lemma14Suppose A is a square matrix and B is the matrix obtained from A by interchanging two columns. ThendetA=-detB.
Proof: From Eqn.(8), all the products Am1,1⋯Amn,n remain the same after interchanging two columns, but two numbers interchange their places in the list of m1,...,mn. Thus, each term in the sum of (8) is multiplied by -1 due to Lemma 13, resulting detA=-detB.□If the interchanged two columns are identical, then
Lemma15If A is a square matrix that has two equal columns, then detA=0.
Proof: From the proof of Lemma 14, we have detA=-detA, indicating detA=0.□The following lemma shows what would happen if we permutate multiple columns of a matrix. Note here that A⋅,i is the ith column of the matrix A.
Lemma16Suppose A=(a
A⋅,1
…
A⋅,n
) is an n-by-n matrix and (m1,…,mn) is a permutation. Thendet(a
A⋅,m1
…
A⋅,mn
)=(sign(m1,…,mn))detA
Proof: converting A to (a
A⋅,m1
…
A⋅,mn
) can be accomplished by a series of two-column interchange. In each step, we need to multiply detA by -1. Because sign(m1,…,mn) is 1 for even number of interchanging and -1 for odd steps, eqn.(9) must be true.□Unlike trace of matrix, determinant of matrix is multiplicative.
Lemma17Suppose A and B are square matrices of the same size. Thendet(AB)=det(BA)=(detA)(detB).
But just like trace of a given matrix, the determinat does not depend on bases either, i.e.,
Lemma18Let T∈L(V). Suppose u1,…,un and v1,…,vn are bases of V. ThendetM(T,(u1,…,un))=detM(T,(v1,…,vn))
which can be proved through the same steps used for proving Lemma 6. Now that we have Lemma 18, we repeat the steps in the proof of Theorem 1 to prove
Theorem2detT=detM(T)for T∈L(V).
2.3. VolumeDeterminant has one important application in undergrad-level mathematics, i.e., in computing volumes and integrals. In this section we first lay out critical tools and then investigate volumes and and integration through linear algebra. Recall that if V is an inner product space and T∈L(V), then T†T is a positive operator and hence has a unique positive square root,T†T. By the polar decomposition, there is an isometry S∈L(V) such that T=ST†T thus|detT|=|detS|detT†T because eigenvalues of the square root are nonnegative. Eqn.(10) and Lemma 12 lead to the following
Lemma19Suppose V is an inner product space and T∈L(V). Then|detT|=detT*T.
We now study the relation between determinant and volume by calculating volumes of arbitrary subsets of real vector spaces. The volume of arbitrary subset could take any shape geometrically, which make the calculation hard. We will relieve this issue by dissecting the volume into smaller "boxes" whose volume is well defined and easy to calculate. A box is defined as the following
Definition9A box in Rn is a set of the form{(y1,…,yn)∈Rn:xj<yj<xj+rj for j=1,…,n},where r1,…,rn are positive numbers and (x1,…,xn)∈Rn. The numbers r1,…,rn are called the side lengths of the box.
and its volume is defined as
Definition10The volume of a box B in Rn with side lengths r1,…,rn is defined to be r1⋯rn and is denoted by volume B.
For an arbitray volume 𝛺, we can write it as a subset of a union of many small boxes, as inidicated in the definition below:
Definition11Suppose Ω⊂Rn. Then the volume of Ω, denoted volume Ω, is defined to be the infimum of volume B1+ volume B2+⋯, where the infimum is taken over all sequences B1,B2,… of boxes in Rn whose union contains Ω.
Since we seek a relation between volume 𝛺 and an acting operator T, we use the notation of T(𝛺) to represent the set of {Tx: x∈𝛺}. Let T first be an positive operator, then volume 𝛺 and volume T(𝛺) are related through the lemma below:
Theorem3Suppose T∈L(Rn) is a positive operator and Ω⊂Rn. Then volume T(Ω)=(detT)( volume Ω).
Proof: Consider an arbitrary positive operator T∈L(Rn). By the Real Spectral Theorem, there exist an orthonormal basis e1,…,en of Rn and nonnegative numbers 𝜆1,…,𝜆n such that Tej=𝜆jej for j=1,…,n. That is, T stretches the jth basis vector in an orthonormal basis by a factor of 𝜆j. Because volume behaves the same with respect to each orthonormal basis, we should have volumeT(Ω) equal to volume Ω multiplied by a factor of 𝜆1⋯𝜆n=detT. □For T being isometry, we have instead
Lemma20Suppose T∈L(Rn) is an isometry and Ω⊂Rn. Thenvolume T(Ω)= volume Ω.
Now we can prove that for arbitrary operator T∈(Rn) changes volume by a factor of |detT|:
Theorem4Suppose T∈L(Rn) and Ω⊂Rn. Thenvolume T(Ω)=|detT|( volume Ω).
Proof: By the Polar Decomposition, there is an isometry S∈L(V) such thatT=ST†T.If Ω⊂Rn, then T(Ω)=S(T†T(Ω)). Thusa
volume T(Ω)
= volume S(T†T(Ω))
= volume T†T(Ω)
=(detT†T)( volume Ω)
=|detT|( volume Ω)□
We finish this section by discussing how determinant appears in multivariable integration. The notions of differentiable and derivative are giving as the following:
Definition12Suppose Ω is an open subset of Rn and 𝜎 is a function from Ω to Rn. For x∈Ω, the function 𝜎 is called differentiable at x if there exists an operator T∈L(Rn) such thatlimy→0‖𝜎(x+y)-𝜎(x)-Ty‖
||y||=0.If 𝜎 is differentiable at x, then the unique operator T∈L(Rn) satisfying the equation above is called the derivative of 𝜎 at x and is denoted by 𝜎′(x).
If the notation of 𝜎′(x) is used in Definition 12, then eqn.(11) implies that the derivative at x∈Rn satisfying𝜎(x+y)≈𝜎(x)+(𝜎′(x))(y)where y∈Rn as well. In some textbook, 𝛥x is used in the place of y. Definition 12 also implies the format of M(𝜎′(x)) as we shall see now. Suppose Ω is an open subset of Rn and 𝜎 is a function from Ω to Rn. We can write𝜎(x)=(𝜎1(x),…,𝜎n(x))where 𝜎j is a functin from 𝛺 to R. Let yi=(0,...,1,...,0) then we can write y asy=y1y1+⋯+ynyn.Because 𝜎′(x)∈L(Rn), we must have based on eqn.(12) that(𝜎′(x))(yiyi)=yi(𝜎′(x))(yi)≈(𝜎1(x+yiyi)-𝜎1(x),…,𝜎n(x+yiyi)-𝜎n(x))⇒(𝜎′(x))(yi)≈(𝜎1(x+yiyi)-𝜎1(x)
yi,…,𝜎n(x+yiyi)-𝜎n(x)
yi)At the limit of yi→0 we can write Di𝜎j=𝜎j(x+yiyi)-𝜎j(x)
yi as the partial derivative of 𝜎j with respect to the ith coordinate, then eqn.(13) gives M(𝜎′(x)) as the followingM(𝜎′(x))=a
a
y1
⋯
yn
a
(𝜎′1(x))(yi)
⋮
(𝜎′n(x))(yi)
(a
D1𝜎1(x)
…
Dn𝜎1(x)
⋮
⋮
D1𝜎n(x)
…
Dn𝜎n(x)
)
where expressions outside of the matrix emphasize the operator maps yi to (𝜎′(x))(yi) just like what discussed in our nots on matrix presentation notes. Now we can state the change of variables integration formula. Some additional mild hypotheses are needed for f and 𝜎′ (such as continuity or measurability), so we will not worry about the rigorous proof of the following theorem.
Theorem5Suppose Ω is an open subset of Rn and 𝜎:Ω→Rn is differentiable at every point of Ω. If f is a real-valued function defined on 𝜎(Ω), then∫𝜎(Ω)f(y)dy=∫Ωf(𝜎(x))|det𝜎′(x)|dx
Theorem 5 is called a change of variables formula because you can think of y=𝜎(x) as a change of variables. Also the theorem above should not be a surprise to the reader as we have already show in Theorem 4 that volume T(Ω)=|detT|( volume Ω).The key point when making a change of variables is that the factor of |det𝜎′(x)| must be included when making a substitution y=f(x). We finish up by illustrating this point with two important examples.2.3.1. Polar coordinatesDefine 𝜎:R2→R2 by𝜎(r,𝜃)=(rcos𝜃,rsin𝜃)where 𝜎1=rcos𝜃, 𝜎2=rsin𝜃, and y=(𝛥r,𝛥𝜃) based on the terminology introduced above. ThenM(𝜎′(r,𝜃))=(a
cos𝜃
-rsin𝜃
sin𝜃
rcos𝜃
)The determinant of the matrix above equals r, thus explaining why a factor of r is needed when computing an integral in polar coordinates.For example, the extra factor of r is added when converting to the polar coordinates the integration of f over a disk in R2 :1∫-11-x2∫-1-x2f(x,y)dydx=2𝜋∫01∫0f(rcos𝜃,rsin𝜃)rdrd𝜃2.3.2. Spherical coordinatesDefine 𝜎:R3→R3 by𝜎(𝜌,𝜑,𝜃)=(𝜌sin𝜑cos𝜃,𝜌sin𝜑sin𝜃,𝜌cos𝜑),For this choice of 𝜎, the matrix of partial derivatives isM(𝜎′(𝜌,𝜑,𝜃))=(a
sin𝜑cos𝜃
𝜌cos𝜑cos𝜃
-𝜌sin𝜑sin𝜃
sin𝜑sin𝜃
𝜌cos𝜑sin𝜃
𝜌sin𝜑cos𝜃
cos𝜑
-𝜌sin𝜑
0
).The determinant of the matrix above equals 𝜌2sin𝜑, thus explaining why a factor of 𝜌2sin𝜑 is needed when computing an integral in spherical coordinates.3. Where to go from here?This ends my notes on preliminary linear algebra. Anything beyond what's been showing in this series of notes can be termed as "advanced" aspects of linear algebra. For people who are more interested in applied mathematics, Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach" could be your next step. People love the book because it discusses interesting physics problem (e.g., Maxwell's equations) by setting up tools from linear algebra. For self-learners, the book is also a good choice as it is self-contained and is accompanied with a solution manual. If one demans broad scope of applications of linear algebra, then Brunton's "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" will likely satisfy one's curiosity. The author is not a pure-math person, so any recommendation from me is not justified for the book of grad-level linear algebra. But the Springer graduate textbook seems to be a decent one for smart minds seeking to understand elements introduced here in greater detail.Like what is preluded in my first note on linear algebra, this series serves as a solid prequel to my journey into quantum computing and quantum information theory. My current life has nothing to do with these two subjects but the impetus of nudging myself to the frontier of humanity sustains over the years. Needless to say, I believe(maybe religeously) that many problems, whether they are scientifically challenging or socially controversial, can be solved by running qunatum algorithm on quantum computers. When such achievements happen, I want to be there among people who dream big and move quick. Sooner or later, I will write new notes again with interesting stories to tell about quantum algorithm and quantum information theory. So please be patient my dear readers and thank you for following along with me so far.