This is the last note on preliminary linear algebra. For people who have been following along they might notice that much of the "nomal" stuff about matrices are not stressed or even mentioned so far in this series. To finish strong, we will focus on determinant and trace of matrices in our last note. This first-operator-then-matrix approach is what Axler's book is best known for. To the author, such a method helps to cultivate your intuition about linear maps and operators. If you can prove things without referring to matrices, you are doing the "hard part" of the work to understand linear algebra. Once you survive the "hard" work, manipulation of matrices might be more transparent. Like previous notes, here we still assume vector spaces V is nonzero and finite-dimensional over the field F=C.1. Trace 1.1. Change of Basis 1.2. Trace: A connection between matrices and operators 2. Determinant 2.1. Determinant of an operator 2.2. Determinant of a matrix 2.3. Volume 2.3.1. Polar coordinates 2.3.2. Spherical coordinates 3. Where to go from here? 1. TraceAs a starter, we need to revisit the way matrix is define. For any operator TL(V,W), we need to one basis of V and another basis of W to define the matrix that represents T. Let v1,...,vn and w1,...,wm be the two bases of V and W, respectively. Then the matrix of T with respect to the two bases is M(T,(v1,...,vn),(w1,..,wm)), which has a shape of m-by-n. We will make it explicit the information of bases in the definition of matrices as it helps us to prove some results in the subsection below. 1.1. Change of BasisSuppose v1,...,vn is a basis of V, and the identity operator IL(V) satisfies Ivi=vi. So M(I,(v1,...,vn),(v1,..,vn)) is a n-by-n matrix with its diagonal entries all being unity. Now, let u1,...,un be another basis of V, M(I,(v1,...,vn),(u1,...,un)) is a matrix that changes basis of every vector vV from v1,...,vn to u1,...,un. Change of basis has extensive use in physics as it usually is a way to simplify the problem. Here we are interested in using M(I,(v1,...,vn),(u1,...,un)) to relate trace of operators to trace of corresponding matrices. For an operator TL(V), we have discussed its matrix M(T,(v1,...,vn)), i.e., we use only one basis to represent the matrix. If we use two different bases to represent matrices of operators, we can then rewrite the rule of matrix multiplication as the following
Lemma1Suppose u1,,un and v1,,vn and w1,,wn are all bases of V. Suppose S,TL(V). ThenM(ST,(u1,,un),(w1,,wn))=M(S,(v1,,vn),(w1,,wn))M(T,(u1,,un),(v1,,vn)).
Applying ST to any vector vV, the right hand side of eqn.(1) changes the basis of v first from {ui} to {vi}, and then from {vi} to {wi}. With Lemma 1, we can also prove that M(I,(v1,...,vn),(u1,...,un)) is invertible.
Lemma2Suppose u1,,un and v1,,vn are bases of V. Then the matrices M(I,(u1,,un),(v1,,vn)) and M(I,(v1,,vn),(u1,,un)) are invertible, and each is the inverse of the other.
Proof: We replace wi with ui and S,T with I in eqn.(1) to giveI=M(I,(u1,,un))=M(I,(v1,,vn),(u1,,un))M(I,(u1,,un),(v1,,vn))=ABInterchaging vi with ui in eqn.(2) givesI=M(I,(v1,,vn))=M(I,(u1,,un),(v1,,vn))M(I,(v1,,vn),(u1,,un))=BAi.e., I=AB=BA, and M(I,(u1,,un),(v1,,vn)) is the inverse of M(I,(v1,,vn),(u1,,un)). Let A=M(I,(v1,,vn),(u1,,un)), the proof above indicates that A-1=M(I,(u1,,un),(v1,,vn)). The following formulae shows how a matrix of operator changes when we change bases of the vector space.
Lemma3Suppose TL(V). Let u1,,un and v1,,vn be bases of V. Let A=M(I,(u1,,un),(v1,,vn)). ThenM(T,(u1,,un))=A-1M(T,(v1,,vn))A
Proof: Based on Lemma 1, if we replace wj with uj and replace S with I, we getM(T,(u1,,un))=A-1M(T,(u1,,un),(v1,,vn))Use Lemma 1 again, but replace wj with vj this time. Also replace T with I and replace S with T, gettingM(T,(u1,,un),(v1,,vn))=M(T,(v1,,vn))ASubstituting (6) into (5) gives desired result.1.2. Trace: A connection between matrices and operatorsWe will first give the definitions of traces of operators and matrices, separately. While their definitions look different, we will prove that they are essentially the same thing. Also, the trace of a matrix is invariant under the change of basis.
Definition1Suppose TL(V)- If F=C, then the trace of T is the sum of the eigenvalues of T, with each eigenvalue repeated according to its multiplicity.- If F=R, then the trace of T is the sum of the eigenvalues of TC, with each eigenvalue repeated according to its multiplicity.The trace of T is denoted by trace T.
The TC above is the complexification of operator T acting on a real vector space. For the sake of completeness, we define TC and complexification of real vector space as the following:
Definition2Suppose V is a real vector space.- The complexification of V, denoted VC, equals V×V. An element of VC is an ordered pair (u,v), where u,vV, but we will write this as u+iv.- Addition on VC is defined by(u1+iv1)+(u2+iv2)=(u1+u2)+i(v1+v2)for u1,v1,u2,v2V.- Complex scalar multiplication on VC is defined by(a+bi)(u+iv)=(au-bv)+i(av+bu)for a,bR and u,vV.
Definition3Suppose V is a real vector space and TL(V). The complexification of T, denoted TC, is the operator TCL(VC) defined byTC(u+iv)=Tu+iTvfor u,vV.
The matrix of TC equals the matrix of T. When 𝜆 is a real eigenvalue of T, then it is also an eigenvalue of TC. Suppose V is a real vector space, TL(V), and 𝜆C. Then 𝜆 is an eigenvalue of TC if and only if 𝜆¯ is an eigenvalue of TC.Recall that characteristic polynomial of an operator is (z-𝜆1)d1(z-𝜆2)d2(z-𝜆m)dm where 𝜆i and di are eigenvalues and their corresponding multiplicities (dimensions of generalized eigenspaces). We expand such a polynomial to get ,zn-(𝜆1++𝜆1d1++𝜆m++𝜆mdm)zn-1++(-1)n(𝜆1d1𝜆mdm) which results in the following lemma:
Lemma4Suppose TL(V). Let n=dimV. Then trace T equals the negative of the coefficient of zn-1 in the characteristic polynomial of T.
For the trace of a matrix, we define it as the following
Definition4The trace of a square matrix A, denoted trace A, is defined to be the sum of the diagonal entries of A.
With Definition 4, we can easily show that
Lemma5If A and B are square matrices of the same size, thentrace(AB)=trace(BA).
Proof: Let A and B be two n-by-n matrices, then the jth diagonal element of AB is k=1nAj,kBk,j. So the trace of the product is
trace(AB)=j=1nk=1nAj,kBk,j=k=1nj=1nBk,jAj,k=trace(BA)To prove that trace of an operator equals trace of its matrix, we need to first show that the trace of matrix does not depend on the choice of basis, i.e.,
Lemma6Let TL(V). Suppose u1,,un and v1,,vn are bases of V. ThentraceM(T,(u1,,un))=traceM(T,(v1,,vn))
Proof: From Lemma 3, M(T,(u1,,un))=A-1M(T,(v1,,vn))A with A=M(I,(u1,,un),(v1,,vn)), so
traceA-1M(T,(v1,,vn))A=trace(AB)=trace(BA)traceM(T,(v1,,vn))AA-1=traceM(T,(v1,,vn))which gives traceM(T,(u1,,un))=traceM(T,(v1,,vn)) as desired.We are now ready to prove
Theorem1Suppose TL(V). Then traceT=traceM(T).
Proof: Because trace of matrix is invariant under the choice of basis (Lemma 6), we only need to show traceT equals to traceM(T) for some bases. Recall that operators on complex vector spaces have their block diagonal matrices with respect to bases of generalized eigenspaces. The block diagonal matrix has eigenvalues on its diagonal, resulting in traceT=traceM(T). If V is a real vector space, then applying the complex case to the complexification TC gives the desired result.
Based on Theorem 1 we can show that trace is additive
Lemma7Suppose S,TL(V). Then trace (S+T)= trace S+ trace T.
Proof:
trace(S+T)=traceM(S+T)=trace(M(S)+M(T))=traceM(S)+traceM(T)=traceS+traceTwhere the third equality comes from the definition of trace. The following result has found its extensive use in quantum theory, especially its infinite-dimensional generalization.
Lemma8There do not exist operators S,TL(V) such that ST-TS=I.
Proof: From Lemma 5, we have
trace(ST-TS)=trace(ST)-trace(TS)=traceM(ST)-traceM(TS)=trace(M(S)M(T))-trace(M(T)M(S))=0.2. Determinant2.1. Determinant of an operatorThe definition of determinant below mimics the definition of trace. The only difference is that the sum of eigenvalues is replaced with the product.
Definition5Suppose TL(V).- If F=C, then the determinant of T is the product of the eigenvalues of T, with each eigenvalue repeated according to its multiplicity.- If F=R, then the determinant of T is the product of the eigenvalues of TC, with each eigenvalue repeated according to its multiplicity.The determinant of T is denoted by detT.
If 𝜆1,,𝜆m are the distinct eigenvalues of T (or of TC if V is a real vector space) with multiplicities d1,,dm, then the definition above implies
detT=𝜆1d1𝜆mdm.From the expansion of characteristic polynomial at (7), we have
Lemma9Suppose TL(V). Then the characteristic polynomial of T can be written aszn-(traceT)zn-1++(-1)n(detT)
We now show the following lemma can be proved easily using Definition 5.
Lemma10An operator on V is invertible if and only if its determinant is nonzero.
Proof: Suppose V is a complex vector space. An operator TL(V) is invertible if and only if 0 is not its eigenvalue, resulting in nonzero determinant. If V is a real vector space, we have again T is invertible if and only if 0 is not an eigenvalue of T, which happens if and only if 0 is not an eigenvalue of TC. Thus, we have detT0
Definition of determinant also gives a new way to define the characteristic polynomial of operators as shown below.
Lemma11Suppose TL(V). Then the characteristic polynomial of T equals det(zI-T).
Proof: First we notice that
-(T-𝜆I)=zI-T-(z-𝜆)ISo 𝜆 is an eigenvalue of T indicates that (z-𝜆) is an eigenvalue of zI-T. The equality also implies that dimnull(T-𝜆I)dimV=dimnull(zI-T-(z-𝜆)I)dimVi.e., the multiplicity of 𝜆 equals the multiplicity of z-𝜆. Let 𝜆1,...,𝜆m be the eigenvalues of T, and d1,...,dm corresponding multiplicities. det(zI-T)=(z-𝜆1)d1(z-𝜆m)dm, where the right hand side is exactly the characteristic polynomial of T. If V is real vector space, we apply the complex case to TC to give the same result.Finally, as a special kind of operator, we have the following result regarding the determinant of isometries.
Lemma12Suppose V is an inner product space and SL(V) is an isometry. Then |detS|=1
This result should be obvious on complex vector spaces as isometries have absolute values of their eigenvalues being unity. By definition of the determinant on real vector spaces, we have detS=detSC and thus |detS|=1, completing the proof.2.2. Determinant of a matrix To define determinant of an arbitrary matrix, we first introduce the concept of permutation and its sign.
Definition6- A permutation of (1,,n) is a list (m1,,mn) that contains each of the numbers 1,,n exactly once.- The set of all permutations of (1,,n) is denoted perm n.
Definition7- The sign of a permutation (m1,,mn) is defined to be 1 if the number of pairs of integers (j,k) with 1j<kn such that j appears after k in the list (m1,,mn) is even and -1 if the number of such pairs is odd.- In other words, the sign of a permutation equals 1 if the natural order has been changed an even number of times and equals -1 if the natural order has been changed an odd number of times.
With Definition 6 and Definition 7, it's possible to prove that
Lemma13Interchanging two entries in a permutation multiplies the sign of the permutation by -1.
The determinant of a matrix is now defined as
Definition8Suppose A is an n-by- n matrixA=(A1,1A1,nAn,1An,n)The determinant of A, denoted detA, is defined bydetA=(m1,,mn)permn(sign(m1,,mn))Am1,1Amn,n
Notice that the product Am1,1Amn,n in the definition above has its second subscripts ranging from 1 to n, i.e., is a product of n entries. Also, permn=n!. Based on Definition 7 we can prove that
Lemma14Suppose A is a square matrix and B is the matrix obtained from A by interchanging two columns. ThendetA=-detB.
Proof: From Eqn.(8), all the products Am1,1Amn,n remain the same after interchanging two columns, but two numbers interchange their places in the list of m1,...,mn. Thus, each term in the sum of (8) is multiplied by -1 due to Lemma 13, resulting detA=-detB.If the interchanged two columns are identical, then
Lemma15If A is a square matrix that has two equal columns, then detA=0.
Proof: From the proof of Lemma 14, we have detA=-detA, indicating detA=0.The following lemma shows what would happen if we permutate multiple columns of a matrix. Note here that A,i is the ith column of the matrix A.
Lemma16Suppose A=(A,1A,n) is an n-by-n matrix and (m1,,mn) is a permutation. Thendet(A,m1A,mn)=(sign(m1,,mn))detA
Proof: converting A to (A,m1A,mn) can be accomplished by a series of two-column interchange. In each step, we need to multiply detA by -1. Because sign(m1,,mn) is 1 for even number of interchanging and -1 for odd steps, eqn.(9) must be true.
Unlike trace of matrix, determinant of matrix is multiplicative.
Lemma17Suppose A and B are square matrices of the same size. Thendet(AB)=det(BA)=(detA)(detB).
But just like trace of a given matrix, the determinat does not depend on bases either, i.e.,
Lemma18Let TL(V). Suppose u1,,un and v1,,vn are bases of V. ThendetM(T,(u1,,un))=detM(T,(v1,,vn))
which can be proved through the same steps used for proving Lemma 6. Now that we have Lemma 18, we repeat the steps in the proof of Theorem 1 to prove
Theorem2 detT=detM(T) for TL(V).
2.3. VolumeDeterminant has one important application in undergrad-level mathematics, i.e., in computing volumes and integrals. In this section we first lay out critical tools and then investigate volumes and and integration through linear algebra. Recall that if V is an inner product space and TL(V), then TT is a positive operator and hence has a unique positive square root,TT. By the polar decomposition, there is an isometry SL(V) such that T=STT thus|detT|=|detS|detTT because eigenvalues of the square root are nonnegative. Eqn.(10) and Lemma 12 lead to the following
Lemma19Suppose V is an inner product space and TL(V). Then|detT|=detT*T.
We now study the relation between determinant and volume by calculating volumes of arbitrary subsets of real vector spaces. The volume of arbitrary subset could take any shape geometrically, which make the calculation hard. We will relieve this issue by dissecting the volume into smaller "boxes" whose volume is well defined and easy to calculate. A box is defined as the following
Definition9A box in Rn is a set of the form{(y1,,yn)Rn:xj<yj<xj+rj for j=1,,n},where r1,,rn are positive numbers and (x1,,xn)Rn. The numbers r1,,rn are called the side lengths of the box.
and its volume is defined as
Definition10The volume of a box B in Rn with side lengths r1,,rn is defined to be r1rn and is denoted by volume B.
For an arbitray volume 𝛺, we can write it as a subset of a union of many small boxes, as inidicated in the definition below:
Definition11Suppose ΩRn. Then the volume of Ω, denoted volume Ω, is defined to be the infimum of volume B1+ volume B2+where the infimum is taken over all sequences B1,B2, of boxes in Rn whose union contains Ω.
Since we seek a relation between volume 𝛺 and an acting operator T, we use the notation of T(𝛺) to represent the set of {Tx:x𝛺}. Let T first be an positive operator, then volume 𝛺 and volume T(𝛺) are related through the lemma below:
Theorem3Suppose TL(Rn) is a positive operator and ΩRn. Then volume T(Ω)=(detT)( volume Ω)
Proof: Consider an arbitrary positive operator TL(Rn). By the Real Spectral Theorem, there exist an orthonormal basis e1,,en of Rn and nonnegative numbers 𝜆1,,𝜆n such that Tej=𝜆jej for j=1,,n. That is, T stretches the jth  basis vector in an orthonormal basis by a factor of 𝜆j. Because volume behaves the same with respect to each orthonormal basis, we should have volumeT(Ω) equal to volume Ω multiplied by a factor of 𝜆1𝜆n=detT.
For T being isometry, we have instead
Lemma20Suppose TL(Rn) is an isometry and ΩRn. Thenvolume T(Ω)= volume Ω.
Now we can prove that for arbitrary operator T(Rn) changes volume by a factor of |detT|:
Theorem4Suppose TL(Rn) and ΩRn. Thenvolume T(Ω)=|detT|( volume Ω).
Proof: By the Polar Decomposition, there is an isometry SL(V) such that
T=STT.If ΩRn, then T(Ω)=S(TT(Ω)). Thus volume T(Ω)= volume S(TT(Ω))= volume TT(Ω)=(detTT)( volume Ω)=|detT|( volume Ω)We finish this section by discussing how determinant appears in multivariable integration. The notions of differentiable and derivative are giving as the following:
Definition12Suppose Ω is an open subset of Rn and 𝜎 is a function from Ω to Rn. For xΩ, the function 𝜎 is called differentiable at x if there exists an operator TL(Rn) such thatlimy0𝜎(x+y)-𝜎(x)-Ty||y||=0.If 𝜎 is differentiable at x, then the unique operator TL(Rn) satisfying the equation above is called the derivative of 𝜎 at x and is denoted by 𝜎(x).
If the notation of 𝜎(x) is used in Definition 12, then eqn.(11) implies that the derivative at xRn satisfying
𝜎(x+y)𝜎(x)+(𝜎(x))(y)where yRn as well. In some textbook, 𝛥x is used in the place of y. Definition 12 also implies the format of M(𝜎(x)) as we shall see now. Suppose Ω is an open subset of Rn and 𝜎 is a function from Ω to Rn. We can write𝜎(x)=(𝜎1(x),,𝜎n(x))where 𝜎j is a functin from 𝛺 to R. Let y^i=(0,...,1,...,0) then we can write y asy=y1y^1++yny^n.Because 𝜎(x)L(Rn), we must have based on eqn.(12) that(𝜎(x))(yiy^i)=yi(𝜎(x))(y^i)(𝜎1(x+yiy^i)-𝜎1(x),,𝜎n(x+yiy^i)-𝜎n(x))(𝜎(x))(y^i)(𝜎1(x+yiy^i)-𝜎1(x)yi,,𝜎n(x+yiy^i)-𝜎n(x)yi)At the limit of yi0 we can write Di𝜎j=𝜎j(x+yiy^i)-𝜎j(x)yi as the partial derivative of 𝜎j with respect to the ith coordinate, then eqn.(13) gives M(𝜎(x)) as the followingM(𝜎(x))=y^1y^n(𝜎1(x))(y^i)(𝜎n(x))(y^i)(D1𝜎1(x)Dn𝜎1(x)D1𝜎n(x)Dn𝜎n(x))where expressions outside of the matrix emphasize the operator maps y^i to (𝜎(x))(y^i) just like what discussed in our nots on matrix presentation notes. Now we can state the change of variables integration formula. Some additional mild hypotheses are needed for f and 𝜎 (such as continuity or measurability), so we will not worry about the rigorous proof of the following theorem.
Theorem5Suppose Ω is an open subset of Rn and 𝜎:ΩRn is differentiable at every point of Ω. If f is a real-valued function defined on 𝜎(Ω), then𝜎(Ω)f(y)dy=Ωf(𝜎(x))|det𝜎(x)|dx
Theorem 5 is called a change of variables formula because you can think of y=𝜎(x) as a change of variables. Also the theorem above should not be a surprise to the reader as we have already show in Theorem 4 that volume T(Ω)=|detT|( volume Ω).The key point when making a change of variables is that the factor of |det𝜎(x)| must be included when making a substitution y=f(x). We finish up by illustrating this point with two important examples.2.3.1. Polar coordinatesDefine 𝜎:R2R2 by𝜎(r,𝜃)=(rcos𝜃,rsin𝜃)where 𝜎1=rcos𝜃, 𝜎2=rsin𝜃, and y=(𝛥r,𝛥𝜃) based on the terminology introduced above. ThenM(𝜎(r,𝜃))=(cos𝜃-rsin𝜃sin𝜃rcos𝜃)The determinant of the matrix above equals r, thus explaining why a factor of r is needed when computing an integral in polar coordinates. For example, the extra factor of r is added when converting to the polar coordinates the integration of f over a disk in R2 :-11-1-x21-x2f(x,y)dydx=02𝜋01f(rcos𝜃,rsin𝜃)rdrd𝜃 2.3.2. Spherical coordinatesDefine 𝜎:R3R3 by𝜎(𝜌,𝜑,𝜃)=(𝜌sin𝜑cos𝜃,𝜌sin𝜑sin𝜃,𝜌cos𝜑),For this choice of 𝜎, the matrix of partial derivatives isM(𝜎(𝜌,𝜑,𝜃))=(sin𝜑cos𝜃𝜌cos𝜑cos𝜃-𝜌sin𝜑sin𝜃sin𝜑sin𝜃𝜌cos𝜑sin𝜃𝜌sin𝜑cos𝜃cos𝜑-𝜌sin𝜑0).The determinant of the matrix above equals 𝜌2sin𝜑, thus explaining why a factor of 𝜌2sin𝜑 is needed when computing an integral in spherical coordinates.3. Where to go from here?This ends my notes on preliminary linear algebra. Anything beyond what's been showing in this series of notes can be termed as "advanced" aspects of linear algebra. For people who are more interested in applied mathematics, Hubbard's "Vector Calculus, Linear Algebra, and Differential Forms: A Unified Approach" could be your next step. People love the book because it discusses interesting physics problem (e.g., Maxwell's equations) by setting up tools from linear algebra. For self-learners, the book is also a good choice as it is self-contained and is accompanied with a solution manual. If one demans broad scope of applications of linear algebra, then Brunton's "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" will likely satisfy one's curiosity. The author is not a pure-math person, so any recommendation from me is not justified for the book of grad-level linear algebra. But the Springer graduate textbook seems to be a decent one for smart minds seeking to understand elements introduced here in greater detail.Like what is preluded in my first note on linear algebra, this series serves as a solid prequel to my journey into quantum computing and quantum information theory. My current life has nothing to do with these two subjects but the impetus of nudging myself to the frontier of humanity sustains over the years. Needless to say, I believe(maybe religeously) that many problems, whether they are scientifically challenging or socially controversial, can be solved by running qunatum algorithm on quantum computers. When such achievements happen, I want to be there among people who dream big and move quick. Sooner or later, I will write new notes again with interesting stories to tell about quantum algorithm and quantum information theory. So please be patient my dear readers and thank you for following along with me so far.