aInner product space is vector space equipped with definition of inner product. In many literature, inner product and dot product are used interchangeably while mathematicians argue the difference appears when applying them on complex and real vector spaces. This note serves the purpose of introducing inner product spaces themselves, and the next note will discuss in more details the operators on inner product spaces.1. Inner Products and Norms1.1. Three important theorems2. Orthonormal Bases2.1. Riesz Representation Theorem3. Orthogonal Complements3.1. Minimization problem1. Inner Products and NormsTo make our definition of inner product more rigorous, let us first define the concept of dot product. Note here the dot product is defined in real vector spaces Rn.
Definition1For x,y∈Rn, the dot product of the two is difined asx⋅y=n∑i=1x1yiwhere x=(x1,...,xn) and y=(y1,...,yn).
With Definition 1, if we let x=y, then x⋅x=||x||2=∑x2i. ||x|| is called norm of x, and it represents the Euclidean length of vector x that has its initial point at the origin. An inner product is a generalization of the dot product to complex vector spaces. For z∈Cn, its norm is defined as||z||=n∑i=1zi⏨ziwhere ⏨zi is the conjugate of complex scalar zi. The norm definition above suggest that the inner product of w∈Cn with v∈Cn should be equal tow1⏨v1+⋯+wn⏨vn.On the other hand, the inner product of v with w should then bev1⏨w1+⋯vn⏨wn,the complex conjugate to the inner product of w with v. To satisfy these constraints we give the following definition for the inner product on both real and complex vector spaces.
Definition2An inner product on V takes each ordered pair (u,v) of elements of V to a number ⟨u,v⟩∈F.
It's worth noting here that the use of ⟨u,v⟩ is pervasive in the studies of quantum mechanics. Some literature prefer (u,v) for denoting the inner product, but we will stick to the former as it looks a lot like the Dirac's notation. For 𝜆∈F, and u,v∈V, we have the following properties of the inner product hold:
a) definiteness: ⟨v,v⟩=0 if and only if v=0.b) positivity: ⟨v,v⟩≥0 for all v∈V.c) additivity in first slot: ⟨u+v,w⟩=⟨u,w⟩+⟨v,w⟩ for all u,v,w∈V.d) homogeneity in first slot: ⟨𝜆u,v⟩=𝜆⟨u,v⟩ for all 𝜆∈F and all u,v∈V.e) conjugate symmetry: ⟨u,v⟩=⏨⏨⏨⟨v,u⟩ for all u,v∈V.(b) ⟨0,u⟩=0 for every u∈V.f) ⟨u,0⟩=0 for every u∈V.g) ⟨u,v+w⟩=⟨u,v⟩+⟨u,w⟩ for all u,v,w∈V.h) ⟨u,𝜆v⟩=⏨𝜆⟨u,v⟩ for all 𝜆∈F and u,v∈V.
The definition of inner product results in two other definitions: norms and orthogonality. Eqn.(1) already gives a definition of norm by writting out a sum of products of complex vector and its conjugate. The same definition can be re-written in a more concise form as
Definition3The norm of v∈V is defined as ||v||=⟨v,v⟩.
For v,u∈V, we can define the orthogonality between the two as the following
Definition4Two vectors v,u∈V are called orthogonal if ⟨v,u⟩=0.
The concise definitions above should make the importance of inner product self-evident. In the next subsection we can show how important theorems can be proved by inner products of vectors easily.1.1. Three important theoremsWe will introduct in this section three important theorems: triangle inequality, Cauchy–Schwarz inequality, and parallelogram equality. We will see the power of using inner products in the proofs. As a starter, let us first state the triangle inequality in terms of norms of vectors
Theorem1For v,u∈V, we have ||v+u||≤||u||+||v||
Proof: ||v+u||2=⟨v+u,v+u⟩=||v||2+||u||2+⟨v,u⟩+⟨u,v⟩. Using the conjugate symmetry of inner product gives ||v+u||2=||v||2+||u||2+2Re(⟨v,u⟩)≤||v||2+||u||2+2||u||||v||=(||u||+||v||)2. Taking square root at both sides gives desired result.We next prove the parallelogram equality, which usually makes its appearance in basic geometry textbooks.
We end this section by proving arguably the most important theorem for inner product vector spaces: Cauchy-Schwatz inequality. To do so, we need to first introduce an orthogonal decomposition of vector in inner product spaces:
Theorem3Suppose u,v∈V, with v≠0. Set c=⟨u,v⟩
||v||2 and w=u-⟨u,v⟩
||v||2v. Then ⟨w,v⟩=0 and u=cv+w.
Theorem 3 should make sense to the reader as ⟨u,v⟩
||v|| is the projection of u at the direction of v. Now we are good to prove the Cauchy-Schwartz inequality
Theorem4Suppose u,v∈V. Then|⟨u,v⟩|≤||u||⋅||v||.This inequality is an equality if and only if one of u,v is a scalar multiple of the other.
||v||2v+w with ⟨w,v⟩=0. Taking the square of both sides gives ||u||2=|⟨u,v⟩|2
||v||2+||w||2 and ||u||2||v||2=|⟨u,v⟩|2+||v||2||w||2. Because ||v||2||w||2≥0, we have |⟨u,v⟩|≤||u||⋅||v||Also, |⟨u,v⟩|=||u||⋅||v|| means ||u||2=|⟨u,v⟩|2
||v||2, i.e., ||w||=0. Therefore the equality is satisfied if and only if one of u,v is scalar multiple of the other.2. Orthonormal BasesWe have discussed in the note of bases and dimensions that a basis of vector space is a list of linearly independent vectors. Now that we have the definition of orthogonality in our pocket, we can take a step further to demand basis vectors to be mutually orthogonal with their norms being 1. A basis satisfies these constraints is called orthonormal bases, a most used basis in quantum-mechanical calculations. The reason for choosing orthonormal bases lies in the following proposition
Proposition1Every finite-dimensional inner product space has an orthonormal basis.
In this section, we will first give the definition of orthonormal bases and discuss how we can obtain them from a basis of linearly independent vectors through Gram–Schmidt procedure. But first, let us make the definition of orthonormal basis formal as the following:
Definition5A basis of V, e1,e2,...,en is called orthonormal if <ej,ek>=a
1
if j=k
0
if j≠k
It turns out that arbitrary vector v∈V can be expressed in terms of orthonormal basis as the followingv=n∑i=1<v,ei>eiwhere n=dimV. To see this, suppose v=a1e1+⋯+anen, and we have ai=⟨v,ei⟩ due to Definition 5. As one might expect, the norm of v is then||v||=n∑i=1<v,ei>2.To prove this, let ai=<v,ei> and v=a1e1+a2e2+⋯+anen. So ||v||2=⟨v,v⟩=a1⏨a1+a2⏨a2+⋯+an⏨an due to Definition 5 and property (h) of inner product.It is now obvious that orthonormal basis, empowered by inner product, is a natural choice for inner product spaces. Furthermore, if we a list of linear independent vectors, we can convert them into a list of orthonormal vectors using the Gram-Schmidt procedure:
Theorem5Suppose v1,…,vm is a linearly independent list of vectors in V. Let e1=v1/‖‖v1‖‖. For j=2,…,m, define ej inductively byej=vj-<vj,e1>e1-⋯-<vj,ej-1>ej-1
‖‖vj-<vj,e1>e1-⋯-<vj,ej-1>ej-1‖‖.Then e1,…,em is an orthonormal list of vectors in V such thatspan(v1,…,vj)=span(e1,…,ej)for j=1,…,m.
One of common wishful thinking is that matrix with respect to orthornomal basis is diagonal. This is usually not the case. But the next theorem shows that if an operator has an upper-triangular matrix with respect to some basis, it also has an upper-triangular matrix with respect to orthonormal basis. Note that the following theorem holds for both real and complex vector spaces.
Theorem6Suppose T∈L(V). If T has an upper-triangular matrix with respect to some basis of V, then T has an upper-triangular matrix with respect to some orthonormal basis of V.
Proof: Let v1,...vn be a basis of V, from which we can get an orthonormal basis e1,...,en by using Gram-Schmit procedure in Theorem 5. Because span(v1,...vn)=span(e1,..,en), we can conclude that span(e1,...,en) is invariant under T. From the condition c) for having an upper-triangular matrix, T must have a upper-triangular matrix with respect to the orthonormal basis.Theorem 6 has many applications, one of which is helping proving the Shur's theorem:
Theorem7Suppose V is a finite-dimensional complex vector space and T∈L(V). Then T has an upper-triangular matrix with respect to some orthonormal basis of V.
Proof: recall the theorem on the existence of upper-triangular matrix in the previous notes, then we can apply Theorem 6 here to prove Theorem 7.2.1. Riesz Representation TheoremTheorem 6 and Theorem 7 give what operator matrices might look like with respect to orthonormal basis, and we will discuss more about operators on inner product space in our next note. For now, let us confine our scope to just discuss one special kind of operators, i.e., linear functional. We have briefly discussed it in our note on the duality of vector space, and its formal definition is the following
Definition6A linear functional on V is a linear map from V to F. In other words, a linear functional is an element of the dual space of V, L(V,F).
For v,u∈V, a map from v to ⟨v,u⟩ is a linear functional. The following theorem (Ries representation theorem) shows that every linear functional on inner product space is essentially a inner product.
Theorem8Suppose V is finite-dimensional and 𝜑 is a linear functional on V. Then there is a unique vector u∈V such that𝜑(v)=⟨v,u⟩for every v∈V.
Proof: We include the proof here because it shows some tricks that will be used in our quantum computing /information theory notes repeatedly. Suppose e1,e2,...,en is an orthonormal basis of V, we havev=⟨v,e1⟩e1+⋯+⟨v,en⟩enanda
𝜑(v)
=⟨v,e1⟩𝜑(e1)+⋯+⟨v,en⟩𝜑(en)
=⟨v,⏨⏨⏨𝜑(e1)e1⟩+⋯+⟨v,⏨⏨⏨𝜑(en)en⟩
=⟨v,⏨⏨⏨𝜑(e1)e1+⋯+⏨⏨⏨𝜑(en)en⟩
where the first equivalence is from the property of linear functional, and the second one is due to the property h) of the inner product. Because ⏨⏨⏨𝜑(ei) are scalars, we recover the form of 𝜑(v)=⟨v,u⟩ if we let u=⏨⏨⏨𝜑(e1)e1+⋯+⏨⏨⏨𝜑(en)enTo prove the uniqueness of u, suppose we have two vectors, u1 and u2, that map every v∈V to scalar through inner product. Then, we have𝜑(v)=⟨v,u1⟩=⟨v,u2⟩So 0=⟨v,u1⟩-⟨v,u2⟩=⟨v,u1-u2⟩ for every v∈V, which indicates that u1=u2. □The u in Theorem 8 is also called as Riesz representation of the linear fucntional 𝜑. More importantly, Eqn.(3) gives a way to calculate the Riesz representation. While Eqn.(3) seems to be dependent on the choice of orthonormal basis, we emphasize here that the Riesz representation is only determined by its associated linear functional, and the right hand side of the equation remains the same regardless of the choice of {ei}.3. Orthogonal ComplementsAs an extention to Definition 4, we can define the concept of orthogonal complement for subspaces of an inner product space.
Definition7If U is a subset of V, then the orthogonal complement of U, denoted U⟂, is the set of all vectors in V that are orthogonal to every vector in U :U⟂={v∈V:⟨v,u⟩=0 for every u∈U}
With Definition 7, the following propositions and theorems hold as expected:
Theorem9Suppose V is finite-dimensional and U is a subspace of V. ThendimU⟂=dimV-dimU.V=U⊕U⟂U=(U⟂)⟂
The definition of orthogonal complement gives the definition of othogonal projection operator, which projects vector to a specific subspace.
Definition8Suppose U is a finite-dimensional subspace of V. The orthogonal projection of V onto U is the operator PU∈L(V) defined as follows: For v∈V, write v=u+w, where u∈U and w∈U⟂. Then PUv=u.
As an example, suppose v∈V and U=span(v), then for u∈V, PUu=⟨u,v⟩
||v||2v. Note here that ⟨u,v⟩
||v||2v and u-⟨u,v⟩
||v||2v are the orthogonal decomposition of v, and ⟨u,v⟩
||v||2v is a scalar multiple of v. Thus, u-⟨u,v⟩
||v||2v∈U⟂.Before we discuss minimization problem using orthogonal projection, let us simply list its properties:
Suppose U is a finite-dimensional subspace of V and v∈V. Then(a) PU∈L(V);(b) PUu=u for every u∈U;(c) PUw=0 for every w∈U⟂;(d) range PU=U;(e) null PU=U⟂;(f) v-PUv∈U⟂;(g) P2U=PU;(h) ‖‖PUv‖‖≤||v||;(i) for every orthonormal basis e1,…,em of U,PUv=<v,e1>e1+⋯+<v,em>em.
3.1. Minimization problemGiven a subspace U⊂V, and a vector v∈V, a natural question would be: how can one find u∈U such that the norm ||v-u|| is minimized. The following proposition gives the answer to the question:
Proposition2Suppose U⊂V is a finite-dimensional subspace of V. For v∈V and u∈U, we have||v-u||≥||PUv-u||where the equality is satisfied if and only if u=PUv.
As an example of using Proposition 2(took from Axler's book), let us find a real function u(x) such that the value of 𝜋∫-𝜋|sinx-u(x)|2dxis minimized. On the vector space of real fucntions defined on [-𝜋,𝜋], we can define its associated inner product as⟨f(x),g(x)⟩=𝜋∫-𝜋f(x)g(x)dxIf we let v(x)=sinx, then Eqn.(4) is just ⟨v(x)-u(x),v(x)-u(x)⟩=||v-u||2. With this observation, the original problem is then finding a u(x) to minimize ||v-u||. Accoding to Proposition 2, such a function must be PUv. Since there is no limitation of how to choose our U, we can let U be the space of polynomials with real coefficient and degree at most 5(i.e. P5(R)). A basis of P5(R) is 1,x,x2,x3,x4,x5. We make orthonormal through the Gram-Schwartz procedure. Next, we can use Eqn.(5) and the property (i) of the orthogonal projection to calculate u(x) asu(x)=0.987862 x-0.155271 x3+0.00564312 x5The following plot shows how close u(x)(green) and sinx(blue) are within [-𝜋,𝜋]. It is worth noting that such closeness is not achieved through Tyler expansion!And finally, ladies and gentleman, we have a colored plot in our bland-and-white notes on Linear Algebra!