Inner Product Spaces

Inner product space is vector space equipped with definition of inner product. In many literature, inner product and dot product are used interchangeably while mathematicians argue the difference appears when applying them on complex and real vector spaces. This note serves the purpose of introducing inner product spaces themselves, and the next note will discuss in more details the operators on inner product spaces.

1. Inner Products and Norms 1.1. Three important theorems 2. Orthonormal Bases 2.1. Riesz Representation Theorem 3. Orthogonal Complements 3.1. Minimization problem

1. Inner Products and NormsTo make our definition of inner product more rigorous, let us first define the concept of dot product. Note here the dot product is defined in real vector spaces

R^{n}

Definition

For

x, y \in R^{n}

, the dot product of the two is difined as

x \cdot y = \sum_{i = 1}^{n} x_{1} y_{i}

where

x = (x_{1}, . . ., x_{n})

and

y = (y_{1}, . . ., y_{n})

With Definition 1, if we let

x = y

, then

x \cdot x = | | x | |^{2} = \sum x_{i}^{2}

| | x | |

is called norm of

x

, and it represents the Euclidean length of vector

x

that has its initial point at the origin. An inner product is a generalization of the dot product to complex vector spaces. For

z \in C^{n}

, its norm is defined as

| | z | | = \sqrt{\sum_{i = 1}^{n} z_{i} {\bar{z}}_{i}}

where

{\bar{z}}_{i}

is the conjugate of complex scalar

z_{i}

. The norm definition above suggest that the inner product of

w \in C^{n}

with

v \in C^{n}

should be equal to

w_{1} {\bar{v}}_{1} + \dots + w_{n} {\bar{v}}_{n} .

On the other hand, the inner product of

v

with

w

should then be

v_{1} {\bar{w}}_{1} + \dots v_{n} {\bar{w}}_{n},

the complex conjugate to the inner product of

w

with

v

. To satisfy these constraints we give the following definition for the inner product on both real and complex vector spaces.

Definition

An inner product on

V

takes each ordered pair

(u, v)

of elements of

V

to a number

⟨ u, v ⟩ \in F

It's worth noting here that the use of

⟨ u, v ⟩

is pervasive in the studies of quantum mechanics. Some literature prefer

(u, v)

for denoting the inner product, but we will stick to the former as it looks a lot like the Dirac's notation. For

𝜆 \in F

, and

u, v \in V

, we have the following properties of the inner product hold:

a) definiteness:

⟨ v, v ⟩ = 0 if and only if v = 0.

b) positivity:

⟨ v, v ⟩ \geq 0 for all v \in V .

c) additivity in first slot:

⟨ u + v, w ⟩ = ⟨ u, w ⟩ + ⟨ v, w ⟩ for all u, v, w \in V .

d) homogeneity in first slot:

⟨ 𝜆 u, v ⟩ = 𝜆 ⟨ u, v ⟩

for all

𝜆 \in F

and all

u, v \in V .

e) conjugate symmetry:

⟨ u, v ⟩ = \bar{⟨ v, u ⟩}

for all

u, v \in V

.(b)

⟨ 0, u ⟩ = 0

for every

u \in V

.f)

⟨ u, 0 ⟩ = 0

for every

u \in V

.g)

⟨ u, v + w ⟩ = ⟨ u, v ⟩ + ⟨ u, w ⟩

for all

u, v, w \in V

.h)

⟨ u, 𝜆 v ⟩ = \bar{𝜆} ⟨ u, v ⟩

for all

𝜆 \in F

and

u, v \in V

The definition of inner product results in two other definitions: norms and orthogonality. Eqn.

(1)

already gives a definition of norm by writting out a sum of products of complex vector and its conjugate. The same definition can be re-written in a more concise form as

Definition

The norm of

v \in V

is defined as

| | v | | = \sqrt{⟨ v, v ⟩}

For

v, u \in V

, we can define the orthogonality between the two as the following

Definition

Two vectors

v, u \in V

are called orthogonal if

⟨ v, u ⟩ = 0

The concise definitions above should make the importance of inner product self-evident. In the next subsection we can show how important theorems can be proved by inner products of vectors easily.1.1. Three important theoremsWe will introduct in this section three important theorems: triangle inequality, Cauchy–Schwarz inequality, and parallelogram equality. We will see the power of using inner products in the proofs. As a starter, let us first state the triangle inequality in terms of norms of vectors

Theorem

For

v, u \in V

, we have

| | v + u | | \leq | | u | | + | | v | |

Proof:

| | v + u | |^{2} = ⟨ v + u, v + u ⟩ = | | v | |^{2} + | | u | |^{2} + ⟨ v, u ⟩ + ⟨ u, v ⟩

. Using the conjugate symmetry of inner product gives

| | v + u | |^{2} = | | v | |^{2} + | | u | |^{2} + 2 R e (⟨ v, u ⟩) \leq | | v | |^{2} + | | u | |^{2} + 2 | | u | | | | v | | = (| | u | | + | | v | |)^{2}

. Taking square root at both sides gives desired result.We next prove the parallelogram equality, which usually makes its appearance in basic geometry textbooks.

Theorem

Suppose

u, v \in V

. Then

| | u + v ‖^{2} + | | u - v ‖^{2} = 2 (| | u | |^{2} + | | v | |^{2}) .

Proof:

\begin{aligned} | | u + v | |^{2} + | | u - v | |^{2} = & ⟨ u + v, u + v ⟩ + ⟨ u - v, u - v ⟩ \\ = & | | u | |^{2} + | | v | |^{2} + ⟨ u, v ⟩ + ⟨ v, u ⟩ \\ + | | u | |^{2} + | | v | |^{2} - ⟨ u, v ⟩ - ⟨ v, u ⟩ \\ = & 2 (| | u | |^{2} + | | v | |^{2}) \end{aligned}

We end this section by proving arguably the most important theorem for inner product vector spaces: Cauchy-Schwatz inequality. To do so, we need to first introduce an orthogonal decomposition of vector in inner product spaces:

Theorem

Suppose

u, v \in V

, with

v \neq 0

. Set

c = \frac{⟨ u, v ⟩}{| | v | |^{2}}

and

w = u - \frac{⟨ u, v ⟩}{| | v | |^{2}} v

. Then

⟨ w, v ⟩ = 0

and

u = c v + w

Theorem 3 should make sense to the reader as

\frac{⟨ u, v ⟩}{| | v | |}

is the projection of

u

at the direction of

v

. Now we are good to prove the Cauchy-Schwartz inequality

Theorem

Suppose

u, v \in V

. Then

| ⟨ u, v ⟩ | \leq | | u | | \cdot | | v | | .

This inequality is an equality if and only if one of

u, v

is a scalar multiple of the other.

Proof: We can use Theorem 3 to decompose

u = \frac{⟨ u, v ⟩}{| | v | |^{2}} v + w

with

⟨ w, v ⟩ = 0

. Taking the square of both sides gives

| | u | |^{2} =

\frac{| ⟨ u, v ⟩ |^{2}}{| | v | |^{2}} + | | w | |^{2}

and

| | u | |^{2} | | v | |^{2} = | ⟨ u, v ⟩ |^{2} + | | v | |^{2} | | w | |^{2}

. Because

| | v | |^{2} | | w | |^{2} \geq 0

, we have

| ⟨ u, v ⟩ | \leq | | u | | \cdot | | v | |

Also,

| ⟨ u, v ⟩ | = | | u | | \cdot | | v | |

means

| | u | |^{2} =

\frac{| ⟨ u, v ⟩ |^{2}}{| | v | |^{2}}

, i.e.,

| | w | | = 0

. Therefore the equality is satisfied if and only if one of

u, v

is scalar multiple of the other.2. Orthonormal BasesWe have discussed in the note of bases and dimensions that a basis of vector space is a list of linearly independent vectors. Now that we have the definition of orthogonality in our pocket, we can take a step further to demand basis vectors to be mutually orthogonal with their norms being 1. A basis satisfies these constraints is called orthonormal bases, a most used basis in quantum-mechanical calculations. The reason for choosing orthonormal bases lies in the following proposition

Proposition

Every finite-dimensional inner product space has an orthonormal basis.

In this section, we will first give the definition of orthonormal bases and discuss how we can obtain them from a basis of linearly independent vectors through Gram–Schmidt procedure. But first, let us make the definition of orthonormal basis formal as the following:

Definition

A basis of

V

e_{1}, e_{2}, . . ., e_{n}

is called orthonormal if

⟨ e_{j}, e_{k} ⟩ = {\begin{cases} 1 & if j = k \\ 0 & if j \neq k \end{cases}

It turns out that arbitrary vector

v \in V

can be expressed in terms of orthonormal basis as the following

v = \sum_{i = 1}^{n} ⟨ v, e_{i} ⟩ e_{i}

where

n = d i m V

. To see this, suppose

v = a_{1} e_{1} + \dots + a_{n} e_{n}

, and we have

a_{i} = ⟨ v, e_{i} ⟩

due to Definition 5. As one might expect, the norm of

v

is then

| | v | | = \sqrt{\sum_{i = 1}^{n} ⟨ v, e_{i} ⟩^{2}} .

To prove this, let

a_{i} = ⟨ v, e_{i} ⟩

and

v = a_{1} e_{1} + a_{2} e_{2} + \dots + a_{n} e_{n}

. So

| | v | |^{2} = ⟨ v, v ⟩ = a_{1} {\bar{a}}_{1} + a_{2} {\bar{a}}_{2} + \dots + a_{n} {\bar{a}}_{n}

due to Definition 5 and property (h) of inner product.It is now obvious that orthonormal basis, empowered by inner product, is a natural choice for inner product spaces. Furthermore, if we a list of linear independent vectors, we can convert them into a list of orthonormal vectors using the Gram-Schmidt procedure:

Theorem

Suppose

v_{1}, \dots, v_{m}

is a linearly independent list of vectors in

V

. Let

e_{1} = v_{1} / ‖ v_{1} ‖

. For

j = 2, \dots, m

, define

e_{j}

inductively by

e_{j} = \frac{v_{j} - ⟨ v_{j}, e_{1} ⟩ e_{1} - \dots - ⟨ v_{j}, e_{j - 1} ⟩ e_{j - 1}}{‖ v_{j} - ⟨ v_{j}, e_{1} ⟩ e_{1} - \dots - ⟨ v_{j}, e_{j - 1} ⟩ e_{j - 1} ‖} .

Then

e_{1}, \dots, e_{m}

is an orthonormal list of vectors in

V

such that

s p a n (v_{1}, \dots, v_{j}) = s p a n (e_{1}, \dots, e_{j})

for

j = 1, \dots, m

One of common wishful thinking is that matrix with respect to orthornomal basis is diagonal. This is usually not the case. But the next theorem shows that if an operator has an upper-triangular matrix with respect to some basis, it also has an upper-triangular matrix with respect to orthonormal basis. Note that the following theorem holds for both real and complex vector spaces.

Theorem

Suppose

T \in L (V)

. If

T

has an upper-triangular matrix with respect to some basis of

V

, then

T

has an upper-triangular matrix with respect to some orthonormal basis of

V

Proof: Let

v_{1}, . . . v_{n}

be a basis of

V

, from which we can get an orthonormal basis

e_{1}, . . ., e_{n}

by using Gram-Schmit procedure in Theorem 5. Because

s p a n (v_{1}, . . . v_{n}) = s p a n (e_{1}, . ., e_{n})

, we can conclude that

s p a n (e_{1}, . . ., e_{n})

is invariant under

T

. From the condition c) for having an upper-triangular matrix,

T

must have a upper-triangular matrix with respect to the orthonormal basis.Theorem 6 has many applications, one of which is helping proving the Shur's theorem:

Theorem

Suppose

V

is a finite-dimensional complex vector space and

T \in L (V)

. Then

T

has an upper-triangular matrix with respect to some orthonormal basis of

V

Proof: recall the theorem on the existence of upper-triangular matrix in the previous notes, then we can apply Theorem 6 here to prove Theorem 7.2.1. Riesz Representation TheoremTheorem 6 and Theorem 7 give what operator matrices might look like with respect to orthonormal basis, and we will discuss more about operators on inner product space in our next note. For now, let us confine our scope to just discuss one special kind of operators, i.e., linear functional. We have briefly discussed it in our note on the duality of vector space, and its formal definition is the following

Definition

A linear functional on

V

is a linear map from

V

F

. In other words, a linear functional is an element of the dual space of

V

L (V, F)

For

v, u \in V

, a map from

v

⟨ v, u ⟩

is a linear functional. The following theorem (Ries representation theorem) shows that every linear functional on inner product space is essentially a inner product.

Theorem

Suppose

V

is finite-dimensional and

𝜑

is a linear functional on

V

. Then there is a unique vector

u \in V

such that

𝜑 (v) = ⟨ v, u ⟩

for every

v \in V

Proof: We include the proof here because it shows some tricks that will be used in our quantum computing /information theory notes repeatedly. Suppose

e_{1}, e_{2}, . . ., e_{n}

is an orthonormal basis of

V

, we have

v = ⟨ v, e_{1} ⟩ e_{1} + \dots + ⟨ v, e_{n} ⟩ e_{n}

and

\begin{aligned} 𝜑 (v) & = ⟨ v, e_{1} ⟩ 𝜑 (e_{1}) + \dots + ⟨ v, e_{n} ⟩ 𝜑 (e_{n}) \\ = ⟨ v, \bar{𝜑 (e_{1})} e_{1} ⟩ + \dots + ⟨ v, \bar{𝜑 (e_{n})} e_{n} ⟩ \\ = ⟨ v, \bar{𝜑 (e_{1})} e_{1} + \dots + \bar{𝜑 (e_{n})} e_{n} ⟩ \end{aligned}

where the first equivalence is from the property of linear functional, and the second one is due to the property h) of the inner product. Because

\bar{𝜑 (e_{i})}

are scalars, we recover the form of

𝜑 (v) = ⟨ v, u ⟩

if we let

u = \bar{𝜑 (e_{1})} e_{1} + \dots + \bar{𝜑 (e_{n})} e_{n}

To prove the uniqueness of

u

, suppose we have two vectors,

u_{1}

and

u_{2}

, that map every

v \in V

to scalar through inner product. Then, we have

𝜑 (v) = ⟨ v, u_{1} ⟩ = ⟨ v, u_{2} ⟩

0 = ⟨ v, u_{1} ⟩ - ⟨ v, u_{2} ⟩ = ⟨ v, u_{1} - u_{2} ⟩

for every

v \in V

, which indicates that

u_{1} = u_{2}

. □The

u

in Theorem 8 is also called as Riesz representation of the linear fucntional

𝜑

. More importantly, Eqn.

(3)

gives a way to calculate the Riesz representation. While Eqn.

(3)

seems to be dependent on the choice of orthonormal basis, we emphasize here that the Riesz representation is only determined by its associated linear functional, and the right hand side of the equation remains the same regardless of the choice of

{e_{i}}

.3. Orthogonal ComplementsAs an extention to Definition 4, we can define the concept of orthogonal complement for subspaces of an inner product space.

Definition

U

is a subset of

V

, then the orthogonal complement of

U

, denoted

U^{⟂}

, is the set of all vectors in

V

that are orthogonal to every vector in

U

U^{⟂} = {v \in V : ⟨ v, u ⟩ = 0 for every u \in U}

With Definition 7, the following propositions and theorems hold as expected:

Theorem

Suppose

V

is finite-dimensional and

U

is a subspace of

V

. Then

\begin{array}{c} d i m U^{⟂} = d i m V - d i m U . \\ V = U \oplus U^{⟂} \\ U = (U^{⟂})^{⟂} \end{array}

The definition of orthogonal complement gives the definition of othogonal projection operator, which projects vector to a specific subspace.

Definition

Suppose

U

is a finite-dimensional subspace of

V

. The orthogonal projection of

V

onto

U

is the operator

P_{U} \in L (V)

defined as follows: For

v \in V

, write

v = u + w

, where

u \in U

and

w \in U^{⟂}

. Then

P_{U} v = u

As an example, suppose

v \in V

and

U = s p a n (v)

, then for

u \in V

P_{U} u = \frac{⟨ u, v ⟩}{| | v | |^{2}} v

. Note here that

\frac{⟨ u, v ⟩}{| | v | |^{2}} v

and

u - \frac{⟨ u, v ⟩}{| | v | |^{2}} v

are the orthogonal decomposition of

v

, and

\frac{⟨ u, v ⟩}{| | v | |^{2}} v

is a scalar multiple of

v

. Thus,

u - \frac{⟨ u, v ⟩}{| | v | |^{2}} v \in U^{⟂}

.Before we discuss minimization problem using orthogonal projection, let us simply list its properties:

Suppose

U

is a finite-dimensional subspace of

V

and

v \in V

. Then(a)

P_{U} \in L (V)

;(b)

P_{U} u = u

for every

u \in U

;(c)

P_{U} w = 0

for every

w \in U^{⟂}

;(d) range

P_{U} = U

;(e) null

P_{U} = U^{⟂}

;(f)

v - P_{U} v \in U^{⟂}

;(g)

P_{U}^{2} = P_{U}

;(h)

‖ P_{U} v ‖ \leq | | v | |

;(i) for every orthonormal basis

e_{1}, \dots, e_{m}

U

P_{U} v = ⟨ v, e_{1} ⟩ e_{1} + \dots + ⟨ v, e_{m} ⟩ e_{m} .

3.1. Minimization problemGiven a subspace

U \subset V

, and a vector

v \in V

, a natural question would be: how can one find

u \in U

such that the norm

| | v - u | |

is minimized. The following proposition gives the answer to the question:

Proposition

Suppose

U \subset V

is a finite-dimensional subspace of

V

. For

v \in V

and

u \in U

, we have

| | v - u | | \geq | | P_{U} v - u | |

where the equality is satisfied if and only if

u = P_{U} v

As an example of using Proposition 2(took from Axler's book), let us find a real function

u (x)

such that the value of

\int_{- 𝜋}^{𝜋} | \sin x - u (x) |^{2} d x

is minimized. On the vector space of real fucntions defined on

[- 𝜋, 𝜋]

, we can define its associated inner product as

⟨ f (x), g (x) ⟩ = \int_{- 𝜋}^{𝜋} f (x) g (x) d x

If we let

v (x) = \sin x

, then Eqn.

(4)

is just

⟨ v (x) - u (x), v (x) - u (x) ⟩ = | | v - u | |^{2}

. With this observation, the original problem is then finding a

u (x)

to minimize

| | v - u | |

. Accoding to Proposition 2, such a function must be

P_{U} v

. Since there is no limitation of how to choose our

U

, we can let

U

be the space of polynomials with real coefficient and degree at most 5(i.e.

P_{5} (R)

). A basis of

P_{5} (R)

1, x, x^{2}, x^{3}, x^{4}, x^{5}

. We make orthonormal through the Gram-Schwartz procedure. Next, we can use Eqn.

(5)

and the property (i) of the orthogonal projection to calculate

u (x)

u (x) = 0.987862 x - 0.155271 x^{3} + 0.00564312 x^{5}

The following plot shows how close

u (x)

(green) and

\sin x

(blue) are within

[- 𝜋, 𝜋]

. It is worth noting that such closeness is not achieved through Tyler expansion!And finally, ladies and gentleman, we have a colored plot in our bland-and-white notes on Linear Algebra!

\|\|u+v\|\|2+\|\|u-v\|\|2=	⟨u+v,u+v⟩+⟨u-v,u-v⟩
=	\|\|u\|\|2+\|\|v\|\|2+⟨u,v⟩+⟨v,u⟩
	+\|\|u\|\|2+\|\|v\|\|2-⟨u,v⟩-⟨v,u⟩
=	2(\|\|u\|\|2+\|\|v\|\|2)

𝜑(v)	=⟨v,e1⟩𝜑(e1)+⋯+⟨v,en⟩𝜑(en)
	=⟨v,⏨⏨⏨𝜑(e1)e1⟩+⋯+⟨v,⏨⏨⏨𝜑(en)en⟩
	=⟨v,⏨⏨⏨𝜑(e1)e1+⋯+⏨⏨⏨𝜑(en)en⟩

1		if j=k
0		if j≠k