Lecture 41B - Proof of the Symmetric Matrix Theorem

In this lecture, we will work through a proof of this part of the Symmetric Matrix Theorem from Lecture 41:

Theorem. If \( A \) is a symmetric matrix, then \( A \) is orthogonally diagonalizable.

Block Multiplication

When working with large matrices, it is often helpful to divide them into blocks. For example if \( M \) is an \( m\times n\) matrix, and \( p \) and \(q\) are integers with \( 1 \lt p \lt m \) and \( 1 \lt q \lt n \), we can write \( M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \), where

\( A \) is the matrix containing the entries from the first \( p \) rows and first \( q \) columns of \( M \)
\( B \) is the matrix containing the entries from the first \( p \) rows and last \( n-q \) columns of \( M \)
\( C \) is the matrix containing the entries from the last \( m-p \) rows and first \( q \) columns of \( M \)
\( D \) is the matrix containing the entries from the last \( m-p \) rows and last \( n-q \) columns of \( M \)

If \( M_1 = \begin{bmatrix} A_1 & B_1 \\ C_1 & D_1 \end{bmatrix} \) and \( M_2 = \begin{bmatrix} A_2 & B_2 \\ C_2 & D_2 \end{bmatrix} \) are block matrices where the shapes of the blocks are compatible to make the necessary block-wise products defined, then \[ M_1 M_2 = \begin{bmatrix} A_1 A_2 + B_1 C_2 & A_1 B_2 + B_1 D_2 \\ C_1 A_2 + D_1 C_2 & C_1 B_2 + D_1 D_2 \end{bmatrix}. \]

Symmetric Matrices Have Real Eigenvalues

Our first step of the Symmetric Matrix Theorem proof will be to prove that any symmetric \( n \times n \) matrix has \( n \) real eigenvalues, counting multiplicities. Since the charactertistic polynomial of \( A \) has degree \( n\), the Fundamental Theorem of Algebra tells us that \( A \) has \( n \) complex eigenvalues. In order to show that these eigenvalues must all be real, we need to establish some terminology and notation regarding complex numbers and matrices with complex entries.

Definition. Let \( z = x+iy \) be a complex number. The conjugate of \( z \), written \( \overline{z} \), is \( \overline{z} = x-iy \).

Definition. Let \( A \) be a matrix with possibly complex entries. The adjoint of \( A \), written \( A^* \), is the transpose of its conjugate: \( A^* = \overline{A}^T \).

Definition. Let \( \bbm v, \bbm w \in \mathbb C^n \) be vectors with entries \( v_i \) and \( w_i \), respectively. The inner product of \( \bbm v \) and \( \bbm w\) is \( \langle \bbm v, \bbm w \rangle = \bbm v^* \bbm w = \overline{v_1}w_1 + \cdots + \overline{v_n}w_n \).

Note that when \( \bbm v\) and \( \bbm w\) have all real entries, this is the same dot product that we defined in Lecture 37. This inner product has many of the same properties as that dot product, including the fact that \( \| \bbm v \|^2 = \langle \bbm v, \bbm v \rangle \).

Theorem. If \( A \) is a real, symmetric \(n\times n\) matrix, then \( A \) has \( n \) real eigenvalues, counting multiplicity.

Since \( A \) has \( n \) complex eigenvalues, counting multiplicity, we must only show that any eigenvalue of \( A \) must be real. Let \( \lambda \) be an eigenvalue of \( A \). Since \( A \) is real and symmetric, \( A^* = A \).

We compute \[ \| A\bbm v \|^2 = \langle A\bbm v, A\bbm v \rangle = (A\bbm v)^* A\bbm v = \bbm v^* A^* A \bbm v = \bbm v^* (A^2 \bbm v). \]

Since \( \bbm v \) is an eigenvector, we have \( A^2 \bbm v = \lambda^2 \bbm v \). Now, \[ \| A\bbm v \|^2 = \bbm v^* (\lambda^2 \bbm v) = \lambda^2 \langle \bbm v, \bbm v \rangle = \lambda^2 \| \bbm v \|^2. \]

This gives \( \lambda^2 = \frac{\| A\bbm v \|^2}{\| \bbm v \|^2} \), which means \( \lambda \) is real. \( \Box \)

Symmetric Matrices Are Orthogonally Diagonalizable

Now we work to complete the main proof.

Theorem. If \( A \) is a symmetric matrix, then \( A \) is orthogonally diagonalizable.

Proof. Work by induction. Clearly any \( 1 \times 1 \) matrix is orthogonally diagonalizable. Now suppose that any \( (n-1)\times (n-1) \) symmetric matrix is orthogonally diagonalizable and consider the \( n \times n \) matrix \( A \). Let \( \lambda \) be a (real) eigenvalue for \( A \) with associated eigenvector \( \bbm v \). If necessary, normalize \( \bbm v\) to make it a unit vector.

Let \( H = {\rm Span}\{ \bbm v \} \) and consider \( H^\perp \). Use the Gram-Schmidt process to find an orthonormal basis \( {\cal B} = \{ \bbm u_1, \bbm u_2, \ldots, \bbm u_{n-1} \} \) for \( H^\perp \). Let \( B \) be the \( n \times (n-1) \) matrix whose columns are the vectors in \( \cal B \).

Consider the matrix \( B^TAB \). Since \( A \) is symmetric, this \( (n-1)\times (n-1) \) matrix is symmetric and therefore orthogonally diagonalizable by our induction hypothesis. Write \( B^T AB = PDP^T \), where \( D \) is diagonal and \( P \) is orthogonal.

Now consider the \( n \times (n-1) \) matrix \( K = BP \). We have \( K^T K = P^T B^T BP = P^T P = I \), since both \( B \) and \( P \) have orthonormal columns. Thus, \( K \) also has orthonormal columns. Let \( U = [\bbm v\ \ K] \) be the \( n\times n\) matrix obtained by attaching the vector \( \bbm v\) onto \( K \).

To see why \( U \) is orthogonal, we need only verify that \( \bbm v\) is orthogonal to the columns of \( K \). We have \( \bbm v^T K = (\bbm v^T B)P \). By the construction of \( B \), we know that \( \bbm v\) is orthogonal to the columns of \( B \). Thus, \( \bbm v^T K \) is the zero matrix, and so \( \bbm v \) is orthogonal to the columns of \( K \).

Finally, using block multiplication, we have \[ U^T A U = \begin{bmatrix} \bbm v^T \\ K^T \end{bmatrix} A [\bbm v\ \ K] = \begin{bmatrix} \bbm v^T A \bbm v & \bbm v^T AK \\ K^T A \bbm v & K^TAK \end{bmatrix}. \]

Now,

\( \bbm v^T A\bbm v = \bbm v^T \lambda \bbm v = \lambda \bbm v \cdot \bbm v = \lambda \)
\( \bbm v^T AK = (A\bbm v)^T K = \lambda \bbm v^T K = 0 \)
\( K^T A\bbm v = K^T \lambda \bbm v = \lambda (\bbm v^T K)^T = 0 \)
\( K^TAK = (BP)^T A (BP) = P^T (B^TAB) P = P^TPDP^TP = D \).

Thus we have \( U^T A U = \begin{bmatrix} \lambda & 0 \\ 0 & D \end{bmatrix} \). Since \( U \) is orthogonal, \( U^{-1} = U^T \). Write \( \Delta = \begin{bmatrix} \lambda & 0 \\ 0 & D \end{bmatrix} \) so that \( A = U\Delta U^T \) is an orthogonal diagonalization of \( A \). \( \Box \)

« Lecture 41 Back to Top Lecture 42 »