Lecture 42 - The Singular Value Decomposition

Learning Objective

Decomposing Any Matrix

We have seen that, when \( A \) is a square matrix, we can sometimes diagonalize \( A \) by writing \( A = PDP^{-1} \) for a diagonal matrix \( D \) and invertible matrix \( P \). We have also seen that symmetric matrices are always diagonalizable and can, in fact, be diagonalized in a special way. However, not every square matrix is diagonalizable, and certainly not every matrix is square!

In this lecture, we aim to construct a decomposition of any matrix \( A \) into \( A = QDP^{-1} \), where \( D \) is the same shape as \( A \) with nonzero entries only on its diagonal.

Singular Values

If \( A \) not a square matrix, then \( A \) cannot have eigenvectors or eigenvalues. Instead, we examine the matrix \( A^T A \). This matrix is symmetric, and therefore orthogonally diagonalizable.

Let \( \{ \bbm v_1, \bbm v_2, \ldots, \bbm v_n \} \) be an orthonormal basis for \( \mathbb R^n \) consisting of eigenvectors of \( A^T A \) with corresponding eigenvalues \( \lambda_1, \lambda_2, \ldots, \lambda_n \). Then, \[ \| A\bbm v_i \|^2 = A\bbm v_i \cdot A\bbm v_i = (A\bbm v_i)^T (A\bbm v_i) = \bbm v_i^T A^T A \bbm v_i = \bbm v_i^T \lambda_i \bbm v_i = \lambda_i. \] In particular, note that each eigenvalue \( \lambda_i \) is nonnegative. This motivates the following definition.

Definition. Let \( A \) be an \( m\times n\) matrix, write \( \lambda_i \) for the (not necessarily distinct) eigenvalues of the \( n\times n\) matrix \( A^T A \). The singular values of \( A \) are \( \sigma_i = \sqrt{\lambda_i} \).

We know that \( \sigma_i^2 = \lambda_i = \| A\bbm v_i \|^2\), so an equivalent definition of singular values is \( \sigma_i = \| A\bbm v_i \| \).

An Orthogonal Basis for \( \Col A \)

We can use singular values to construct an orthogonal basis for the column space of any matrix.

Theorem (Orthogonal Basis for the Column Space). Let \( A \) be an \( m\times n \) matrix, and let \( {\cal B} = \{ \bbm v_1, \bbm v_2, \ldots, \bbm v_n \} \) be an orthonormal basis for \( \mathbb R^n \) consisting of eigenvectors of \( A^T A \), numbered so that the corresponding eigenvalues are written in decreasing order: \( \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n \). Suppose that \( A \) has \( r \) nonzero singular values. Then \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is an orthogonal basis for \( \Col A \).

Note that some of the eigenvalues of \( A^T A \) may be zero, but they are all nonnegative. So, any zero values will appear at the end of the ordered list \( \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n \). By saying "\( A \) has \( r \) nonzero singular values," we are saying \( \lambda_{r+1} = \lambda_{r+2} = \cdots = \lambda_n = 0 \).

Proof of the Orthogonal Basis for the Column Space Theorem. We have three things to prove:

  1. \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) spans \( \Col A \)
  2. \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is orthogonal
  3. \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is linearly independent

For (1), let \( \bbm y \in \Col A \). Then \( \bbm y = A\bbm x \) for some \( \bbm x \in \mathbb R^n \). Since \( {\cal B} = \{ \bbm v_1, \bbm v_2, \ldots, \bbm v_n \} \) is a basis for \( \mathbb R^n \), we can write \( c_i \) for the coordinates \( [\bbm x]_{\cal B} \), so that \( \bbm x = c_1 \bbm v_1 + \cdots + c_n \bbm v_n \).

Now, \[ \begin{eqnarray*} \bbm y = A\bbm x & = & A(c_1 \bbm v_1 + \cdots + c_n \bbm v_n) \\ & = & c_1 A \bbm v_1 + c_2 A \bbm v_2 + \cdots + c_r A \bbm v_r + c_{r+1} A\bbm v_{r+1} + \cdots + c_n A\bbm v_n \\ & = & c_1 A \bbm v_1 + c_2 A \bbm v_2 + \cdots + c_r A \bbm v_r + \bbm 0 + \cdots + \bbm 0 \\ & = & c_1 (A \bbm v_1) + c_2 (A \bbm v_2) + \cdots + c_r (A \bbm v_r) \end{eqnarray*} \] Thus, \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) spans \( \Col A \).

For (2), let \( 1 \le i, j \le r \) with \( i \ne j \) and consider \( A\bbm v_i \cdot A\bbm v_j \). Recall that \( {\cal B} = \{ \bbm v_1, \bbm v_2, \ldots, \bbm v_n \} \) is an orthonormal basis, so \( \bbm v_i \cdot \bbm v_j = 0\). Now, \[ (A\bbm v_i)\cdot (A\bbm v_j) = (A\bbm v_i)^T (A\bbm v_j) = \bbm v_i^T A^T A \bbm v_j = \bbm v_i^T \lambda_j \bbm v_j = \lambda_j(\bbm v_i \cdot \bbm v_j) = 0. \] Thus, \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is orthogonal.

For (3), note that for each \( 1 \le i \le r \), the singular value \( \sigma_i = \| A \bbm v_i \| \) is nonzero, so \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is an orthogonal set of nonzero vectors. We proved in Lecture 38 that such a set must be linearly independent. \( \Box \)

The Singular Value Decomposition

Given an \( m\times n\) matrix \( A \) with nonzero singular values \( \sigma_1,\ldots,\sigma_r \), the decomposition of \( A \) involves a "quasi-diagonal" matrix \( \Sigma = \begin{bmatrix} \Delta & 0 \\ 0 & 0 \end{bmatrix} \), written in block form where \( \Sigma \) is \( m\times n \) and \( \Delta \) is the \( r\times r\) diagonal matrix whose diagonal entries are the \( \sigma_i \).

The Singular Value Decomposition Theorem. Let \( A \) be an \( m\times n \) matrix with nonzero singular values \( \sigma_1,\ldots,\sigma_r \). Let \( \Sigma \) be the \( m\times n\) matrix whose row \( i\), column \( i \) entry is \( \sigma_i \) and all other entries are zero. There exist an \( m\times m\) orthogonal matrix \( U \) and an \( n\times n\) orthogonal matrix \( V \) for which \( A = U\Sigma V^T \).

Proof. Since \( A^T A \) is symmetric, it is orthogonally diagonalizable by the Symmetric Matrix Theorem. Let \( {\cal B} = \{ \bbm v_1, \bbm v_2, \ldots, \bbm v_n \} \) be an orthonormal basis for \(\mathbb R^n \) consisting of eigenvectors of \(A^T A\) corresponding to eigenvalues \(\lambda_i\), written in decreasing order. Then \(\sigma_i=\sqrt{\lambda_i}\) and \( \{ A \bbm v_1, A \bbm v_2, \ldots, A\bbm v_r \} \) is an orthogonal basis for \( \Col A \).

Since \( A \) has \( m \) rows, \( \Col A \) is a subspace of \( \mathbb R^m \). Normalize each \( A\bbm v_i \) to obtain an orthonormal basis for this subspace: \[ \bbm u_i = \frac{1}{\| A\bbm v_i \|} A\bbm v_i = \frac{1}{\sigma_i} A\bbm v_i. \] So, we have \( A \bbm v_i = \sigma_i \bbm u_i \).

Since in general we may have \( r \lt m \), we will need to extend \( \{ \bbm u_1,\ldots, \bbm u_r \} \) to an orthonormal basis for all of \( \mathbb R^m \). One way to do this is to form the matrix \( C = [\bbm u_1\ \ \bbm u_2\ \ \cdots\ \ \bbm u_r]\). Since \( (\Col C)^\perp = \Nul C^T \), finding an orthonormal basis for \( \Nul C^T \) extends \( \{ \bbm u_1,\ldots, \bbm u_r \} \) to a full basis for \( \mathbb R^m \). Write \( \{ \bbm u_1,\ldots, \bbm u_m \} \) for the resulting orthonormal basis for \( \mathbb R^m \).

Let \( U \) be the matrix whose columns are the \( \bbm u_i \), and let \( V \) be the matrix whose columns are the \( \bbm v_i \). We will now prove that \( AV = U\Sigma \).

First, we have \[ AV = [A\bbm v_1\ \ A\bbm v_2\ \cdots\ A\bbm v_r\ \ A\bbm v_{r+1}\ \cdots\ A\bbm v_n] = [A\bbm v_1\ \ A\bbm v_2\ \cdots\ A\bbm v_r\ \ \bbm 0\ \cdots\ \bbm 0] = [\sigma_1 \bbm v_1\ \ \sigma_2 \bbm v_2\ \cdots\ \sigma_r \bbm v_r\ \ \bbm 0\ \cdots\ \bbm 0]. \]

Also, we have \[ U\Sigma = [\bbm u_1\ \ \bbm u_2\ \cdots\ \bbm u_m] \begin{bmatrix} \Delta & 0 \\ 0 & 0 \end{bmatrix} = [\sigma_1 \bbm v_1\ \ \sigma_2 \bbm v_2\ \cdots\ \sigma_r \bbm v_r\ \ \bbm 0\ \cdots\ \bbm 0]. \]

Thus \( AV = U\Sigma \), and since \( V \) is orthogonal, \( A = U\Sigma V^T \), completing the proof. \( \Box \)

Here is the process for constructing the singular value decomposition for a matrix \( A \):

  1. Find an orthogonal diagonalization of \( A^T A \).
  2. Construct \( V \) from the eigenvectors \( \bbm v_i \) and \( \Sigma \) from the singular values \( \sigma_i = \sqrt{\lambda_i} \).
  3. For each nonzero singular value \( \sigma_i \), compute \( \bbm u_i = \frac{1}{\sigma_i} A\bbm v_i \). Extend \( \{ \bbm u_1,\ldots, \bbm u_r \} \), if necessary, to a full orthonormal basis for \( \mathbb R^m \). The vectors \( \bbm u_i \) form the columns of \( U \).
  4. Verify that \( A = U\Sigma V^T \).

An Example

Let \( A = \begin{bmatrix} -2\sqrt 3 & 0 & 0 \\ -2 & 0 & 0 \\ -\sqrt 2 & 6 & -6 \\ \sqrt 6 & 2\sqrt 3 & -2\sqrt 3 \end{bmatrix} \). Construct the singular value decomposition for \( A \).

Step 1: Find an orthogonal diagonalization of \( A^T A \).

We compute \( A^T A = \begin{bmatrix} 24 & 0 & 0 \\ 0 & 48 & -48 \\ 0 & -48 & 48 \end{bmatrix} \). We find that \( A^T A \) has three eigenvalues: \( \lambda_1 = 96 \), \( \lambda_2 = 24 \), and \( \lambda_3 = 0 \). Corresponding eigenvectors are \( \bbm p_1 = \vecthree 0{-1}1 \), \( \bbm p_2 = \vecthree 100 \), and \( \bbm p_3 = \vecthree 011 \). Since these are eigenvectors for distinct eigenvalues, they are mutually orthogonal, so we need only normalize them.

Step 2: Construct \( V \) and \( \Sigma \).

The orthonormal eigenvectors form the columns of \( V \): \[ V = \begin{bmatrix} 0 & 1 & 0 \\ -1/\sqrt 2 & 0 & 1/\sqrt 2 \\ 1/\sqrt 2 & 0 & 1/\sqrt 2 \end{bmatrix}. \]

The singular values are \( \sigma_1 = \sqrt{96}\), \( \sigma_2 = \sqrt{24}\), and \( \sigma_3 = 0 \). We construct \[ \Sigma = \begin{bmatrix} \sqrt{96} & 0 & 0 \\ 0 & \sqrt{24} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. \]

Step 3. Construct \( U \).

For the nonzero singular values, we use the formula \( \bbm u_i = \frac{1}{\sigma_i} A\bbm v_i \): \[ \bbm u_1 = \frac{1}{\sigma_1} A \bbm v_1 = \frac{1}{\sqrt{96}} A \vecthree 0{-1/\sqrt{2}}{1/\sqrt 2} = \vecfour 00{-\sqrt 3/2}{-1/2} \] \[ \bbm u_2 = \frac{1}{\sigma_2} A \bbm v_2 = \frac{1}{\sqrt{24}} A \vecthree 100 = \vecfour {-1/\sqrt 2}{-1/\sqrt 6}{-1/(2\sqrt 3)}{1/2}. \]

To extend \( \{ \bbm u_1, \bbm u_2 \} \) to a basis for \( \mathbb R^4 \), we construct \( C = [\bbm u_1\ \ \bbm u_2] \) and find an orthonormal basis for \( \Nul C^T \). We row-reduce \( C^T \) to solve the equation \( C^T \bbm x = \bbm 0 \): \[ C^T = \begin{bmatrix} 0 & 0 & -\sqrt 3/2 & -1/2 \\ {-1/\sqrt 2}&{-1/\sqrt 6}&{-1/(2\sqrt 3)}&{1/2} \end{bmatrix} \longrightarrow \begin{bmatrix} 1 & 1/\sqrt 3 & 0 & -2\sqrt 2/3 \\ 0 & 0 & 1 & 1/\sqrt 3 \end{bmatrix}. \]

This gives a basis \( \left\{ \vecfour {-1/\sqrt 3} 1 0 0, \vecfour {2\sqrt 2/3} 0 {-1/\sqrt 3} 1 \right\} \) for \( \Nul C^T \). Now apply the Gram-Schmidt process to construct \( \bbm u_3 \) and \( \bbm u_4 \), the final two columns of \( U \). We have \[ U = \begin{bmatrix} 0 & -1/\sqrt 2 & -1/2 & 1/2 \\ 0 & -1/\sqrt 6 & \sqrt 3/2 & 1/(2\sqrt 3) \\ -\sqrt 3/2 & -1/(2\sqrt 3) & 0 & -1/\sqrt 6 \\ -1/2 & 1/2 & 0 & 1/\sqrt 2 \end{bmatrix}. \]

Step 4. Verify that \( A = U\Sigma V^T \).

We compute \( A=U \Sigma V^T \) using the definitions above. \( \Box\)

In the final lecture, we will learn how the Singular Value Decomposition can be applied to image compression.

« Lecture 41 Back to Top Lecture 43 »