Where does the SVD come from?

This short post addresses the title question, a common query from my students. First, we will explain the singular value decomposition (SVD) using the definition as a starting point, the way it is introduced in lectures. We will then go through a derivation to arrive at the definition by analogy with eigenvalues and eigenvectors for a square matrix.

Fait accompli

In my Maths Data Book, the SVD of a real $m$ by $n$ matrix $\bf{A}$ is defined as, $$ \newcommand{\tra}[1]{\bf{#1}^\mathrm{t}} \newcommand{\Qtwo}{{\bf V}} \newcommand{\Qtt}{{\bf V}^\mathrm{t}} \newcommand{\Qone}{{\bf U}} \newcommand{\Rn}{\mathbb{R}^n} \newcommand{\Rm}{\mathbb{R}^m} \newcommand{\Sg}{{\bf \Sigma}} {\bf{A}} = {\Qone \Sg \Qtt} \ , \tag{I} $$ where:

$\Qone$ contains the $m$ unit eigenvectors of $\bf{A}\tra{A}$ as columns;
$\Qtwo$ contains the $n$ unit eigenvectors of $\tra{A}\bf{A}$ as columns;
$\Sg$ is an $m$ by $n$ diagonal matrix containing the square roots of the eigenvalues shared between $\tra{A}\bf{A}$ and $\bf{A}\tra{A}$, conventionally in descending order.

If we consider $\bf{A}$ as a linear transformation from $\Rn$ to $\Rm$, then we can see the SVD as a decomposition into three separate transformations: $\Qtwo$ is a reflection or rotation in $\Rn$, $\Sg$ is a scaling along each coordinate direction, and $\Qone$ is a reflection or rotation in $\Rm$.

The SVD also reveals orthonormal bases for the four fundamental subspaces of $\bf{A}$. Because eigenvectors are by definition orthogonal, we can form a vector space in $\Rn$ by using as many columns of $\Qtwo$ as we desire dimensions, and similarly for $\Rm$ with the columns of $\Qone$. If the rank of $\bf{A}$ is $r$, we need $r$ vectors for each of the column and row space, $n-r$ for the null space, and $m-r$ for the left null space.

It turns out that exactly $r$ of the singular values are non-zero, allowing a two-way mapping between $\Rn$ and $\Rm$ of the first $r$ columns of $\Qtwo$ and $\Qone$. These are bases for the row and column spaces respectively.

If we multiply out the decomposition, Eqn. $\rm(I)$, the remaining $n-r$ columns of $\Qtwo$, corresponding to zero singular values, are mapped to $\vec{0}$. That is, they are solutions to ${\bf{A}}\vec{x} = \vec{0}$ and and a basis for the null space. Taking the transpose of the definition, we also see that the last $m-r$ columns of $\Qone$ are mapped to $\vec{0}$, are solutions of ${\tra{A}}\vec{b}=\vec{0}$, and hence a basis for the left null space.

Ab initio

This is all very well, but how did we get here? We will now take a different perspective, that links back to eigenvalues and illustrates the significance of $\bf{A}\tra{A}$.

An eigenvector $\vec{a}$ of the matrix $\bf{C}$ maps to itself scaled by the eigenvalue $\lambda$, or $$ {\bf{C}}\vec{a} = \lambda \vec{a} \ . $$ Eigenvectors and eigenvalues are neat, but they only work for square matrices mapping within the same $\Rn$. For an $m$ by $n$ rectangular matrix, we need two sets of special unit vectors $\vec{v}$ and $\vec{u}$ in $\Rn$ and $\Rm$, but we can get close to the spirit of eigenvectors by defining, $$ {{\bf{A}}\vec{v}} = \sigma \vec{u} \ , \tag{II} $$ where $\sigma$ is a scaling factor. Stacking special vectors column-wise into matrices $\bf V$ and $\bf U$, $$ {\bf{A}\bf{V}} = {\bf{U} \Sg }\ , $$ where $\Sg$ is a diagonal matrix of $\sigma$ values for each $\vec{v}$, $\vec{u}$ pair, post-multiplying to scale the columns of $\bf{U}$.

If the special vectors are unit, then $\bf U$ and $\bf V$ are orthonormal, $$ {\bf{U}}^{-1} = {\tra{U}} \ , \quad {\bf{V}}^{-1} = {\tra{V}} \ , \quad \Rightarrow \quad {{\bf{U}}\tra{U}} = 1 \ , \quad {{\bf{V}}\tra{V}} = 1 $$ and we can rearrange to get to our original definition of Eqn. $\rm (I)$, ${\bf{A}} = {\bf U\Sg\tra{V}} $. Then after simplifying, $$ \begin{align} {\bf{A}\tra{A}} &= {\bf U\Sg\tra{V}V\tra{\Sg}\tra{U}} = {\bf U} ({\Sg\tra{\Sg}}) {\tra{U}} \\ {\tra{A}\bf{A}} &= {\bf V\tra{\Sg}\tra{U}U\Sg\tra{V}} = {\bf V} ({\tra{\Sg}\Sg}) {\tra{V}} \end{align} $$ These equations are diagonalised forms of the square matrices $\bf{A}\tra{A}$ and $\tra{A}\bf{A}$. Our special matrices $\vec{u}$ are in fact the left-singular vectors, eigenvectors of $\bf{A}\tra{A}$. Our special matrices $\vec{v}$ are the right-singular vectors, eigenvectors of $\tra{A}\bf{A}$. Because $\Sg$ is a diagonal matrix, both $\bf \Sg\tra{\Sg}$ and $\tra{\Sg}\Sg$ are also diagonal with elements $\sigma^2$, hence why we have to square root the eigenvalues of $\bf{A}\tra{A}$.

The point is, Eqn. $\rm (II)$ implies our original definition in Eqn. $\rm (I)$.

2022-04-04

#maths