Where does the SVD come from?
This short post addresses the title question, a common query from my students. First, we will explain the singular value decomposition (SVD) using the definition as a starting point, the way it is introduced in lectures. We will then go through a derivation to arrive at the definition by analogy with eigenvalues and eigenvectors for a square matrix.
Fait accompli
In my Maths Data Book, the SVD of a real m by n matrix A is defined as, \newcommand{\tra}[1]{\bf{#1}^\mathrm{t}} \newcommand{\Qtwo}{{\bf V}} \newcommand{\Qtt}{{\bf V}^\mathrm{t}} \newcommand{\Qone}{{\bf U}} \newcommand{\Rn}{\mathbb{R}^n} \newcommand{\Rm}{\mathbb{R}^m} \newcommand{\Sg}{{\bf \Sigma}} {\bf{A}} = {\Qone \Sg \Qtt} \ , \tag{I} where:
- \Qone contains the m unit eigenvectors of \bf{A}\tra{A} as columns;
- \Qtwo contains the n unit eigenvectors of \tra{A}\bf{A} as columns;
- \Sg is an m by n diagonal matrix containing the square roots of the eigenvalues shared between \tra{A}\bf{A} and \bf{A}\tra{A}, conventionally in descending order.
If we consider \bf{A} as a linear transformation from \Rn to \Rm, then we can see the SVD as a decomposition into three separate transformations: \Qtwo is a reflection or rotation in \Rn, \Sg is a scaling along each coordinate direction, and \Qone is a reflection or rotation in \Rm.
The SVD also reveals orthonormal bases for the four fundamental subspaces of \bf{A}. Because eigenvectors are by definition orthogonal, we can form a vector space in \Rn by using as many columns of \Qtwo as we desire dimensions, and similarly for \Rm with the columns of \Qone. If the rank of \bf{A} is r, we need r vectors for each of the column and row space, n-r for the null space, and m-r for the left null space.
It turns out that exactly r of the singular values are non-zero, allowing a two-way mapping between \Rn and \Rm of the first r columns of \Qtwo and \Qone. These are bases for the row and column spaces respectively.
If we multiply out the decomposition, Eqn. \rm(I), the remaining n-r columns of \Qtwo, corresponding to zero singular values, are mapped to \vec{0}. That is, they are solutions to {\bf{A}}\vec{x} = \vec{0} and and a basis for the null space. Taking the transpose of the definition, we also see that the last m-r columns of \Qone are mapped to \vec{0}, are solutions of {\tra{A}}\vec{b}=\vec{0}, and hence a basis for the left null space.
Ab initio
This is all very well, but how did we get here? We will now take a different perspective, that links back to eigenvalues and illustrates the significance of \bf{A}\tra{A}.
An eigenvector \vec{a} of the matrix \bf{C} maps to itself scaled by the eigenvalue \lambda, or {\bf{C}}\vec{a} = \lambda \vec{a} \ . Eigenvectors and eigenvalues are neat, but they only work for square matrices mapping within the same \Rn. For an m by n rectangular matrix, we need two sets of special unit vectors \vec{v} and \vec{u} in \Rn and \Rm, but we can get close to the spirit of eigenvectors by defining, {{\bf{A}}\vec{v}} = \sigma \vec{u} \ , \tag{II} where \sigma is a scaling factor. Stacking special vectors column-wise into matrices \bf V and \bf U, {\bf{A}\bf{V}} = {\bf{U} \Sg }\ , where \Sg is a diagonal matrix of \sigma values for each \vec{v}, \vec{u} pair, post-multiplying to scale the columns of \bf{U}.
If the special vectors are unit, then \bf U and \bf V are orthonormal, {\bf{U}}^{-1} = {\tra{U}} \ , \quad {\bf{V}}^{-1} = {\tra{V}} \ , \quad \Rightarrow \quad {{\bf{U}}\tra{U}} = 1 \ , \quad {{\bf{V}}\tra{V}} = 1 and we can rearrange to get to our original definition of Eqn. \rm (I), {\bf{A}} = {\bf U\Sg\tra{V}} . Then after simplifying, \begin{align} {\bf{A}\tra{A}} &= {\bf U\Sg\tra{V}V\tra{\Sg}\tra{U}} = {\bf U} ({\Sg\tra{\Sg}}) {\tra{U}} \\ {\tra{A}\bf{A}} &= {\bf V\tra{\Sg}\tra{U}U\Sg\tra{V}} = {\bf V} ({\tra{\Sg}\Sg}) {\tra{V}} \end{align} These equations are diagonalised forms of the square matrices \bf{A}\tra{A} and \tra{A}\bf{A}. Our special matrices \vec{u} are in fact the left-singular vectors, eigenvectors of \bf{A}\tra{A}. Our special matrices \vec{v} are the right-singular vectors, eigenvectors of \tra{A}\bf{A}. Because \Sg is a diagonal matrix, both \bf \Sg\tra{\Sg} and \tra{\Sg}\Sg are also diagonal with elements \sigma^2, hence why we have to square root the eigenvalues of \bf{A}\tra{A}.
The point is, Eqn. \rm (II) implies our original definition in Eqn. \rm (I).