Singular Value Decomposition

Let \(X\in\mathbb R^{n\times d}\) and define 2 matrices \(H=X^\top X \in \mathbb R^{d\times d}\) and \(K=XX^\top\in\mathbb R^{n\times n}\). These matrices lie in different spaces, but they share a special connection.

Let us, for simplicity of notation, also denote \(S:=\text{range}(X^\top)\) and \(T=\text{range}(X)\), and thanks to the fundamental theorem of linear algebra, we have \(\text{null}(X)=S^\perp\) and \(\text{null}(X^\top)=T^\perp\).

First off, they are both positive semi-definite, since \(v^\top Hv = v^\top X^\top Xv = (Xv)^\top (Xv) = |\!|Xv|\!|^2 \geq 0\). Next, they both have the same rank as \(X\), i.e., \(r=\text{rank}(X)=\text{rank}(H)=\text{rank}(K)\). We showed this earlier when we showed that \(\text{range}(X^\top)=\text{range}(H)\), and similarly \(\text{range}(X)=\text{range}(K)\).

Symmetric positive definite matrices have some nice properties. They have a full set of eigenvectors due to their symmetry, and all their eigenvalues are non-negative. And they have exactly as many non-zero eigenvalues as their rank.

We say \((\lambda,v)\in\mathbb R\times\mathbb R^k\) is an eigenpair of \(A\in\mathbb R^{k\times k}\) if \(Av=\lambda v\).

Claim: If \((\sigma^2,v)\) is an eigenpair of \(H\), then \((\sigma^2, Xv)\) is an eigenpair of \(K\).

Let \(\{v_i\}_{i=1}^r\) be unit-norm eigenvectors of \(H\) with non-zero eigenvalues \(\sigma_i^2\) respectively, sorted in descending order of the eigenvalues, i.e., \(\sigma_1>\sigma_2>\ldots\sigma_r>0\). Further, let \(\{v_i\}_{i=r+1}^d\) be the remaining eigenvectors of \(H\) with eigenvalue \(0\).1

For \(1\leq i\leq r\), define \(u_i = \frac{Xv_i}{\sigma_i}\), and verify that they are unit-norm eigenvectors of \(K\), and let \(\{u_i\}_{i=r+1}^{n}\) be the unit-norm eigenvectors of \(K\) with eigenvalue \(0\).

Let \(V_r:=[v_1\mid v_2\mid \ldots\mid v_r]\in\mathbb R^{d\times r}\) and let \(V_{-r}:=[v_{r+1}\mid \ldots \mid v_d]\in\mathbb R^{d\times (d-r)}\), and similarly define \(U_r\in\mathbb R^{n\times r}\) and \(U_{-r}\in\mathbb R^{n\times (n-r)}\).

Note that \(V_r^\top V_r=U_r^\top U_r = I_r\), and similarly \(V_{-r}^\top V_{-r}=I_{d-r}\) and \(U_{-r}^\top U_{-r} = I_{n-r}\),

We denote by \(\Pi_S\) the idempotent linear map that projects onto subspace \(S\).

Claim: \(\text{range}(V_r) = S\), and \(\text{range}(V_{-r})=S^\perp\), and \(\text{range}(U_r)=T\), and \(\text{range}(U_{-r})=T^\perp.\) Furthermore, \(V_r V_r^\top=\Pi_S\), and \(U_r U_r^\top=\Pi_T\), and \(V_{-r} V_{-r}^\top=\Pi_{S^\perp}\), and \(U_{-r} U_{-r}^\top=\Pi_{T^\perp}\).

Observe that \(XV_r = U_r\Sigma_r\) and \(XV=U\Sigma\), where \(\Sigma_r = \begin{bmatrix} \sigma_1 \\\\ &\ddots\\\\ &&\sigma_r\end{bmatrix}\in\mathbb R^{r\times r}\), and \(\Sigma = \begin{bmatrix}\Sigma_r & 0_{r\times (d-r)}\\\\ 0_{(n-r)\times r} & 0_{(n-r)\times (d-r)}\end{bmatrix}\in\mathbb R^{n\times d},\) and \(V=[V_r\mid V_{-r}]\in\mathbb R^{d\times d}\) and \(U=[U_r\mid U_{-r}]\in\mathbb R^{n\times n}\)

This yields what is called the singular value decomposition (SVD) \[X = U\Sigma V^\top = U_r \Sigma_r V_r^\top\] The first decomposition in terms of \((U,\Sigma,V)\) is called the full-SVD whereas the second one is called economy-SVD.

This is closely related to the (economy and full) eigendecomposition of \(H\) and \(K\) \[H=V_r\Sigma_r^2 V_r^\top = V\Sigma^\top\Sigma V^\top \qquad K = U_r \Sigma_r^2 U_r^\top=U\Sigma\Sigma^\top U^\top.\]

The SVD gives a nice interpretation to the action of the linear map \(X\) as a sequence of \(3\) operations: rotation \(\to\) scaling \(\to\) rotation.

Pseudo-inverse

The pseudo-inverse is expressed in a straightforward manner via the SVD. We have \[X^\dagger = V_r \Sigma_r^{-1} U_r^\top=V\Sigma^\dagger U^\top \in \mathbb R^{d\times n}\] where \(\Sigma^\dagger = \begin{bmatrix}\Sigma_r^{-1} & 0_{r\times (n-r)}\\\\ 0_{(d-r)\times r} & 0_{(d-r)\times (n-r)}\end{bmatrix}\in\mathbb R^{d\times n},\)

Footnotes

  1. The case of repeated eigenvalues can be handled too. In that case if \(\sigma_i=\sigma_{i+1}\), we can find orthogonal eigenvectors \((v_i, v_{i+1})\). In reality, the \(\text{span}(\{v_i,v_{i+1}\})\) is an eigenspace of the matrix. The dimension of the eigenspace is called the geometric multiplicity of the eigenvalue \(\sigma_i^2\). For symmetric matrices the geometric multiplicity of eigenvalues is the same as the algebraic multiplicity, which is the order of the root \(\sigma_i^2\) in the characteristic equation \(\text{det}(H-\lambda I)=0\).↩︎