\[ \def\range{\text{range}} \def\Real{\mathbb{R}} \def\null{\text{null}} \def\zero{0} \def\one{\mathbf{1}} \def\tran{^\top} \def\pinv{^\dagger} \def\inv{^{-1}} \def\norm#1{\left\|#1\right\|} \def\inner#1{\left<#1\right>} \def\set#1{\left\{#1\right\}} \def\abs#1{\left|#1\right|} \def\round#1{\left(#1\right)} \]
Learning without noise
Introducing noise leads to dealing with probability. There is much to be understood in the absence of noise.
Consider data that is drawn from a linear relationship. \[y_i = \phi_i^\top \theta_\star\] for some \(\theta_\star\in\mathbb{R}^p\) and \(\phi_i=\phi(x_i)\) where \(\phi:\mathbb{R}^d\to\mathbb{R}^p\) is a feature map, and consider the problem of \(f_\theta(x)=\phi(x)^\top\theta\).
Let us write the above in a matrix-vector form \[ y = \Phi \theta_\star \] where \(y\in\mathbb{R}^n = (y_i)_{i=1}^n\) consists of targets \(y_i\) and \(\Phi\in\mathbb{R}^{n\times p}=(\phi_i^\top)_{i=1}^n\) where \(\phi_i^\top\in\mathbb{R}^p\) are the \(n\) rows of \(\Phi\).
At this point let us pause to ask a few questions:
- When does a solution exist to the linear system of equations above?
- If a solution doesn’t exist, is there a natural approximate solution?
- If a solution does exist, is it unique?
- If the solution is not unique, how do the functions \(f_{\theta_1}\) and \(f_{\theta_2}\) differ for two solutions \(\theta_1\) and \(\theta_2\) of the above system?
- If a solution doesn’t exist, is there a natural approximate solution?
Existence
The existence question is easy to answer. A solution exists if \(y\in\range(\Phi)\). If on the other hand \(y\notin\range(\Phi)\), then there is no solution to the system of equations. However, we can solve \(\Pi_T y = \Phi\theta\) where \(\Pi_T\) is the projection onto \(T=\range(\Phi)\).
Uniqueness
There is a unique \(\hat\theta\in\range(\Phi\tran)\) that solves the equation \(\Pi_T y=\Phi \theta\). This is exactly the solution expressed via the pseudoinverse1: \[\hat\theta = \Phi\pinv y\] If \(y\in\range(\Phi)\) and \(\null(\Phi)\neq\set{0_p}\), i.e., \(\Phi\) has a non-trivial null-space, then any vector \(\theta=\hat\theta+\theta_2\) is also a solution for any \(\theta_2\in\null(\Phi)\). Note that the Pythagoras theorem tells us that \[\norm{\theta}^2 = \norm{\hat\theta}^2+\norm{\theta_2}^2\] whereby \(\hat\theta\) is the minimum-norm solution of \(\Pi_S y=\Phi\theta\).