Learning without noise

Introducing noise leads to dealing with probability. There is much to be understood in the absence of noise.

Consider data that is drawn from a linear relationship. \[y_i = \phi_i^\top \theta_\star\] for some \(\theta_\star\in\mathbb{R}^p\) and \(\phi_i=\phi(x_i)\) where \(\phi:\mathbb{R}^d\to\mathbb{R}^p\) is a feature map, and consider the problem of \(f_\theta(x)=\phi(x)^\top\theta\).

Let us write the above in a matrix-vector form \[ y = \Phi \theta_\star \] where \(y\in\mathbb{R}^n = (y_i)_{i=1}^n\) consists of targets \(y_i\) and \(\Phi\in\mathbb{R}^{n\times p}=(\phi_i^\top)_{i=1}^n\) where \(\phi_i^\top\in\mathbb{R}^p\) are the \(n\) rows of \(\Phi\).

At this point let us pause to ask a few questions:

  • When does a solution exist to the linear system of equations above?
    • If a solution doesn’t exist, is there a natural approximate solution?
    • If a solution does exist, is it unique?
      • If the solution is not unique, how do the functions \(f_{\theta_1}\) and \(f_{\theta_2}\) differ for two solutions \(\theta_1\) and \(\theta_2\) of the above system?

Existence

The existence question is easy to answer. A solution exists if \(y\in\range(\Phi)\). If on the other hand \(y\notin\range(\Phi)\), then there is no solution to the system of equations. However, we can solve \(\Pi_T y = \Phi\theta\) where \(\Pi_T\) is the projection onto \(T=\range(\Phi)\).

Uniqueness

There is a unique \(\hat\theta\in\range(\Phi\tran)\) that solves the equation \(\Pi_T y=\Phi \theta\). This is exactly the solution expressed via the pseudoinverse1: \[\hat\theta = \Phi\pinv y\] If \(y\in\range(\Phi)\) and \(\null(\Phi)\neq\set{0_p}\), i.e., \(\Phi\) has a non-trivial null-space, then any vector \(\theta=\hat\theta+\theta_2\) is also a solution for any \(\theta_2\in\null(\Phi)\). Note that the Pythagoras theorem tells us that \[\norm{\theta}^2 = \norm{\hat\theta}^2+\norm{\theta_2}^2\] whereby \(\hat\theta\) is the minimum-norm solution of \(\Pi_S y=\Phi\theta\).

Footnotes

  1. See notes on the pseudoinverse here.↩︎