(Finite dimensional) Linear Algebra

Vector space: A space $\mathbb V$ is a set consisting of elements, called vectors, that is closed under the unary operation of scaling and the binary operation of addition, i.e., if
- Scaling: if $v\in\mathbb V$ then $\alpha v\in \mathbb V$, where where $\alpha$ is a scalar that belongs to some prespecified field,
- Addition: $v_1,v_2\in\mathbb V$ then $v_1 + v_2\in\mathbb V$.
There is a unqiue special element in every vector space, called zero and denoted $0_{\mathbb V}$, which satisfies $v+0_{\mathbb V}=v$ for all $v\in \mathbb V$.

Norm: A norm is a unary real-valued function over a vector space that satisfies 3 key properties of
- Positive definiteness: $|\!|x|\!|\geq 0$ with equality if and only if $x=0_{\mathbb V}$,
- Homogeneity: $|\!|\alpha x|\!| = |\alpha|\cdot|\!|x|\!|$ for any scalar $\alpha$, and
- Triangle inequality: $|\!|x+y|\!|\leq |\!|x|\!| + |\!|y|\!|$ with equality if only if $y=\alpha x$ for some scalar $\alpha\in\mathbb R$ or $x=\mathbf 0_n$.
The norm is usually denoted as $|\!|v|\!|$.

For example the general $\ell_p$ norm $|\!|x|\!|_p:\left(|x_1|^p + |x_2|^p + \ldots + |x_n|^p\right)^{1/p}$ for $p\geq 1$.

A Banach space is a vector space endowed with a norm.

A Hilbert space is a special type of vector space which is endowed with yet another binary operation called the inner-product, which must satisfy the properties
- (conjugate) Symmetry: $\left< u,v\right> = \overline{\left< v,u\right>}$ where the $\overline{z}$ denotes the complex conjugate of a complex number $z\in\mathbb C$.
- Linearity in first argument: $\left< \alpha u_1+\beta u_2,v\right>=\alpha \left< u_1,v\right>+\beta\left<u_2,v\right>$
- Positive definite: $\left<x,x\right>>0$ if $x\neq 0_{\mathbb V}$.
A Hilbert space is also a Banach space because $\sqrt{\left<x,x\right>}$ is a norm due to the positive definiteness of the inner-product. A norm that is defined in such a manner via an inner-produce is called an induced norm.

Recall that a set $S\subseteq\mathbb{R}^n$ is a subspace if for any $u,v\in S,$ the element $\alpha u+ \beta v\in S$ for all $\alpha,\beta\in\mathbb{R}$. For every subspace, we can define a unique subspace $S^\perp\subseteq \mathbb{R}^n$, called its orthogonal complement consisting of all vectors orthogonal to every element in $S$. \[S^\perp := \{u\mid v^\top u = 0, \text{for all } v\in S\}.\]

Note that \[\text{dim}(S) + \text{dim}(S^\perp) = n\]

A map $A:\mathbb U\to\mathbb V$ between two Hilbert space $\mathbb U, \mathbb V$ is said to be linear if $ A(u_1+u_2) = A(u_1) + A(u_2).$

We will focus on the case when $\mathbb U=\mathbb R^d$ and $\mathbb V=\mathbb R^n$. Every map $A:\mathbb R^d\to\mathbb R^n$ can be represented as a matrix of size $(n,d)$. This is because any vector $v$ in $\mathbb {R}^d$ can be written as $\sum_i v_i \mathbf e_i$ where $\mathbf e_i$ is the $i^{\text{th}}$ column of $I_d$ and $v_i$ is the $i^{\rm th}$ coordinate of $v$. Hence $ A(v) = _{i=1}^d v_i A(_i).$ Consider the matrix \[X = \begin{bmatrix} A(\mathbf{e}_1) & A(\mathbf{e}_2) & \ldots & A(\mathbf{e}_d)\end{bmatrix}\in\mathbb{R}^{n\times d}\] where each column $A(\mathbf{e}_i)\in\mathbb R^n$. Hence instead of using the map $A$, we simply use its matrix representation $X$ for all linear maps between finite dimensional Euclidean spaces.

Recall that for a linear map $X\in\mathbb{R}^{n\times d}$, there are 4 subspaces of interest

$\text{null}(X) = \{v\in\mathbb{R}^n\mid Xv=0\}\subseteq\mathbb{R}^n$
$\text{range}(X) = \{Xv\in\mathbb{R}^m\mid v\in\mathbb{R}^n\}\subseteq\mathbb{R}^m$
$\text{null}(X^\top)\subseteq\mathbb{R}^m$
$\text{range}(X^\top)\subseteq\mathbb{R}^m$

The span of a set $\mathfrak T\subseteq\mathbb{R}^n$, denoted $\text{span}(\mathfrak T)$ is the set consisting of all linear combinations of elements in $\mathfrak T$. Suppose $|\mathfrak T|=k<\infty$, then we can write the matrix $T=[t_1 \ t_2\ \ldots \ t_k]\in\mathbb R^{n\times k}$. We can write \[\text{span}(\mathfrak T)=\text{range}(T).\]

A set of vectors $\{t_i\}_{i=1}^k$, where $t_i\in\mathbb R^n$ are said to be linearly independent if the equation \[\sum_{i=1}^n \alpha_i t_i = 0_n\] has a unique solution: $\alpha_i=0$ for all $1\leq i\leq k$.

A vector $x\neq \mathbf{0}_n$ is linearly dependent on a set $\mathfrak T=\{t_1, t_2,\ldots, t_k\}\subseteq\mathbb R^n$ if there exist scalars $\{\alpha_i\}_{i=1}^k$ not all equal to $0$, such that $x=\alpha_1 t_1+\alpha_2 t_2 + \ldots + \alpha_k t_k$.

A set $S_0$ is a basis of the subspace $S$ if $S_0$ is linearly independent and $\text{span}(S_0)=S$.

The canonical norm of a vector $x\in \mathbb{R}^n$ is defined as \[|\!|x|\!|:=\sqrt{x^\top x} = \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}\] This norm is rotationally invariant, i.e., if we rotate the coordinate system, then the norm remains the same. Hence this norm is the length of the vector. Other

The dimension of a subspace is the size of the smallest basis that spans the subspace.

The rank of the matrix $X$ denoted $\text{rank}(X)$ is the number of linearly independent columns in $X$. This is exactly equal to the dimension of $\text{range}(X^\top)$.

Claim: (Fundamental theorem of linear algebra) $\text{null}(X)^{\perp} = \text{range}(X^\top)$.

Proof: Let $u\in \text{null}(X)$, and let $v\in \text{range}(X^\top)$. Then there exists a $w\in\mathbb R^n$ such that $v=X^\top w$. Now observe that $u^\top v=u^\top X^\top w = (Xu)^\top w=0$.
$\square$

Rank-nullity theorem: The above result yields, \[\text{rank}(X)+\text{nullity}(X)=\text{dim}\left(\text{dom}(X)\right).\] where $\text{nullity}(X)=\text{dim}\left(\text{null}(X)\right)$ and $\text{rank}(X)=\text{dim}(\text{range}(X^\top))$.

Claim: $\text{range}(X^\top X) = \text{range}(X^\top)$.

Proof: Showing $\text{range}(X^\top X) \subseteq \text{range}(X^\top)$ is trivial. Let us show that $\text{range}(X^\top X) \supseteq \text{range}(X^\top)$. Let $v\in \text{range}(X^\top)$. Then there exists a $u\in\mathbb R^n$ such that $v=X^\top u$. We can decompose $u = u_1\oplus u_2$ where $u_1\in\text{null}(X^\top)$ and $u_2\in\text{range}(X)$ whereby $u_2=X w$ for some $w\in\mathbb R^d$. This means that $v=X^\top u_1 + X^\top u_2 = 0_d + X^\top X w=X^\top Xw$, whereby $v\in\text{range}(X^\top X)$. $\square$

Claim: $\text{rank}(X)=\text{rank}(X^\top)$.

Proof: Suppose $r=\text{rank}(X)=\text{dim}(\text{range}(X^\top))$, so there exist $r$ linearly independent vectors $\{u_i\}_{i=1}^r$ that span $\text{range}(X^\top)$. Observe that $\{Xu_i\}_{i=1}^r$ are linearly independent in $\text{range}(X)$, whereby $\text{dim}(\text{range}(X))=\text{rank}(X^\top)\geq r$.
Now consider a non-zero vector $v\in\text{range}(X)=\text{range}(XX^\top)$, and let $w\in\text{range}(X^\top)$ such that $v=Xw$. Note that $w=\sum_{j=1}^r \beta_j u_j$ for some $\{\beta_j\}_{j=1}^r$ since $\{u_j\}_{j=1}^r$ span $\text{range}(X^\top)$. Hence $v=\sum_{j=1}^r \beta_j Xu_j$. Since $v$ was arbitrary, we have shown that $\{Xu_j\}_{j=1}^r$ span $\text{range}(X)$, this proves that $\text{dim}(\text{range}(X))=\text{rank}(X^\top)\leq r$. Together with the lower bound this proves the claim.
$\square$

For a map $ A:$, and $ T$, we denote by $ A|{ T}$ the restriction of $ A$ to $ T$, i.e., if $ B= A|{ T}$, then \[ B: T\to\Gamma,\quad \text{with}\quad B(u)= A(u)\quad\text{for all}\quad u\in T.\]

Let $S$ denote $\text{range}(X^\top)\subseteq\mathbb R^d$.
Claim: $M:=X^\top X\big|_{S}:S\to S$ is an invertible map. (Note here that the codomain has also been restricted.)

Proof: For invertibility, we need to show that $M$ is surjective and injective. The surjectivitiy holds by definition of $M$. To show the injectivity, let $u,v\in S$ such that $Mu=Mv$. This means $X^\top X(u-v)=0.$ Let $w=X(u-v)\in\text{range}(X)=\text{null}(X^\top)^\perp$, whereby the only solution to $X^\top w=0$ is $w=0_n$. Similarly, since $(u-v)\in\text{range}(X^\top)=\text{null}(X)^\perp$, the only solution to $X(u-v)=0$ is $u-v=0_d$ which proves the injectivity.
$\square$

Pseudo-inverse

If $S:=\text{range}(X^\top)$ and $T:=\text{range}(X)$, and suppose we define $P:S\to T,$ to be the restriction $P:=X\big|_S$, i.e., $Pu = Xu$ for all $u\in S$.

We can show that $P$ is an invertible map. Furthermore we can show that $X=P\Pi_S$. The pseudo-inverse is defined as the map $X^\dagger:\mathbb R^n\to\mathbb R^d$ given by \[X^\dagger := P^{-1} \Pi_T\] Note here that $\Pi_T:\mathbb R^n \to T$ and $P^{-1}: T\to S\subseteq\mathbb R^d$.

More details on how $X^\pinv$ can be computed are provided in the notes on the singular value decomposition (SVD.)