A: Linear Algebra¶
A1¶
Inner Product¶
Consider we have a vectors \(\mathbf{x}\in\mathbb{R}^n\) and \(\mathbf{y}\in\mathbb{R}^n\) then we define the inner product, \(\langle\cdot,\cdot\rangle: \mathbb{R}^n,\ \mathbb{R}^n \rightarrow \mathbb{R}\) as
Defining first \(a\in\mathbb{R}\), the inner product has the following properties:
Linearity
\[\begin{split}\begin{align*} \langle a \mathbf{x},\mathbf{y}\rangle &= \langle\mathbf{x},a\mathbf{y}\rangle = a \langle\mathbf{x},\mathbf{y}\rangle\\ \langle \mathbf{x} + \mathbf{y}, \mathbf{z}\rangle &= \langle\mathbf{x},\mathbf{z}\rangle + \langle\mathbf{y},\mathbf{z}\rangle \end{align*}\end{split}\]Conjuagacy
\[\langle\mathbf{x},\mathbf{y}\rangle = \langle\mathbf{y},\mathbf{x}\rangle\]Semi-positive definite
\[\langle\mathbf{x},\mathbf{x}\rangle \ge 0\quad\]with equality if and only if \(\mathbf{x} = \mathbf{0}\)
A2¶
Vector Norm¶
Consider we have a vector \(\mathbf{z}\in\mathbb{R}^n\) then the vector norm is defined by
The 2-norm is induced by the square root of its inner product \(\sqrt{\langle\mathbf{z},\mathbf{z}\rangle}\) and is often referred to as the Euclidean distance. Other popular settings of \(p\) are
The common setting if \(p\) is not stated is 2 i.e. \(||\mathbf{x}|| = ||\mathbf{x}||_2\).
A3¶
Cauchy-Schwarz Inequality¶
The Cauchy-Schwarz inequality state that for \(\mathbf{x}\in\mathbb{R}^n\), \(\mathbf{y}\in\mathbb{R}^n\), it is true that
Proof.
Let \(\mathbf{z} = \mathbf{x} - \frac{\langle\mathbf{x},\mathbf{y}\rangle}{\langle\mathbf{y},\mathbf{y}\rangle}\mathbf{y}\) then by linearity of the inner product, we have
which implies that \(\mathbf{z}\) is orthogonal to \(\mathbf{y}\). We can then apply Pythagoras’ theorem to
A4¶
Triangle Inequality¶
The triangle inequality states that for \(\mathbf{x}\in\mathbb{R}^n\) and \(\mathbf{y}\in\mathbb{R}^n\) we have
Proof.
A5¶
Matrix Norm¶
Consider the matrix \(\mathbf{A}\in\mathbb{R}^{n,m}\) then the element-wise matrix norm is defined as
Stating the most common use cases of \(p\), we have
where \(||\cdot||_\mathcal{F}\) is known as the Frobenius norm, \(\mathbf{A}^\text{T}\mathbf{A} = \mathbf{V}\boldsymbol{\Lambda}\mathbf{V}^\text{T}\) is known as the eigen-decomposition with \(\mathbf{V}=[\mathbf{v}_1,...,\mathbf{v}_m]\) and \(\boldsymbol{\Lambda} = \text{diag}(\lambda_1,...,\lambda_m)\) are known as eigenvectors and eigenvalues respectively where \(\lambda_1 \ge \lambda_2\ge...\ge\lambda_m\).
Alternative to the element-wise matrix norms, there is also the induced (or operator) norm. Defining \(\mathbf{x}\in\mathbb{R}^m\) we have
where we again, use the eigen-decomposition, and let \(\mathbf{y} = \mathbf{V}^\text{T}\mathbf{x}\). We can observe that if \(y_1 = 1\) and \(y_i = 0\) if \(i > 1\), then we obtain the maximum (with the constraint that \(||\mathbf{y}|| = 1\)).