chapter7

Dimensionality reduction

Each point can be expressed as: \(x = a_iu_i + a_2u_2 + ... + a_du_d\)
\(u_i\) is an orthonormal vector \(u_i \cdot u_k = 0\)
\(a = (a_1, a_2 ..., a_d)^T\) is the coordinate vector of x in the new basis \(u\)
\(u_i\) vectors can be represented as the matrix \(U\)
\(x = Ua\)

PCA (Principal component analysis)

Finds r-dimensional basis that best captures the variance of the data

Best line approximation

\(\mu = 0\)
\(x^{'}_i = \left(\frac{u^Tx_i}{u^Tu}\right)u = (u^Tx_i)u = a_iu\) is the projection of \(x_i\) onto u
\(a_i = u^Tx_i\) is the coordinate \(x_i'\) along u
since \(\mu = 0\) the coordinate alonge u is also 0
\(\sigma^2_u = u^T\Sigma u\)
\(\Sigma\) is the covariance matrix for the centered data D
lagrangian muliplier is used to maximize the variance subject to the orthogonality constraint
\(\max\limits_u J(u) = u^T \Sigma u - \alpha(u^Tu-1)\)
Taking the derivative \(\)

Minimum squared error

\(MSE(u) = \frac{1}{n}\displaystyle\sum\limits^{n}_{i=1} \| \epsilon_i \|^2 = \displaystyle\sum\limits^{n}_{i=1} \frac{\|x_i\|^2}{n} - u^t\Sigma u = \displaystyle\sum\limits^{n}_{i=1} \sigma^2 - u^t\Sigma u\)
total variance of centered data is \(var(D) = \frac{1}{n}\displaystyle\sum\limits^n_{i=1} \|x_i\|^2\)

Best 2-dimensional approximation

\(\sigma^2_v = v^T \Sigma v\)
\(\Sigma v = \alpha v\)
second principle component (\(v = u_2\))
u and v are both eigenvectors (λ) of \(\Sigma\)

Total Projected Variance

\(var(A) = u^T_1\Sigma u_1 + u_2^T \Sigma u_2 = \lambda_1 + \lambda_2\)

Mean squared error

\(Var(D) - Var(A)\)

Best r-dimensional approximation

Toal projected variance

\(U_r = \) r dimensional basis vector matrix

Mean squared error

\(var(D) - \displaystyle\sum\limits^{r}_{i=1}\lambda_i\)

total variance

\(var(D) = \displaystyle\sum\limits^{d}_{i=1}\lambda^i\)

Choosing the dimensionality

\(f(r) = \frac{\sum^r_{i=1} \lambda_i}{var(D)}\)

Geometry of PCA

Principal components are always ortholinear

Kernel Principal Component Analysis

Finds linear combinations of the high-dimensional feature space obtained from a non-linear transformations of the input dimensions

Singular Value Decomposition

\(\Sigma = V \Delta V^T \)
\(X = U \Delta V^T\)
\(D = L \Delta R^T\)
L is orthogonal n x n, R is d x d, Δ is d x d
δ is the eigenvalues of D in a diagonal matrix \(\Delta\)
R is the eighenvectors of \(D^TD \) and \(\Sigma\)
L is the eigenvectors of \(DD^T\)
\(Dr_i = \delta l_i\)

Connection between SVD and PCA

\(n\lambda_i = \delta_i^2\)
\(D^TD = n\Sigma\)