Dimensionality reduction
- Each point can be expressed as: \(x = a_iu_i + a_2u_2 + ... + a_du_d\)
- \(u_i\) is an orthonormal vector \(u_i \cdot u_k = 0\)
- \(a = (a_1, a_2 ..., a_d)^T\) is the coordinate vector of x in the new basis \(u\)
- \(u_i\) vectors can be represented as the matrix \(U\)
- \(x = Ua\)
PCA (Principal component analysis)
- Finds r-dimensional basis that best captures the variance of the data
Best line approximation
- \(\mu = 0\)
- \(x^{'}_i = \left(\frac{u^Tx_i}{u^Tu}\right)u = (u^Tx_i)u = a_iu\) is the projection of \(x_i\) onto u
- \(a_i = u^Tx_i\) is the coordinate \(x_i'\) along u
- since \(\mu = 0\) the coordinate alonge u is also 0
- \(\sigma^2_u = u^T\Sigma u\)
- \(\Sigma\) is the covariance matrix for the centered data D
- lagrangian muliplier is used to maximize the variance subject to the orthogonality constraint
- \(\max\limits_u J(u) = u^T \Sigma u - \alpha(u^Tu-1)\)
- Taking the derivative \(\)
Minimum squared error
- \(MSE(u) = \frac{1}{n}\displaystyle\sum\limits^{n}_{i=1} \| \epsilon_i \|^2 = \displaystyle\sum\limits^{n}_{i=1} \frac{\|x_i\|^2}{n} - u^t\Sigma u = \displaystyle\sum\limits^{n}_{i=1} \sigma^2 - u^t\Sigma u\)
- total variance of centered data is \(var(D) = \frac{1}{n}\displaystyle\sum\limits^n_{i=1} \|x_i\|^2\)
Best 2-dimensional approximation
- \(\sigma^2_v = v^T \Sigma v\)
- \(\Sigma v = \alpha v\)
- second principle component (\(v = u_2\))
- u and v are both eigenvectors (λ) of \(\Sigma\)
Total Projected Variance
\(var(A) = u^T_1\Sigma u_1 + u_2^T \Sigma u_2 = \lambda_1 + \lambda_2\)
Mean squared error
\(Var(D) - Var(A)\)
Best r-dimensional approximation
Toal projected variance
- \(U_r = \) r dimensional basis vector matrix
Mean squared error
- \(var(D) - \displaystyle\sum\limits^{r}_{i=1}\lambda_i\)
total variance
- \(var(D) = \displaystyle\sum\limits^{d}_{i=1}\lambda^i\)
Choosing the dimensionality
- \(f(r) = \frac{\sum^r_{i=1} \lambda_i}{var(D)}\)
Geometry of PCA
- Principal components are always ortholinear
Kernel Principal Component Analysis
- Finds linear combinations of the high-dimensional feature space obtained from a non-linear transformations of the input dimensions
Singular Value Decomposition
- \(\Sigma = V \Delta V^T \)
- \(X = U \Delta V^T\)
- \(D = L \Delta R^T\)
- L is orthogonal n x n, R is d x d, Δ is d x d
- δ is the eigenvalues of D in a diagonal matrix \(\Delta\)
- R is the eighenvectors of \(D^TD \) and \(\Sigma\)
- L is the eigenvectors of \(DD^T\)
- \(Dr_i = \delta l_i\)
Connection between SVD and PCA
- \(n\lambda_i = \delta_i^2\)
- \(D^TD = n\Sigma\)