Dimensionality reduction
- high dimensional space analysis is hard
Feature selector
- Select few dimensions
- Feature == dimension
- Create new feature from several other features
Prinicpal component analysis (PCA)
- Remove linearly dependent dimensions from data
- Gives intrinsic dimensionality
- Linear combination of multiple dimensions
- Maximize variance
- minimize the sum of squared errors (sse)
- mean point is a point that minimizes the sum of squared errors
- Line that maximizes variance and minimizs the sum of squard errors