LDA Linear discriminant analysis
optimal linear discriminant
- LDA maximizes the separation by ensuring that the scatter is small
- \(s^2_i = \sum\limits_{x_j \in D_i}(a_j-m_i)^2 = n_i\sigma^2_i\)
- \(n_i = |D_i|\) is the size
- \(\max\limits_w j(w) = \frac{(m_1-m_2)^2}{s^2_1+s^2_2}\)
- goal of lda is to find the vector \(w\) that maximizes \(j(w)\). This maximizes the separation between \(m_1 \) and \(m_2\) and minimizes the total scatter \(s^2_1+s^2_2\) of the two classes
- \((m_1-m_2)^2 = w^TBw\)
- \(B = (\mu_1-\mu_2)(\mu_1-\mu_2)^T\) is a d x d rank one matrix (between class scatter matrix)
- \(s^2 = w^TS_1w\)
- \(s^2_2 = w^TS_2w\)
- \(S_i = n_i\Sigma_i\)
- \(\max\limits_w J(w) = \frac{w^TBw}{w^TSw}\)
- \(bW =\lambda Sw\)
- \(S^{-1}Bw = \lambda w\)
- computing S takes \(O(nd^2)\) time
- computing the dominant eigenvalue-eigenvector pair takes \(O(d^3)\) time