goal of lda is to find the vector \(w\) that maximizes \(j(w)\). This maximizes the separation between \(m_1 \) and \(m_2\) and minimizes the total scatter \(s^2_1+s^2_2\) of the two classes
\((m_1-m_2)^2 = w^TBw\)
\(B = (\mu_1-\mu_2)(\mu_1-\mu_2)^T\) is a d x d rank one matrix (between class scatter matrix)
\(s^2 = w^TS_1w\)
\(s^2_2 = w^TS_2w\)
\(S_i = n_i\Sigma_i\)
\(\max\limits_w J(w) = \frac{w^TBw}{w^TSw}\)
\(bW =\lambda Sw\)
\(S^{-1}Bw = \lambda w\)
computing S takes \(O(nd^2)\) time
computing the dominant eigenvalue-eigenvector pair takes \(O(d^3)\) time