Support vector machines (SVMs)

Linear classifier
hyperplane that separates the two classes (can be generalized to more than 2 classes)
LDA is linear classifier as well

Objective function

Find the optimal hyperplane that separates two classes

\(h(\vec x) = h(\vec x_p + \vec r)\)
\(h(\vec x) = h(\vec x_p + r \cdot \frac{\vec w}{\|w\|})\)
\(r = \frac{h(x)}{\|w\|} = \frac{w^Tx + b}{\|w\|}\) distance from any point to hyperplane
\(r_0 = \frac{b}{\|w\|}\) the distance from origin to hyperplane (offset)
Margin maximizes distance to margin
Scale so that smallest distance equal to 1

\(S^+_k = \min \{\frac{y_i h(x_i)}{\|w\|}\} \xrightarrow{scale} \frac{1}{\|w\|}\)
\(sh(x) = sw^Tx + sb = s0 = 0\)
closest point to dividing line is 1 (margin is 1)
constraints: all distances must be \(\ge 1\)

decision boundaries (hyperplane)
Widest street (largest margin)
project unknown value onto the orthogonal vector w
\(\vec w \cdot \vec u \ge c\)
\(\vec w \cdot \vec u + b \ge 0\) then it is a positive sample
\(\vec w \cdot \vec x + b \ge 1\)
\(\vec w \cdot \vec x + b \le 1\)
\(y_i = +1 \text{\:for\:} + \text{samples}\)
\(y_i = -1 \text{\:for\:} - \text{positive samples}\)
\(\vec x \veca + b) -1 \ge 0\)
WIDTH = \(\vec x_+-\vec x_- \cdot \frac{\vec w}{\|w\|} = \frac{2}{\|w\|\)
maximize \(\frac {2}{\|w\|} = \max \frac{1}{\|x\|} = \min \frac{1}{2}\|w\|^2 = \min \|w\|^2\)