lec13

Support vector machines (SVMs)

  • Linear classifier
  • hyperplane that separates the two classes (can be generalized to more than 2 classes)
  • LDA is linear classifier as well

Objective function

Find the optimal hyperplane that separates two classes

  • optimal: maximum margin of separation between two classes

Hyperplane

  • set of all points that satisfy \(w^Tx + b = 0\)
  • \(h(\vecx) \equiv \(w^Tx + b = 0\)\)
  • \(x, w \in \mathbb{R}^d\)
  • \(b \in \mathbb R\)
  • (d-1) dimensional hyperplane in a d dimensional space into 2 half spaces

example

  • There exists a case where classes are perfectly separated
  • This implies infinite solutions
  • margin is the smallest distance to any point from the dividing hyperplane
  • maximize margin for SVM

Summary

  • determine w
  • determine separation (p)

Proofs on properties of hyperplane

  • \(h(x_1) = w^Tx_1 + b = 0\)
  • \(h(x_2) = w^Tx_2 + b = 0\)
  • \(w^T(x_1-x_2) = 0\)
  • w is orthagonal to all vectors \(h(x) = 0\)

Margin

  • \(h(\vec x) = h(\vec x_p + \vec r)\)
  • \(h(\vec x) = h(\vec x_p + r \cdot \frac{\vec w}{\|w\|})\)
  • \(r = \frac{h(x)}{\|w\|} = \frac{w^Tx + b}{\|w\|}\) distance from any point to hyperplane
  • \(r_0 = \frac{b}{\|w\|}\) the distance from origin to hyperplane (offset)
  • Margin maximizes distance to margin
  • Scale so that smallest distance equal to 1

Objective function of SVM

  • \(S^+_k = \min \{\frac{y_i h(x_i)}{\|w\|}\} \xrightarrow{scale} \frac{1}{\|w\|}\)
  • \(sh(x) = sw^Tx + sb = s0 = 0\)
  • closest point to dividing line is 1 (margin is 1)
  • constraints: all distances must be \(\ge 1\)

Solve the objective function

  • lagrange multipler for each constraint
  • \(y_i(w^Tx_i+b)-1 \ge 0\)
  • w is juat a linear combination of the support vectors

MIT lecture

  • decision boundaries (hyperplane)
  • Widest street (largest margin)
  • project unknown value onto the orthogonal vector w
  • \(\vec w \cdot \vec u \ge c\)
  • \(\vec w \cdot \vec u + b \ge 0\) then it is a positive sample
  • \(\vec w \cdot \vec x + b \ge 1\)
  • \(\vec w \cdot \vec x + b \le 1\)
  • \(y_i = +1 \text{\:for\:} + \text{samples}\)
  • \(y_i = -1 \text{\:for\:} - \text{positive samples}\)
  • \(\vec x \veca + b) -1 \ge 0\)
  • WIDTH = \(\vec x_+-\vec x_- \cdot \frac{\vec w}{\|w\|} = \frac{2}{\|w\|\)
  • maximize \(\frac {2}{\|w\|} = \max \frac{1}{\|x\|} = \min \frac{1}{2}\|w\|^2 = \min \|w\|^2\)