lec12 activity

a-c

#+attr_latex: :width \textwidth

[file:1a.pdf](1a.pdf)

d

\(\begin{cases} (1,3)x -11 < 0 \\ (1,3)x-11 > 0 \end{cases}\)

e.

The above hyperplane does split all points accurately. See the python results.

d = [[3,1,-1],
    [4,1,-1],
    [4,2,-1],
    [5,1,-1],
    [0,5,1],
    [1,5,1],
    [1,4,1],
    [2,4,1]]
def h(x):
    return 1 * x[0] + 3 * x[1] -11
def half_space():
    s = []
    for x in d:
        if h(x) > 0:
            s.append("+")
        else:
            s.append("-")
    return s


return [[i, d] for i,d in zip(range(1,len(half_space())+1), half_space())]
1-
2-
3-
4-
5+
6+
7+
8+

f.

\(\sigma = \frac{yh(x)}{\|w\|}\)

import math
d_w = math.sqrt(1**2 + 3**2)

d = [[3,1,-1],
    [4,1,-1],
    [4,2,-1],
    [5,1,-1],
    [0,5,1],
    [1,5,1],
    [1,4,1],
    [2,4,1]]

def h(x):
    return 1 * x[0] + 3 * x[1] -11
distances = []
for i in d:
    distances.append(h(i)*i[-1]/d_w)

return [[i, d] for i,d in zip(range(1,len(distances)+1), distances)]

g.

the closest vector to hyperplane is 3, with a distance of 0.32. This does not align with the analysis above, meaning that this hyperplane does not maximize the margin. This hyperplane is not canonical.

2.

a.

\(\min L = \frac{1}{2}\|w\|^2 - \displaystyle \sum \limits^n_{i=1}\alpha_i(y_i(w^Tx_i + b)-1)\)

b.

  • w is the weight vector that is orthogonal to the hyperplane
  • x is the data points in the dataset
  • y is the class labels (+ or -)
  • \(\alpha\) is the set of lagrangian multipliers, a set of scalar values that when signed sum to zero. Additionally \(\alpha_i\) satisifies the constraints \(\alpha_i(y_i(w^Tx_i + b)-1) = 0\) and \(\alpha_i \ge 0\)

c.

  • THe first term \(1/2\|w\|^2\) is the equivalent to \(\frac{1}{\|w\|}\), which is equivalent to the margin of the hyperplane. The goal of the objective function is to maximize this margin according to the contraints given by \(y_i(w^Tx_i + a) \ge 1\) for all x in D.
  • The second term is the sum of the lagrangian multipler multiplied by the contraint equation for every constraint.

d.

  • Y: known, given
  • x: known, given
  • w: unknown
  • b: unknown
  • \(\alpha\): unknown

3.

a.

  • \(\underset{\alpha}\max \quad L_{dual} = \displaystyle \sum\limits^n_{i=1}\alpha_i - \frac{1}{2}\displaystyle \sum\limits^n_{i=1}\displaystyle \sum\limits^n_{j=1}\alpha_i\alpha_jy_iy_jx^T_ix_j\)
  • and linear constraints \(\alpha_i \ge 0, \forall i \in D, \displaystyle \sum \limits^n_{i=1} \alpha_i y_i = 0\)

b.

  • \(y_i, y_j\) known
  • \(x_i^T, x_j\) known
  • \(\alpha_i\) is not known, but can be determined with quadratic optimization

c.

The \(\alpha_i\alpha_j\) term implies that the equation is quadradic and thus can be solved using quadratic programming (optimization) techniques. Specifically, the sum of the \(\alpha_i\alpha_j\) terms can be rearranged into the formula \(\alpha_1^2 + ... + \alpha_n^2\) which is a standard multivariate quadradic.