a.

\(\begin{pmatrix}1 \\ 3 \\ -11 \end{pmatrix}^Tx\)
w = \((\vec w, -11) \)
\(x = (\vec x, 1)\)

b.

\(\begin{pmatrix}1 \\ 5\end{pmatrix} \rightarrow \begin{pmatrix}1 \\ 5 \\ 1\end{pmatrix}\)
\(\begin{pmatrix}4 \\ 1\end{pmatrix} \rightarrow \begin{pmatrix}4 \\ 1 \\ 1\end{pmatrix}\)

c.

(1, 3, -11) * (1, 5, 1) = 1 + 15 - 11 = 5
(1, 3, -11) * (4, 1, 1) = 4 + 3 - 11 = -4

d.

\(\xi_1 = 1 - 5 = -4 \rightarrow 0\)
\(\xi_2 = 1 - 4 = -3 \rightarrow 0\)

2.

a.

primal = \(\min L = \frac{1}{2}\|w\|^2 - \displaystyle \sum \limits^n_{i=1}\alpha_i(y_i(w^Tx_i + b)-1)\)
dual = \(\underset{\alpha}\max \quad L_{dual} = \displaystyle \sum\limits^n_{i=1}\alpha_i - \frac{1}{2}\displaystyle \sum\limits^n_{i=1}\displaystyle \sum\limits^n_{j=1}\alpha_i\alpha_jy_iy_jx^T_ix_j\)
w is the weight vector, it has d dimensions
\(x_i\) is a data point, it has d dimensions
\(y_i\) is the class label, has 1 dimension
\(\alpha_i, b\) are scalars
\(\alpha\) is the set of lagrangian multipliers, a set of scalar values that when signed sum to zero. Additionally \(\alpha_i\) satisifies the constraints \(\alpha_i(y_i(w^Tx_i + b)-1) = 0\) and \(\alpha_i \ge 0\)

b.

The dual can be solved using quadratic optimization. The primal is solved using the raw constraints. The primal is easier to compute, but the dual makes the final computation that classifies points more efficient via the kernel trick.

3.

a.

\(L_{primal} = \frac{1}{2}\|w\|^2 + C\sum\limits^n_{i=1}\xi_i - \sum\limits^n_{i=1}\alpha_i(y_i(w^Tx_i + b) - 1 + \xi_i) - \sum\limits^n_{i=1}\beta_i\xi_i\)

\(L_{dual} = \displaystyle \sum\limits^n_{i=1}\alpha_i - \frac{1}{2}\displaystyle \sum\limits^n_{i=1}\displaystyle \sum\limits^n_{j=1}\alpha_i\alpha_jy_iy_jx^T_ix_j\)

b.

The non-linear case introduces the concept of slack \(\xi\) which indicates how much the point violates the spearability condition (at least \(\frac{1}{\|w\|} \) away from hyperplane)
C is the regularization constant that controls the the balance between maximizing the margin and minimizing the loss (minimizing slack)
\(\beta \) is a second lagrangian multiplier that gets removed from the dual formation

4.

a.

\begin{minipage}{0.7\linewidth}
\begin{algorithm}[H]
\caption{gradientAscent}
\SetKwFunction{SDud}{SDud}\SetKwFunction{Tol}{tol}\SetKwFunction{Break}{break}
\SetKwInOut{Input}{Input}
\Input{$x_{train}$, $y_{train}$, c, \epsilon(cutoff)}

\ForEach {$x_i \in x_{train}$}{ $x_i$ \gets \begin{pmatrix} x_i \\ 1 \end{pmatrix}  \tcp*{map to $\mathbb{R}^{d+1}$}}

\For{k = 1, ..., n}{$s_k \gets \frac{1}{x^Tx_k}$}

$t \gets 0$\;
$\alpha_0 \gets (0, ..., 0)^T$\;

\Repeat{$\|\alpha_t - \alpha_{t-1\|} \le \epsilon$}{
\alpha \gets $\alpha_t$\;
\For {k = 1 to n}{
    \tcp{update kth comoponent of $\alpha$}
    $\alpha_k \gets \alpha_k + s_k(1 - y_k)\sum^n_{i=1}\alpha_iy_ix^T_ix_k$\;
    \If{$\alpha_k < 0$}{$\alpha_k \gets 0$}
    \If{$\alpha_k > C$}{$\alpha_k \gets C$}
}

$\alpha_{t+1} = \alpha$\;
t \gets t + 1


}
\Return \alpha
\end{algorithm}
\end{minipage}

b.

\begin{minipage}{0.7\linewidth}
\begin{algorithm}[H]
\caption{learnSVM}
\SetKwFunction{SDud}{SDud}\SetKwFunction{Tol}{tol}\SetKwFunction{Break}{break}
\SetKwInOut{Input}{Input}
\Input{$x_{train}$, $y_{train}$, c, \epsilon(cutoff)}

\alpha \gets gradientAscent($x_{train}$, $y_{train}$, c, \epsilon(cutoff))\;
w \gets (0, ..., 0)^T\;
\For{$\alpha_i, y_i, x_i$ in $\alpha, y_{train}, x_{train}$}{
 $w \gets w + \alpha_iy_i(x_i, 1)$\;
}

b \gets 0\;
\For{$\alpha_i, y_i, x_i$ in $\alpha, y_{train}, x_{train}$}{
 $b \gets b + y_i - w^T(x_i, 1)$\;
}
w \gets (w, b)\;
\Return w
\end{algorithm}
\end{minipage}

c.

\begin{minipage}{0.7\linewidth}
\begin{algorithm}[H]
\caption{testSVM}
\SetKwFunction{SDud}{SDud}\SetKwFunction{Tol}{tol}\SetKwFunction{Break}{break}
\SetKwInOut{Input}{Input}
\Input{$x_{test}, w$}

y_{test} \gets ()^T\;

\ForEach{$x_i$ in $x_{test}$}{
$y_{test} \gets (y_{test}, w^Tx_i > 0)$
}

\Return $y_{test}$
\end{algorithm}
\end{minipage}

Lecture 15

a.

b.

c.

d.

2.

a.

b.

3.

a.

b.

4.

a.

b.

c.