lec16

1.

\begin{align}
\(K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j) &= x_i \cdot x_j + (x_i\cdot x_j)^2 \\ & = a_ia_j+b_ib_j + a_i^2a_j^2 + b_i^2b_j^2 + 2a_ib_ia_jb_j \)
\end{align}

\(\phi(x) = a, b, a^2, b^2, \sqrt{2}ab\)

2.

a.

colour != Red and colour != yellow -> Don't buy
colour != Red and colour == yellow and make !=  ferrari -> Don't buy
colour != Red and colour == yellow and make =  ferrari -> buy
colour != Red and colour == yellow and make =  ferrari -> buy
colour == Red and Model > 2010 -> Buy
colour == Red and Model < 2010 and Mileage < 50000KM -> buy
colour == Red and Model < 2010 and Mileage > 50000KM -> Don't buy

b.

Decision trees are not always the ideal classifier, they work by drawing "boxes" around particular regions, but none of the boxes can overlap. Distributions are often not "box" shaped, so the drawing method of descisions trees are not ideal for these distributions.

3.

a.

(X(5), Y(6), Z(5), XY(4), XZ(2), XC(3), YZ(3), ZC(2), YC(6) XYZ(1), XYC(3), YZC(2), XYZC(1))

b.

\((X, Y, Z, XY, YZ, XC, YC, XYC\))

c.

(XYC, XC, YC)

d.

conf (XY-> C) = P(C|XY) = support(XYC)/support(XY) = 3/4 conf (Y-> C) = P(C|Y) = support(YC)/support(Y) = 4/6 conf (X-> C) = P(C|X) = support(XC)/support(X) = 3/5 (Z1 → C0) is implied here as the same as (XY → C)

e.

conf (XY-> C) = P(C|XY) = support(XYC)/support(XY) = 3/4 conf (Y-> C) = P(C|Y) = support(YC)/support(Y) = 4/6 conf (X-> C) = P(C|X) = support(XC)/support(X) = 3/5

f.

1 (3/4 confidence)

g.

There is no rule, so you cannot classify it accurately (with rules). This is a failing of the minimum support requirement. The itemsets that have a class of 0 do tend to include more zeros, so I would learn towards classifing (0,0,0) as class 0.

4.

a. 4, the margin is maximized without caring much about the number of slack terms (C is close to zero) b. 3, the margin is maximized while reducing the number of slack terms (fewer support vectors) c. 2, kernel is quadradic shape (u shape) d. 6, variance is smaller and 6's data points are closer together e. 1, variance is larger and 1's data points are farther apart (circle is bigger)