lec12 activity

a, b.

  • Calculate relative distance (square root is not required)
  • save the top k results and take the mode
import pandas as pd
d = [[1,1,1,0],
    [2,1,3,0],
    [2,4,3,0],
    [4,1,4,0],
    [0,2,2,1],
    [1,4,1,1],
    [2,2,4,1],
    [2,4,5,1]]
d = pd.DataFrame(d, columns=("X", "Y", "Z", "C"))
def distance(xi, xj):
    return (xi-xj)**2
def L(xi, yi, zi, xj, yj, zj):
    dist = distance(xi,xj) + distance(yi,yj) + distance(zi,zj)
    return dist

d["distance1"] = d.apply(lambda row : L(1, 2, 1, row["X"], row["Y"], row["Z"]), axis=1)
d["distance2"] = d.apply(lambda row : L(3, 2, 4, row["X"], row["Y"], row["Z"]), axis=1)

k11 = int(d.sort_values(by=["distance1"]).head(1)["C"])
k12 = int(d.sort_values(by=["distance2"]).head(1)["C"])
k21 = int(d.sort_values(by=["distance1"]).head(3)["C"].mode())
k22 = int(d.sort_values(by=["distance2"]).head(3)["C"].mode())

l = [["ID", "K=1", "K=3"],
     [1, k11, k21],
     [2, k12, k22]]
return l
IDK=1K=3
101
210

c.

The opposite result was determined based on the length of k. This was caused by the top 3 results having more of the opposite class in the latter two results.

2.

a.

P(C=0) = P(C=1) = 1/2

b.

/<
C = 0C = 1
X=1 Y=1 Z=11/40
X=0 Y=1 Z=100
X=1 Y=0 Z=11/40
X=0 Y=0 Z=11/41/2
X=1 Y=0 Z=001/2
X=0 Y=0 Z=000
X=0 Y=1 Z=01/40
X=1 Y=1 Z=000

c.

  • \(P(C = 0|r) = 0\) r not in dataset so \(n_i(v)\) is zero, and zero in table
  • \(P(C = 1|r) = 0\) r not in dataset so \(n_i(v)\) is zero, and zero in table

d.

/<
C=0C=1
p(X=0| C)1/21/2
p(X=1| C)1/21/2
/<
C=0C=1
p(Y=0| C)1/21
p(Y=1| C)1/20
/<
C=0C=1
p(Z=0| C)1/41/2
p(Z=1| C)3/41/2

e.

  • \(P(Y|C)\) requires a laplace corrrection because one of the probabilities in the table is 0.

f.

/<
C=0C=1
p(X=0| C)1/21/2
p(X=1| C)1/21/2
/<
C=0C=1
p(Y=0| C)1/25/6
p(Y=1| C)1/21/6
/<
C=0C=1
p(Z=0| C)1/41/2
p(Z=1| C)3/41/2

g.

/<
C = 0C = 1
X=1 Y=1 Z=11/2 * 1/2 * 3/41/2 * 1/6 * 1/2
X=0 Y=1 Z=11/2 * 1/2 * 3/41/2 * 1/6 * 1/2
X=1 Y=0 Z=11/2 * 1/2 * 3/41/2 * 5/6 * 1/2
X=0 Y=0 Z=11/2 * 1/2 * 3/41/2 * 5/6 * 1/2
X=1 Y=0 Z=01/2 * 1/2 * 1/41/2 * 5/6 * 1/2
X=0 Y=0 Z=01/2 * 1/2 * 1/41/2 * 5/6 * 1/2
X=0 Y=1 Z=01/2 * 1/2 * 1/41/2 * 1/6 * 1/2
X=1 Y=1 Z=01/2 * 1/2 * 1/41/2 * 1/6 * 1/2

h.

  • \(P(C = 0|r) = 1/2 \cdot 1/2 \cdot 1/4 * 1/2 = 1/32\)
  • \(P(C = 1|r) = 1/2 \cdot 5/6 \cdot 1/4 * 1/2 = 5/96\)