lecture 21

a.

D = [1, 2, 3, 3, 3, 4, 6, 7, 8, 8, 8, 9]
unique = set(D)
counts = {}
for i in unique:
    count = 0
    for k in D:
        if i == k:
            count+=1
    counts[i] = count

rel_freq = []
for key, value in counts.items():
    rel_freq.append(value/len(D))
total = 0
CDF = []
for i in rel_freq:
    CDF.append(i+total)
    total+=i
CDF

b.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.step(list(unique), CDF)
plt.show()
![](./.ob-jupyter/a4122c9ddc64a86253e2b5f7ff7e3ae46d02de51.png)

c.

\(f(x,h) = \frac{F(x+h/2)- F(x-h/2)}{h}\)

F = dict(zip(list(unique), CDF))
# add missing values to step function (quick hack)
F[0] = 0
F[5] = F[4]
F[10] = F[9]
F[11] = F[9]
F[12] = F[9]
F[-1] = F[0]
F[-2] = F[0]
F[-3] = F[0]

from math import floor
def f(x, h):
    #print(F[floor(x+h/2)], F[floor(x-h/2)])
    return (F[floor(x+h/2)] - F[floor(x-h/2)])/h

h_1 = [f(x, 1) for x in range(0,11)]
h_1

d.

h_5 = [f(x, 5) for x in range(0,11)]
h_5

e.

import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(2)
ax1.step(range(0,11), h_1)
ax2.step(range(0,11), h_5)
plt.show()

f.

I would expect that the above plot would dip into the negative, as it appears to be a visualization of the changes in the step (empirical cumulative distribution) kernel.

2.

  • CBAD
  • the higher the h, the smoother the distribution

3.

a.

\begin{equation*}
\nabla f(x) = \frac{1}{nh^{d+2}}\sum \limits^n_{i=1}K\left(\frac{x-x_i}{h}\right)(x_i-x)
\end{equation*}
  • \(K(\frac{x-x_i}{h})\) is the kernel term (a scalar value). Inversely proportional to distance in the gaussian case.
  • \((x_i-x)\) is a vector that makes up a portion of the gradient. All of these vectors summed together is the gradient.

b.

Because A is closer to the marked location, it will have a greater affect on the direction of the gradient. This is because the Kernel term is inversely proportional to the distance.

c.

\begin{equation}
x^{t+1} = x^t + \delta \frac{\nabla f(x)}{\|\nabla f(x) \|}
\end{equation}
  • Since A has a greater effect, the gradient vector will be to the peak by A rather than the peak by B. Since \(x^t\) moves in the direction of the gradient vector (by step \(\delta\)) to get to \(x^{t+1}\).