formulas and tidbits

possible clusters (brute force) = $k^n$
Bayes: $P(A|B) = \frac{P(A) \cdot P(B|A)}{P(B)}$
Lance-williams = $d_{(ij)k} = \alpha_id_{ik} + \alpha_jd_{jk} + \beta d_{ij} + \gamma|d_{ik} - d{jk}|$ d variables represent the function to determine distance between points in each cluster
average distance between cluster = $$\frac{\sum_{x\inC_i}\sum_{y\in C_j}d(x,y)}{n_in_j}$$ is the average distance between all pairs of points.
empirical cumulative distribution is average of all values less than or equal to the current value in a list (can be used as step kernel).
Kernel density $\hat f_k(x, h) = \frac{k(x+h/2) - k(x+h/2)}{h}$ where k is a kernel function
affinity matrix = a matrix where numbers closer to one are "similar" used to quantify relationships between clusters or data points

Algorithms

split into two steps (expectation, maximization)
optimize parameters that maximize the log likelyhood function
not really sure this lines up with what we did in class, there was some fancy bayes, assuming c stuff going on
Time complexity depends on likelyhood function

$\epsilon$ is max distance a "neighbor" is allowed to be (reachable)
core points have more than min_neighbors/points
border points are "reachable" from core points, ie there is a core point with a border point as its neighbor
noise points are not reachable
the algorithm just finds all core points then border points, then tosses everythings else out as noise

Gradient function: $$\nabla f(x) = \frac{1}{nh^{d+2}}\sum \limits^n_{i=1}K\left(\frac{x-x_i}{h}\right)(x_i-x)$$
Hill climbs to local min or max, then assigns points to clusters based on local min or max

compute affinity matrix A
Compute Degree matrix
- diagonal matrix
- each entry is the sum of weights of all edges incident with node
Compute L (differs with method)
- Average weight: L = A
- Ratio Cut: L = D - A (used in normalized)
- Normalized assymmetric cut: L^a = D^-1 L
- Normalized symmetric cut: L^a = D^-.5 L D^-.5
compute k(number of clusters) smallest eigenvalues/eigenvectors of L
compute matrix Y from eigenvalue matrix U using $y_i = \frac{1}{\sqrt{\sum^k_{j=1}u^2_{n-j+1, i}}}(\vec{u_i}^T)$
find clusters using k-means on Y

take a random sample of points
test all dimensions for the following
A = $\{d_k| \: \|x_p^k - x\| \le w \forall x \in s_j\}$
dimension is good if there is no point that has a distance greater than w in the particular dimensional subspace
good for data that is axis aligned

Linear regression $(D^TD)^{-1}D^Ty$
Ridge regression: $w = (X^TX + \alpha I)^{-1}X^Ty$
SSE: $\sum(Y-\hat Y)^2$ where $\hat Y$ is the sum of the columns after being updated (multiplied) with the $w$ and $b$ found in regression