lec20

Dndrogram

  • hierarchical cluster tree
    • agglomerate (merge point)
    • divisive (splitting): similarity minimum cut-based

Basic agglomerative clutering

  • put each point into a separate cluster
  • merge points until they are all in the same cluster
  • closest pair of clusters are merged
  • any distance O(n2) in high dimensions O(n log n) in low dimensions (with kd tree)
  • min heap can be created in linear time, but O(n2)
  • recompute the distance takes O(n log n) time
  • total complexity O(n2) initilization
  • n(O(nlogn)) = O(n2logn)

Defining the slosest pair of clusters

Single link/min distance

  • minimum distance between any 2 points in the clusters
  • maximum distance between two points in a pair of clusters

group average/mean distance

mean distance between all points in cluster (not robust)

distane between the mean

distance between the mean of all the points in the cluster

lance william formula for cluster distance

  • consolidates the above

Density based clustering

  • euclidian distance is not ideal for some clusterings
  • non-convex clusters/shape-based/density based