lec3

Correlation

  • Falls between -1 and 1 just like \(\cos \theta\)

Covariance matrix

  • \(\sum = \begin{pmatrix}{\sigma^2_x && \sigma_{xy} \\ \sigma_{xy} && \sigma_{y^2}\end{pmatrix}\)
  • \(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}\)
  • \(\rho_{xx} = 1\)
  • \(det(\Sigma) \ge 0\)
  • positive semi definite matrix
  • Symmetric \(\Sigma^T = \Sigma\)

Covariance matrix in higher dimensions

  • h points in \(\mathbb{R}^d\)
  • n by d matrix
  • \(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
  • \(x \in \mathbb{R}^d\)
  • \(x^2\Sigma x \ge 0\) where x is a d by 1 matrix then it is a positive semi-definite matrix
  • Eigenvalues are all non-negative, real
  • Ellipsis of data becomes aligned to the eigenvector axis

Normal Distribution (1d)

  • Central limit theorom (normalized sum will fith the normal distrobution)
  • Uni modal (mean = mode)

Normal Distribution in higher dimensions

  • \(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
  • \(\Sigma = \begin{pmatrix} \sigma^2_{x_1} && \sigma_{x_1x_2} && ... && \sigma_{x_1x_d} \\ && \sigma^2{x_2} \\ && && ... \\ && && && \sigma^2_{x_d}\end{pmatrix}\)
  • mahalanobis distance: \((x-\mu)^T \Sigma^{-1} (x-\mu)\)

Categorical analysis

  • Data is not numerical
  • \(P(\mathbb{R}) = \frac{n_R}{n}\)
  • Histogram can be used to look at categorical data
  • Unordered, shape does not matter