Correlation
- Falls between -1 and 1 just like \(\cos \theta\)
Covariance matrix
- \(\sum = \begin{pmatrix}{\sigma^2_x && \sigma_{xy} \\ \sigma_{xy} && \sigma_{y^2}\end{pmatrix}\)
- \(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}\)
- \(\rho_{xx} = 1\)
- \(det(\Sigma) \ge 0\)
- positive semi definite matrix
- Symmetric \(\Sigma^T = \Sigma\)
Covariance matrix in higher dimensions
- h points in \(\mathbb{R}^d\)
- n by d matrix
- \(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
- \(x \in \mathbb{R}^d\)
- \(x^2\Sigma x \ge 0\) where x is a d by 1 matrix then it is a positive semi-definite matrix
- Eigenvalues are all non-negative, real
- Ellipsis of data becomes aligned to the eigenvector axis
Normal Distribution (1d)
- Central limit theorom (normalized sum will fith the normal distrobution)
- Uni modal (mean = mode)
Normal Distribution in higher dimensions
- \(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
- \(\Sigma = \begin{pmatrix} \sigma^2_{x_1} && \sigma_{x_1x_2} && ... && \sigma_{x_1x_d} \\ && \sigma^2{x_2} \\ && && ... \\ && && && \sigma^2_{x_d}\end{pmatrix}\)
- mahalanobis distance: \((x-\mu)^T \Sigma^{-1} (x-\mu)\)
Categorical analysis
- Data is not numerical
- \(P(\mathbb{R}) = \frac{n_R}{n}\)
- Histogram can be used to look at categorical data
- Unordered, shape does not matter