Correlation
-
Falls between -1 and 1 just like \(\cos \theta\)
Covariance matrix
-
\(\sum = \begin{pmatrix}{\sigma^2_x && \sigma_{xy} \\ \sigma_{xy} && \sigma_{y^2}\end{pmatrix}\)
-
\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}\)
-
\(\rho_{xx} = 1\)
-
\(det(\Sigma) \ge 0\)
-
positive semi definite matrix
-
Symmetric \(\Sigma^T = \Sigma\)
Covariance matrix in higher dimensions
-
h points in \(\mathbb{R}^d\)
-
n by d matrix
-
\(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
-
\(x \in \mathbb{R}^d\)
-
\(x^2\Sigma x \ge 0\) where x is a d by 1 matrix then it is a positive semi-definite matrix
-
Eigenvalues are all non-negative, real
-
Ellipsis of data becomes aligned to the eigenvector axis
Normal Distribution (1d)
-
Central limit theorom (normalized sum will fith the normal distrobution)
-
Uni modal (mean = mode)
Normal Distribution in higher dimensions
-
\(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
-
\(\Sigma = \begin{pmatrix} \sigma^2_{x_1} && \sigma_{x_1x_2} && ... && \sigma_{x_1x_d} \\ && \sigma^2{x_2}
\\ && && ... \\ && && && \sigma^2_{x_d}\end{pmatrix}\)
-
mahalanobis distance: \((x-\mu)^T \Sigma^{-1} (x-\mu)\)
Categorical analysis
-
Data is not numerical
-
\(P(\mathbb{R}) = \frac{n_R}{n}\)
-
Histogram can be used to look at categorical data
-
Unordered, shape does not matter