lec3

Correlation

Falls between -1 and 1 just like \(\cos \theta\)

Covariance matrix

\(\sum = \begin{pmatrix}{\sigma^2_x && \sigma_{xy} \\ \sigma_{xy} && \sigma_{y^2}\end{pmatrix}\)
\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x\sigma_y}\)
\(\rho_{xx} = 1\)
\(det(\Sigma) \ge 0\)
positive semi definite matrix
Symmetric \(\Sigma^T = \Sigma\)

Covariance matrix in higher dimensions

h points in \(\mathbb{R}^d\)
n by d matrix
\(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
\(x \in \mathbb{R}^d\)
\(x^2\Sigma x \ge 0\) where x is a d by 1 matrix then it is a positive semi-definite matrix
Eigenvalues are all non-negative, real
Ellipsis of data becomes aligned to the eigenvector axis

Normal Distribution (1d)

Central limit theorom (normalized sum will fith the normal distrobution)
Uni modal (mean = mode)

Normal Distribution in higher dimensions

\(\vec{\mu} = (\mu_{x1}, \mu_{x2}, ...)\)
\(\Sigma = \begin{pmatrix} \sigma^2_{x_1} && \sigma_{x_1x_2} && ... && \sigma_{x_1x_d} \\ && \sigma^2{x_2} \\ && && ... \\ && && && \sigma^2_{x_d}\end{pmatrix}\)
mahalanobis distance: \((x-\mu)^T \Sigma^{-1} (x-\mu)\)

Categorical analysis

Data is not numerical
\(P(\mathbb{R}) = \frac{n_R}{n}\)
Histogram can be used to look at categorical data
Unordered, shape does not matter