Data matrix

Data can be represented by an \(n \times d\) matrix

print('hi world')

Attributes

Numerica Attributes

Categorical Attributes

Distance and Angle

Distance

\(||a-b||\)

Angle

\(\cos \theta = \frac{a^Tb}{||a||}\)

Orthoganality

If dot product is 0 then \(\theta\) is \(90^\circ\)

Orthogonal projection

The projection of b on a is:

\(p = b_{||} = c_a = (\frac{a^Tb}{a^Ta})a\)

Mean and total Variance

D is a matrix

Mean

\(mean(D) = \mu = \(\frac{1}{n}\sum^n_{i=1}x_i\)

Total variance

\(var(D) = \frac{1}{n}\sum\limits^n_{i=1}\delta(x_i, \mu)^2 = \frac{1}{n} \sum\limits^n_{i=1}||x_i - \mu||^2\)
\(\frac{1}{n}(\sum\limits^n_{i=1}||x_i||^2) - ||\mu||^2\)

Centered Data matrix

Centered data matrix is created by subtracting the mean from every value in the vector

Linear independence and Dimensionality

Row and Column Space

column space is the set of linear combinations in the rows
Row space is the set of linear combinations in the rows, column space of \(D^T\)

Linear Independence

linear dependent vectors can be written as a linear combination of other vectors in the matrix.

Dimension and Rank

Dimensionality = number of columns
Rank is the number of linearly independent columns or rows which is determined by converted the matrix to echelon form and counting the non-zero columns or rows. This is also reffered to as the dimensionality of the column space.

Probabilistic View

Each numeric attribute \(X\) is a random variable: a function that assigns a real number to each outcome of an experiment
X is a function, \(X:\mathcal{O} \rightarrow \mathbb{R}\)
\(\mathcal{O} \)is the domain of \(X\)k

Probability mass function

If \(X\) is discrete:
\(f(x) = P(X=x) \text{ for all } x \in \mathbb{R}\)
probablility that \(X = x\)

Probability density function

if \(X\) is continuous, then \(P(X)=x) = 0\) for all \(x \in \mathbb{R}\)
probability is so spread over range that probablility can only be measured over intervals \([a,b]\subset\mathbb{R}\) rather than at specific points.
\(P(X\in[a,b]) = \displaystyle\int\limits^b_af(x)dx\)
\(f(x) \ge 0\) for all x \(\in \mathbb{R}\)
\(\displaystyle\int\limits^\infty_{-\infty}f(x)dx = 1\)
ratio of the probability mass to the width of the interval (width given in \(\epsilon\))

Cumulative Distribution Function

CDF \(F:\mathbb{R} \rightarrow [0,1]\)
\(F(x) = P(X\le x)\) for all \(-\infty < x < \infty\)
Discrete CDF: \(F(x) = P(X \le x) = \displaystyle{\sum\limits_{u\le x}}f(u)\)
Continuous CDF: \(F(x) = P(X \le x) = \displaystyle\int\limits^x_{-\infty}f(u)du\)

Bivariate Random Variables

Analyze two attributes together as a bivariate random variable
\(\boldsymbol{X} = \begin{pmatrix}X_1 \\ X_2\end{pmatrix}\)
\(X:\mathcal{O}\rightarrow \mathbb{R}^2\)
Assigns each outcome a pair of real numbers, a 2 dimensional vector \(\begin{pmatrix}x_1 \\ x_2 \end{pmatrix} \in \mathbb{R}^2\)

Joint Probability Density Function

\(\displaystyle{\sum\limits_x}f(x) = \displaystyle{\sum\limits_{x_1}}\displaystyle{\sum\limits_{x_2}} f(x_1, x_2) = 1\)

Joint Probability Density Function

\(P(\boldsymbol{x} \in W) = \displaystyle\int\displaystyle\int\limits_{x \in W}f(x)dx = \displaystyle\int\displaystyle\int\limits_{(x_1,x_2)^T\in W}f(x_1,x_2)dx_1dx_2\)

chapter1

Data matrix

Attributes

Numerica Attributes

Categorical Attributes

Distance and Angle

Distance

Angle

Orthoganality

Orthogonal projection

Mean and total Variance

Mean

Total variance

Centered Data matrix

Linear independence and Dimensionality

Row and Column Space

Linear Independence

Dimension and Rank

Probabilistic View

Probability mass function

Probability density function

Cumulative Distribution Function

Bivariate Random Variables

Joint Probability Density Function

Joint Probability Density Function