chapter1

Data matrix

Data can be represented by an \(n \times d\) matrix

print('hi world')

Attributes

Numerica Attributes

Categorical Attributes

Distance and Angle

Distance

  • \(||a-b||\)

Angle

  • \(\cos \theta = \frac{a^Tb}{||a||}\)

Orthoganality

  • If dot product is 0 then \(\theta\) is \(90^\circ\)

Orthogonal projection

The projection of b on a is:

  • \(p = b_{||} = c_a = (\frac{a^Tb}{a^Ta})a\)

Mean and total Variance

D is a matrix

Mean

\(mean(D) = \mu = \(\frac{1}{n}\sum^n_{i=1}x_i\)

Total variance

  • \(var(D) = \frac{1}{n}\sum\limits^n_{i=1}\delta(x_i, \mu)^2 = \frac{1}{n} \sum\limits^n_{i=1}||x_i - \mu||^2\)
  • \(\frac{1}{n}(\sum\limits^n_{i=1}||x_i||^2) - ||\mu||^2\)

Centered Data matrix

  • Centered data matrix is created by subtracting the mean from every value in the vector

Linear independence and Dimensionality

Row and Column Space

  • column space is the set of linear combinations in the rows
  • Row space is the set of linear combinations in the rows, column space of \(D^T\)

Linear Independence

linear dependent vectors can be written as a linear combination of other vectors in the matrix.

Dimension and Rank

  • Dimensionality = number of columns
  • Rank is the number of linearly independent columns or rows which is determined by converted the matrix to echelon form and counting the non-zero columns or rows. This is also reffered to as the dimensionality of the column space.

Probabilistic View

  • Each numeric attribute \(X\) is a random variable: a function that assigns a real number to each outcome of an experiment
  • X is a function, \(X:\mathcal{O} \rightarrow \mathbb{R}\)
  • \(\mathcal{O} \)is the domain of \(X\)k

Probability mass function

  • If \(X\) is discrete:
  • \(f(x) = P(X=x) \text{ for all } x \in \mathbb{R}\)
  • probablility that \(X = x\)

Probability density function

  • if \(X\) is continuous, then \(P(X)=x) = 0\) for all \(x \in \mathbb{R}\)
  • probability is so spread over range that probablility can only be measured over intervals \([a,b]\subset\mathbb{R}\) rather than at specific points.
  • \(P(X\in[a,b]) = \displaystyle\int\limits^b_af(x)dx\)
  • \(f(x) \ge 0\) for all x \(\in \mathbb{R}\)
  • \(\displaystyle\int\limits^\infty_{-\infty}f(x)dx = 1\)
  • ratio of the probability mass to the width of the interval (width given in \(\epsilon\))

Cumulative Distribution Function

  • CDF \(F:\mathbb{R} \rightarrow [0,1]\)
  • \(F(x) = P(X\le x)\) for all \(-\infty < x < \infty\)
  • Discrete CDF: \(F(x) = P(X \le x) = \displaystyle{\sum\limits_{u\le x}}f(u)\)
  • Continuous CDF: \(F(x) = P(X \le x) = \displaystyle\int\limits^x_{-\infty}f(u)du\)

Bivariate Random Variables

  • Analyze two attributes together as a bivariate random variable
  • \(\boldsymbol{X} = \begin{pmatrix}X_1 \\ X_2\end{pmatrix}\)
  • \(X:\mathcal{O}\rightarrow \mathbb{R}^2\)
  • Assigns each outcome a pair of real numbers, a 2 dimensional vector \(\begin{pmatrix}x_1 \\ x_2 \end{pmatrix} \in \mathbb{R}^2\)

Joint Probability Density Function

  • \(\displaystyle{\sum\limits_x}f(x) = \displaystyle{\sum\limits_{x_1}}\displaystyle{\sum\limits_{x_2}} f(x_1, x_2) = 1\)

Joint Probability Density Function

  • \(P(\boldsymbol{x} \in W) = \displaystyle\int\displaystyle\int\limits_{x \in W}f(x)dx = \displaystyle\int\displaystyle\int\limits_{(x_1,x_2)^T\in W}f(x_1,x_2)dx_1dx_2\)

*