lec12

Classification

  • supervised learning
  • (\(\vec{x_i}, y_i\))
  • \(\vec{x_i} \in \mathbb R^d\)
  • \(y_i = \{1, 2, ..., k\}\)
  • \(m \vec x \rightarrow \hat y\)
  • build a model that classifies
  • Testing data vs training data (build model from training that can classify test)
  • Split original data into training and test
  • compare model accuracy with test data labels

Classes of models

  • more complex = more trend capturing
  • \(f(\vec{x}) \rightarrow \hat y\)
  • f is linear (hyperplane)
  • some trends are not linearly classifiable
  • too much complexity leads to overfitting
  • can't accurately describe the test set, but high accuracy on test set
  • high complexity models overfit
  • low complexity models underfit
  • goal is to find the middle ground

example

  • classification = risk (high, medium, low)
  • dimensions = (age, car, location, etc)
  • Goal is to classify data point according to risk
  • ulitmatley a partition of data into class (labels) in reduced dimensions (discriminitive)

Decision trees

Example

agecar typerisk
25SportsL
20SUVH
  • Numeric and categorical descisions
  • Example of descision Age \(\le\) 25
  • binary descision (yes or no)
  • leaf nodes represent classes
  • Goal is to build tree to represent model

Construction

  • if stopping criteria is met, choose the majority class in D to make a leaf
  • for all attributes, compute the deiscrimination score for Ai
  • (numerical)for all values \(v_j\) evaluate \(A \le v_j\)
  • (categorical)for all values \(v_j\) evaluate \(A \in v_j\)
  • Partition data into tree
  • Cutoff = purity = \(\frac{n_{in tree}}{n_{out tree}\)
  • Calculate score \(f(D, D_{yes}, D_{no}\)
  • Entropy measures amount of disorder (high entropy = high disorder)

Shannon entropy

  • \(H(D) = - ( \sum^k p_i \log(p_i)\) where p is the purity