Lecture 25

a.

It looks as if The coefficient of A2 is around 2, while the coefficient of A1 is around 1, set up in the equation of the form \(1A_1 + 2A_2 = Y\)

b.

  • \((D^TD)^{-1}D^Ty\)
import numpy as np
D = [[1, 2, 1],
     [2, 2, 1],
     [3, 4, 1],
     [1, 4, 1]]
Y = [5.1, 6.3, 10.8, 8.8]
DT = np.transpose(D)
DI = np.linalg.inv(np.matmul(DT, D))
W = np.matmul(np.matmul(DI, DT), Y)
W1 = W
w = W[:-1]
b = W[-1]
f"w={w}, b={b}"
def SSE(D):
    yhat = np.sum(D * W, axis=1)
    return sum((Y - yhat)**2)
SSE(D)

c

  • \(w = (X^TX + \alpha I)^{-1}X^Ty\)
alpha = 1
DT = np.transpose(D)
DI = np.linalg.inv(np.matmul(DT, D) + alpha * np.identity(3))
W = np.matmul(np.matmul(DI, DT), Y)
W2 = W
w = W[:-1]
b = W[-1]
f"w={w}, b={b}"
SSE(D)

d.

p = [2, 5, 1]
[["b", np.dot(p, W1)], ["c", np.dot(p, W2)]]

e.

The SSE is smaller than the linear regression model, this indicates that the model has less error

2.

a.

  • we'll rewrite y as \(y = w^Tx = Xw\)
  • X is matrix made of x rows
  • then we can say \(J(w) = \|y-\hat y\|^2 = (y-\hat y)^T(y - \hat y)\)
  • \(J(w) = (Xw-\hat y)^T(Xw - \hat y)\)
  • \(\frac{dJ}{dw} = 2X^TXw - 2X^Ty = 0\)
  • \(X^TXw = X^Ty\)
  • \(w = (X^TX)^{-1}X^Ty\)

b.

logistic regression can classify a datset by determining the probablility of discrete values. If the discrete values are binary, then we can do binary classification this way. The logistic regression is ideal for this because the logistic function "clamps" to specific ends of the distribution (most points are given a plus or minus value).