ML

Liner Regression & Logistic Regression

LR, GD, SGD

Posted by freeCookie🍪 on September 24, 2019

Linear Regression

Linear regression is a linear model, e.g. a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x). Both the input values (x) and the output value are numeric.

Cost function:

  • Mean square error
    • The average squared difference between the estimated values and the actual value.
    • The smaller, the better.

Training:

Difference between GD and SGD

GD for ML

  • Gradient descent - Least Mean Squares
    • Gradient descent is an optimization algorithm often used for finding the weights or coefficients of machine learning algorithms, such as artificial neural networks and regression
    • Batch gradient descent
      • Looks at every example in the entire training set on every step
      • One global optima for linear regression gradient descent
    • Stochastic gradient descent:
      • Update weights with every sample and converge very fast
      • Local minimum - could slowly decrease learning rate to 0
      • Need randomly shuffle before for looping - harness the noise of updating order
    • Mini-batch gradient descent:
      • Between above two
  • The normal equations
    • theta = (X^TX)^-1X^Ty
  • Probabilistic interpretation

Logistic Regression

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable.

Logistic function/ sigmoid function: g(x) = 1/(1 + e^-x)

Cost function:

  • Cross Entropy
    • -y(log(p) + (1 - y)log(1 - p))
    • Measures the performance of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the actual label.

Training: GD, Newton’s method

Some Questions

  1. What is linear regression? What is logistic regression? What is the difference?

  2. How to find the best parameter for linear regression? / How to optimize the model?

  3. Please write down the close form solution for linear regression?

  4. What is Stochastic Gradient Descent or Mini-batch Gradient Descent? What is the advantage? and disadvantages?

  5. What is mean square error? What is Cross Entropy? What is the difference? Also please write down the formular of these two cost function.

    When Should Use Cross Entropy than MSE

    Cross-entropy is a better measure than MSE for classification, because the decision boundary in a classification task is large (in comparison with regression). MSE doesn’t punish misclassifications enough, but is the right loss for regression, where the distance between two values that can be predicted is small.

  6. What is Softmax? What is relationship between Softmax and Logistic Regression?

    Softmax regression and Logistic regression

  7. Explain and derive the SGD for Logistic Regression and Softmax?

  8. Does global optimal can be reached by SGD, why?

  9. What is the Sigmoid function? What is the characteristic of Sigmoid? What is the advantages and disadvantages of Sigmoid function?

    Quora - What is Sigmoid?