Overfitting & Underfitting
Overfitting and Underfitting in ML
What is overfitting?
- Overfiting: Model corresponds too close to a particular set of data, cannot generalize well from training data to unseen data. (low bias high variance)
- Underfitting: Model doesn’t learn data well. (high bias low variance)
How can you know your model is overfitted or not?
- Model does better on training set than on test set.
How to prevent overfitting?
- Cross-validation: Use initial training data to generate multiple mini train-test splits. Use these splits to tune model.
- k-fold cross-validation: partition the data into k subsets, called folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set
- Allows to keep your test set as a truly unseen dataset for selecting your final model.
- Train with more data
- Remove features
- Early stopping
- Regularization: L1, L2, Dropout
- Ensemble learning
Bias VS. Variance
Understanding Bias and Variance
Bias and variance are two big source of error
- Bias: the algorithm’s error rate on the training set
- Variance: how much worse the algorithm does on the dev (or test) set than the training set
Bias is reduced and variance is increased in relation to model complexity. As more and more parameters are added to a model, the complexity of the model rises and variance becomes our primary concern while bias steadily falls.
Comparing to optimal error rate
- Optimal error rate (“unavoidable bias”): “unavoidable” part of a learning algorithm ’ s bias
- Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias
- Avoidable bias: the difference between the training error and the optimal error rate
- If this number is negative, you are doing better on the training set than the optimal error rate. This means you are overfitting on the training set, and the algorithm has over-memorized the training set. Should focus on reducing variance instead of bias.
- Variance: difference between the dev error and the training error
- All variance is “avoidable”
Bias Variance Tradeoff
Some changes that reduce bias errors but at the cost of increasing variance, and vice versa. This creates a “trade off” between bias and variance.
- High avoidable bias: increase the size of your model
- Increasing the model size generally reduces bias, but it might also increase variance and the risk of overfitting.
- Should add well-designed regularization method, like L2 regularization or dropout
- High variance: add data to training set
- Can also tune regularization method
- If you think you have data that has no benefit,you should just leave out that data for computational reasons
Techniques for reducing avoidable bias
- Increase the model size -> might increase variance but could eliminate by regularization
- Modify input features based on insights from error analysis -> same above
- Reduce or eliminate regularization -> increase variance
- Modify model architecture -> affect both
- Add more training data -> usually has no significant on bias but help variance problems
Techniques for reducing variance
- Add more training data
- Add regularization (L2 regularization, L1 regularization, dropout) -> increase bias
- Add early stopping -> also increase bias
- Feature selection to decrease number/type of input features
- Decrease the model size: Use with caution
- The advantage of reducing the model size is reducing your computational cost and thus speeding up how quickly you can train models
- Modify input features based on insights from error analysis
- Modify model architecture -> fits better for the problem, affect both