Overfitting, Underfitting, Bias, Variance

Overfitting & Underfitting

Overfitting in ML

Overfitting and Underfitting in ML

What is overfitting?

Overfiting: Model corresponds too close to a particular set of data, cannot generalize well from training data to unseen data. (low bias high variance)
Underfitting: Model doesn’t learn data well. (high bias low variance)

How can you know your model is overfitted or not?

Model does better on training set than on test set.

How to prevent overfitting?

Cross-validation: Use initial training data to generate multiple mini train-test splits. Use these splits to tune model.
- k-fold cross-validation: partition the data into k subsets, called folds. Then, we iteratively train the algorithm on k-1 folds while using the remaining fold as the test set
- Allows to keep your test set as a truly unseen dataset for selecting your final model.
Train with more data
Remove features
Early stopping
Regularization: L1, L2, Dropout
Ensemble learning

Bias VS. Variance

Machine Learning Yearning

Understanding Bias and Variance

Bias and variance are two big source of error

Bias: the algorithm’s error rate on the training set
Variance: how much worse the algorithm does on the dev (or test) set than the training set

Bias is reduced and variance is increased in relation to model complexity. As more and more parameters are added to a model, the complexity of the model rises and variance becomes our primary concern while bias steadily falls.

Comparing to optimal error rate

Optimal error rate (“unavoidable bias”): “unavoidable” part of a learning algorithm ’ s bias
- Bias = Optimal error rate (“unavoidable bias”) + Avoidable bias
Avoidable bias: the difference between the training error and the optimal error rate
- If this number is negative, you are doing better on the training set than the optimal error rate. This means you are overfitting on the training set, and the algorithm has over-memorized the training set. Should focus on reducing variance instead of bias.
Variance: difference between the dev error and the training error
- All variance is “avoidable”

Bias Variance Tradeoff

Some changes that reduce bias errors but at the cost of increasing variance, and vice versa. This creates a “trade off” between bias and variance.

High avoidable bias: increase the size of your model
- Increasing the model size generally reduces bias, but it might also increase variance and the risk of overfitting.
- Should add well-designed regularization method, like L2 regularization or dropout
High variance: add data to training set
- Can also tune regularization method
- If you think you have data that has no benefit,you should just leave out that data for computational reasons

Techniques for reducing avoidable bias

Increase the model size -> might increase variance but could eliminate by regularization
Modify input features based on insights from error analysis -> same above
Reduce or eliminate regularization -> increase variance
Modify model architecture -> affect both
Add more training data -> usually has no significant on bias but help variance problems

Techniques for reducing variance

Add more training data
Add regularization (L2 regularization, L1 regularization, dropout) -> increase bias
Add early stopping -> also increase bias
Feature selection to decrease number/type of input features
Decrease the model size: Use with caution
- The advantage of reducing the model size is reducing your computational cost and thus speeding up how quickly you can train models
Modify input features based on insights from error analysis
Modify model architecture -> fits better for the problem, affect both

FEATURED TAGS

Java Hashset Linkedhashset BFS Array DFS Hashmap Design ICC DP Stack Python Database BinarySearch ArrayList SegmentTree BIT TopologicalSort Bit BST TreeSet TreeMap Trie String LinkedList KMP D&C Greedy Math Backtracking Tree UnionFind TwoPointers C/C++ GC Web Sort Graph DS DL PP ML Sampling Minimax Heap