Bias
• Bias means error due to wrong assumptions.
• A high-bias model is too simple — it misses patterns in the data.
• This leads to underfitting.
Example: Trying to fit a straight line to a complicated curve.
⸻
Variance
• Variance means error due to too much sensitivity to the training data.
• A high variance model is too complex — it memorizes noise in the data.
• This leads to overfitting.
Example: Drawing a crazy squiggly line that passes through every point exactly.
Graphical illustration of Bias and Variance
How to reduce bias (to prevent underfitting)
Increase Model Complexity
• Use more complex models, such as deep neural networks, or add more layers and neurons to existing models.
• Use models that capture non-linear relationships (e.g., decision trees, random forests, or support vector machines with non-linear kernels).
Feature Engineering
• Add more relevant features that may capture the underlying patterns in the data.
• Transform features to better represent the data (e.g., polynomial features, interaction terms).
Increase Training Time
• Train the model for more epochs, especially for deep learning models, to allow them to learn more complex patterns.
How to reduce variance (to prevent overfitting)
Simplify the Model
• Use a simpler model with fewer parameters to avoid overfitting the training data.
• Reduce the number of layers or units in a neural network, or prune a decision tree.
Increase Training Data
• Collect more data to give the model more examples, helping it generalize better to unseen data.
Regularization Techniques
• Augment the existing dataset with transformations (for images, techniques like rotation, flipping, and cropping).
• Split into training and test data sets multiple times
• Use L1/L2 Regularization: Adds a penalty term to the loss function, discouraging the model from fitting noise.
• Use Dropout (for neural networks): Randomly drop neurons during training to prevent the model from becoming too dependent on specific pathways.
• Use Early Stopping: Stop training when performance on a validation set starts to degrade, indicating overfitting.