Bias

• Bias means error due to wrong assumptions.
• A high-bias model is too simple — it misses patterns in the data.
• This leads to underfitting.

Example: Trying to fit a straight line to a complicated curve.

Variance

• Variance means error due to too much sensitivity to the training data.
• A high variance model is too complex — it memorizes noise in the data.
• This leads to overfitting.

Example: Drawing a crazy squiggly line that passes through every point exactly.

Image description

Graphical illustration of Bias and Variance

Image description

Image description

How to reduce bias (to prevent underfitting)

Increase Model Complexity
• Use more complex models, such as deep neural networks, or add more layers and neurons to existing models.

• Use models that capture non-linear relationships (e.g., decision trees, random forests, or support vector machines with non-linear kernels).

Feature Engineering

• Add more relevant features that may capture the underlying patterns in the data.

• Transform features to better represent the data (e.g., polynomial features, interaction terms).

Increase Training Time

• Train the model for more epochs, especially for deep learning models, to allow them to learn more complex patterns.

How to reduce variance (to prevent overfitting)

Simplify the Model

• Use a simpler model with fewer parameters to avoid overfitting the training data.

• Reduce the number of layers or units in a neural network, or prune a decision tree.

Increase Training Data

• Collect more data to give the model more examples, helping it generalize better to unseen data.

Regularization Techniques

• Augment the existing dataset with transformations (for images, techniques like rotation, flipping, and cropping).
• Split into training and test data sets multiple times
• Use L1/L2 Regularization: Adds a penalty term to the loss function, discouraging the model from fitting noise.

• Use Dropout (for neural networks): Randomly drop neurons during training to prevent the model from becoming too dependent on specific pathways.

• Use Early Stopping: Stop training when performance on a validation set starts to degrade, indicating overfitting.