A
C
D
G
M
N
R
S
X
Regularization is a technique that helps machine learning (ML) models achieve an optimal fit to specific training data. The fit represents how closely a model responds to all training data points and affects how it generalizes to new data. Overfitting occurs when the model works too hard to capture all data points, including outliers or noise that may not reflect the true properties of the data. This prevents the model from accurately responding to future data. Underfitting is the opposite of overfitting, where the model does not sufficiently account for variance in the training data. An underfit model will be too simple and won’t learn enough from the training data to make future predictions.
A loss function is calculated when fitting a model, which measures the difference between a model's predictions and the actual values in the training data. A proper fit will minimize the loss function. However, in trying to reduce the loss function, a model may become complex by introducing more and more parameters, resulting in overfitting. Regularization adds a penalty term to the loss function, forcing the model to limit its complexity by penalizing parameters beyond the optimal fit. The tuning parameter Lambda modulates the extent to which this regularization takes effect.
Lambda, also referred to as the regularization rate, is a scalar that is applied to regularization functions to adjust the impact of regularization on the model. A larger Lambda value will result in a stronger regularization effect. Lambda values must be carefully selected to create the most accurate model. Too high of a Lambda value will over-regulate the model and can result in underfitting. If a Lambda value is too low, the model can become too complex and suffer from overfitting.
Model developers should select the Lambda value at which the model performs best with the training data. The ideal Lambda value can be narrowed down through a process called cross-validation. In cross-validation, several functions are created using a range of Lambda values. The performance of the functions are then compared through cross-validation, and the best value from the range is identified. Lambda can be further honed by cross-validating a new set of functions with values surrounding the best value from the first test. This process can be repeated until varying Lambda no longer results in improved performance.