In data science, overfitting can pose a major challenge to a model. It happens when “the algorithm, unfortunately, cannot perform accurately against unseen data, defeating its purpose” (IBM Cloud Education, 2021, para. 1). For interpolation, “‘double-descent’ curve subsumes the textbook U-shaped bias–variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance” (Belkin et al., 2019, p. 15849). Therefore, overfitting is not an issue for all models since some can benefit from the double descent (Provost and Fawcett, 2013). In a classical sense, three reasons why overfitting is an issue to consider for all models include the introduction of noise, high variance, and low bias. Firstly, models impacted by overfitting can begin learning irrelevant information (Bilbao and Bilbao, 2017). Secondly, all models can acquire high levels of variance due to the issue (Le et al., 2018). Thirdly, the latter phenomenon causes a lower bias of deviating from the ‘sweet spot’ further (Rocks and Mehta, 2022). In other words, these reasons can impact all models to varying extents.
However, in some cases, overfitting might not be an issue if the interpolation threshold is breached. The three primary reasons include inductive bias models, such as random Fourier features, SGD neural networks, and random forests (Belkin et al., 2019). Firstly, for random Fourier features, it is possible to “incorporate the deep architecture into kernel learning, which significantly boosts the flexibility and richness of kernel machines” (Xie et al., 2019, p. 1). Secondly, “convergence rates of SGD to a global minimum and provide generalization guarantees for this global minimum that are independent of the network size” (Brutzkus et al., 2017, p. 1). Thirdly, “interpolated classifiers appear to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests” (Belkin, Hsu, and Mitra, 2018, p. 1). Therefore, the recent evidence indicates that overfitting becomes less of a problem after some point in all complex models.
When it comes to an organizational example, Google can be used to show how overfitting intertwines with a bias-variances tradeoff and constitutes an issue to consider for all models. According to Peter Norvig, Google’s director of research, “We don’t have better algorithms. We just have more data” (McAfee and Brynjolfsson, 2012, para. 9). Overfitting causes “wildly inaccurate results unless you have a human intervention to validate the output variables” (Delua, 2021, para. 19). In other words, the company had an abundance of data but did not have the means to properly supervise them and reduce variance. Thus, overfitting became a major risk factor for its models, which required corrections, interventions, and preventative measures.
A close-up critical analysis reveals that overfitting is a challenge for all models, but some tend to improve after interpolation. Therefore, overfitting should not be viewed as an issue only affecting classical versions since it can be a hindrance for recent frameworks as well. More advanced learning models need to be aware that crossing an interpolation point can be an effective performance-enhancing solution. However, older ones need to adhere to the conventional avoidance measures, such as early stopping or supervision. Thus, overfitting is a threat of increased variance, which eliminates the necessary degree of bias resulting in excessively complex models or the ones with insufficient generation. ‘Double descent’ should always be accounted for when dealing with enough data to pass the point of overfitting.
Belkin, M. et al. (2019) ‘Reconciling modern machine-learning practice and the classical bias–variance trade-off’, PNAS, 116(32), pp. 15849-15854. Web.
Belkin, M., Hsu, D. J., and Mitra, P. (2018) ‘Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate’, Advances in Neural Information Processing Systems, 31, pp. 1-29. Web.
Bilbao, I., and Bilbao, J. (2017) ‘Overfitting problem and the over-training in the era of data: particularly for artificial neural networks’, Eighth International Conference on Intelligent Computing and Information Systems, 2017, pp. 173-177. Web.
Brutzkus, A. et al. (2017) ‘SGD learns over-parameterized networks that provably generalize on linearly separable data’, Arxiv, 1710, pp. 1-17.
Delua, J. (2021) Supervised vs. unsupervised learning: what’s the difference? Web.
IBM Cloud Education. (2021) Overfitting. Web.
Le, X. B. D. et al. (2018) ‘Overfitting in semantics-based automated program repair’, Empirical Software Engineering, 23, pp. 3007-3033.
McAfee, A., and Brynjolfsson, E. (2012) ‘Big data: the management revolution’, Harvard Business Review, Web.
Provost, F., and Fawcett, T. (2013) Data science for business: what you need to know about data mining and data-analytic thinking. 1st edn. Sebastopol: O’Reilly Media.
Rocks, J. W., and Mehta, P. (2022). ‘Memorizing without overfitting: bias, variance, and interpolation in overparameterized models’, Physical Review Research, 4(1), pp. 1-10.
Xie, J. et al. (2019) ‘Deep kernel learning via random Fourier features’, Arxiv, 1910, pp. 1-8.