Comparison of optimization methods for neural networks training
Abstract
Modern methods of training neural networks consist in finding the minimum of some continuous error function. Over the past years, various optimization algorithms have been proposed that use different approaches to update the parameters of the model weights. This article describes the most common optimization methods used in neural networks training process, also provides a comparative analysis of these methods on the example of learning simple convolutional neural network on the MNIST data set. Analysed various implementations of the gradient descent method, impulse methods, adaptive methods, generalized problems of their use.
References
Arthur E. Bryson [en] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications. (англ.)
Dreyfus, Stuart (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications 5 (1): 30–45. doi:10.1016/0022-247x(62)90004-5. (англ.)
Dreyfus, Stuart (1973). The computational solution of optimal control problems with time lag. IEEE Transactions on Automatic Control 18 (4): 383–385. doi:10.1109/tac.1973.1100330. (англ.)
Schmidhuber, Jürgen (2015). Deep Learning. Scholarpedia 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.(англ.)
Ruder, S. An overview of gradient descent optimization algorithms / S. Ruder // Cornell University Library. – 2016. – URL: https://arxiv. org/abs/1609.04747
Jordan, J. Intro to optimization in deep learning: Gradient Descent/ J. Jordan // Paperspace. Series: Optimization. – 2018. – URL: https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/
Seppo Linnainmaa[en] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7. (англ.)
Anish Singh Walia Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent – URL: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f
Fletcher, R. Practical methods of optimization / R. Fletcher. – Wiley, 2000. – 450 p.¬
Abstract views: 128 PDF Downloads: 427