Порівняння методів оптимізації для навчання нейронних мереж

N. Polishchuk; S. Нrinyuk; S. Datsyuk

N. Polishchuk LNTU
S. Нrinyuk LNTU
S. Datsyuk LNTU

Keywords: optimization methods, neural networks, gradient descent method, stochastic gradient, tensorflow, machine learning, convolutional neural networks

Abstract

Modern methods of training neural networks consist in finding the minimum of some continuous error function. Over the past years, various optimization algorithms have been proposed that use different approaches to update the parameters of the model weights. This article describes the most common optimization methods used in neural networks training process, also provides a comparative analysis of these methods on the example of learning simple convolutional neural network on the MNIST data set. Analysed various implementations of the gradient descent method, impulse methods, adaptive methods, generalized problems of their use.

References

Kelley, Henry J. (1960). Gradient theory of optimal flight paths. Ars Journal 30(10): 947–954. doi:10.2514/8.5282. (англ.)

Arthur E. Bryson [en] (1961, April). A gradient method for optimizing multi-stage allocation processes. In Proceedings of the Harvard Univ. Symposium on digital computers and their applications. (англ.)

Dreyfus, Stuart (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications 5 (1): 30–45. doi:10.1016/0022-247x(62)90004-5. (англ.)

Dreyfus, Stuart (1973). The computational solution of optimal control problems with time lag. IEEE Transactions on Automatic Control 18 (4): 383–385. doi:10.1109/tac.1973.1100330. (англ.)

Schmidhuber, Jürgen (2015). Deep Learning. Scholarpedia 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi:10.4249/scholarpedia.32832.(англ.)

Ruder, S. An overview of gradient descent optimization algorithms / S. Ruder // Cornell University Library. – 2016. – URL: https://arxiv. org/abs/1609.04747

Jordan, J. Intro to optimization in deep learning: Gradient Descent/ J. Jordan // Paperspace. Series: Optimization. – 2018. – URL: https://blog.paperspace.com/intro-to-optimization-in-deep-learning-gradient-descent/

Seppo Linnainmaa[en] (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 6-7. (англ.)

Anish Singh Walia Types of Optimization Algorithms used in Neural Networks and Ways to Optimize Gradient Descent – URL: https://towardsdatascience.com/types-of-optimization-algorithms-used-in-neural-networks-and-ways-to-optimize-gradient-95ae5d39529f

Fletcher, R. Practical methods of optimization / R. Fletcher. – Wiley, 2000. – 450 p.¬

Comparison of optimization methods for neural networks training

Abstract

References