Порівняння методів оптимізації нейронних мереж на прикладі задачі класифікації зображень.

M. Polishchuk; S. Kostiuchko; M. Khrystynets

doi:10.36910/6775-2524-0560-2019-37-7

M. Polishchuk LNTU
S. Kostiuchko LNTU
M. Khrystynets LNTU

DOI: https://doi.org/10.36910/6775-2524-0560-2019-37-7

Keywords: neural networks, stochastic gradient descent, optimization methods, training of neural networks, distributed computing, asynchronous server

Abstract

The article analyzes existing optimization methods and types of distributed computing for neural network training. On the basis of the conducted experiments, it was investigated the feasibility of using these methods for different types of data and architecture of neural networks.

References

Wilson, D. R. and Martinez, T. R. (2003). The general inefficiency of batch training for gradient descent learning. Neural Networks, 16(10), 1429–1451.

Rao, C. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society, 37, 81–89.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. “Rethinking the Inception Architecture for Computer Vision”. In: CoRR abs/1512.00567 (2015). arXiv: 1512.00567. URL: http: //arxiv.org/abs/1512.00567.

K. He, X. Zhang, S. Ren, and J. Sun. “Deep Residual Learning for Image Recognition”. In: Computing Research Repository abs/1512.03385 (2015). arXiv: 1512.03385. URL: http://arxiv.org/abs/ 1512.03385.

K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. In: Computing Research Repository abs/1409.1556 (2014). arXiv: 1409.1556. URL: http://arxiv.org/abs/1409.1556. K. Simonyan and A. Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. In: Computing Research Repository abs/1409.1556 (2014). arXiv: 1409.1556. URL: http://arxiv.org/abs/1409.1556.

ImageNet [Електронний ресурс]. – Режим доступу: http://www.image-net.org/ (Дата звернення 25.10.19 р.).

Tensorflow benchmarks. https://github.com/tensorflow/ benchmarks/tree/master/scripts/tf_cnn_benchmarks. 2018.

Cauchy and the Gradient Method- math.uni-bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf

Russel, S. J. and Norvig, P. (2003). Artificial Intelligence: a Modern Approach. Prentice Hall

Duchi, J., Hazan, E., and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research.

Tang, Y., Salakhutdinov, R., and Hinton, G. (2012). Deep mixtures of factor analysers. arXiv preprint arXiv:1206.4635.

Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980

Comparison of neural network optimization methods using the image classification problem.

Abstract

References