On the application of deep learning with reinforcement in modern systems

  • А. Maltsev National Technical University of Ukraine "Kyiv Polytechnic Institute named after Igor Sikorsky"
Keywords: artificial intelligence, machine learning, deep learning with reinforcement, system.

Abstract

The article reveals the principles of application of deep learning with reinforcement in modern systems. It is emphasized that the function of reinforced learning includes the adaptation of the non-Markov model of decision-making to the situation that has developed due to the analysis of the prehistory of the decision-making process, which improves the quality of decisions. The principle of realization of training with reinforcement is described and the scheme of interaction of the agent with environment is schematically opened. For a detailed description, the use of a 2D pole balancing problem is proposed, which is the basis of the mathematical aspect. It is emphasized that in modern systems two schemes of reinforcement are most often used: the method of time differences and the method of Monte Carlo. The mathematical substantiation of each method is carried out separately and the architecture of a deep Q-network is offered. Model and non-model methods are described, it is emphasized that model methods are based on models of training with reinforcement, forcing the agent to try to understand the world and create a model for its presentation. Non-model methods try to capture two functions, the transition function and the reward function, from this model the agent has a link and can plan accordingly. However, it is noted that there is no need to study the model, and the agent can instead study the policy directly, using algorithms such as Q-learning or policy gradient. Deep Q-network, uses a convolutional neural network to directly interpret the graphical representation of the input state with the environment. It is substantiated that the deep Q-network can be considered as a parameterized policy network, which is constantly trained to approximate the optimal policy, and, mathematically, the deep Q-network uses the Bellman equation to minimize the loss function, which is effective in reducing time. However, the use of the neural network to approximate the value function proved to be unstable and could lead to discrepancies due to the bias resulting from correlative samples.

References

Eoh, Gyuho & Park, Tae-Hyoung. (2021). Cooperative Object Transportation Using Curriculum-Based Deep Reinforcement Learning. Sensors. 21. 10.3390/s21144780. Accessed: August, 6, 2021.

J. Dornheim, N. Link, and P. Gumbsch, “Model-Free Adaptive Optimal Control of Sequential Manufacturing Processes Using Reinforcement Learning,” arXiv.org, 2019. [Electronic resource]. Available: https://arxiv.org/abs/1809.06646v1. Accessed: August, 6, 2021.

Kayakökü, Hakan & Guzel, Mehmet & Bostanci, Gazi Erkan & Medeni, Ihsan & Mishra, Deepti. (2021). A Novel Behavioral Strategy for RoboCode Platform Based on Deep Q-Learning. Complexity. 2021. 1-14. 10.1155/2021/9963018. Accessed: August, 6, 2021.

Bi, L., Kim, J., Ahn, E., Kumar, A., Fulham, M., Feng, D. (2017). Dermoscopic Image Segmentation via Multistage Fully Convolutional Networks. IEEE Transactions on Biomedical Engineering, 64 (9), 2065–2074. doi: https://doi.org/10.1109/tbme.2017.2712771 Accessed: August, 6, 2021.

Vesal, S., Malakarjun Patil, S., Ravikumar, N., Maier, A. K. (2018). A Multi-task Framework for Skin Lesion Detection and Segmentation. OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis, 285–293. doi: https://doi.org/10.1007/978-3-030-01201-4_31 Accessed: August, 6, 2021.

Garau-Luis, Juan & Crawley, Edward & Cameron, Bruce. (2021). Evaluating the progress of Deep Reinforcement Learning in the real world: aligning domain-agnostic and domain-specific research. Accessed: August, 6, 2021.

Frikha, Mohamed & Gammar, Sonia & Lahmadi, Abdelkader & Andrey, Laurent. (2021). Reinforcement and deep reinforcement learning for wireless Internet of Things: A survey. Computer Communications. 178. 98-113. 10.1016/j.comcom.2021.07.014. Accessed: August, 6, 2021.

W. Haskell, and W. Huang, "Stochastic Approximation for Risk-Aware Markov Decision Processes", Arxiv.org, 2018. [Electronic resource]. Available: https://arxiv.org/pdf/1805.04238.pdf. Accessed: August, 6, 2021.

M. Rahman and H. Rashid, “Implementation of Q Learning and Deep Q Network for Controlling a Self-Balancing Robot Model,” ArXiv.org, 2018. [Electronic resource]. Available: https://arxiv.org/pdf/1807.08272.pdf . Accessed: August, 6, 2021.

Abstract views: 185
PDF Downloads: 130
Published
2021-10-28
How to Cite
Maltsev А. (2021). On the application of deep learning with reinforcement in modern systems. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (44), 37-43. https://doi.org/10.36910/6775-2524-0560-2021-44-06