Adaptation of distilling knowledge method in Natural Language Processing for sentiment analysis.

  • O. Korovii National Technical University of Ukraine "Kyiv Polytechnic Institute of Igor Sikorsky"
Keywords: BERT, FastText, distill knowledge, neural network, natural language processing, sentiment analysis

Abstract

This paper describes how to adapt an application method of " knowledge distillation " for sentiment analysis for Ukrainian and Russian languages. It is demonstrated how to minimize resources without losing much accuracy, but speeding up the text sentiment recognition, and how to decrease expenses on cloud by using the method of "knowledge distillation". For research we used two types of different neural networks architecture for natural language processing: BERT instead of ensemble models and FastText like a small model. Combination of these two neural networks (BERT as a teacher and FastText as a learner) allowed us to achieve the speedup up to 5 times and without sacrificing much accuracy in sentiment analysis task.

References

Goodfellow, I., Bengio, Y. and Courville, A., 2016. Deep learning. Cambridge (EE. UU.): MIT Press.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L. and Polosukhin, I., 2021. Attention Is All You Need. [online] arXiv.org. Available at: https://arxiv.org/abs/1706.03762

Abdur Rahman, Mobashir Sadat, Saeed Siddik, "Sentiment Analysis on Twitter Data: Comparative Study on Different Approaches", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.4, pp.1-13, 2021. DOI: 10.5815/ijisa.2021.04.01

Golam Mostafa, Ikhtiar Ahmed, Masum Shah Junayed, "Investigation of Different Machine Learning Algorithms to Determine Human Sentiment Using Twitter Data", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.2, pp.38-48, 2021. DOI: 10.5815/ijitcs.2021.02.04

Khalid Mahboob, Fayyaz Ali, Hafsa Nizami, "Sentiment Analysis of RSS Feeds on Sports News – A Case Study", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.12, pp.19-29, 2019. DOI: 10.5815/ijitcs.2019.12.02

Hinton, G., Vinyals, O. and Dean, J., 2021. Distilling the Knowledge in a Neural Network. [online] arXiv.org. Available at: https://arxiv.org/abs/1503.02531

C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 535–541, New York, NY, USA, 2006. ACM.

N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.

Dalal AL-Alimi, Yuxiang Shao, Ahamed Alalimi, Ahmed Abdu, "Mask R-CNN for Geospatial Object Detection", International Journal of Information Technology and Computer Science(IJITCS), Vol.12, No.5, pp.63-72, 2020. DOI: 10.5815/ijitcs.2020.05.05

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, 2016. You Only Look Once: Unified, Real-Time Object Detection. [online] arXiv.org. Available at: https://arxiv.org/abs/1506.02640

Tatoeba: Collection of sentences and translations, 2021. [online] tatoeba.org. Available at: https://tatoeba.org/en/

Devlin, J., Chang, M., Lee, K. and Toutanova, K., 2021. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [online] arXiv.org. Available at: https://arxiv.org/abs/1810.04805

Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2021. Bag of Tricks for Efficient Text Classification. [online] arXiv.org. Available at: https://arxiv.org/abs/1607.01759

Kaggle: Your Machine Learning and Data Science Community, 2021. [online] Available at: https://www.kaggle.com/

Scaleway. 2021. Cloud, Compute, Storage and Network models and pricing. [online] Available at: https://www.scaleway.com/en/pricing/

Abstract views: 112
PDF Downloads: 135
Published
2021-12-23
How to Cite
Korovii , O. (2021). Adaptation of distilling knowledge method in Natural Language Processing for sentiment analysis . COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (45), 78-83. https://doi.org/10.36910/6775-2524-0560-2021-45-11
Section
Computer science and computer engineering