Adaptation of distilling knowledge method in Natural Language Processing for sentiment analysis.
Abstract
This paper describes how to adapt an application method of " knowledge distillation " for sentiment analysis for Ukrainian and Russian languages. It is demonstrated how to minimize resources without losing much accuracy, but speeding up the text sentiment recognition, and how to decrease expenses on cloud by using the method of "knowledge distillation". For research we used two types of different neural networks architecture for natural language processing: BERT instead of ensemble models and FastText like a small model. Combination of these two neural networks (BERT as a teacher and FastText as a learner) allowed us to achieve the speedup up to 5 times and without sacrificing much accuracy in sentiment analysis task.
References
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L. and Polosukhin, I., 2021. Attention Is All You Need. [online] arXiv.org. Available at: https://arxiv.org/abs/1706.03762
Abdur Rahman, Mobashir Sadat, Saeed Siddik, "Sentiment Analysis on Twitter Data: Comparative Study on Different Approaches", International Journal of Intelligent Systems and Applications(IJISA), Vol.13, No.4, pp.1-13, 2021. DOI: 10.5815/ijisa.2021.04.01
Golam Mostafa, Ikhtiar Ahmed, Masum Shah Junayed, "Investigation of Different Machine Learning Algorithms to Determine Human Sentiment Using Twitter Data", International Journal of Information Technology and Computer Science(IJITCS), Vol.13, No.2, pp.38-48, 2021. DOI: 10.5815/ijitcs.2021.02.04
Khalid Mahboob, Fayyaz Ali, Hafsa Nizami, "Sentiment Analysis of RSS Feeds on Sports News – A Case Study", International Journal of Information Technology and Computer Science(IJITCS), Vol.11, No.12, pp.19-29, 2019. DOI: 10.5815/ijitcs.2019.12.02
Hinton, G., Vinyals, O. and Dean, J., 2021. Distilling the Knowledge in a Neural Network. [online] arXiv.org. Available at: https://arxiv.org/abs/1503.02531
C. Buciluˇa, R. Caruana, and A. Niculescu-Mizil. Model compression. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’06, pages 535–541, New York, NY, USA, 2006. ACM.
N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
Dalal AL-Alimi, Yuxiang Shao, Ahamed Alalimi, Ahmed Abdu, "Mask R-CNN for Geospatial Object Detection", International Journal of Information Technology and Computer Science(IJITCS), Vol.12, No.5, pp.63-72, 2020. DOI: 10.5815/ijitcs.2020.05.05
Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, 2016. You Only Look Once: Unified, Real-Time Object Detection. [online] arXiv.org. Available at: https://arxiv.org/abs/1506.02640
Tatoeba: Collection of sentences and translations, 2021. [online] tatoeba.org. Available at: https://tatoeba.org/en/
Devlin, J., Chang, M., Lee, K. and Toutanova, K., 2021. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. [online] arXiv.org. Available at: https://arxiv.org/abs/1810.04805
Joulin, A., Grave, E., Bojanowski, P. and Mikolov, T., 2021. Bag of Tricks for Efficient Text Classification. [online] arXiv.org. Available at: https://arxiv.org/abs/1607.01759
Kaggle: Your Machine Learning and Data Science Community, 2021. [online] Available at: https://www.kaggle.com/
Scaleway. 2021. Cloud, Compute, Storage and Network models and pricing. [online] Available at: https://www.scaleway.com/en/pricing/
Abstract views: 112 PDF Downloads: 135