Програмне порівняння анотацій наукових статей за допомогою статистичних методик обробки природніх мов.

S. Prykhodchenko; O. Prykhodchenko; O. Shevtsova

doi:10.36910/6775-2524-0560-2025-58-18

S. Prykhodchenko https://orcid.org/0000-0002-6562-0601
O. Prykhodchenko https://orcid.org/0000-0001-5080-737X
O. Shevtsova https://orcid.org/0000-0002-0148-5877

DOI: https://doi.org/10.36910/6775-2524-0560-2025-58-18

Keywords: natural language processing (NLP), automated text analysis, scientific annotations, text comparison, automated literature review

Abstract

This paper presents an advanced approach to the automated comparison of scientific article abstracts using statistical natural language processing (NLP) techniques. The authors analyze state-of-the-art methodologies, including Cosine Similarity, Jaccard Similarity, and TF-IDF, alongside clustering methods and machine learning models such as SciBERT and BioBERT. A research-based software model is proposed to enhance text similarity assessment, facilitating efficient scientific literature analysis, reducing research duplication, and improving bibliographic accuracy. The study highlights the practical application of NLP techniques in academic publishing, plagiarism detection, and automated literature review systems. The proposed system integrates various computational approaches to refine text analysis and classification, making it a valuable tool for researchers and journal editors. Future research directions include optimizing NLP algorithms, incorporating deep learning methods, and integrating the system with major scientific databases to enhance further its applicability and performance in academic and industrial contexts.

References

1. Turrisi, R. (2023). Beyond original Research Articles Categorization via NLP.
2. Masoumi, S., Amirkhani, H., Sadeghian, N. et al. Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research. Syst Rev 13, 107 (2024).
3. Timur, Ishankulov., Gleb, Danilov., Konstantin, Kotik., Yu., N., Orlov., Mikhail, Shifrin., Alexander, Potapov. (2022). The Classification of Scientific Abstracts Using Text Statistical Features. MedInfo, 290:263-267
4. Starukhin, Yaroslav & Diukarev, Vladimir. (2024). Automation of text data processing using NLP. The American Journal of Engineering and Technology. 6. 24-39.
5. (2022). The Classification of Scientific Abstracts Using Text Statistical Features.

Software comparison of scientific article annotations using statistical natural language processing methods

Abstract

References