Software comparison of scientific article annotations using statistical natural language processing methods
Abstract
This paper presents an advanced approach to the automated comparison of scientific article abstracts using statistical natural language processing (NLP) techniques. The authors analyze state-of-the-art methodologies, including Cosine Similarity, Jaccard Similarity, and TF-IDF, alongside clustering methods and machine learning models such as SciBERT and BioBERT. A research-based software model is proposed to enhance text similarity assessment, facilitating efficient scientific literature analysis, reducing research duplication, and improving bibliographic accuracy. The study highlights the practical application of NLP techniques in academic publishing, plagiarism detection, and automated literature review systems. The proposed system integrates various computational approaches to refine text analysis and classification, making it a valuable tool for researchers and journal editors. Future research directions include optimizing NLP algorithms, incorporating deep learning methods, and integrating the system with major scientific databases to enhance further its applicability and performance in academic and industrial contexts.
References
2. Masoumi, S., Amirkhani, H., Sadeghian, N. et al. Natural language processing (NLP) to facilitate abstract review in medical research: the application of BioBERT to exploring the 20-year use of NLP in medical research. Syst Rev 13, 107 (2024).
3. Timur, Ishankulov., Gleb, Danilov., Konstantin, Kotik., Yu., N., Orlov., Mikhail, Shifrin., Alexander, Potapov. (2022). The Classification of Scientific Abstracts Using Text Statistical Features. MedInfo, 290:263-267
4. Starukhin, Yaroslav & Diukarev, Vladimir. (2024). Automation of text data processing using NLP. The American Journal of Engineering and Technology. 6. 24-39.
5. (2022). The Classification of Scientific Abstracts Using Text Statistical Features.
Abstract views: 7 PDF Downloads: 4