A comparative study of text preprocessing methods in Orange Data Mining and KNIME

Keywords: Text Preprocess, Clustering, Orange Data Mining, Data mining, Knime Analytics, Word Cloud, text pre–processing, Text to Vector

Abstract

The paper examines the comparison of the results of preprocessing of text data in the Knime Analytics and Orange Data Mining software systems. A detailed description of the research methodology is presented, including tools for preprocessing textual data, setting up and creating models according to the capabilities of each of the programs. The obtained results are analyzed using visualization tools, while different formats of the result presentation are used. The advantages and disadvantages of each tool are revealed in the recommendations for the use of this or that software system in different conditions. The results of pre-processing of text data in both software systems showed that the data were cleaned of noise, unwanted words and syntactic elements, which made it possible to highlight key themes and trends from the test material.

References

1. Manning C. D., Raghavan P., & Schütze H. Introduction to Information Retrieval, 2022.
2. Charu C. Aggarwal, ChengXiang Zhai. Mining Text Data, 2012.
3. Orange Data Mining.
4. KNIME Analytics Platform.
5. Text Preprocessing Orange Blog.

Abstract views: 65
PDF Downloads: 32
Published
2024-09-28
How to Cite
Koval , I., & Surynovych , O. (2024). A comparative study of text preprocessing methods in Orange Data Mining and KNIME. COMPUTER-INTEGRATED TECHNOLOGIES: EDUCATION, SCIENCE, PRODUCTION, (56), 191-198. https://doi.org/10.36910/6775-2524-0560-2024-56-24
Section
Computer science and computer engineering