A comparative study of text preprocessing methods in Orange Data Mining and KNIME
Abstract
The paper examines the comparison of the results of preprocessing of text data in the Knime Analytics and Orange Data Mining software systems. A detailed description of the research methodology is presented, including tools for preprocessing textual data, setting up and creating models according to the capabilities of each of the programs. The obtained results are analyzed using visualization tools, while different formats of the result presentation are used. The advantages and disadvantages of each tool are revealed in the recommendations for the use of this or that software system in different conditions. The results of pre-processing of text data in both software systems showed that the data were cleaned of noise, unwanted words and syntactic elements, which made it possible to highlight key themes and trends from the test material.
References
2. Charu C. Aggarwal, ChengXiang Zhai. Mining Text Data, 2012.
3. Orange Data Mining.
4. KNIME Analytics Platform.
5. Text Preprocessing Orange Blog.
Abstract views: 65 PDF Downloads: 32