A Turkish Text Classification Based Feature Selection and Density Peaks Clustering
Künye
Zorarpaci, E. (2023). A Turkish Text Classification Based Feature Selection and Density Peaks Clustering [Öznitelik Seçimi ve Yoǧunluk Tepelerini Kümelemeye Dayali Türkçe Metin Siniflandirma]. 31st IEEE Conference on Signal Processing and Communications Applications, SIU 2023.Özet
Text classification, a well-known Natural Language Processing (NLP) task, can be defined as the process of categorizing documents according to their content. In this process, the selection of classification algorithms and the determination of the correct variables for classification are very important for an efficient classification. The texts to be classified in this study are first preprocessed using the IG (Information gain) method, taking into account the Tf (Term frequency) and Idf (Reverse document frequency) values, and then they are divided into different categories using the DPC (Clustering Density Peaks) algorithm which is a semi-supervised algorithm. In the study, TTC-3600 dataset, which includes texts obtained from 6 well-known Turkish news portals and 6 different fields, was used. The study performed better than the previous results in the selected dataset.