Feature Selection for Classification based on Machine Learning algorithms for Prostate Cancer
P S, P R R, K P
Microarray technology has transformed the biotechnological research to next level in the recent years. It provides the expression levels of various genes involved in a particular disease. Prostate cancer disease turned into life threatening cancer. The genes causing this disease are identified through the classification methods. These gene expression data have problems like high dimensional with low sample size which imposes active challenges in the existing classification algorithms. Feature selection techniques are applied in order to address the dimensionality issues. This paper aims in analyzing the feature selection methods for classification of gene expression data of Prostate cancer and identifies the significant genes that cause the disease. The three different feature selection methods such as Filters, wrappers and embedded selectors are applied before the classification process for selecting the top ranked genes. Then, the extracted top ranked genes are applied on the classification algorithms such as SVM, k-NN, Random Forest and Artificial Neural Network. After the inclusion of feature selection technique, the classification accuracy is significantly boosted even with less number of genes. Random Forest classification algorithm outperforms other classification methods. The significant genes that has the major influence in prostate cancer disease are identified such as KLK3, GFI1, CXCR2 and TNFRSF10C.
© 2025 IOP Publishing Ltd. All rights, including for text and data mining, AI training, and similar technologies, are reserved.
Biomedical physics & engineering express, 2025-04-24