Multi-Model Machine Learning Framework for Lung Cancer Risk Prediction: A Comparative Analysis of Nine Classifiers with Hybrid and Ensemble Approaches Using Behavioral and Hematological Parameters

LC continues to be the most prevalent cause of cancer deaths worldwide, which calls for sophisticated detection strategies. The present study investigates 34 demographic, behavioral, and hematological risk factors based on a sample of 2,000 patient data records. A multi-model machine learning approach compares nine algorithms: KNN, AdaBoost (AB), logistic regression (LR), random forest (RF), SVM, naive Bayes (NB), decision tree (DT), gradient boosting (GB), and stochastic gradient descent (SGD). Performance measures (accuracy, sensitivity, specificity, F1-score, AUC) identify quantitative differences: GB had the best F1-scores (0.953) and NB had the second-best F1-score (0.945), while GB had the best sensitivity (99.1%). The KNN-AB hybrid model reported the highest accuracy with 99.5%, while RF reported the highest AUC with a value of 0.92. Ensemble approaches (RF, GB) showed robust predictive performance across measures through integration of complementary strengths of base models. Lasso and ridge regression were able to minimize overfitting, making them easier to interpret. Therapeutic uses include integration into electronic health records (EHRs) for computerized risk stratification, LC screening earlier, and public health interventions in high-risk subjects (smokers with abnormal hematologic markers). The research highlights the value of hybrid ML models to integrate behavioral and biological data to effectively predict LC. Subsequent work can expand predictive capabilities through imaging data and genomics data incorporation, and continue to advance early identification and patient-specific therapy options. This is an intersection of computational advances and clinical translation, providing scalable solutions for global LC diagnosis.

Copyright © 2025 The Author(s). Published by Elsevier Inc. All rights reserved.
SLAS technology, 2025-06-27