Predicting Inhibitor Development in Hemophilia 'A' using Machine Learning: A Comprehensive Approach to Data Preprocessing, Balancing, and Biomarker Identification Using AI on the CHAMP Dataset

BACKGROUND: Hemophilia 'A' (HA) is a genetic blood disorder characterized by a deficiency of Factor VIII (FVIII), with treatment often triggering the development of neutralizing antibodies (inhibitors) to FVIII. Predicting the development of these inhibitors is crucial for clinical applications but presents significant computational challenges due to data imbalance, skewed data, and inadequate data sanitization.
OBJECTIVE: This study aimed to develop a machine-learning/AI approach to find biomarkers and predict the development of inhibitors to Factor VIII in patients with Hemophilia 'A,' addressing the challenges associated with data imbalance and enhancing prediction accuracy.
METHODS: The data were sanitized and encoded for prediction, and the Random Over-sampling (ROS) technique was employed to resolve data imbalance in the CHAMP dataset. Several machine- learning classification models, including Random Forest, XG Boost, Cat Boost, Logistic Regression, Gradient Boosting, and Light GBM, were utilized. Hyperparameters were tuned using GridSearchCV optimization with a stratified k-fold approach. The performance of the models was evaluated based on accuracy, precision, recall, and F1 scores. The Random Forest model was further analyzed using an explainable AI (XAI) tool known as SHAP (SHapley Additive exPlanations) to identify the variables influencing model performance.
RESULTS: The Random Forest model outperformed other classifiers, achieving a mean accuracy of 97.37%, along with closely aligned precision, recall, and F1 scores. The XAI tool SHAP facilitated the ranking of variables Clinical Severity, Variant Type, Exon, HGVS cDNA, hg19 Coordinates, and others according to their impact on the model's predictions. Additionally, the study identified biomarkers associated with FVIII inhibition.
CONCLUSION: This study presents a breakthrough in the early prediction of inhibitor development in Hemophilia 'A' patients, paving the way for personalized and effective treatment programs. The integration of the preprocessing pipeline, Random Forest model, and SHAP analysis offers a novel solution for guiding treatment strategies for HA patients, which could significantly enhance the development of targeted and effective therapies.

Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.net.
Current pharmaceutical biotechnology, 2025-04-24