Integrating multi-cohort machine learning and clinical sample validation to explore peripheral blood mRNA diagnostic biomarkers for prostate cancer
Zhong X, Yang Y, He H, Xiong Y, Zhong M, Wang S, Xia Q
BACKGROUND: The global incidence of prostate cancer (PCa) has been rising annually, and early diagnosis and treatment remain pivotal for improving therapeutic outcomes and patient prognosis. Concurrently, advancements in liquid biopsy technology have facilitated disease diagnosis and monitoring, with its minimally invasive nature and low heterogeneity positioning it as a promising approach for predicting disease progression. However, current liquid biopsy strategies for PCa predominantly rely on prostate-specific antigen (PSA), which lacks specificity and compromises diagnostic accuracy. Thus, there is an urgent need to identify novel liquid biopsy biomarkers to enable early and precise PCa diagnosis.
METHODS: We integrated 12 machine learning algorithms to construct 113 combinatorial models, screening and validating an optimal PCa diagnostic panel across five datasets from TCGA and GEO databases. Subsequently, the biological feasibility of the selected predictive model was verified in one prostate epithelial cell line and five PCa cell lines. Robust RNA diagnostic targets were further validated for their expression in plasma samples to establish an RNA-based liquid biopsy strategy for PCa. Finally, plasma samples from PCa and benign prostatic hyperplasia (BPH) patients at Wuhan Tongji Hospital were collected to evaluate the strategy's clinical significance.
RESULTS: Differential analysis identified 1,071 candidate mRNAs, which were input into the integrated machine learning framework. Among the 113 combinatorial models, the 9-gene diagnostic panel selected by the Stepglm[both] and Enet[alpha = 0.4] algorithms demonstrated the highest diagnostic efficacy (mean AUC = 0.91), including JPH4, RASL12, AOX1, SLC18A2, PDZRN4, P2RY2, B3GNT8, KCNQ5, and APOBEC3C. Cell line experiments further validated AOX1 and B3GNT8 as robust RNA biomarkers, both exhibiting consistent PCa-specific expression in human plasma samples. In liquid biopsy analyses, AOX1 and B3GNT8 outperformed PSA in diagnostic accuracy, achieving a combined AUC of 0.91. Notably, these biomarkers also demonstrated diagnostic utility in patients with ISUP ≤ 2.
CONCLUSIONS: Through an integrated machine learning approach and clinical validation, we developed an RNA-based diagnostic panel for PCa. Specifically, we identified AOX1 and B3GNT8 as novel liquid biopsy biomarkers with promising clinical diagnostic value. These findings provide new targets and insights for early and precise PCa diagnosis.
© 2025. The Author(s).
Cancer cell international, 2025-04-24