Pathology-based deep learning features for predicting basal and luminal subtypes in bladder cancer

BACKGROUND: Bladder cancer (BLCA) exists a profound molecular diversity, with basal and luminal subtypes having different prognostic and therapeutic outcomes. Traditional methods for molecular subtyping are often time-consuming and resource-intensive. This study aims to develop machine learning models using deep learning features from hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict basal and luminal subtypes in BLCA.
METHODS: RNA sequencing data and clinical outcomes were downloaded from seven public BLCA databases, including TCGA, GEO datasets, and the IMvigor210C cohort, to assess the prognostic value of BLCA molecular subtypes. WSIs from TCGA were used to construct and validate the machine learning models, while WSIs from Shanghai Tenth People's Hospital (STPH) and The Affiliated Guangdong Second Provincial General Hospital of Jinan University (GD2H) were used as external validations. Deep learning models were trained to obtained tumor patches within WSIs. WSI level deep learning features were extracted from tumor patches based on the RetCCL model. Support vector machine (SVM), random forest (RF), and logistic regression (LR) were developed using these features to classify basal and luminal subtypes.
RESULTS: Kaplan-Meier survival and prognostic meta-analyses showed that basal BLCA patients had significantly worse overall survival compared to luminal BLCA patients (hazard ratio = 1.47, 95% confidence interval: 1.25-1.73, P < 0.001). The LR model based on tumor patch features selected by Resnet50 model demonstrated superior performance, achieving an area under the curve (AUC) of 0.88 in the internal validation set, and 0.81 and 0.64 in the external validation sets from STPH and GD2H, respectively. This model outperformed both junior and senior pathologists in the differentiation of basal and luminal subtypes (AUC: 0.85, accuracy: 74%, sensitivity: 66%, specificity: 82%).
CONCLUSIONS: This study showed the efficacy of machine learning models in predicting the basal and luminal subtypes of BLCA based on the extraction of deep learning features from tumor patches in H&E-stained WSIs. The performance of the LR model suggests that the integration of AI tools into the diagnostic process could significantly enhance the accuracy of molecular subtyping, thereby potentially informing personalized treatment strategies for BLCA patients.

© 2025. The Author(s).
BMC cancer, 2025-02-22