Machine learning technique-based four-autoantibody test for early detection of esophageal squamous cell carcinoma: a multicenter, retrospective study with a nested case-control study

BACKGROUND: Autoantibodies represent promising diagnostic blood-based biomarkers that may be generated prior to the first clinically detectable signs of cancers. In present study, we aimed to identify a novel optimized autoantibody panel with high diagnostic accuracy for clinical and preclinical esophageal squamous cell carcinoma (ESCC) using machine learning (ML) algorithms.
METHODS: We identified potential autoantibodies against tumor-associated antigens with serological proteome analysis. Serum autoantibody levels were measured by ELISA. Using a training set (n = 531), 102 models based on ML algorithms were constructed, and Partial Least Squares Generalized Linear Models (plsRglm) was selected out using receiver operating characteristics (ROC), Kolmogorov-Smirnov (K-S) test, and Population Stability Index (PSI), and further validated through an internal validation set (n = 413), external validation set 1 (n = 371), and external validation set 2 (n = 202). Then, we validated the ability of plsRglm model in predicting preclinical ESCC by a nested case-control study (24 preclinical ESCCs and 112 matched controls) within a population-based prospective cohort study.
RESULTS: ROC analysis, K-S test, and PSI showed that plsRglm model based on four autoantibodies (ALDOA, ENO1, p53, and NY-ESO-1) exhibited the better diagnostic performance and robustness, which provided a high diagnostic accuracy in diagnosing ESCC with the respective AUCs (sensitivities and specificities) of 0.860 (68.8% and 90.4%) in the training set, 0.826 (65.3% and 89.1%) in the internal validation set, and 0.851 (69.2% and 87.3%) in the external validation set 1. For early-stage ESCC, this signature also maintained diagnostic performance [0.817 (62.3% and 90.4%) in the training set; 0.842 (62.5% and 89.1%) in the internal validation set; 0.854 (63.2% and 87.3%) in the external validation set 1; and 0.850 (67.3% and 90.1%) in the external validation set 2]. In the nested case-control study, this plsRglm model could detect the presence of preclinical ESCC with the AUC of 0.723, sensitivity of 54.2%, and specificity of 86.6%.
CONCLUSIONS: Our findings indicated that the plsRglm model based on four autoantibodies might help identify preclinical and early-stage ESCC.

© 2025. The Author(s).
BMC medicine, 2025-04-25