Mid-level data fusion of pleural effusion SERS spectra and serum CEA levels using machine learning algorithms for precise lung cancer detection

Accurate identification of clinically malignant pleural effusions is critical for cancer diagnosis and subsequent treatment planning. Here, surface-enhanced Raman spectroscopy (SERS) data of pleural effusions and serum carcinoembryonic antigen (CEA) levels were integrated to develop an innovative mid-level data fusion method combined with machine learning algorithms to improve the accuracy of cancer detection. SERS spectra of pleural effusions from 15 lung cancer patients, 10 other cancer patients, and 28 non-cancer patients were first acquired using a handheld Raman spectrometer. The principal component analysis (PCA) scores from the SERS spectra were merged with the digitized serum CEA values to generate a data fusion array. Machine learning algorithms such as linear discriminant analysis (LDA), k-nearest neighbor (KNN), and support vector machine (SVM) were applied to train the fused dataset using five-fold cross-validation. Notably, the fusion strategy achieved superior performance compared to the pure SERS spectral discrimination model, with the KNN algorithm demonstrating very high accuracy (>85%) in distinguishing the three clinical groups of lung cancer vs. non-cancer, other cancers vs. non-cancer, and lung cancer vs. other cancers. These results highlight the synergistic diagnostic capability of combining molecular spectroscopic fingerprints with tumor biomarkers for pleural effusion analysis, thereby providing a new strategy for rapid and accurate clinical cancer discrimination via liquid biopsy.
Nanoscale, 2025-06-25