Diagnosis of Oral Cancer With Deep Learning. A Comparative Test Accuracy Systematic Review
Nieri M, Serni L, Clauser T, Paoletti C, Franchi L
OBJECTIVE: To directly compare the diagnostic accuracy of deep learning models with human experts and other diagnostic methods used for the clinical detection of oral cancer.
METHODS: Comparative diagnostic studies involving patients with photographic images of oral mucosal lesions (cancer or non-cancer) were included. Only studies using deep learning methods were eligible. Medline, EMBASE, Scopus, Google Scholar, and ClinicalTrials.gov were searched until September 2024. QUADAS-C assessed the risk of bias. A Bayesian meta-analysis compared diagnostic test accuracy.
RESULTS: Eight studies were included, none of which had a low risk of bias. Three studies compared deep learning versus human experts. The difference in sensitivity favored deep learning by 0.024 (95% CI: -0.093, 0.206), while the difference in specificity favored human experts by -0.041 (95% CI: -0.218, 0.038). Two studies compared deep learning versus postgraduate medical students. The differences in sensitivity and specificity favored deep learning by 0.108 (95% CI: -0.038, 0.324) and by 0.010 (95% CI: -0.119, 0.111), respectively. Both comparisons provided low-level evidence.
CONCLUSIONS: Deep learning models showed comparable sensitivity and specificity to human experts. These models outperformed postgraduate medical students in terms of sensitivity. Prospective clinical trials are needed to evaluate the real-world performance of deep learning models.
© 2025 The Author(s). Oral Diseases published by John Wiley & Sons Ltd.
Oral diseases, 2025-04-02