Extracting critical clinical indicators and survival prediction of lung cancer from pathology reports using large language models
Chang YC, Hsiao SH, Yeh WC, Hsing YC, Wang CC, Chen CY
Lung cancer remains the leading cause of cancer deaths in many developed countries, primarily due to late-stage diagnosis. Histopathology, the gold standard for diagnosis, often results in semi-structured pathological reports containing complex information that can hinder timely clinical decision-making. This study evaluates the effectiveness of pre-trained language models (PLMs) in extracting 16 critical features from pathology reports (as defined by National Comprehensive Cancer Network (NCCN) guidelines) and then using them for survival prediction in patients with advanced-stage (AJCC stage IIIB-IV) lung cancer. Approximately 20,000 pathology reports from 4600 lung cancer patients across three Taipei Medical University (TMU) hospitals were analyzed. Rigorous validation included 10-fold cross-validation on 3047 annotated reports from TMU Hospital and external validation with 1258 reports from National Taiwan University Hospital (NTUH). Among the models tested, the fine-tuned LLaMA 3 demonstrated superior accuracy, achieving a 92 % F1-score in feature extraction and a 70 % F1-score in survival prediction. Cross-hospital validation confirmed its robustness and generalizability, highlighting its potential for clinical application. This research underscores the transformative potential of PLMs in lung cancer care to enable automated extraction of critical clinical indicators and improve survival prediction, which can enhance decision-making efficiency and patient outcomes.
Copyright © 2025 Elsevier Ltd. All rights reserved.
Computers in biology and medicine, 2025-06-26