Enhancing Malignancy Detection and Tumor Classification in Pathology Reports: A Comparative Evaluation of Large Language Models

BACKGROUND: Cancer registries require accurate and efficient documentation of malignancies, yet current manual methods are time-consuming and error-prone.
OBJECTIVES: This study evaluates the effectiveness of large language models (LLMs) in classifying malignancies and detecting tumor types from pathology reports.
METHODS: Using a synthetic dataset of 227 reports, the performance of four LLMs and a score-based algorithm was compared against expert-labeled gold standards.
RESULTS: The LLMs, particularly GPT-4o and Llama3.3, demonstrated high sensitivity and specificity in both malignancy detection and tumor classification, significantly outperforming traditional algorithms.
CONCLUSION: LLMs enhance the accuracy and efficiency of cancer data classification and hold promise for improving public health monitoring and clinical decision-making.
Studies in health technology and informatics, 2025-04-26