Machine learning-based identification of cuproptosis-related lncRNA biomarkers in diffuse large B-cell lymphoma

Multiple machine learning techniques were employed to identify key long non-coding RNA (lncRNA) biomarkers associated with cuproptosis in Diffuse Large B-Cell Lymphoma (DLBCL). Data from the TCGA and GEO databases facilitated the identification of 126 significant cuproptosis-related lncRNAs. Various feature selection methods, such as Univariate Filtering, Lasso, Boruta, and Random Forest, were integrated with a Transformer-based model to develop a robust prognostic tool. This model, validated through fivefold cross-validation, demonstrated high accuracy and robustness in predicting risk scores. MALAT1 was pinpointed using permutation feature importance from machine learning methods and was further validated in DLBCL cell lines, confirming its substantial role in cell proliferation. Knockdown experiments on MALAT1 led to reduced cell proliferation, underscoring its potential as a therapeutic target. This integrated approach not only enhances the precision of biomarker identification but also provides a robust prognostic model for DLBCL, demonstrating the utility of these lncRNAs in personalized treatment strategies. This study highlights the critical role of combining diverse machine learning methods to advance DLBCL research and develop targeted cancer therapies.

© 2025. The Author(s).
Cell biology and toxicology, 2025-04-23