TIMILSINA, Mohan, Dirk FEY, Adrianna JANIK, Maria TORRENTE, Mariano PROVENCIO, Alberto Cruz BERMUDEZ, Enric CARCERENY, Luca COSTABELLO, Delvys Rodrıguez ABREU, Manuel COBO, Rafael Lopez CASTRO, Reyes BERNABE, Maria GUIRADO, Pasquale MINERVINI and Vít NOVÁČEK. Integration of Medical and Genomic Information to Enhance Relapse Prediction in Early Stage Lung Cancer Patients. Online. In Proceedings of the Annual Symposium of the American Medical Informatics Association. Washington, USA: AMIA, 2022, p. 1082-1091. ISSN 1559-4076.
Other formats:   BibTeX LaTeX RIS
Basic information
Original name Integration of Medical and Genomic Information to Enhance Relapse Prediction in Early Stage Lung Cancer Patients
Authors TIMILSINA, Mohan, Dirk FEY, Adrianna JANIK, Maria TORRENTE, Mariano PROVENCIO, Alberto Cruz BERMUDEZ, Enric CARCERENY, Luca COSTABELLO, Delvys Rodrıguez ABREU, Manuel COBO, Rafael Lopez CASTRO, Reyes BERNABE, Maria GUIRADO, Pasquale MINERVINI and Vít NOVÁČEK (203 Czech Republic, guarantor, belonging to the institution).
Edition Washington, USA, Proceedings of the Annual Symposium of the American Medical Informatics Association, p. 1082-1091, 10 pp. 2022.
Publisher AMIA
Other information
Original language English
Type of outcome Proceedings paper
Field of Study 10201 Computer sciences, information science, bioinformatics
Country of publisher United States of America
Confidentiality degree is not subject to a state or trade secret
Publication form electronic version available online
WWW URL
Organization unit Faculty of Informatics
ISSN 1559-4076
Keywords in English relapse; lung cancer; imputation; machine learning; genomic scores
Tags Artificial Intelligence, knowledge graphs, machine learning, medical informatics
Tags International impact, Reviewed
Changed by Changed by: RNDr. Pavel Šmerk, Ph.D., učo 3880. Changed: 24/4/2024 14:26.
Abstract
Early-stage lung cancer is crucial clinically due to its insidious nature and rapid progression. Most of the prediction models designed to predict tumour recurrence in the early stage of lung cancer rely on the clinical or medical history of the patient. However, their performance could likely be improved if the input patient data contained genomic information. Unfortunately, such data is not always collected. This is the main motivation of our work, in which we have imputed and integrated specific type of genomic data with clinical data to increase the accuracy of machine learning models for prediction of relapse in early-stage, non-small cell lung cancer patients. Using a publicly available TCGA lung adenocarcinoma cohort of 501 patients, their aneuploidy scores were imputed into similar records in the Spanish Lung Cancer Group (SLCG) data, more specifically a cohort of 1348 early-stage patients. First, the tumor recurrence in those patients was predicted without the imputed aneuploidy scores. Then, the SLCG data were enriched with the aneuploidy scores imputed from TCGA. This integrative approach improved the prediction of the relapse risk, achieving area under the precision-recall curve (PR-AUC) score of 0.74, and area under the ROC (ROC-AUC) score of 0.79. Using the prediction explanation model SHAP (SHapley Additive exPlanations), we further explained the predictions performed by the machine learning model. We conclude that our explainable predictive model is a promising tool for oncologists that addresses an unmet clinical need of post-treatment patient stratification based on the relapse risk, while also improving the predictive power by incorporating proxy genomic data not available for the actual specific patients.
PrintDisplayed: 1/5/2024 06:56