The reliability of a deep learning model in clinical
out-of-distribution MRI data: A multicohort study

J 2020

The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

MARTENSSON, G., D. FERREIRA, T. GRANBERG, L. CAVALLIN, K. OPPEDAL et. al.

Basic information

Original name

The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study

Authors

MARTENSSON, G. (guarantor), D. FERREIRA, T. GRANBERG, L. CAVALLIN, K. OPPEDAL, A. PADOVANI, Irena REKTOROVÁ (203 Czech Republic, belonging to the institution), L. BONANNI, M. PARDINI, M. G. KRAMBERGER, J. P. TAYLOR, J. HORT, J. SNAEDAL, J. KULISEVSKY, F. BLANC, A. ANTONINI, P. MECOCCI, B. VELLAS, M. TSOLAKI, I. KLOSZEWSKA, H. SOININEN, S. LOVESTONE, A. SIMMONS, D. AARSLAND and E. WESTMAN

Edition

Medical Image Analysis, AMSTERDAM, ELSEVIER SCIENCE BV, 2020, 1361-8415

Other information

Language

English

Type of outcome

Článek v odborném periodiku

Field of Study

30224 Radiology, nuclear medicine and medical imaging

Country of publisher

Netherlands

Confidentiality degree

není předmětem státního či obchodního tajemství

References:

URL

Impact factor

Impact factor: 8.545

RIV identification code

RIV/00216224:14740/20:00118222

Organization unit

Central European Institute of Technology

DOI

http://dx.doi.org/10.1016/j.media.2020.101714

UT WoS

000579512600001

Keywords in English

Neuroimaging; Deep learning; Domain shift; Clinical application

Abstract

V originále

Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment. (C) 2020 The Author(s). Published by Elsevier B.V.

Citovat

MARTENSSON, G., D. FERREIRA, T. GRANBERG, L. CAVALLIN, K. OPPEDAL, A. PADOVANI, Irena REKTOROVÁ, L. BONANNI, M. PARDINI, M. G. KRAMBERGER, J. P. TAYLOR, J. HORT, J. SNAEDAL, J. KULISEVSKY, F. BLANC, A. ANTONINI, P. MECOCCI, B. VELLAS, M. TSOLAKI, I. KLOSZEWSKA, H. SOININEN, S. LOVESTONE, A. SIMMONS, D. AARSLAND and E. WESTMAN. The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study. Medical Image Analysis. AMSTERDAM: ELSEVIER SCIENCE BV, 2020, vol. 66, DEC 2020, p. 1-10. ISSN 1361-8415. Available from: https://dx.doi.org/10.1016/j.media.2020.101714.

@article{1744837,
   author = {Martensson, G. and Ferreira, D. and Granberg, T. and Cavallin, L. and Oppedal, K. and Padovani, A. and Rektorová, Irena and Bonanni, L. and Pardini, M. and Kramberger, M. G. and Taylor, J. P. and Hort, J. and Snaedal, J. and Kulisevsky, J. and Blanc, F. and Antonini, A. and Mecocci, P. and Vellas, B. and Tsolaki, M. and Kloszewska, I. and Soininen, H. and Lovestone, S. and Simmons, A. and Aarsland, D. and Westman, E.},
   article_location = {AMSTERDAM},
   article_number = {DEC 2020},
   doi = {http://dx.doi.org/10.1016/j.media.2020.101714},
   keywords = {Neuroimaging; Deep learning; Domain shift; Clinical application},
   language = {eng},
   issn = {1361-8415},
   journal = {Medical Image Analysis},
   title = {The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study},
   url = {https://www.sciencedirect.com/science/article/pii/S1361841520300785},
   volume = {66},
   year = {2020}
}

TY  - JOUR
ID  - 1744837
AU  - Martensson, G. - Ferreira, D. - Granberg, T. - Cavallin, L. - Oppedal, K. - Padovani, A. - Rektorová, Irena - Bonanni, L. - Pardini, M. - Kramberger, M. G. - Taylor, J. P. - Hort, J. - Snaedal, J. - Kulisevsky, J. - Blanc, F. - Antonini, A. - Mecocci, P. - Vellas, B. - Tsolaki, M. - Kloszewska, I. - Soininen, H. - Lovestone, S. - Simmons, A. - Aarsland, D. - Westman, E.
PY  - 2020
TI  - The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study
JF  - Medical Image Analysis
VL  - 66
IS  - DEC 2020
SP  - 1-10
EP  - 1-10
PB  - ELSEVIER SCIENCE BV
SN  - 13618415
KW  - Neuroimaging
KW  - Deep learning
KW  - Domain shift
KW  - Clinical application
UR  - https://www.sciencedirect.com/science/article/pii/S1361841520300785
N2  - Deep learning (DL) methods have in recent years yielded impressive results in medical imaging, with the potential to function as clinical aid to radiologists. However, DL models in medical imaging are often trained on public research cohorts with images acquired with a single scanner or with strict protocol harmonization, which is not representative of a clinical setting. The aim of this study was to investigate how well a DL model performs in unseen clinical datasets-collected with different scanners, protocols and disease populations-and whether more heterogeneous training data improves generalization. In total, 3117 MRI scans of brains from multiple dementia research cohorts and memory clinics, that had been visually rated by a neuroradiologist according to Scheltens' scale of medial temporal atrophy (MTA), were included in this study. By training multiple versions of a convolutional neural network on different subsets of this data to predict MTA ratings, we assessed the impact of including images from a wider distribution during training had on performance in external memory clinic data. Our results showed that our model generalized well to datasets acquired with similar protocols as the training data, but substantially worse in clinical cohorts with visibly different tissue contrasts in the images. This implies that future DL studies investigating performance in out-of-distribution (OOD) MRI data need to assess multiple external cohorts for reliable results. Further, by including data from a wider range of scanners and protocols the performance improved in OOD data, which suggests that more heterogeneous training data makes the model generalize better. To conclude, this is the most comprehensive study to date investigating the domain shift in deep learning on MRI data, and we advocate rigorous evaluation of DL models on clinical data prior to being certified for deployment. (C) 2020 The Author(s). Published by Elsevier B.V.
ER  -

MARTENSSON, G., D. FERREIRA, T. GRANBERG, L. CAVALLIN, K. OPPEDAL, A. PADOVANI, Irena REKTOROVÁ, L. BONANNI, M. PARDINI, M. G. KRAMBERGER, J. P. TAYLOR, J. HORT, J. SNAEDAL, J. KULISEVSKY, F. BLANC, A. ANTONINI, P. MECOCCI, B. VELLAS, M. TSOLAKI, I. KLOSZEWSKA, H. SOININEN, S. LOVESTONE, A. SIMMONS, D. AARSLAND and E. WESTMAN. The reliability of a deep learning model in clinical out-of-distribution MRI data: A multicohort study. \textit{Medical Image Analysis}. AMSTERDAM: ELSEVIER SCIENCE BV, 2020, vol.~66, DEC 2020, p.~1-10. ISSN~1361-8415. Available from: https://dx.doi.org/10.1016/j.media.2020.101714.

Detailed Information on Publication Record