Cargando…

Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations

BACKGROUND: Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made the...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ammann, Clemens, Hadler, Thomas, Gröschel, Jan, Kolbitsch, Christoph, Schulz-Menger, Jeanette
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Cardiovascular Medicine
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10151814/ https://www.ncbi.nlm.nih.gov/pubmed/37144061 http://dx.doi.org/10.3389/fcvm.2023.1118499

_version_	1785035622325420032
author	Ammann, Clemens Hadler, Thomas Gröschel, Jan Kolbitsch, Christoph Schulz-Menger, Jeanette
author_facet	Ammann, Clemens Hadler, Thomas Gröschel, Jan Kolbitsch, Christoph Schulz-Menger, Jeanette
author_sort	Ammann, Clemens
collection	PubMed
description	BACKGROUND: Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made their way from academia into clinical practice. In the quality assessment and control of medical artificial intelligence, the opaque reasoning and associated distinctive errors of neural networks meet an extraordinarily low tolerance for failure. AIM: The aim of this study is a multilevel analysis and comparison of the performance of three popular convolutional neural network (CNN) models for cardiac function quantification. METHODS: U-Net, FCN, and MultiResUNet were trained for the segmentation of the left and right ventricles on short-axis cine images of 119 patients from clinical routine. The training pipeline and hyperparameters were kept constant to isolate the influence of network architecture. CNN performance was evaluated against expert segmentations for 29 test cases on contour level and in terms of quantitative clinical parameters. Multilevel analysis included breakdown of results by slice position, as well as visualization of segmentation deviations and linkage of volume differences to segmentation metrics via correlation plots for qualitative analysis. RESULTS: All models showed strong correlation to the expert with respect to quantitative clinical parameters (r(z)(′) = 0.978, 0.977, 0.978 for U-Net, FCN, MultiResUNet respectively). The MultiResUNet significantly underestimated ventricular volumes and left ventricular myocardial mass. Segmentation difficulties and failures clustered in basal and apical slices for all CNNs, with the largest volume differences in the basal slices (mean absolute error per slice: 4.2 ± 4.5 ml for basal, 0.9 ± 1.3 ml for midventricular, 0.9 ± 0.9 ml for apical slices). Results for the right ventricle had higher variance and more outliers compared to the left ventricle. Intraclass correlation for clinical parameters was excellent (≥0.91) among the CNNs. CONCLUSION: Modifications to CNN architecture were not critical to the quality of error for our dataset. Despite good overall agreement with the expert, errors accumulated in basal and apical slices for all models.
format	Online Article Text
id	pubmed-10151814
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-101518142023-05-03 Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations Ammann, Clemens Hadler, Thomas Gröschel, Jan Kolbitsch, Christoph Schulz-Menger, Jeanette Front Cardiovasc Med Cardiovascular Medicine BACKGROUND: Cardiac function quantification in cardiovascular magnetic resonance requires precise contouring of the heart chambers. This time-consuming task is increasingly being addressed by a plethora of ever more complex deep learning methods. However, only a small fraction of these have made their way from academia into clinical practice. In the quality assessment and control of medical artificial intelligence, the opaque reasoning and associated distinctive errors of neural networks meet an extraordinarily low tolerance for failure. AIM: The aim of this study is a multilevel analysis and comparison of the performance of three popular convolutional neural network (CNN) models for cardiac function quantification. METHODS: U-Net, FCN, and MultiResUNet were trained for the segmentation of the left and right ventricles on short-axis cine images of 119 patients from clinical routine. The training pipeline and hyperparameters were kept constant to isolate the influence of network architecture. CNN performance was evaluated against expert segmentations for 29 test cases on contour level and in terms of quantitative clinical parameters. Multilevel analysis included breakdown of results by slice position, as well as visualization of segmentation deviations and linkage of volume differences to segmentation metrics via correlation plots for qualitative analysis. RESULTS: All models showed strong correlation to the expert with respect to quantitative clinical parameters (r(z)(′) = 0.978, 0.977, 0.978 for U-Net, FCN, MultiResUNet respectively). The MultiResUNet significantly underestimated ventricular volumes and left ventricular myocardial mass. Segmentation difficulties and failures clustered in basal and apical slices for all CNNs, with the largest volume differences in the basal slices (mean absolute error per slice: 4.2 ± 4.5 ml for basal, 0.9 ± 1.3 ml for midventricular, 0.9 ± 0.9 ml for apical slices). Results for the right ventricle had higher variance and more outliers compared to the left ventricle. Intraclass correlation for clinical parameters was excellent (≥0.91) among the CNNs. CONCLUSION: Modifications to CNN architecture were not critical to the quality of error for our dataset. Despite good overall agreement with the expert, errors accumulated in basal and apical slices for all models. Frontiers Media S.A. 2023-04-18 /pmc/articles/PMC10151814/ /pubmed/37144061 http://dx.doi.org/10.3389/fcvm.2023.1118499 Text en © 2023 Ammann, Hadler, Gröschel, Kolbitsch and Schulz-Menger. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) (https://creativecommons.org/licenses/by/4.0/) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Cardiovascular Medicine Ammann, Clemens Hadler, Thomas Gröschel, Jan Kolbitsch, Christoph Schulz-Menger, Jeanette Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title	Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title_full	Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title_fullStr	Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title_full_unstemmed	Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title_short	Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations
title_sort	multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: on the redundancy of architectural variations
topic	Cardiovascular Medicine
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10151814/ https://www.ncbi.nlm.nih.gov/pubmed/37144061 http://dx.doi.org/10.3389/fcvm.2023.1118499
work_keys_str_mv	AT ammannclemens multilevelcomparisonofdeeplearningmodelsforfunctionquantificationincardiovascularmagneticresonanceontheredundancyofarchitecturalvariations AT hadlerthomas multilevelcomparisonofdeeplearningmodelsforfunctionquantificationincardiovascularmagneticresonanceontheredundancyofarchitecturalvariations AT groscheljan multilevelcomparisonofdeeplearningmodelsforfunctionquantificationincardiovascularmagneticresonanceontheredundancyofarchitecturalvariations AT kolbitschchristoph multilevelcomparisonofdeeplearningmodelsforfunctionquantificationincardiovascularmagneticresonanceontheredundancyofarchitecturalvariations AT schulzmengerjeanette multilevelcomparisonofdeeplearningmodelsforfunctionquantificationincardiovascularmagneticresonanceontheredundancyofarchitecturalvariations

Multilevel comparison of deep learning models for function quantification in cardiovascular magnetic resonance: On the redundancy of architectural variations

Ejemplares similares