Cargando…
External validation of deep learning-based bone-age software: a preliminary study with real world data
Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)–based software for BA assessment using a real-world data. From Nov. 2018 to Feb....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8786917/ https://www.ncbi.nlm.nih.gov/pubmed/35075207 http://dx.doi.org/10.1038/s41598-022-05282-z |
_version_ | 1784639247335030784 |
---|---|
author | Lea, Winnah Wu-in Hong, Suk-Joo Nam, Hyo-Kyoung Kang, Woo-Young Yang, Ze-Pa Noh, Eun-Jin |
author_facet | Lea, Winnah Wu-in Hong, Suk-Joo Nam, Hyo-Kyoung Kang, Woo-Young Yang, Ze-Pa Noh, Eun-Jin |
author_sort | Lea, Winnah Wu-in |
collection | PubMed |
description | Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)–based software for BA assessment using a real-world data. From Nov. 2018 to Feb. 2019, 474 children (35 boys, 439 girls, age 4–17 years) were enrolled. We compared the BA estimated by DL software (DL-BA) with that independently estimated by 3 reviewers (R1: Musculoskeletal radiologist, R2: Radiology resident, R3: Pediatric endocrinologist) using the traditional Greulich–Pyle atlas, then to his/her chronological age (CA). A paired t-test, Pearson’s correlation coefficient, Bland–Altman plot, mean absolute error (MAE) and root mean square error (RMSE) were used for the statistical analysis. The intraclass correlation coefficient (ICC) was used for inter-rater variation. There were significant differences between DL-BA and each reviewer’s BA (P < 0.025), but the correlation was good with one another (r = 0.983, P < 0.025). RMSE (MAE) values were 10.09 (7.21), 10.76 (7.88) and 13.06 (10.06) months between DL-BA and R1, R2, R3 BA. Compared with the CA, RMSE (MAE) values were 13.54 (11.06), 15.18 (12.11), 16.19 (12.78) and 19.53 (17.71) months for DL-BA, R1, R2, R3 BA, respectively. Bland–Altman plots revealed the software and reviewers’ tendency to overestimate the BA in general. ICC values between 3 reviewers were 0.97, 0.85 and 0.86, and the overall ICC value was 0.93. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers’ compared to the chronological age in the real world clinic. |
format | Online Article Text |
id | pubmed-8786917 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-87869172022-01-25 External validation of deep learning-based bone-age software: a preliminary study with real world data Lea, Winnah Wu-in Hong, Suk-Joo Nam, Hyo-Kyoung Kang, Woo-Young Yang, Ze-Pa Noh, Eun-Jin Sci Rep Article Artificial intelligence (AI) is increasingly being used in bone-age (BA) assessment due to its complicated and lengthy nature. We aimed to evaluate the clinical performance of a commercially available deep learning (DL)–based software for BA assessment using a real-world data. From Nov. 2018 to Feb. 2019, 474 children (35 boys, 439 girls, age 4–17 years) were enrolled. We compared the BA estimated by DL software (DL-BA) with that independently estimated by 3 reviewers (R1: Musculoskeletal radiologist, R2: Radiology resident, R3: Pediatric endocrinologist) using the traditional Greulich–Pyle atlas, then to his/her chronological age (CA). A paired t-test, Pearson’s correlation coefficient, Bland–Altman plot, mean absolute error (MAE) and root mean square error (RMSE) were used for the statistical analysis. The intraclass correlation coefficient (ICC) was used for inter-rater variation. There were significant differences between DL-BA and each reviewer’s BA (P < 0.025), but the correlation was good with one another (r = 0.983, P < 0.025). RMSE (MAE) values were 10.09 (7.21), 10.76 (7.88) and 13.06 (10.06) months between DL-BA and R1, R2, R3 BA. Compared with the CA, RMSE (MAE) values were 13.54 (11.06), 15.18 (12.11), 16.19 (12.78) and 19.53 (17.71) months for DL-BA, R1, R2, R3 BA, respectively. Bland–Altman plots revealed the software and reviewers’ tendency to overestimate the BA in general. ICC values between 3 reviewers were 0.97, 0.85 and 0.86, and the overall ICC value was 0.93. The BA estimated by DL-based software showed statistically similar, or even better performance than that of reviewers’ compared to the chronological age in the real world clinic. Nature Publishing Group UK 2022-01-24 /pmc/articles/PMC8786917/ /pubmed/35075207 http://dx.doi.org/10.1038/s41598-022-05282-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Lea, Winnah Wu-in Hong, Suk-Joo Nam, Hyo-Kyoung Kang, Woo-Young Yang, Ze-Pa Noh, Eun-Jin External validation of deep learning-based bone-age software: a preliminary study with real world data |
title | External validation of deep learning-based bone-age software: a preliminary study with real world data |
title_full | External validation of deep learning-based bone-age software: a preliminary study with real world data |
title_fullStr | External validation of deep learning-based bone-age software: a preliminary study with real world data |
title_full_unstemmed | External validation of deep learning-based bone-age software: a preliminary study with real world data |
title_short | External validation of deep learning-based bone-age software: a preliminary study with real world data |
title_sort | external validation of deep learning-based bone-age software: a preliminary study with real world data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8786917/ https://www.ncbi.nlm.nih.gov/pubmed/35075207 http://dx.doi.org/10.1038/s41598-022-05282-z |
work_keys_str_mv | AT leawinnahwuin externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata AT hongsukjoo externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata AT namhyokyoung externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata AT kangwooyoung externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata AT yangzepa externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata AT noheunjin externalvalidationofdeeplearningbasedboneagesoftwareapreliminarystudywithrealworlddata |