Cargando…
Data mining polycystic ovary morphology in electronic medical record ultrasound reports
BACKGROUND: Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligo-anovulation, and numerous ovarian cysts. Hospital electronic medical records provide an avenue for investigating polycystic ovary morphology commonly seen in PCOS at a large scale. The purpose of this study was...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886196/ https://www.ncbi.nlm.nih.gov/pubmed/31827874 http://dx.doi.org/10.1186/s40738-019-0067-7 |
_version_ | 1783474836552548352 |
---|---|
author | Cheng, Jay Jojo Mahalingaiah, Shruthi |
author_facet | Cheng, Jay Jojo Mahalingaiah, Shruthi |
author_sort | Cheng, Jay Jojo |
collection | PubMed |
description | BACKGROUND: Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligo-anovulation, and numerous ovarian cysts. Hospital electronic medical records provide an avenue for investigating polycystic ovary morphology commonly seen in PCOS at a large scale. The purpose of this study was to develop and evaluate the performance of two machine learning text algorithms, for classification of polycystic ovary morphology (PCOM) in pelvic ultrasounds. METHODS: Pelvic ultrasound reports from patients at Boston Medical Center between October 1, 2003 and December 12, 2016 were included for analysis, which resulted in 39,093 ultrasound reports from 25,535 unique women. Following the 2003 Rotterdam Consensus Criteria for polycystic ovary syndrome, 2000 randomly selected ultrasounds were expert labeled for PCOM status as present, absent, or unidentifiable (not able to be determined from text alone). An ovary was marked as having PCOM if there was mention of numerous peripheral follicles or if the volume was greater than 10 ml in the absence of a dominant follicle or other confounding pathology. Half of the labeled data was used to develop and refine the algorithms, and the other half was used as a test set for evaluating its accuracy. RESULTS: On the evaluation set of 1000 random US reports, the accuracy of the classifiers were 97.6% (95% CI: 96.5, 98.5%) and 96.1% (94.7, 97.2%). Both models were more adept at identifying PCOM-absent ultrasounds than either PCOM-unidentifiable or PCOM-present ultrasounds. The two classifiers estimated prevalence of PCOM within the whole set of 39,093 ultrasounds to be 44% PCOM-absent, 32% PCOM-unidentifiable, and 24% PCOM-present. CONCLUSIONS: Although accuracy measured on the test set and inter-rater agreement between the two classifiers (Cohen’s Kappa = 0.988) was high, a major limitation of our approach is that it uses the ultrasound report text as a proxy and does not directly count follicles from the ultrasound images themselves. |
format | Online Article Text |
id | pubmed-6886196 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-68861962019-12-11 Data mining polycystic ovary morphology in electronic medical record ultrasound reports Cheng, Jay Jojo Mahalingaiah, Shruthi Fertil Res Pract Research Article BACKGROUND: Polycystic ovary syndrome (PCOS) is characterized by hyperandrogenemia, oligo-anovulation, and numerous ovarian cysts. Hospital electronic medical records provide an avenue for investigating polycystic ovary morphology commonly seen in PCOS at a large scale. The purpose of this study was to develop and evaluate the performance of two machine learning text algorithms, for classification of polycystic ovary morphology (PCOM) in pelvic ultrasounds. METHODS: Pelvic ultrasound reports from patients at Boston Medical Center between October 1, 2003 and December 12, 2016 were included for analysis, which resulted in 39,093 ultrasound reports from 25,535 unique women. Following the 2003 Rotterdam Consensus Criteria for polycystic ovary syndrome, 2000 randomly selected ultrasounds were expert labeled for PCOM status as present, absent, or unidentifiable (not able to be determined from text alone). An ovary was marked as having PCOM if there was mention of numerous peripheral follicles or if the volume was greater than 10 ml in the absence of a dominant follicle or other confounding pathology. Half of the labeled data was used to develop and refine the algorithms, and the other half was used as a test set for evaluating its accuracy. RESULTS: On the evaluation set of 1000 random US reports, the accuracy of the classifiers were 97.6% (95% CI: 96.5, 98.5%) and 96.1% (94.7, 97.2%). Both models were more adept at identifying PCOM-absent ultrasounds than either PCOM-unidentifiable or PCOM-present ultrasounds. The two classifiers estimated prevalence of PCOM within the whole set of 39,093 ultrasounds to be 44% PCOM-absent, 32% PCOM-unidentifiable, and 24% PCOM-present. CONCLUSIONS: Although accuracy measured on the test set and inter-rater agreement between the two classifiers (Cohen’s Kappa = 0.988) was high, a major limitation of our approach is that it uses the ultrasound report text as a proxy and does not directly count follicles from the ultrasound images themselves. BioMed Central 2019-12-01 /pmc/articles/PMC6886196/ /pubmed/31827874 http://dx.doi.org/10.1186/s40738-019-0067-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cheng, Jay Jojo Mahalingaiah, Shruthi Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title | Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title_full | Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title_fullStr | Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title_full_unstemmed | Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title_short | Data mining polycystic ovary morphology in electronic medical record ultrasound reports |
title_sort | data mining polycystic ovary morphology in electronic medical record ultrasound reports |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6886196/ https://www.ncbi.nlm.nih.gov/pubmed/31827874 http://dx.doi.org/10.1186/s40738-019-0067-7 |
work_keys_str_mv | AT chengjayjojo dataminingpolycysticovarymorphologyinelectronicmedicalrecordultrasoundreports AT mahalingaiahshruthi dataminingpolycysticovarymorphologyinelectronicmedicalrecordultrasoundreports |