Cargando…

Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approac...

Descripción completa

Detalles Bibliográficos
Autores principales: Kashyap, Mehr, Seneviratne, Martin, Banda, Juan M, Falconer, Thomas, Ryu, Borim, Yoo, Sooyoung, Hripcsak, George, Shah, Nigam H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309227/
https://www.ncbi.nlm.nih.gov/pubmed/32374408
http://dx.doi.org/10.1093/jamia/ocaa032
_version_ 1783549172019888128
author Kashyap, Mehr
Seneviratne, Martin
Banda, Juan M
Falconer, Thomas
Ryu, Borim
Yoo, Sooyoung
Hripcsak, George
Shah, Nigam H
author_facet Kashyap, Mehr
Seneviratne, Martin
Banda, Juan M
Falconer, Thomas
Ryu, Borim
Yoo, Sooyoung
Hripcsak, George
Shah, Nigam H
author_sort Kashyap, Mehr
collection PubMed
description OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.
format Online
Article
Text
id pubmed-7309227
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-73092272020-06-29 Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network Kashyap, Mehr Seneviratne, Martin Banda, Juan M Falconer, Thomas Ryu, Borim Yoo, Sooyoung Hripcsak, George Shah, Nigam H J Am Med Inform Assoc Research and Applications OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research. Oxford University Press 2020-05-06 /pmc/articles/PMC7309227/ /pubmed/32374408 http://dx.doi.org/10.1093/jamia/ocaa032 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Kashyap, Mehr
Seneviratne, Martin
Banda, Juan M
Falconer, Thomas
Ryu, Borim
Yoo, Sooyoung
Hripcsak, George
Shah, Nigam H
Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_full Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_fullStr Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_full_unstemmed Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_short Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_sort development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309227/
https://www.ncbi.nlm.nih.gov/pubmed/32374408
http://dx.doi.org/10.1093/jamia/ocaa032
work_keys_str_mv AT kashyapmehr developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT seneviratnemartin developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT bandajuanm developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT falconerthomas developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT ryuborim developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT yoosooyoung developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT hripcsakgeorge developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork
AT shahnigamh developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork