Cargando…

Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approac...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kashyap, Mehr, Seneviratne, Martin, Banda, Juan M, Falconer, Thomas, Ryu, Borim, Yoo, Sooyoung, Hripcsak, George, Shah, Nigam H
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309227/ https://www.ncbi.nlm.nih.gov/pubmed/32374408 http://dx.doi.org/10.1093/jamia/ocaa032

_version_	1783549172019888128
author	Kashyap, Mehr Seneviratne, Martin Banda, Juan M Falconer, Thomas Ryu, Borim Yoo, Sooyoung Hripcsak, George Shah, Nigam H
author_facet	Kashyap, Mehr Seneviratne, Martin Banda, Juan M Falconer, Thomas Ryu, Borim Yoo, Sooyoung Hripcsak, George Shah, Nigam H
author_sort	Kashyap, Mehr
collection	PubMed
description	OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.
format	Online Article Text
id	pubmed-7309227
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-73092272020-06-29 Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network Kashyap, Mehr Seneviratne, Martin Banda, Juan M Falconer, Thomas Ryu, Borim Yoo, Sooyoung Hripcsak, George Shah, Nigam H J Am Med Inform Assoc Research and Applications OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research. Oxford University Press 2020-05-06 /pmc/articles/PMC7309227/ /pubmed/32374408 http://dx.doi.org/10.1093/jamia/ocaa032 Text en © The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Kashyap, Mehr Seneviratne, Martin Banda, Juan M Falconer, Thomas Ryu, Borim Yoo, Sooyoung Hripcsak, George Shah, Nigam H Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title	Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_full	Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_fullStr	Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_full_unstemmed	Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_short	Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
title_sort	development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7309227/ https://www.ncbi.nlm.nih.gov/pubmed/32374408 http://dx.doi.org/10.1093/jamia/ocaa032
work_keys_str_mv	AT kashyapmehr developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT seneviratnemartin developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT bandajuanm developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT falconerthomas developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT ryuborim developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT yoosooyoung developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT hripcsakgeorge developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork AT shahnigamh developmentandvalidationofphenotypeclassifiersacrossmultiplesitesintheobservationalhealthdatasciencesandinformaticsnetwork

Development and validation of phenotype classifiers across multiple sites in the observational health data sciences and informatics network

Ejemplares similares