Cargando…

A cloud-based pipeline for analysis of FHIR and long-read data

MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical F...

Descripción completa

Detalles Bibliográficos
Autores principales: Dunn, Tim, Cosgun, Erdal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872570/
https://www.ncbi.nlm.nih.gov/pubmed/36726729
http://dx.doi.org/10.1093/bioadv/vbac095
_version_ 1784877433139232768
author Dunn, Tim
Cosgun, Erdal
author_facet Dunn, Tim
Cosgun, Erdal
author_sort Dunn, Tim
collection PubMed
description MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR(®) data, from initial data to tertiary analysis. The electronic health records are stored in FHIR(®) (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. AVAILABILITY AND IMPLEMENTATION: https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.
format Online
Article
Text
id pubmed-9872570
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98725702023-01-31 A cloud-based pipeline for analysis of FHIR and long-read data Dunn, Tim Cosgun, Erdal Bioinform Adv Original Paper MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR(®) data, from initial data to tertiary analysis. The electronic health records are stored in FHIR(®) (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. AVAILABILITY AND IMPLEMENTATION: https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. Oxford University Press 2023-01-20 /pmc/articles/PMC9872570/ /pubmed/36726729 http://dx.doi.org/10.1093/bioadv/vbac095 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Dunn, Tim
Cosgun, Erdal
A cloud-based pipeline for analysis of FHIR and long-read data
title A cloud-based pipeline for analysis of FHIR and long-read data
title_full A cloud-based pipeline for analysis of FHIR and long-read data
title_fullStr A cloud-based pipeline for analysis of FHIR and long-read data
title_full_unstemmed A cloud-based pipeline for analysis of FHIR and long-read data
title_short A cloud-based pipeline for analysis of FHIR and long-read data
title_sort cloud-based pipeline for analysis of fhir and long-read data
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872570/
https://www.ncbi.nlm.nih.gov/pubmed/36726729
http://dx.doi.org/10.1093/bioadv/vbac095
work_keys_str_mv AT dunntim acloudbasedpipelineforanalysisoffhirandlongreaddata
AT cosgunerdal acloudbasedpipelineforanalysisoffhirandlongreaddata
AT dunntim cloudbasedpipelineforanalysisoffhirandlongreaddata
AT cosgunerdal cloudbasedpipelineforanalysisoffhirandlongreaddata