Cargando…

A cloud-based pipeline for analysis of FHIR and long-read data

MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical F...

Descripción completa

Detalles Bibliográficos
Autores principales: Dunn, Tim, Cosgun, Erdal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872570/
https://www.ncbi.nlm.nih.gov/pubmed/36726729
http://dx.doi.org/10.1093/bioadv/vbac095
Descripción
Sumario:MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR(®) data, from initial data to tertiary analysis. The electronic health records are stored in FHIR(®) (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. AVAILABILITY AND IMPLEMENTATION: https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online.