Cargando…
A cloud-based pipeline for analysis of FHIR and long-read data
MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical F...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9872570/ https://www.ncbi.nlm.nih.gov/pubmed/36726729 http://dx.doi.org/10.1093/bioadv/vbac095 |
Sumario: | MOTIVATION: As genome sequencing becomes cheaper and more accurate, it is becoming increasingly viable to merge this data with electronic health information to inform clinical decisions. RESULTS: In this work, we demonstrate a full pipeline for working with both PacBio sequencing data and clinical FHIR(®) data, from initial data to tertiary analysis. The electronic health records are stored in FHIR(®) (Fast Healthcare Interoperability Resource) format, the current leading standard for healthcare data exchange. For the genomic data, we perform variant calling on long-read PacBio HiFi data using Cromwell on Azure. Both data formats are parsed, processed and merged in a single scalable pipeline which securely performs tertiary analyses using cloud-based Jupyter notebooks. We include three example applications: exporting patient information to a database, clustering patients and performing a simple pharmacogenomic study. AVAILABILITY AND IMPLEMENTATION: https://github.com/microsoft/genomicsnotebook/tree/main/fhirgenomics SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Advances online. |
---|