Cargando…

Advancing clinical cohort selection with genomics analysis on a distributed platform

The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into b...

Descripción completa

Detalles Bibliográficos
Autores principales: Smith, Jaclyn M., Lathara, Melvin, Wright, Hollis, Hill, Brian, Ganapati, Nalini, Srinivasa, Ganapati, Denny, Christopher T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179830/
https://www.ncbi.nlm.nih.gov/pubmed/32324802
http://dx.doi.org/10.1371/journal.pone.0231826
_version_ 1783525708282200064
author Smith, Jaclyn M.
Lathara, Melvin
Wright, Hollis
Hill, Brian
Ganapati, Nalini
Srinivasa, Ganapati
Denny, Christopher T.
author_facet Smith, Jaclyn M.
Lathara, Melvin
Wright, Hollis
Hill, Brian
Ganapati, Nalini
Srinivasa, Ganapati
Denny, Christopher T.
author_sort Smith, Jaclyn M.
collection PubMed
description The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine.
format Online
Article
Text
id pubmed-7179830
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-71798302020-04-29 Advancing clinical cohort selection with genomics analysis on a distributed platform Smith, Jaclyn M. Lathara, Melvin Wright, Hollis Hill, Brian Ganapati, Nalini Srinivasa, Ganapati Denny, Christopher T. PLoS One Research Article The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine. Public Library of Science 2020-04-23 /pmc/articles/PMC7179830/ /pubmed/32324802 http://dx.doi.org/10.1371/journal.pone.0231826 Text en © 2020 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Smith, Jaclyn M.
Lathara, Melvin
Wright, Hollis
Hill, Brian
Ganapati, Nalini
Srinivasa, Ganapati
Denny, Christopher T.
Advancing clinical cohort selection with genomics analysis on a distributed platform
title Advancing clinical cohort selection with genomics analysis on a distributed platform
title_full Advancing clinical cohort selection with genomics analysis on a distributed platform
title_fullStr Advancing clinical cohort selection with genomics analysis on a distributed platform
title_full_unstemmed Advancing clinical cohort selection with genomics analysis on a distributed platform
title_short Advancing clinical cohort selection with genomics analysis on a distributed platform
title_sort advancing clinical cohort selection with genomics analysis on a distributed platform
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179830/
https://www.ncbi.nlm.nih.gov/pubmed/32324802
http://dx.doi.org/10.1371/journal.pone.0231826
work_keys_str_mv AT smithjaclynm advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT latharamelvin advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT wrighthollis advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT hillbrian advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT ganapatinalini advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT srinivasaganapati advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform
AT dennychristophert advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform