Cargando…
Advancing clinical cohort selection with genomics analysis on a distributed platform
The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into b...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179830/ https://www.ncbi.nlm.nih.gov/pubmed/32324802 http://dx.doi.org/10.1371/journal.pone.0231826 |
_version_ | 1783525708282200064 |
---|---|
author | Smith, Jaclyn M. Lathara, Melvin Wright, Hollis Hill, Brian Ganapati, Nalini Srinivasa, Ganapati Denny, Christopher T. |
author_facet | Smith, Jaclyn M. Lathara, Melvin Wright, Hollis Hill, Brian Ganapati, Nalini Srinivasa, Ganapati Denny, Christopher T. |
author_sort | Smith, Jaclyn M. |
collection | PubMed |
description | The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine. |
format | Online Article Text |
id | pubmed-7179830 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-71798302020-04-29 Advancing clinical cohort selection with genomics analysis on a distributed platform Smith, Jaclyn M. Lathara, Melvin Wright, Hollis Hill, Brian Ganapati, Nalini Srinivasa, Ganapati Denny, Christopher T. PLoS One Research Article The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine. Public Library of Science 2020-04-23 /pmc/articles/PMC7179830/ /pubmed/32324802 http://dx.doi.org/10.1371/journal.pone.0231826 Text en © 2020 Smith et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Smith, Jaclyn M. Lathara, Melvin Wright, Hollis Hill, Brian Ganapati, Nalini Srinivasa, Ganapati Denny, Christopher T. Advancing clinical cohort selection with genomics analysis on a distributed platform |
title | Advancing clinical cohort selection with genomics analysis on a distributed platform |
title_full | Advancing clinical cohort selection with genomics analysis on a distributed platform |
title_fullStr | Advancing clinical cohort selection with genomics analysis on a distributed platform |
title_full_unstemmed | Advancing clinical cohort selection with genomics analysis on a distributed platform |
title_short | Advancing clinical cohort selection with genomics analysis on a distributed platform |
title_sort | advancing clinical cohort selection with genomics analysis on a distributed platform |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7179830/ https://www.ncbi.nlm.nih.gov/pubmed/32324802 http://dx.doi.org/10.1371/journal.pone.0231826 |
work_keys_str_mv | AT smithjaclynm advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT latharamelvin advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT wrighthollis advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT hillbrian advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT ganapatinalini advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT srinivasaganapati advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform AT dennychristophert advancingclinicalcohortselectionwithgenomicsanalysisonadistributedplatform |