Cargando…

ATAV: a comprehensive platform for population-scale genomic analyses

BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibiti...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Zhong, Povysil, Gundula, Hostyk, Joseph A., Cui, Hongzhu, Bhardwaj, Nitin, Goldstein, David B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7988908/
https://www.ncbi.nlm.nih.gov/pubmed/33757430
http://dx.doi.org/10.1186/s12859-021-04071-1
_version_ 1783668858937147392
author Ren, Zhong
Povysil, Gundula
Hostyk, Joseph A.
Cui, Hongzhu
Bhardwaj, Nitin
Goldstein, David B.
author_facet Ren, Zhong
Povysil, Gundula
Hostyk, Joseph A.
Cui, Hongzhu
Bhardwaj, Nitin
Goldstein, David B.
author_sort Ren, Zhong
collection PubMed
description BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. RESULTS: We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser (http://atavdb.org/). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. CONCLUSIONS: Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.
format Online
Article
Text
id pubmed-7988908
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-79889082021-03-25 ATAV: a comprehensive platform for population-scale genomic analyses Ren, Zhong Povysil, Gundula Hostyk, Joseph A. Cui, Hongzhu Bhardwaj, Nitin Goldstein, David B. BMC Bioinformatics Software BACKGROUND: A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. RESULTS: We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser (http://atavdb.org/). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. CONCLUSIONS: Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface. BioMed Central 2021-03-23 /pmc/articles/PMC7988908/ /pubmed/33757430 http://dx.doi.org/10.1186/s12859-021-04071-1 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Ren, Zhong
Povysil, Gundula
Hostyk, Joseph A.
Cui, Hongzhu
Bhardwaj, Nitin
Goldstein, David B.
ATAV: a comprehensive platform for population-scale genomic analyses
title ATAV: a comprehensive platform for population-scale genomic analyses
title_full ATAV: a comprehensive platform for population-scale genomic analyses
title_fullStr ATAV: a comprehensive platform for population-scale genomic analyses
title_full_unstemmed ATAV: a comprehensive platform for population-scale genomic analyses
title_short ATAV: a comprehensive platform for population-scale genomic analyses
title_sort atav: a comprehensive platform for population-scale genomic analyses
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7988908/
https://www.ncbi.nlm.nih.gov/pubmed/33757430
http://dx.doi.org/10.1186/s12859-021-04071-1
work_keys_str_mv AT renzhong atavacomprehensiveplatformforpopulationscalegenomicanalyses
AT povysilgundula atavacomprehensiveplatformforpopulationscalegenomicanalyses
AT hostykjosepha atavacomprehensiveplatformforpopulationscalegenomicanalyses
AT cuihongzhu atavacomprehensiveplatformforpopulationscalegenomicanalyses
AT bhardwajnitin atavacomprehensiveplatformforpopulationscalegenomicanalyses
AT goldsteindavidb atavacomprehensiveplatformforpopulationscalegenomicanalyses