Cargando…

VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database

BACKGROUND: Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the incr...

Descripción completa

Detalles Bibliográficos
Autores principales: Musacchia, F., Ciolfi, A., Mutarelli, M., Bruselles, A., Castello, R., Pinelli, M., Basu, S., Banfi, S., Casari, G., Tartaglia, M., Nigro, V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291943/
https://www.ncbi.nlm.nih.gov/pubmed/30541431
http://dx.doi.org/10.1186/s12859-018-2532-4
_version_ 1783380311450583040
author Musacchia, F.
Ciolfi, A.
Mutarelli, M.
Bruselles, A.
Castello, R.
Pinelli, M.
Basu, S.
Banfi, S.
Casari, G.
Tartaglia, M.
Nigro, V.
author_facet Musacchia, F.
Ciolfi, A.
Mutarelli, M.
Bruselles, A.
Castello, R.
Pinelli, M.
Basu, S.
Banfi, S.
Casari, G.
Tartaglia, M.
Nigro, V.
author_sort Musacchia, F.
collection PubMed
description BACKGROUND: Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing. RESULTS: Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the “joint analysis” of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h. CONCLUSIONS: We developed VarGenius, a “master” tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2532-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6291943
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62919432018-12-17 VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database Musacchia, F. Ciolfi, A. Mutarelli, M. Bruselles, A. Castello, R. Pinelli, M. Basu, S. Banfi, S. Casari, G. Tartaglia, M. Nigro, V. BMC Bioinformatics Software BACKGROUND: Targeted resequencing has become the most used and cost-effective approach for identifying causative mutations of Mendelian diseases both for diagnostics and research purposes. Due to very rapid technological progress, NGS laboratories are expanding their capabilities to address the increasing number of analyses. Several open source tools are available to build a generic variant calling pipeline, but a tool able to simultaneously execute multiple analyses, organize, and categorize the samples is still missing. RESULTS: Here we describe VarGenius, a Linux based command line software able to execute customizable pipelines for the analysis of multiple targeted resequencing data using parallel computing. VarGenius provides a database to store the output of the analysis (calling quality statistics, variant annotations, internal allelic variant frequencies) and sample information (personal data, genotypes, phenotypes). VarGenius can also perform the “joint analysis” of hundreds of samples with a single command, drastically reducing the time for the configuration and execution of the analysis. VarGenius executes the standard pipeline of the Genome Analysis Tool-Kit (GATK) best practices (GBP) for germinal variant calling, annotates the variants using Annovar, and generates a user-friendly output displaying the results through a web page. VarGenius has been tested on a parallel computing cluster with 52 machines with 120GB of RAM each. Under this configuration, a 50 M whole exome sequencing (WES) analysis for a family was executed in about 7 h (trio or quartet); a joint analysis of 30 WES in about 24 h and the parallel analysis of 34 single samples from a 1 M panel in about 2 h. CONCLUSIONS: We developed VarGenius, a “master” tool that faces the increasing demand of heterogeneous NGS analyses and allows maximum flexibility for downstream analyses. It paves the way to a different kind of analysis, centered on cohorts rather than on singleton. Patient and variant information are stored into the database and any output file can be accessed programmatically. VarGenius can be used for routine analyses by biomedical researchers with basic Linux skills providing additional flexibility for computational biologists to develop their own algorithms for the comparison and analysis of data. The software is freely available at: https://github.com/frankMusacchia/VarGenius ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2532-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-12 /pmc/articles/PMC6291943/ /pubmed/30541431 http://dx.doi.org/10.1186/s12859-018-2532-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Musacchia, F.
Ciolfi, A.
Mutarelli, M.
Bruselles, A.
Castello, R.
Pinelli, M.
Basu, S.
Banfi, S.
Casari, G.
Tartaglia, M.
Nigro, V.
VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title_full VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title_fullStr VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title_full_unstemmed VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title_short VarGenius executes cohort-level DNA-seq variant calling and annotation and allows to manage the resulting data through a PostgreSQL database
title_sort vargenius executes cohort-level dna-seq variant calling and annotation and allows to manage the resulting data through a postgresql database
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6291943/
https://www.ncbi.nlm.nih.gov/pubmed/30541431
http://dx.doi.org/10.1186/s12859-018-2532-4
work_keys_str_mv AT musacchiaf vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT ciolfia vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT mutarellim vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT brusellesa vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT castellor vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT pinellim vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT basus vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT banfis vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT casarig vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT tartagliam vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT nigrov vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase
AT vargeniusexecutescohortleveldnaseqvariantcallingandannotationandallowstomanagetheresultingdatathroughapostgresqldatabase