Cargando…

From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the r...

Descripción completa

Detalles Bibliográficos
Autores principales: Laurie, Steve, Fernandez‐Callejo, Marcos, Marco‐Sola, Santiago, Trotta, Jean‐Remi, Camps, Jordi, Chacón, Alejandro, Espinosa, Antonio, Gut, Marta, Gut, Ivo, Heath, Simon, Beltran, Sergi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5129537/
https://www.ncbi.nlm.nih.gov/pubmed/27604516
http://dx.doi.org/10.1002/humu.23114
_version_ 1782470606016479232
author Laurie, Steve
Fernandez‐Callejo, Marcos
Marco‐Sola, Santiago
Trotta, Jean‐Remi
Camps, Jordi
Chacón, Alejandro
Espinosa, Antonio
Gut, Marta
Gut, Ivo
Heath, Simon
Beltran, Sergi
author_facet Laurie, Steve
Fernandez‐Callejo, Marcos
Marco‐Sola, Santiago
Trotta, Jean‐Remi
Camps, Jordi
Chacón, Alejandro
Espinosa, Antonio
Gut, Marta
Gut, Ivo
Heath, Simon
Beltran, Sergi
author_sort Laurie, Steve
collection PubMed
description As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state‐of‐the‐art read aligners (BWA‐MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available.
format Online
Article
Text
id pubmed-5129537
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-51295372016-11-30 From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing Laurie, Steve Fernandez‐Callejo, Marcos Marco‐Sola, Santiago Trotta, Jean‐Remi Camps, Jordi Chacón, Alejandro Espinosa, Antonio Gut, Marta Gut, Ivo Heath, Simon Beltran, Sergi Hum Mutat Special Articles As whole genome sequencing becomes cheaper and faster, it will progressively substitute targeted next‐generation sequencing as standard practice in research and diagnostics. However, computing cost–performance ratio is not advancing at an equivalent rate. Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required. We have benchmarked six combinations of state‐of‐the‐art read aligners (BWA‐MEM and GEM3) and variant callers (FreeBayes, GATK HaplotypeCaller, SAMtools) on whole genome and whole exome sequencing data from the NA12878 human sample. Results have been compared between them and against the NIST Genome in a Bottle (GIAB) variants reference dataset. We report differences in speed of up to 20 times in some steps of the process and have observed that SNV, and to a lesser extent InDel, detection is highly consistent in 70% of the genome. SNV, and especially InDel, detection is less reliable in 20% of the genome, and almost unfeasible in the remaining 10%. These findings will aid in choosing the appropriate tools bearing in mind objectives, workload, and computing infrastructure available. John Wiley and Sons Inc. 2016-09-26 2016-12 /pmc/articles/PMC5129537/ /pubmed/27604516 http://dx.doi.org/10.1002/humu.23114 Text en © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc. This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial (http://creativecommons.org/licenses/by-nc/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
spellingShingle Special Articles
Laurie, Steve
Fernandez‐Callejo, Marcos
Marco‐Sola, Santiago
Trotta, Jean‐Remi
Camps, Jordi
Chacón, Alejandro
Espinosa, Antonio
Gut, Marta
Gut, Ivo
Heath, Simon
Beltran, Sergi
From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title_full From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title_fullStr From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title_full_unstemmed From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title_short From Wet‐Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing
title_sort from wet‐lab to variations: concordance and speed of bioinformatics pipelines for whole genome and whole exome sequencing
topic Special Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5129537/
https://www.ncbi.nlm.nih.gov/pubmed/27604516
http://dx.doi.org/10.1002/humu.23114
work_keys_str_mv AT lauriesteve fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT fernandezcallejomarcos fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT marcosolasantiago fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT trottajeanremi fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT campsjordi fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT chaconalejandro fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT espinosaantonio fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT gutmarta fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT gutivo fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT heathsimon fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing
AT beltransergi fromwetlabtovariationsconcordanceandspeedofbioinformaticspipelinesforwholegenomeandwholeexomesequencing