Cargando…

Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies

MOTIVATION: Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw c...

Descripción completa

Detalles Bibliográficos
Autores principales: Standish, Kristopher A., Carland, Tristan M., Lockwood, Glenn K., Pfeiffer, Wayne, Tatineni, Mahidhar, Huang, C Chris, Lamberth, Sarah, Cherkas, Yauheniya, Brodmerkel, Carrie, Jaeger, Ed, Smith, Lance, Rajagopal, Gunaretnam, Curran, Mark E., Schork, Nicholas J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4580299/
https://www.ncbi.nlm.nih.gov/pubmed/26395405
http://dx.doi.org/10.1186/s12859-015-0736-4
_version_ 1782391373712850944
author Standish, Kristopher A.
Carland, Tristan M.
Lockwood, Glenn K.
Pfeiffer, Wayne
Tatineni, Mahidhar
Huang, C Chris
Lamberth, Sarah
Cherkas, Yauheniya
Brodmerkel, Carrie
Jaeger, Ed
Smith, Lance
Rajagopal, Gunaretnam
Curran, Mark E.
Schork, Nicholas J.
author_facet Standish, Kristopher A.
Carland, Tristan M.
Lockwood, Glenn K.
Pfeiffer, Wayne
Tatineni, Mahidhar
Huang, C Chris
Lamberth, Sarah
Cherkas, Yauheniya
Brodmerkel, Carrie
Jaeger, Ed
Smith, Lance
Rajagopal, Gunaretnam
Curran, Mark E.
Schork, Nicholas J.
author_sort Standish, Kristopher A.
collection PubMed
description MOTIVATION: Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. RESULTS: We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. CONCLUSIONS: We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging ‘big data’ problems in biomedical research brought on by the expansion of NGS technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0736-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4580299
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45802992015-09-24 Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies Standish, Kristopher A. Carland, Tristan M. Lockwood, Glenn K. Pfeiffer, Wayne Tatineni, Mahidhar Huang, C Chris Lamberth, Sarah Cherkas, Yauheniya Brodmerkel, Carrie Jaeger, Ed Smith, Lance Rajagopal, Gunaretnam Curran, Mark E. Schork, Nicholas J. BMC Bioinformatics Methodology Article MOTIVATION: Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. RESULTS: We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. CONCLUSIONS: We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging ‘big data’ problems in biomedical research brought on by the expansion of NGS technologies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0736-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-09-22 /pmc/articles/PMC4580299/ /pubmed/26395405 http://dx.doi.org/10.1186/s12859-015-0736-4 Text en © Standish et al. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Standish, Kristopher A.
Carland, Tristan M.
Lockwood, Glenn K.
Pfeiffer, Wayne
Tatineni, Mahidhar
Huang, C Chris
Lamberth, Sarah
Cherkas, Yauheniya
Brodmerkel, Carrie
Jaeger, Ed
Smith, Lance
Rajagopal, Gunaretnam
Curran, Mark E.
Schork, Nicholas J.
Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title_full Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title_fullStr Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title_full_unstemmed Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title_short Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
title_sort group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4580299/
https://www.ncbi.nlm.nih.gov/pubmed/26395405
http://dx.doi.org/10.1186/s12859-015-0736-4
work_keys_str_mv AT standishkristophera groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT carlandtristanm groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT lockwoodglennk groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT pfeifferwayne groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT tatinenimahidhar groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT huangcchris groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT lamberthsarah groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT cherkasyauheniya groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT brodmerkelcarrie groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT jaegered groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT smithlance groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT rajagopalgunaretnam groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT curranmarke groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies
AT schorknicholasj groupbasedvariantcallingleveragingnextgenerationsupercomputingforlargescalewholegenomesequencingstudies