Cargando…

Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies...

Descripción completa

Detalles Bibliográficos
Autores principales: Rieber, Nora, Zapatka, Marc, Lasitschka, Bärbel, Jones, David, Northcott, Paul, Hutter, Barbara, Jäger, Natalie, Kool, Marcel, Taylor, Michael, Lichter, Peter, Pfister, Stefan, Wolf, Stephan, Brors, Benedikt, Eils, Roland
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679043/
https://www.ncbi.nlm.nih.gov/pubmed/23776689
http://dx.doi.org/10.1371/journal.pone.0066621
_version_ 1782272942076329984
author Rieber, Nora
Zapatka, Marc
Lasitschka, Bärbel
Jones, David
Northcott, Paul
Hutter, Barbara
Jäger, Natalie
Kool, Marcel
Taylor, Michael
Lichter, Peter
Pfister, Stefan
Wolf, Stephan
Brors, Benedikt
Eils, Roland
author_facet Rieber, Nora
Zapatka, Marc
Lasitschka, Bärbel
Jones, David
Northcott, Paul
Hutter, Barbara
Jäger, Natalie
Kool, Marcel
Taylor, Michael
Lichter, Peter
Pfister, Stefan
Wolf, Stephan
Brors, Benedikt
Eils, Roland
author_sort Rieber, Nora
collection PubMed
description The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes.
format Online
Article
Text
id pubmed-3679043
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-36790432013-06-17 Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies Rieber, Nora Zapatka, Marc Lasitschka, Bärbel Jones, David Northcott, Paul Hutter, Barbara Jäger, Natalie Kool, Marcel Taylor, Michael Lichter, Peter Pfister, Stefan Wolf, Stephan Brors, Benedikt Eils, Roland PLoS One Research Article The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. Public Library of Science 2013-06-11 /pmc/articles/PMC3679043/ /pubmed/23776689 http://dx.doi.org/10.1371/journal.pone.0066621 Text en © 2013 Rieber et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Rieber, Nora
Zapatka, Marc
Lasitschka, Bärbel
Jones, David
Northcott, Paul
Hutter, Barbara
Jäger, Natalie
Kool, Marcel
Taylor, Michael
Lichter, Peter
Pfister, Stefan
Wolf, Stephan
Brors, Benedikt
Eils, Roland
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title_full Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title_fullStr Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title_full_unstemmed Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title_short Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
title_sort coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679043/
https://www.ncbi.nlm.nih.gov/pubmed/23776689
http://dx.doi.org/10.1371/journal.pone.0066621
work_keys_str_mv AT riebernora coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT zapatkamarc coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT lasitschkabarbel coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT jonesdavid coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT northcottpaul coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT hutterbarbara coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT jagernatalie coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT koolmarcel coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT taylormichael coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT lichterpeter coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT pfisterstefan coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT wolfstephan coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT brorsbenedikt coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies
AT eilsroland coveragebiasandsensitivityofvariantcallingforfourwholegenomesequencingtechnologies