Cargando…

Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary s...

Descripción completa

Detalles Bibliográficos
Autores principales: Laehnemann, David, Borkhardt, Arndt, McHardy, Alice Carolyn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719071/
https://www.ncbi.nlm.nih.gov/pubmed/26026159
http://dx.doi.org/10.1093/bib/bbv029
_version_ 1782410877080698880
author Laehnemann, David
Borkhardt, Arndt
McHardy, Alice Carolyn
author_facet Laehnemann, David
Borkhardt, Arndt
McHardy, Alice Carolyn
author_sort Laehnemann, David
collection PubMed
description Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.
format Online
Article
Text
id pubmed-4719071
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47190712016-01-21 Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction Laehnemann, David Borkhardt, Arndt McHardy, Alice Carolyn Brief Bioinform Software Review Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. Oxford University Press 2016-01 2015-05-29 /pmc/articles/PMC4719071/ /pubmed/26026159 http://dx.doi.org/10.1093/bib/bbv029 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Review
Laehnemann, David
Borkhardt, Arndt
McHardy, Alice Carolyn
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title_full Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title_fullStr Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title_full_unstemmed Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title_short Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
title_sort denoising dna deep sequencing data—high-throughput sequencing errors and their correction
topic Software Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4719071/
https://www.ncbi.nlm.nih.gov/pubmed/26026159
http://dx.doi.org/10.1093/bib/bbv029
work_keys_str_mv AT laehnemanndavid denoisingdnadeepsequencingdatahighthroughputsequencingerrorsandtheircorrection
AT borkhardtarndt denoisingdnadeepsequencingdatahighthroughputsequencingerrorsandtheircorrection
AT mchardyalicecarolyn denoisingdnadeepsequencingdatahighthroughputsequencingerrorsandtheircorrection