Cargando…

Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a...

Descripción completa

Detalles Bibliográficos
Autores principales: Caboche, Ségolène, Audebert, Christophe, Lemoine, Yves, Hot, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051166/
https://www.ncbi.nlm.nih.gov/pubmed/24708189
http://dx.doi.org/10.1186/1471-2164-15-264
_version_ 1782320070606716928
author Caboche, Ségolène
Audebert, Christophe
Lemoine, Yves
Hot, David
author_facet Caboche, Ségolène
Audebert, Christophe
Lemoine, Yves
Hot, David
author_sort Caboche, Ségolène
collection PubMed
description BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform.
format Online
Article
Text
id pubmed-4051166
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40511662014-06-17 Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data Caboche, Ségolène Audebert, Christophe Lemoine, Yves Hot, David BMC Genomics Research Article BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. BioMed Central 2014-04-05 /pmc/articles/PMC4051166/ /pubmed/24708189 http://dx.doi.org/10.1186/1471-2164-15-264 Text en Copyright © 2014 Caboche et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Caboche, Ségolène
Audebert, Christophe
Lemoine, Yves
Hot, David
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title_full Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title_fullStr Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title_full_unstemmed Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title_short Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
title_sort comparison of mapping algorithms used in high-throughput sequencing: application to ion torrent data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051166/
https://www.ncbi.nlm.nih.gov/pubmed/24708189
http://dx.doi.org/10.1186/1471-2164-15-264
work_keys_str_mv AT cabochesegolene comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata
AT audebertchristophe comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata
AT lemoineyves comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata
AT hotdavid comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata