Cargando…
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data
BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051166/ https://www.ncbi.nlm.nih.gov/pubmed/24708189 http://dx.doi.org/10.1186/1471-2164-15-264 |
_version_ | 1782320070606716928 |
---|---|
author | Caboche, Ségolène Audebert, Christophe Lemoine, Yves Hot, David |
author_facet | Caboche, Ségolène Audebert, Christophe Lemoine, Yves Hot, David |
author_sort | Caboche, Ségolène |
collection | PubMed |
description | BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. |
format | Online Article Text |
id | pubmed-4051166 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40511662014-06-17 Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data Caboche, Ségolène Audebert, Christophe Lemoine, Yves Hot, David BMC Genomics Research Article BACKGROUND: The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. RESULTS: In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. CONCLUSIONS: A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. BioMed Central 2014-04-05 /pmc/articles/PMC4051166/ /pubmed/24708189 http://dx.doi.org/10.1186/1471-2164-15-264 Text en Copyright © 2014 Caboche et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Caboche, Ségolène Audebert, Christophe Lemoine, Yves Hot, David Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title | Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title_full | Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title_fullStr | Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title_full_unstemmed | Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title_short | Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data |
title_sort | comparison of mapping algorithms used in high-throughput sequencing: application to ion torrent data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4051166/ https://www.ncbi.nlm.nih.gov/pubmed/24708189 http://dx.doi.org/10.1186/1471-2164-15-264 |
work_keys_str_mv | AT cabochesegolene comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata AT audebertchristophe comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata AT lemoineyves comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata AT hotdavid comparisonofmappingalgorithmsusedinhighthroughputsequencingapplicationtoiontorrentdata |