Cargando…

Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads

BACKGROUND: Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial....

Descripción completa

Detalles Bibliográficos
Autores principales: Poulsen, Thomas M., Frith, Martin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5469136/
https://www.ncbi.nlm.nih.gov/pubmed/28606054
http://dx.doi.org/10.1186/s12859-017-1710-0
_version_ 1783243530669391872
author Poulsen, Thomas M.
Frith, Martin
author_facet Poulsen, Thomas M.
Frith, Martin
author_sort Poulsen, Thomas M.
collection PubMed
description BACKGROUND: Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment. RESULTS: Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with. CONCLUSIONS: Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1710-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5469136
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54691362017-06-14 Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads Poulsen, Thomas M. Frith, Martin BMC Bioinformatics Software BACKGROUND: Genome sequencing provides a powerful tool for pathogen detection and can help resolve outbreaks that pose public safety and health risks. Mapping of DNA reads to genomes plays a fundamental role in this approach, where accurate alignment and classification of sequencing data is crucial. Standard mapping methods crudely treat bases as independent from their neighbors. Accuracy might be improved by using higher order paired hidden Markov models (HMMs), which model neighbor effects, but introduce design and implementation issues that have typically made them impractical for read mapping applications. We present a variable-order paired HMM that we term VarHMM, which addresses central issues involved with higher order modeling for sequence alignment. RESULTS: Compared with existing alignment methods, VarHMM is able to model higher order distributions and quantify alignment probabilities with greater detail and accuracy. In a series of comparison tests, in which Ion Torrent sequenced DNA was mapped to similar bacterial strains, VarHMM consistently provided better strain discrimination than any of the other alignment methods that we compared with. CONCLUSIONS: Our results demonstrate the advantages of higher ordered probability distribution modeling and also suggest that further development of such models would benefit read mapping in a range of other applications as well. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1710-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-12 /pmc/articles/PMC5469136/ /pubmed/28606054 http://dx.doi.org/10.1186/s12859-017-1710-0 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Poulsen, Thomas M.
Frith, Martin
Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_full Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_fullStr Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_full_unstemmed Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_short Variable-order sequence modeling improves bacterial strain discrimination for Ion Torrent DNA reads
title_sort variable-order sequence modeling improves bacterial strain discrimination for ion torrent dna reads
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5469136/
https://www.ncbi.nlm.nih.gov/pubmed/28606054
http://dx.doi.org/10.1186/s12859-017-1710-0
work_keys_str_mv AT poulsenthomasm variableordersequencemodelingimprovesbacterialstraindiscriminationforiontorrentdnareads
AT frithmartin variableordersequencemodelingimprovesbacterialstraindiscriminationforiontorrentdnareads