Cargando…

HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment

BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes...

Descripción completa

Detalles Bibliográficos
Autores principales: Abante, Jordi, Ghaffari, Noushin, Johnson, Charles D., Datta, Aniruddha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584555/
https://www.ncbi.nlm.nih.gov/pubmed/28874136
http://dx.doi.org/10.1186/s12864-017-3965-2
_version_ 1783261485432045568
author Abante, Jordi
Ghaffari, Noushin
Johnson, Charles D.
Datta, Aniruddha
author_facet Abante, Jordi
Ghaffari, Noushin
Johnson, Charles D.
Datta, Aniruddha
author_sort Abante, Jordi
collection PubMed
description BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3965-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5584555
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55845552017-09-06 HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment Abante, Jordi Ghaffari, Noushin Johnson, Charles D. Datta, Aniruddha BMC Genomics Methodology Article BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3965-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-05 /pmc/articles/PMC5584555/ /pubmed/28874136 http://dx.doi.org/10.1186/s12864-017-3965-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Abante, Jordi
Ghaffari, Noushin
Johnson, Charles D.
Datta, Aniruddha
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_full HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_fullStr HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_full_unstemmed HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_short HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
title_sort himme: using genetic patterns as a proxy for genome assembly reliability assessment
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584555/
https://www.ncbi.nlm.nih.gov/pubmed/28874136
http://dx.doi.org/10.1186/s12864-017-3965-2
work_keys_str_mv AT abantejordi himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT ghaffarinoushin himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT johnsoncharlesd himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment
AT dattaaniruddha himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment