Cargando…
HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment
BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584555/ https://www.ncbi.nlm.nih.gov/pubmed/28874136 http://dx.doi.org/10.1186/s12864-017-3965-2 |
_version_ | 1783261485432045568 |
---|---|
author | Abante, Jordi Ghaffari, Noushin Johnson, Charles D. Datta, Aniruddha |
author_facet | Abante, Jordi Ghaffari, Noushin Johnson, Charles D. Datta, Aniruddha |
author_sort | Abante, Jordi |
collection | PubMed |
description | BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3965-2) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5584555 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-55845552017-09-06 HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment Abante, Jordi Ghaffari, Noushin Johnson, Charles D. Datta, Aniruddha BMC Genomics Methodology Article BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3965-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-09-05 /pmc/articles/PMC5584555/ /pubmed/28874136 http://dx.doi.org/10.1186/s12864-017-3965-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Abante, Jordi Ghaffari, Noushin Johnson, Charles D. Datta, Aniruddha HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title | HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_full | HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_fullStr | HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_full_unstemmed | HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_short | HiMMe: using genetic patterns as a proxy for genome assembly reliability assessment |
title_sort | himme: using genetic patterns as a proxy for genome assembly reliability assessment |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5584555/ https://www.ncbi.nlm.nih.gov/pubmed/28874136 http://dx.doi.org/10.1186/s12864-017-3965-2 |
work_keys_str_mv | AT abantejordi himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT ghaffarinoushin himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT johnsoncharlesd himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment AT dattaaniruddha himmeusinggeneticpatternsasaproxyforgenomeassemblyreliabilityassessment |