Cargando…

A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in...

Descripción completa

Detalles Bibliográficos
Autores principales: Kouchaki, Samaneh, Tapinos, Avraam, Robertson, David L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377666/
https://www.ncbi.nlm.nih.gov/pubmed/30770850
http://dx.doi.org/10.1038/s41598-018-38197-9
_version_ 1783395785162883072
author Kouchaki, Samaneh
Tapinos, Avraam
Robertson, David L.
author_facet Kouchaki, Samaneh
Tapinos, Avraam
Robertson, David L.
author_sort Kouchaki, Samaneh
collection PubMed
description Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their ‘texture’ compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP.
format Online
Article
Text
id pubmed-6377666
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-63776662019-02-20 A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns Kouchaki, Samaneh Tapinos, Avraam Robertson, David L. Sci Rep Article Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their ‘texture’ compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP. Nature Publishing Group UK 2019-02-15 /pmc/articles/PMC6377666/ /pubmed/30770850 http://dx.doi.org/10.1038/s41598-018-38197-9 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Kouchaki, Samaneh
Tapinos, Avraam
Robertson, David L.
A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_full A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_fullStr A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_full_unstemmed A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_short A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
title_sort signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6377666/
https://www.ncbi.nlm.nih.gov/pubmed/30770850
http://dx.doi.org/10.1038/s41598-018-38197-9
work_keys_str_mv AT kouchakisamaneh asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT tapinosavraam asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT robertsondavidl asignalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT kouchakisamaneh signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT tapinosavraam signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns
AT robertsondavidl signalprocessingmethodforalignmentfreemetagenomicbinningmultiresolutiongenomicbinarypatterns