Cargando…

Alignment-free comparison of metagenomics sequences via approximate string matching

SUMMARY: Quantifying pairwise sequence similarities is a key step in metagenomics studies. Alignment-free methods provide a computationally efficient alternative to alignment-based methods for large-scale sequence analysis. Several neural network-based methods have recently been developed for this p...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Jian, Yang, Le, Li, Lu, Goodison, Steve, Sun, Yijun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645238/
https://www.ncbi.nlm.nih.gov/pubmed/36388153
http://dx.doi.org/10.1093/bioadv/vbac077
_version_ 1784826923808980992
author Chen, Jian
Yang, Le
Li, Lu
Goodison, Steve
Sun, Yijun
author_facet Chen, Jian
Yang, Le
Li, Lu
Goodison, Steve
Sun, Yijun
author_sort Chen, Jian
collection PubMed
description SUMMARY: Quantifying pairwise sequence similarities is a key step in metagenomics studies. Alignment-free methods provide a computationally efficient alternative to alignment-based methods for large-scale sequence analysis. Several neural network-based methods have recently been developed for this purpose. However, existing methods do not perform well on sequences of varying lengths and are sensitive to the presence of insertions and deletions. In this article, we describe the development of a new method, referred to as AsMac that addresses the aforementioned issues. We proposed a novel neural network structure for approximate string matching for the extraction of pertinent information from biological sequences and developed an efficient gradient computation algorithm for training the constructed neural network. We performed a large-scale benchmark study using real-world data that demonstrated the effectiveness and potential utility of the proposed method. AVAILABILITY AND IMPLEMENTATION: The open-source software for the proposed method and trained neural-network models for some commonly used metagenomics marker genes were developed and are freely available at www.acsu.buffalo.edu/~yijunsun/lab/AsMac.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9645238
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-96452382022-11-14 Alignment-free comparison of metagenomics sequences via approximate string matching Chen, Jian Yang, Le Li, Lu Goodison, Steve Sun, Yijun Bioinform Adv Original Paper SUMMARY: Quantifying pairwise sequence similarities is a key step in metagenomics studies. Alignment-free methods provide a computationally efficient alternative to alignment-based methods for large-scale sequence analysis. Several neural network-based methods have recently been developed for this purpose. However, existing methods do not perform well on sequences of varying lengths and are sensitive to the presence of insertions and deletions. In this article, we describe the development of a new method, referred to as AsMac that addresses the aforementioned issues. We proposed a novel neural network structure for approximate string matching for the extraction of pertinent information from biological sequences and developed an efficient gradient computation algorithm for training the constructed neural network. We performed a large-scale benchmark study using real-world data that demonstrated the effectiveness and potential utility of the proposed method. AVAILABILITY AND IMPLEMENTATION: The open-source software for the proposed method and trained neural-network models for some commonly used metagenomics marker genes were developed and are freely available at www.acsu.buffalo.edu/~yijunsun/lab/AsMac.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-10-21 /pmc/articles/PMC9645238/ /pubmed/36388153 http://dx.doi.org/10.1093/bioadv/vbac077 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Chen, Jian
Yang, Le
Li, Lu
Goodison, Steve
Sun, Yijun
Alignment-free comparison of metagenomics sequences via approximate string matching
title Alignment-free comparison of metagenomics sequences via approximate string matching
title_full Alignment-free comparison of metagenomics sequences via approximate string matching
title_fullStr Alignment-free comparison of metagenomics sequences via approximate string matching
title_full_unstemmed Alignment-free comparison of metagenomics sequences via approximate string matching
title_short Alignment-free comparison of metagenomics sequences via approximate string matching
title_sort alignment-free comparison of metagenomics sequences via approximate string matching
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9645238/
https://www.ncbi.nlm.nih.gov/pubmed/36388153
http://dx.doi.org/10.1093/bioadv/vbac077
work_keys_str_mv AT chenjian alignmentfreecomparisonofmetagenomicssequencesviaapproximatestringmatching
AT yangle alignmentfreecomparisonofmetagenomicssequencesviaapproximatestringmatching
AT lilu alignmentfreecomparisonofmetagenomicssequencesviaapproximatestringmatching
AT goodisonsteve alignmentfreecomparisonofmetagenomicssequencesviaapproximatestringmatching
AT sunyijun alignmentfreecomparisonofmetagenomicssequencesviaapproximatestringmatching