Cargando…

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutio...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Zhengqiao, Woloszynek, Stephen, Agbavor, Felix, Mell, Joshua Chang, Sokhansanj, Bahrad A., Rosen, Gail L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496832/ https://www.ncbi.nlm.nih.gov/pubmed/34550967 http://dx.doi.org/10.1371/journal.pcbi.1009345

_version_	1784579834011189248
author	Zhao, Zhengqiao Woloszynek, Stephen Agbavor, Felix Mell, Joshua Chang Sokhansanj, Bahrad A. Rosen, Gail L.
author_facet	Zhao, Zhengqiao Woloszynek, Stephen Agbavor, Felix Mell, Joshua Chang Sokhansanj, Bahrad A. Rosen, Gail L.
author_sort	Zhao, Zhengqiao
collection	PubMed
description	Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
format	Online Article Text
id	pubmed-8496832
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-84968322021-10-08 Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network Zhao, Zhengqiao Woloszynek, Stephen Agbavor, Felix Mell, Joshua Chang Sokhansanj, Bahrad A. Rosen, Gail L. PLoS Comput Biol Research Article Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool). Public Library of Science 2021-09-22 /pmc/articles/PMC8496832/ /pubmed/34550967 http://dx.doi.org/10.1371/journal.pcbi.1009345 Text en © 2021 Zhao et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Zhao, Zhengqiao Woloszynek, Stephen Agbavor, Felix Mell, Joshua Chang Sokhansanj, Bahrad A. Rosen, Gail L. Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title	Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title_full	Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title_fullStr	Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title_full_unstemmed	Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title_short	Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
title_sort	learning, visualizing and exploring 16s rrna structure using an attention-based deep neural network
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8496832/ https://www.ncbi.nlm.nih.gov/pubmed/34550967 http://dx.doi.org/10.1371/journal.pcbi.1009345
work_keys_str_mv	AT zhaozhengqiao learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork AT woloszynekstephen learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork AT agbavorfelix learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork AT melljoshuachang learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork AT sokhansanjbahrada learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork AT rosengaill learningvisualizingandexploring16srrnastructureusinganattentionbaseddeepneuralnetwork

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

Ejemplares similares