Cargando…
Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly ident...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580936/ https://www.ncbi.nlm.nih.gov/pubmed/36304316 http://dx.doi.org/10.3389/fbinf.2022.871256 |
_version_ | 1784812504659001344 |
---|---|
author | Borgman, Jacob Stark, Karen Carson, Jeremy Hauser, Loren |
author_facet | Borgman, Jacob Stark, Karen Carson, Jeremy Hauser, Loren |
author_sort | Borgman, Jacob |
collection | PubMed |
description | We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification. |
format | Online Article Text |
id | pubmed-9580936 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95809362022-10-26 Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data Borgman, Jacob Stark, Karen Carson, Jeremy Hauser, Loren Front Bioinform Bioinformatics We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification. Frontiers Media S.A. 2022-06-24 /pmc/articles/PMC9580936/ /pubmed/36304316 http://dx.doi.org/10.3389/fbinf.2022.871256 Text en Copyright © 2022 Borgman, Stark, Carson and Hauser. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Bioinformatics Borgman, Jacob Stark, Karen Carson, Jeremy Hauser, Loren Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title | Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title_full | Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title_fullStr | Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title_full_unstemmed | Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title_short | Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data |
title_sort | deep learning encoding for rapid sequence identification on microbiome data |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580936/ https://www.ncbi.nlm.nih.gov/pubmed/36304316 http://dx.doi.org/10.3389/fbinf.2022.871256 |
work_keys_str_mv | AT borgmanjacob deeplearningencodingforrapidsequenceidentificationonmicrobiomedata AT starkkaren deeplearningencodingforrapidsequenceidentificationonmicrobiomedata AT carsonjeremy deeplearningencodingforrapidsequenceidentificationonmicrobiomedata AT hauserloren deeplearningencodingforrapidsequenceidentificationonmicrobiomedata |