Cargando…

Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data

We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Borgman, Jacob, Stark, Karen, Carson, Jeremy, Hauser, Loren
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580936/
https://www.ncbi.nlm.nih.gov/pubmed/36304316
http://dx.doi.org/10.3389/fbinf.2022.871256
_version_ 1784812504659001344
author Borgman, Jacob
Stark, Karen
Carson, Jeremy
Hauser, Loren
author_facet Borgman, Jacob
Stark, Karen
Carson, Jeremy
Hauser, Loren
author_sort Borgman, Jacob
collection PubMed
description We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification.
format Online
Article
Text
id pubmed-9580936
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95809362022-10-26 Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data Borgman, Jacob Stark, Karen Carson, Jeremy Hauser, Loren Front Bioinform Bioinformatics We present a novel approach for rapidly identifying sequences that leverages the representational power of Deep Learning techniques and is applied to the analysis of microbiome data. The method involves the creation of a latent sequence space, training a convolutional neural network to rapidly identify sequences by mapping them into that space, and we leverage the novel encoded latent space for denoising to correct sequencing errors. Using mock bacterial communities of known composition, we show that this approach achieves single nucleotide resolution, generating results for sequence identification and abundance estimation that match the best available microbiome algorithms in terms of accuracy while vastly increasing the speed of accurate processing. We further show the ability of this approach to support phenotypic prediction at the sample level on an experimental data set for which the ground truth for sequence identities and abundances is unknown, but the expected phenotypes of the samples are definitive. Moreover, this approach offers a potential solution for the analysis of data from other types of experiments that currently rely on computationally intensive sequence identification. Frontiers Media S.A. 2022-06-24 /pmc/articles/PMC9580936/ /pubmed/36304316 http://dx.doi.org/10.3389/fbinf.2022.871256 Text en Copyright © 2022 Borgman, Stark, Carson and Hauser. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Borgman, Jacob
Stark, Karen
Carson, Jeremy
Hauser, Loren
Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title_full Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title_fullStr Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title_full_unstemmed Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title_short Deep Learning Encoding for Rapid Sequence Identification on Microbiome Data
title_sort deep learning encoding for rapid sequence identification on microbiome data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580936/
https://www.ncbi.nlm.nih.gov/pubmed/36304316
http://dx.doi.org/10.3389/fbinf.2022.871256
work_keys_str_mv AT borgmanjacob deeplearningencodingforrapidsequenceidentificationonmicrobiomedata
AT starkkaren deeplearningencodingforrapidsequenceidentificationonmicrobiomedata
AT carsonjeremy deeplearningencodingforrapidsequenceidentificationonmicrobiomedata
AT hauserloren deeplearningencodingforrapidsequenceidentificationonmicrobiomedata