Cargando…

Investigation of machine learning algorithms for taxonomic classification of marine metagenomes

Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to under...

Descripción completa

Detalles Bibliográficos
Autores principales: Park, Helen, Lim, Shen Jean, Cosme, Jonathan, O'Connell, Kyle, Sandeep, Jilla, Gayanilo, Felimon, Cutter Jr., George R., Montes, Enrique, Nitikitpaiboon, Chotinan, Fisher, Sam, Moustahfid, Hassan, Thompson, Luke R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580933/
https://www.ncbi.nlm.nih.gov/pubmed/37695074
http://dx.doi.org/10.1128/spectrum.05237-22
_version_ 1785122042788446208
author Park, Helen
Lim, Shen Jean
Cosme, Jonathan
O'Connell, Kyle
Sandeep, Jilla
Gayanilo, Felimon
Cutter Jr., George R.
Montes, Enrique
Nitikitpaiboon, Chotinan
Fisher, Sam
Moustahfid, Hassan
Thompson, Luke R.
author_facet Park, Helen
Lim, Shen Jean
Cosme, Jonathan
O'Connell, Kyle
Sandeep, Jilla
Gayanilo, Felimon
Cutter Jr., George R.
Montes, Enrique
Nitikitpaiboon, Chotinan
Fisher, Sam
Moustahfid, Hassan
Thompson, Luke R.
author_sort Park, Helen
collection PubMed
description Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools—DeepMicrobes and a novel Residual Network architecture—that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE: Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement.
format Online
Article
Text
id pubmed-10580933
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-105809332023-10-18 Investigation of machine learning algorithms for taxonomic classification of marine metagenomes Park, Helen Lim, Shen Jean Cosme, Jonathan O'Connell, Kyle Sandeep, Jilla Gayanilo, Felimon Cutter Jr., George R. Montes, Enrique Nitikitpaiboon, Chotinan Fisher, Sam Moustahfid, Hassan Thompson, Luke R. Microbiol Spectr Research Article Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools—DeepMicrobes and a novel Residual Network architecture—that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE: Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement. American Society for Microbiology 2023-09-11 /pmc/articles/PMC10580933/ /pubmed/37695074 http://dx.doi.org/10.1128/spectrum.05237-22 Text en https://doi.org/10.1128/AuthorWarrantyLicense.v1This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply.
spellingShingle Research Article
Park, Helen
Lim, Shen Jean
Cosme, Jonathan
O'Connell, Kyle
Sandeep, Jilla
Gayanilo, Felimon
Cutter Jr., George R.
Montes, Enrique
Nitikitpaiboon, Chotinan
Fisher, Sam
Moustahfid, Hassan
Thompson, Luke R.
Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title_full Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title_fullStr Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title_full_unstemmed Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title_short Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
title_sort investigation of machine learning algorithms for taxonomic classification of marine metagenomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580933/
https://www.ncbi.nlm.nih.gov/pubmed/37695074
http://dx.doi.org/10.1128/spectrum.05237-22
work_keys_str_mv AT parkhelen investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT limshenjean investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT cosmejonathan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT oconnellkyle investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT sandeepjilla investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT gayanilofelimon investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT cutterjrgeorger investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT montesenrique investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT nitikitpaiboonchotinan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT fishersam investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT moustahfidhassan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes
AT thompsonluker investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes