Cargando…
Investigation of machine learning algorithms for taxonomic classification of marine metagenomes
Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to under...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society for Microbiology
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580933/ https://www.ncbi.nlm.nih.gov/pubmed/37695074 http://dx.doi.org/10.1128/spectrum.05237-22 |
_version_ | 1785122042788446208 |
---|---|
author | Park, Helen Lim, Shen Jean Cosme, Jonathan O'Connell, Kyle Sandeep, Jilla Gayanilo, Felimon Cutter Jr., George R. Montes, Enrique Nitikitpaiboon, Chotinan Fisher, Sam Moustahfid, Hassan Thompson, Luke R. |
author_facet | Park, Helen Lim, Shen Jean Cosme, Jonathan O'Connell, Kyle Sandeep, Jilla Gayanilo, Felimon Cutter Jr., George R. Montes, Enrique Nitikitpaiboon, Chotinan Fisher, Sam Moustahfid, Hassan Thompson, Luke R. |
author_sort | Park, Helen |
collection | PubMed |
description | Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools—DeepMicrobes and a novel Residual Network architecture—that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE: Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement. |
format | Online Article Text |
id | pubmed-10580933 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Society for Microbiology |
record_format | MEDLINE/PubMed |
spelling | pubmed-105809332023-10-18 Investigation of machine learning algorithms for taxonomic classification of marine metagenomes Park, Helen Lim, Shen Jean Cosme, Jonathan O'Connell, Kyle Sandeep, Jilla Gayanilo, Felimon Cutter Jr., George R. Montes, Enrique Nitikitpaiboon, Chotinan Fisher, Sam Moustahfid, Hassan Thompson, Luke R. Microbiol Spectr Research Article Microbial communities play key roles in ocean ecosystems through regulation of biogeochemical processes such as carbon and nutrient cycling, food web dynamics, and gut microbiomes of invertebrates, fish, reptiles, and mammals. Assessments of marine microbial diversity are therefore critical to understanding spatiotemporal variations in microbial community structure and function in ocean ecosystems. With recent advances in DNA shotgun sequencing for metagenome samples and computational analysis, it is now possible to access the taxonomic and genomic content of ocean microbial communities to study their structural patterns, diversity, and functional potential. However, existing taxonomic classification tools depend upon manually curated phylogenetic trees, which can create inaccuracies in metagenomes from less well-characterized communities, such as from ocean water. Herein, we explore the utility of deep learning tools—DeepMicrobes and a novel Residual Network architecture—that leverage natural language processing and convolutional neural network architectures to map input sequence data (k-mers) to output labels (taxonomic groups) without reliance on a curated taxonomic tree. We trained both models using metagenomic reads simulated from marine microbial genomes in the MarRef database. The performance of both models (accuracy, precision, and percent microbe predicted) was compared with the standard taxonomic classification tool Kraken2 using 10 complex metagenomic data sets simulated from MarRef. Our results demonstrate that time, compute power, and microbial genomic diversity still pose challenges for machine learning (ML). Moreover, our results suggest that high genome coverage and rectification of class imbalance are prerequisites for a well-trained model, and therefore should be a major consideration in future ML work. IMPORTANCE: Taxonomic profiling of microbial communities is essential to model microbial interactions and inform habitat conservation. This work develops approaches in constructing training/testing data sets from publicly available marine metagenomes and evaluates the performance of machine learning (ML) approaches in read-based taxonomic classification of marine metagenomes. Predictions from two models are used to test accuracy in metagenomic classification and to guide improvements in ML approaches. Our study provides insights on the methods, results, and challenges of deep learning on marine microbial metagenomic data sets. Future machine learning approaches can be improved by rectifying genome coverage and class imbalance in the training data sets, developing alternative models, and increasing the accessibility of computational resources for model training and refinement. American Society for Microbiology 2023-09-11 /pmc/articles/PMC10580933/ /pubmed/37695074 http://dx.doi.org/10.1128/spectrum.05237-22 Text en https://doi.org/10.1128/AuthorWarrantyLicense.v1This is a work of the U.S. Government and is not subject to copyright protection in the United States. Foreign copyrights may apply. |
spellingShingle | Research Article Park, Helen Lim, Shen Jean Cosme, Jonathan O'Connell, Kyle Sandeep, Jilla Gayanilo, Felimon Cutter Jr., George R. Montes, Enrique Nitikitpaiboon, Chotinan Fisher, Sam Moustahfid, Hassan Thompson, Luke R. Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title | Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title_full | Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title_fullStr | Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title_full_unstemmed | Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title_short | Investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
title_sort | investigation of machine learning algorithms for taxonomic classification of marine metagenomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10580933/ https://www.ncbi.nlm.nih.gov/pubmed/37695074 http://dx.doi.org/10.1128/spectrum.05237-22 |
work_keys_str_mv | AT parkhelen investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT limshenjean investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT cosmejonathan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT oconnellkyle investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT sandeepjilla investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT gayanilofelimon investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT cutterjrgeorger investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT montesenrique investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT nitikitpaiboonchotinan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT fishersam investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT moustahfidhassan investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes AT thompsonluker investigationofmachinelearningalgorithmsfortaxonomicclassificationofmarinemetagenomes |