Cargando…
Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier
Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbio...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093763/ https://www.ncbi.nlm.nih.gov/pubmed/33959149 http://dx.doi.org/10.3389/fgene.2021.642282 |
_version_ | 1783687882840473600 |
---|---|
author | Anyaso-Samuel, Samuel Sachdeva, Archie Guha, Subharup Datta, Somnath |
author_facet | Anyaso-Samuel, Samuel Sachdeva, Archie Guha, Subharup Datta, Somnath |
author_sort | Anyaso-Samuel, Samuel |
collection | PubMed |
description | Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier. |
format | Online Article Text |
id | pubmed-8093763 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80937632021-05-05 Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier Anyaso-Samuel, Samuel Sachdeva, Archie Guha, Subharup Datta, Somnath Front Genet Genetics Microbiome samples harvested from urban environments can be informative in predicting the geographic location of unknown samples. The idea that different cities may have geographically disparate microbial signatures can be utilized to predict the geographical location based on city-specific microbiome samples. We implemented this idea first; by utilizing standard bioinformatics procedures to pre-process the raw metagenomics samples provided by the CAMDA organizers. We trained several component classifiers and a robust ensemble classifier with data generated from taxonomy-dependent and taxonomy-free approaches. Also, we implemented class weighting and an optimal oversampling technique to overcome the class imbalance in the primary data. In each instance, we observed that the component classifiers performed differently, whereas the ensemble classifier consistently yielded optimal performance. Finally, we predicted the source cities of mystery samples provided by the organizers. Our results highlight the unreliability of restricting the classification of metagenomic samples to source origins to a single classification algorithm. By combining several component classifiers via the ensemble approach, we obtained classification results that were as good as the best-performing component classifier. Frontiers Media S.A. 2021-04-20 /pmc/articles/PMC8093763/ /pubmed/33959149 http://dx.doi.org/10.3389/fgene.2021.642282 Text en Copyright © 2021 Anyaso-Samuel, Sachdeva, Guha and Datta. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Anyaso-Samuel, Samuel Sachdeva, Archie Guha, Subharup Datta, Somnath Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_full | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_fullStr | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_full_unstemmed | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_short | Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier |
title_sort | metagenomic geolocation prediction using an adaptive ensemble classifier |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093763/ https://www.ncbi.nlm.nih.gov/pubmed/33959149 http://dx.doi.org/10.3389/fgene.2021.642282 |
work_keys_str_mv | AT anyasosamuelsamuel metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT sachdevaarchie metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT guhasubharup metagenomicgeolocationpredictionusinganadaptiveensembleclassifier AT dattasomnath metagenomicgeolocationpredictionusinganadaptiveensembleclassifier |