Cargando…
Marker genes as predictors of shared genomic function
BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profi...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449922/ https://www.ncbi.nlm.nih.gov/pubmed/30947688 http://dx.doi.org/10.1186/s12864-019-5641-1 |
_version_ | 1783408949620375552 |
---|---|
author | Sevigny, Joseph L. Rothenheber, Derek Diaz, Krystalle Sharlyn Zhang, Ying Agustsson, Kristin Bergeron, R. Daniel Thomas, W. Kelley |
author_facet | Sevigny, Joseph L. Rothenheber, Derek Diaz, Krystalle Sharlyn Zhang, Ying Agustsson, Kristin Bergeron, R. Daniel Thomas, W. Kelley |
author_sort | Sevigny, Joseph L. |
collection | PubMed |
description | BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profiles of microbial communities based on phylogenetic identification inferred from marker genes, and recent tools have been developed to link the two. However, to date, no large-scale examination has quantified the correlation between the marker gene based taxonomic identity and protein coding gene conservation. Here we utilize 4872 representative prokaryotic genomes from NCBI to investigate the relationship between marker gene identity and shared protein coding gene content. RESULTS: Even at 99–100% marker gene identity, genomes share on average less than 75% of their protein coding gene content. This occurs regardless of the marker gene(s) used: V4 region of the 16S rRNA, complete 16S rRNA, or single copy orthologs through a multi-locus sequence analysis. An important aspect related to this observation is the intra-organism variation of 16S copies from a single genome. Although the majority of 16S copies were found to have high sequence similarity (> 99%), several genomes contained copies that were highly diverged (< 97% identity). CONCLUSIONS: This is the largest comparison between marker gene similarity and shared protein coding gene content to date. The study highlights the limitations of inferring a microbial community’s functions based on marker gene phylogeny. The data presented expands upon the results of previous studies that examined one or few bacterial species and supports the hypothesis that 16S rRNA and other marker genes cannot be directly used to fully predict the functional potential of a bacterial community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5641-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6449922 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64499222019-04-15 Marker genes as predictors of shared genomic function Sevigny, Joseph L. Rothenheber, Derek Diaz, Krystalle Sharlyn Zhang, Ying Agustsson, Kristin Bergeron, R. Daniel Thomas, W. Kelley BMC Genomics Research Article BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profiles of microbial communities based on phylogenetic identification inferred from marker genes, and recent tools have been developed to link the two. However, to date, no large-scale examination has quantified the correlation between the marker gene based taxonomic identity and protein coding gene conservation. Here we utilize 4872 representative prokaryotic genomes from NCBI to investigate the relationship between marker gene identity and shared protein coding gene content. RESULTS: Even at 99–100% marker gene identity, genomes share on average less than 75% of their protein coding gene content. This occurs regardless of the marker gene(s) used: V4 region of the 16S rRNA, complete 16S rRNA, or single copy orthologs through a multi-locus sequence analysis. An important aspect related to this observation is the intra-organism variation of 16S copies from a single genome. Although the majority of 16S copies were found to have high sequence similarity (> 99%), several genomes contained copies that were highly diverged (< 97% identity). CONCLUSIONS: This is the largest comparison between marker gene similarity and shared protein coding gene content to date. The study highlights the limitations of inferring a microbial community’s functions based on marker gene phylogeny. The data presented expands upon the results of previous studies that examined one or few bacterial species and supports the hypothesis that 16S rRNA and other marker genes cannot be directly used to fully predict the functional potential of a bacterial community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5641-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-04 /pmc/articles/PMC6449922/ /pubmed/30947688 http://dx.doi.org/10.1186/s12864-019-5641-1 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Sevigny, Joseph L. Rothenheber, Derek Diaz, Krystalle Sharlyn Zhang, Ying Agustsson, Kristin Bergeron, R. Daniel Thomas, W. Kelley Marker genes as predictors of shared genomic function |
title | Marker genes as predictors of shared genomic function |
title_full | Marker genes as predictors of shared genomic function |
title_fullStr | Marker genes as predictors of shared genomic function |
title_full_unstemmed | Marker genes as predictors of shared genomic function |
title_short | Marker genes as predictors of shared genomic function |
title_sort | marker genes as predictors of shared genomic function |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449922/ https://www.ncbi.nlm.nih.gov/pubmed/30947688 http://dx.doi.org/10.1186/s12864-019-5641-1 |
work_keys_str_mv | AT sevignyjosephl markergenesaspredictorsofsharedgenomicfunction AT rothenheberderek markergenesaspredictorsofsharedgenomicfunction AT diazkrystallesharlyn markergenesaspredictorsofsharedgenomicfunction AT zhangying markergenesaspredictorsofsharedgenomicfunction AT agustssonkristin markergenesaspredictorsofsharedgenomicfunction AT bergeronrdaniel markergenesaspredictorsofsharedgenomicfunction AT thomaswkelley markergenesaspredictorsofsharedgenomicfunction |