Cargando…

Marker genes as predictors of shared genomic function

BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profi...

Descripción completa

Detalles Bibliográficos
Autores principales: Sevigny, Joseph L., Rothenheber, Derek, Diaz, Krystalle Sharlyn, Zhang, Ying, Agustsson, Kristin, Bergeron, R. Daniel, Thomas, W. Kelley
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449922/
https://www.ncbi.nlm.nih.gov/pubmed/30947688
http://dx.doi.org/10.1186/s12864-019-5641-1
_version_ 1783408949620375552
author Sevigny, Joseph L.
Rothenheber, Derek
Diaz, Krystalle Sharlyn
Zhang, Ying
Agustsson, Kristin
Bergeron, R. Daniel
Thomas, W. Kelley
author_facet Sevigny, Joseph L.
Rothenheber, Derek
Diaz, Krystalle Sharlyn
Zhang, Ying
Agustsson, Kristin
Bergeron, R. Daniel
Thomas, W. Kelley
author_sort Sevigny, Joseph L.
collection PubMed
description BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profiles of microbial communities based on phylogenetic identification inferred from marker genes, and recent tools have been developed to link the two. However, to date, no large-scale examination has quantified the correlation between the marker gene based taxonomic identity and protein coding gene conservation. Here we utilize 4872 representative prokaryotic genomes from NCBI to investigate the relationship between marker gene identity and shared protein coding gene content. RESULTS: Even at 99–100% marker gene identity, genomes share on average less than 75% of their protein coding gene content. This occurs regardless of the marker gene(s) used: V4 region of the 16S rRNA, complete 16S rRNA, or single copy orthologs through a multi-locus sequence analysis. An important aspect related to this observation is the intra-organism variation of 16S copies from a single genome. Although the majority of 16S copies were found to have high sequence similarity (> 99%), several genomes contained copies that were highly diverged (< 97% identity). CONCLUSIONS: This is the largest comparison between marker gene similarity and shared protein coding gene content to date. The study highlights the limitations of inferring a microbial community’s functions based on marker gene phylogeny. The data presented expands upon the results of previous studies that examined one or few bacterial species and supports the hypothesis that 16S rRNA and other marker genes cannot be directly used to fully predict the functional potential of a bacterial community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5641-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6449922
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64499222019-04-15 Marker genes as predictors of shared genomic function Sevigny, Joseph L. Rothenheber, Derek Diaz, Krystalle Sharlyn Zhang, Ying Agustsson, Kristin Bergeron, R. Daniel Thomas, W. Kelley BMC Genomics Research Article BACKGROUND: Although high-throughput marker gene studies provide valuable insight into the diversity and relative abundance of taxa in microbial communities, they do not provide direct measures of their functional capacity. Recently, scientists have shown a general desire to predict functional profiles of microbial communities based on phylogenetic identification inferred from marker genes, and recent tools have been developed to link the two. However, to date, no large-scale examination has quantified the correlation between the marker gene based taxonomic identity and protein coding gene conservation. Here we utilize 4872 representative prokaryotic genomes from NCBI to investigate the relationship between marker gene identity and shared protein coding gene content. RESULTS: Even at 99–100% marker gene identity, genomes share on average less than 75% of their protein coding gene content. This occurs regardless of the marker gene(s) used: V4 region of the 16S rRNA, complete 16S rRNA, or single copy orthologs through a multi-locus sequence analysis. An important aspect related to this observation is the intra-organism variation of 16S copies from a single genome. Although the majority of 16S copies were found to have high sequence similarity (> 99%), several genomes contained copies that were highly diverged (< 97% identity). CONCLUSIONS: This is the largest comparison between marker gene similarity and shared protein coding gene content to date. The study highlights the limitations of inferring a microbial community’s functions based on marker gene phylogeny. The data presented expands upon the results of previous studies that examined one or few bacterial species and supports the hypothesis that 16S rRNA and other marker genes cannot be directly used to fully predict the functional potential of a bacterial community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12864-019-5641-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-04 /pmc/articles/PMC6449922/ /pubmed/30947688 http://dx.doi.org/10.1186/s12864-019-5641-1 Text en © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Sevigny, Joseph L.
Rothenheber, Derek
Diaz, Krystalle Sharlyn
Zhang, Ying
Agustsson, Kristin
Bergeron, R. Daniel
Thomas, W. Kelley
Marker genes as predictors of shared genomic function
title Marker genes as predictors of shared genomic function
title_full Marker genes as predictors of shared genomic function
title_fullStr Marker genes as predictors of shared genomic function
title_full_unstemmed Marker genes as predictors of shared genomic function
title_short Marker genes as predictors of shared genomic function
title_sort marker genes as predictors of shared genomic function
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6449922/
https://www.ncbi.nlm.nih.gov/pubmed/30947688
http://dx.doi.org/10.1186/s12864-019-5641-1
work_keys_str_mv AT sevignyjosephl markergenesaspredictorsofsharedgenomicfunction
AT rothenheberderek markergenesaspredictorsofsharedgenomicfunction
AT diazkrystallesharlyn markergenesaspredictorsofsharedgenomicfunction
AT zhangying markergenesaspredictorsofsharedgenomicfunction
AT agustssonkristin markergenesaspredictorsofsharedgenomicfunction
AT bergeronrdaniel markergenesaspredictorsofsharedgenomicfunction
AT thomaswkelley markergenesaspredictorsofsharedgenomicfunction