Cargando…

MetaGeneHunt for protein domain annotation in short-read metagenomes

The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don’...

Descripción completa

Detalles Bibliográficos
Autores principales: Berlemont, R., Winans, N., Talamantes, D., Dang, H., Tsai, H-W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7205989/
https://www.ncbi.nlm.nih.gov/pubmed/32382098
http://dx.doi.org/10.1038/s41598-020-63775-1
_version_ 1783530334183227392
author Berlemont, R.
Winans, N.
Talamantes, D.
Dang, H.
Tsai, H-W.
author_facet Berlemont, R.
Winans, N.
Talamantes, D.
Dang, H.
Tsai, H-W.
author_sort Berlemont, R.
collection PubMed
description The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don’t mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. Next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RAST. In addition, using the sister program “GeneHunt” to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes.
format Online
Article
Text
id pubmed-7205989
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-72059892020-05-15 MetaGeneHunt for protein domain annotation in short-read metagenomes Berlemont, R. Winans, N. Talamantes, D. Dang, H. Tsai, H-W. Sci Rep Article The annotation of short-reads metagenomes is an essential process to understand the functional potential of sequenced microbial communities. Annotation techniques based solely on the identification of local matches tend to confound local sequence similarity and overall protein homology and thus don’t mirror the complex multidomain architecture and the shuffling of functional domains in many protein families. Here, we present MetaGeneHunt to identify specific protein domains and to normalize the hit-counts based on the domain length. We used MetaGeneHunt to investigate the potential for carbohydrate processing in the mouse gastrointestinal tract. We sampled, sequenced, and analyzed the microbial communities associated with the bolus in the stomach, intestine, cecum, and colon of five captive mice. Focusing on Glycoside Hydrolases (GHs) we found that, across samples, 58.3% of the 4,726,023 short-read sequences matching with a GH domain-containing protein were located outside the domain of interest. Next, before comparing the samples, the counts of localized hits matching the domains of interest were normalized to account for the corresponding domain length. Microbial communities in the intestine and cecum displayed characteristic GH profiles matching distinct microbial assemblages. Conversely, the stomach and colon were associated with structurally and functionally more diverse and variable microbial communities. Across samples, despite fluctuations, changes in the functional potential for carbohydrate processing correlated with changes in community composition. Overall MetaGeneHunt is a new way to quickly and precisely identify discrete protein domains in sequenced metagenomes processed with MG-RAST. In addition, using the sister program “GeneHunt” to create custom Reference Annotation Table, MetaGeneHunt provides an unprecedented way to (re)investigate the precise distribution of any protein domain in short-reads metagenomes. Nature Publishing Group UK 2020-05-07 /pmc/articles/PMC7205989/ /pubmed/32382098 http://dx.doi.org/10.1038/s41598-020-63775-1 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Berlemont, R.
Winans, N.
Talamantes, D.
Dang, H.
Tsai, H-W.
MetaGeneHunt for protein domain annotation in short-read metagenomes
title MetaGeneHunt for protein domain annotation in short-read metagenomes
title_full MetaGeneHunt for protein domain annotation in short-read metagenomes
title_fullStr MetaGeneHunt for protein domain annotation in short-read metagenomes
title_full_unstemmed MetaGeneHunt for protein domain annotation in short-read metagenomes
title_short MetaGeneHunt for protein domain annotation in short-read metagenomes
title_sort metagenehunt for protein domain annotation in short-read metagenomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7205989/
https://www.ncbi.nlm.nih.gov/pubmed/32382098
http://dx.doi.org/10.1038/s41598-020-63775-1
work_keys_str_mv AT berlemontr metagenehuntforproteindomainannotationinshortreadmetagenomes
AT winansn metagenehuntforproteindomainannotationinshortreadmetagenomes
AT talamantesd metagenehuntforproteindomainannotationinshortreadmetagenomes
AT dangh metagenehuntforproteindomainannotationinshortreadmetagenomes
AT tsaihw metagenehuntforproteindomainannotationinshortreadmetagenomes