Cargando…

Metagenome and Metatranscriptome Analyses Using Protein Family Profiles

Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and d...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhong, Cuncong, Edlund, Anna, Yang, Youngik, McLean, Jeffrey S., Yooseph, Shibu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4939949/
https://www.ncbi.nlm.nih.gov/pubmed/27400380
http://dx.doi.org/10.1371/journal.pcbi.1004991
_version_ 1782442076744450048
author Zhong, Cuncong
Edlund, Anna
Yang, Youngik
McLean, Jeffrey S.
Yooseph, Shibu
author_facet Zhong, Cuncong
Edlund, Anna
Yang, Youngik
McLean, Jeffrey S.
Yooseph, Shibu
author_sort Zhong, Cuncong
collection PubMed
description Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to reconstruct comprehensive sets of complete or near-complete protein and nucleotide sequences for the query protein families. HMM-GRASPx is freely available online from http://sourceforge.net/projects/hmm-graspx.
format Online
Article
Text
id pubmed-4939949
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-49399492016-07-22 Metagenome and Metatranscriptome Analyses Using Protein Family Profiles Zhong, Cuncong Edlund, Anna Yang, Youngik McLean, Jeffrey S. Yooseph, Shibu PLoS Comput Biol Research Article Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to reconstruct comprehensive sets of complete or near-complete protein and nucleotide sequences for the query protein families. HMM-GRASPx is freely available online from http://sourceforge.net/projects/hmm-graspx. Public Library of Science 2016-07-11 /pmc/articles/PMC4939949/ /pubmed/27400380 http://dx.doi.org/10.1371/journal.pcbi.1004991 Text en © 2016 Zhong et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhong, Cuncong
Edlund, Anna
Yang, Youngik
McLean, Jeffrey S.
Yooseph, Shibu
Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title_full Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title_fullStr Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title_full_unstemmed Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title_short Metagenome and Metatranscriptome Analyses Using Protein Family Profiles
title_sort metagenome and metatranscriptome analyses using protein family profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4939949/
https://www.ncbi.nlm.nih.gov/pubmed/27400380
http://dx.doi.org/10.1371/journal.pcbi.1004991
work_keys_str_mv AT zhongcuncong metagenomeandmetatranscriptomeanalysesusingproteinfamilyprofiles
AT edlundanna metagenomeandmetatranscriptomeanalysesusingproteinfamilyprofiles
AT yangyoungik metagenomeandmetatranscriptomeanalysesusingproteinfamilyprofiles
AT mcleanjeffreys metagenomeandmetatranscriptomeanalysesusingproteinfamilyprofiles
AT yoosephshibu metagenomeandmetatranscriptomeanalysesusingproteinfamilyprofiles