Cargando…

A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads

BACKGROUND: Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searchi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Du, Ruofei, Mercante, Donald, An, Lingling, Fang, Zhide
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2014
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5922784/ https://www.ncbi.nlm.nih.gov/pubmed/29710879 http://dx.doi.org/10.4172/2155-6180.1000208

_version_	1783318227369066496
author	Du, Ruofei Mercante, Donald An, Lingling Fang, Zhide
author_facet	Du, Ruofei Mercante, Donald An, Lingling Fang, Zhide
author_sort	Du, Ruofei
collection	PubMed
description	BACKGROUND: Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searching of metagenomic sequencing reads against a protein database, the relative abundance of a family can be represented by the number of reads aligned to its members. However, it has been observed that, for short reads generated by next-generation sequencing platforms, some may be erroneously assigned to the functional families they are not associated to. This commonly occurred phenomenon is termed as cross-annotation. Current methods for functional profiling of a metagenomic sample use empirical cutoff values, to select the alignments and ignore such cross-annotation problem, or employ summarized equation to do a simple adjustment. RESULT: By introducing latent variables, we use the Probabilistic Latent Semantic Analysis to model the proportions of reads assigned to functional families in a metagenomic sample. The approach can be applied on a metagenomic sample after the list of the true functional families being obtained or estimated. It was implemented in metagenomic samples functionally characterized by the database of Clusters of Orthologous Groups of proteins, and successfully addressed the cross-annotation issue on both in vitro-simulated, bioinformatics tool simulated metagenomic samples, and a real-world data. CONCLUSIONS: Correcting cross-annotation will increase the accuracy of the functional profiling of a metagenome generated by short reads. It will further benefit differential abundance analysis of metagenomic samples under different conditions.
format	Online Article Text
id	pubmed-5922784
institution	National Center for Biotechnology Information
language	English
publishDate	2014
record_format	MEDLINE/PubMed
spelling	pubmed-59227842018-04-27 A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads Du, Ruofei Mercante, Donald An, Lingling Fang, Zhide J Biom Biostat Article BACKGROUND: Categorizing protein coding sequences into one family, if the proteins they encode perform the same biochemical function, and then tabulating the relative abundances among all the families, is a widely-adopted practice for functional profiling of a metagenomic sample. By homology searching of metagenomic sequencing reads against a protein database, the relative abundance of a family can be represented by the number of reads aligned to its members. However, it has been observed that, for short reads generated by next-generation sequencing platforms, some may be erroneously assigned to the functional families they are not associated to. This commonly occurred phenomenon is termed as cross-annotation. Current methods for functional profiling of a metagenomic sample use empirical cutoff values, to select the alignments and ignore such cross-annotation problem, or employ summarized equation to do a simple adjustment. RESULT: By introducing latent variables, we use the Probabilistic Latent Semantic Analysis to model the proportions of reads assigned to functional families in a metagenomic sample. The approach can be applied on a metagenomic sample after the list of the true functional families being obtained or estimated. It was implemented in metagenomic samples functionally characterized by the database of Clusters of Orthologous Groups of proteins, and successfully addressed the cross-annotation issue on both in vitro-simulated, bioinformatics tool simulated metagenomic samples, and a real-world data. CONCLUSIONS: Correcting cross-annotation will increase the accuracy of the functional profiling of a metagenome generated by short reads. It will further benefit differential abundance analysis of metagenomic samples under different conditions. 2014-11-10 2014 /pmc/articles/PMC5922784/ /pubmed/29710879 http://dx.doi.org/10.4172/2155-6180.1000208 Text en http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Article Du, Ruofei Mercante, Donald An, Lingling Fang, Zhide A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title	A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title_full	A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title_fullStr	A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title_full_unstemmed	A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title_short	A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads
title_sort	statistical approach to correcting cross-annotations in a metagenomic functional profile generated by short reads
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5922784/ https://www.ncbi.nlm.nih.gov/pubmed/29710879 http://dx.doi.org/10.4172/2155-6180.1000208
work_keys_str_mv	AT duruofei astatisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT mercantedonald astatisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT anlingling astatisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT fangzhide astatisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT duruofei statisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT mercantedonald statisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT anlingling statisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads AT fangzhide statisticalapproachtocorrectingcrossannotationsinametagenomicfunctionalprofilegeneratedbyshortreads

A Statistical Approach to Correcting Cross-Annotations in a Metagenomic Functional Profile Generated by Short Reads

Ejemplares similares