Cargando…

An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the...

Descripción completa

Detalles Bibliográficos
Autores principales: Du, Ruofei, Mercante, Donald, Fang, Zhide
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3597637/
https://www.ncbi.nlm.nih.gov/pubmed/23516532
http://dx.doi.org/10.1371/journal.pone.0058669
_version_ 1782262665164357632
author Du, Ruofei
Mercante, Donald
Fang, Zhide
author_facet Du, Ruofei
Mercante, Donald
Fang, Zhide
author_sort Du, Ruofei
collection PubMed
description In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures.
format Online
Article
Text
id pubmed-3597637
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35976372013-03-20 An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics Du, Ruofei Mercante, Donald Fang, Zhide PLoS One Research Article In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. Public Library of Science 2013-03-14 /pmc/articles/PMC3597637/ /pubmed/23516532 http://dx.doi.org/10.1371/journal.pone.0058669 Text en © 2013 Du et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Du, Ruofei
Mercante, Donald
Fang, Zhide
An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title_full An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title_fullStr An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title_full_unstemmed An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title_short An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
title_sort artificial functional family filter in homolog searching in next-generation sequencing metagenomics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3597637/
https://www.ncbi.nlm.nih.gov/pubmed/23516532
http://dx.doi.org/10.1371/journal.pone.0058669
work_keys_str_mv AT duruofei anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics
AT mercantedonald anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics
AT fangzhide anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics
AT duruofei artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics
AT mercantedonald artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics
AT fangzhide artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics