Cargando…
An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics
In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3597637/ https://www.ncbi.nlm.nih.gov/pubmed/23516532 http://dx.doi.org/10.1371/journal.pone.0058669 |
_version_ | 1782262665164357632 |
---|---|
author | Du, Ruofei Mercante, Donald Fang, Zhide |
author_facet | Du, Ruofei Mercante, Donald Fang, Zhide |
author_sort | Du, Ruofei |
collection | PubMed |
description | In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. |
format | Online Article Text |
id | pubmed-3597637 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-35976372013-03-20 An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics Du, Ruofei Mercante, Donald Fang, Zhide PLoS One Research Article In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. Public Library of Science 2013-03-14 /pmc/articles/PMC3597637/ /pubmed/23516532 http://dx.doi.org/10.1371/journal.pone.0058669 Text en © 2013 Du et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Du, Ruofei Mercante, Donald Fang, Zhide An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title | An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title_full | An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title_fullStr | An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title_full_unstemmed | An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title_short | An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics |
title_sort | artificial functional family filter in homolog searching in next-generation sequencing metagenomics |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3597637/ https://www.ncbi.nlm.nih.gov/pubmed/23516532 http://dx.doi.org/10.1371/journal.pone.0058669 |
work_keys_str_mv | AT duruofei anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics AT mercantedonald anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics AT fangzhide anartificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics AT duruofei artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics AT mercantedonald artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics AT fangzhide artificialfunctionalfamilyfilterinhomologsearchinginnextgenerationsequencingmetagenomics |