Cargando…

Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency

BACKGROUND: Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes d...

Descripción completa

Detalles Bibliográficos
Autores principales: Weng, Francis C, Su, Chien-Hao, Hsu, Ming-Tsung, Wang, Tse-Yi, Tsai, Huai-Kuang, Wang, Daryi
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098102/
https://www.ncbi.nlm.nih.gov/pubmed/21083935
http://dx.doi.org/10.1186/1471-2105-11-565
_version_ 1782203917998751744
author Weng, Francis C
Su, Chien-Hao
Hsu, Ming-Tsung
Wang, Tse-Yi
Tsai, Huai-Kuang
Wang, Daryi
author_facet Weng, Francis C
Su, Chien-Hao
Hsu, Ming-Tsung
Wang, Tse-Yi
Tsai, Huai-Kuang
Wang, Daryi
author_sort Weng, Francis C
collection PubMed
description BACKGROUND: Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies. RESULTS: Among the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The Pearson correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach. CONCLUSIONS: Our findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes.
format Text
id pubmed-3098102
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30981022011-07-08 Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency Weng, Francis C Su, Chien-Hao Hsu, Ming-Tsung Wang, Tse-Yi Tsai, Huai-Kuang Wang, Daryi BMC Bioinformatics Research Article BACKGROUND: Investigation of metagenomes provides greater insight into uncultured microbial communities. The improvement in sequencing technology, which yields a large amount of sequence data, has led to major breakthroughs in the field. However, at present, taxonomic binning tools for metagenomes discard 30-40% of Sanger sequencing data due to the stringency of BLAST cut-offs. In an attempt to provide a comprehensive overview of metagenomic data, we re-analyzed the discarded metagenomes by using less stringent cut-offs. Additionally, we introduced a new criterion, namely, the evolutionary conservation of adjacency between neighboring genes. To evaluate the feasibility of our approach, we re-analyzed discarded contigs and singletons from several environments with different levels of complexity. We also compared the consistency between our taxonomic binning and those reported in the original studies. RESULTS: Among the discarded data, we found that 23.7 ± 3.9% of singletons and 14.1 ± 1.0% of contigs were assigned to taxa. The recovery rates for singletons were higher than those for contigs. The Pearson correlation coefficient revealed a high degree of similarity (0.94 ± 0.03 at the phylum rank and 0.80 ± 0.11 at the family rank) between the proposed taxonomic binning approach and those reported in original studies. In addition, an evaluation using simulated data demonstrated the reliability of the proposed approach. CONCLUSIONS: Our findings suggest that taking account of conserved neighboring gene adjacency improves taxonomic assignment when analyzing metagenomes using Sanger sequencing. In other words, utilizing the conserved gene order as a criterion will reduce the amount of data discarded when analyzing metagenomes. BioMed Central 2010-11-18 /pmc/articles/PMC3098102/ /pubmed/21083935 http://dx.doi.org/10.1186/1471-2105-11-565 Text en Copyright ©2010 Weng et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Weng, Francis C
Su, Chien-Hao
Hsu, Ming-Tsung
Wang, Tse-Yi
Tsai, Huai-Kuang
Wang, Daryi
Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_full Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_fullStr Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_full_unstemmed Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_short Reanalyze unassigned reads in Sanger based metagenomic data using conserved gene adjacency
title_sort reanalyze unassigned reads in sanger based metagenomic data using conserved gene adjacency
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098102/
https://www.ncbi.nlm.nih.gov/pubmed/21083935
http://dx.doi.org/10.1186/1471-2105-11-565
work_keys_str_mv AT wengfrancisc reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT suchienhao reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT hsumingtsung reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT wangtseyi reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT tsaihuaikuang reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency
AT wangdaryi reanalyzeunassignedreadsinsangerbasedmetagenomicdatausingconservedgeneadjacency