Cargando…

M-pick, a modularity-based method for OTU picking of 16S rRNA sequences

BACKGROUND: Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but mos...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xiaoyu, Yao, Jin, Sun, Yijun, Mai, Volker
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599145/
https://www.ncbi.nlm.nih.gov/pubmed/23387433
http://dx.doi.org/10.1186/1471-2105-14-43
_version_ 1782262896488611840
author Wang, Xiaoyu
Yao, Jin
Sun, Yijun
Mai, Volker
author_facet Wang, Xiaoyu
Yao, Jin
Sun, Yijun
Mai, Volker
author_sort Wang, Xiaoyu
collection PubMed
description BACKGROUND: Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but most suffer from either inaccuracies or from being unable to handle millions of sequences generated in current studies. Furthermore, existing binning methods usually require a priori decisions regarding binning parameters such as a distance level for defining an OTU. RESULTS: We present a novel modularity-based approach (M-pick) to address the aforementioned problems. The new method utilizes ideas from community detection in graphs, where sequences are viewed as vertices on a weighted graph, each pair of sequences is connected by an imaginary edge, and the similarity of a pair of sequences represents the weight of the edge. M-pick first generates a graph based on pairwise sequence distances and then applies a modularity-based community detection technique on the graph to generate OTUs to capture the community structures in sequence data. To compare the performance of M-pick with that of existing methods, specifically CROP and ESPRIT-Tree, sequence data from different hypervariable regions of 16S rRNA were used and binning results were compared. CONCLUSIONS: A new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance levels to define OTUs. The source code is available at http://plaza.ufl.edu/xywang/Mpick.htm.
format Online
Article
Text
id pubmed-3599145
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35991452013-03-29 M-pick, a modularity-based method for OTU picking of 16S rRNA sequences Wang, Xiaoyu Yao, Jin Sun, Yijun Mai, Volker BMC Bioinformatics Methodology Article BACKGROUND: Binning 16S rRNA sequences into operational taxonomic units (OTUs) is an initial crucial step in analyzing large sequence datasets generated to determine microbial community compositions in various environments including that of the human gut. Various methods have been developed, but most suffer from either inaccuracies or from being unable to handle millions of sequences generated in current studies. Furthermore, existing binning methods usually require a priori decisions regarding binning parameters such as a distance level for defining an OTU. RESULTS: We present a novel modularity-based approach (M-pick) to address the aforementioned problems. The new method utilizes ideas from community detection in graphs, where sequences are viewed as vertices on a weighted graph, each pair of sequences is connected by an imaginary edge, and the similarity of a pair of sequences represents the weight of the edge. M-pick first generates a graph based on pairwise sequence distances and then applies a modularity-based community detection technique on the graph to generate OTUs to capture the community structures in sequence data. To compare the performance of M-pick with that of existing methods, specifically CROP and ESPRIT-Tree, sequence data from different hypervariable regions of 16S rRNA were used and binning results were compared. CONCLUSIONS: A new modularity-based clustering method for OTU picking of 16S rRNA sequences is developed in this study. The algorithm does not require a predetermined cut-off level, and our simulation studies suggest that it is superior to existing methods that require specified distance levels to define OTUs. The source code is available at http://plaza.ufl.edu/xywang/Mpick.htm. BioMed Central 2013-02-07 /pmc/articles/PMC3599145/ /pubmed/23387433 http://dx.doi.org/10.1186/1471-2105-14-43 Text en Copyright ©2013 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Wang, Xiaoyu
Yao, Jin
Sun, Yijun
Mai, Volker
M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title_full M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title_fullStr M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title_full_unstemmed M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title_short M-pick, a modularity-based method for OTU picking of 16S rRNA sequences
title_sort m-pick, a modularity-based method for otu picking of 16s rrna sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599145/
https://www.ncbi.nlm.nih.gov/pubmed/23387433
http://dx.doi.org/10.1186/1471-2105-14-43
work_keys_str_mv AT wangxiaoyu mpickamodularitybasedmethodforotupickingof16srrnasequences
AT yaojin mpickamodularitybasedmethodforotupickingof16srrnasequences
AT sunyijun mpickamodularitybasedmethodforotupickingof16srrnasequences
AT maivolker mpickamodularitybasedmethodforotupickingof16srrnasequences