Cargando…

Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets

BACKGROUND: Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the k...

Descripción completa

Detalles Bibliográficos
Autores principales: Yue, Yi, Huang, Hao, Qi, Zhao, Dou, Hui-Min, Liu, Xin-Yi, Han, Tian-Fei, Chen, Yue, Song, Xiang-Jun, Zhang, You-Hua, Tu, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7469296/
https://www.ncbi.nlm.nih.gov/pubmed/32723290
http://dx.doi.org/10.1186/s12859-020-03667-3
_version_ 1783578396520873984
author Yue, Yi
Huang, Hao
Qi, Zhao
Dou, Hui-Min
Liu, Xin-Yi
Han, Tian-Fei
Chen, Yue
Song, Xiang-Jun
Zhang, You-Hua
Tu, Jian
author_facet Yue, Yi
Huang, Hao
Qi, Zhao
Dou, Hui-Min
Liu, Xin-Yi
Han, Tian-Fei
Chen, Yue
Song, Xiang-Jun
Zhang, You-Hua
Tu, Jian
author_sort Yue, Yi
collection PubMed
description BACKGROUND: Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major challenge in metagenomic research. Both supervised and unsupervised machine learning methods have been employed in binning. Genome binning belonging to unsupervised method clusters contigs into individual genome bins by machine learning methods without the assistance of any reference databases. So far a lot of genome binning tools have emerged. Evaluating these genome tools is of great significance to microbiological research. In this study, we evaluate 15 genome binning tools containing 12 original binning tools and 3 refining binning tools by comparing the performance of these tools on chicken gut metagenomic datasets and the first CAMI challenge datasets. RESULTS: For chicken gut metagenomic datasets, original genome binner MetaBat, Groopm2 and Autometa performed better than other original binner, and MetaWrap combined the binning results of them generated the most high-quality genome bins. For CAMI datasets, Groopm2 achieved the highest purity (> 0.9) with good completeness (> 0.8), and reconstructed the most high-quality genome bins among original genome binners. Compared with Groopm2, MetaBat2 had similar performance with higher completeness and lower purity. Genome refining binners DASTool predicated the most high-quality genome bins among all genomes binners. Most genome binner performed well for unique strains. Nonetheless, reconstructing common strains still is a substantial challenge for all genome binner. CONCLUSIONS: In conclusion, we tested a set of currently available, state-of-the-art metagenomics hybrid binning tools and provided a guide for selecting tools for metagenomic binning by comparing range of purity, completeness, adjusted rand index, and the number of high-quality reconstructed bins. Furthermore, available information for future binning strategy were concluded.
format Online
Article
Text
id pubmed-7469296
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74692962020-09-03 Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets Yue, Yi Huang, Hao Qi, Zhao Dou, Hui-Min Liu, Xin-Yi Han, Tian-Fei Chen, Yue Song, Xiang-Jun Zhang, You-Hua Tu, Jian BMC Bioinformatics Research Article BACKGROUND: Shotgun metagenomics based on untargeted sequencing can explore the taxonomic profile and the function of unknown microorganisms in samples, and complement the shortage of amplicon sequencing. Binning assembled sequences into individual groups, which represent microbial genomes, is the key step and a major challenge in metagenomic research. Both supervised and unsupervised machine learning methods have been employed in binning. Genome binning belonging to unsupervised method clusters contigs into individual genome bins by machine learning methods without the assistance of any reference databases. So far a lot of genome binning tools have emerged. Evaluating these genome tools is of great significance to microbiological research. In this study, we evaluate 15 genome binning tools containing 12 original binning tools and 3 refining binning tools by comparing the performance of these tools on chicken gut metagenomic datasets and the first CAMI challenge datasets. RESULTS: For chicken gut metagenomic datasets, original genome binner MetaBat, Groopm2 and Autometa performed better than other original binner, and MetaWrap combined the binning results of them generated the most high-quality genome bins. For CAMI datasets, Groopm2 achieved the highest purity (> 0.9) with good completeness (> 0.8), and reconstructed the most high-quality genome bins among original genome binners. Compared with Groopm2, MetaBat2 had similar performance with higher completeness and lower purity. Genome refining binners DASTool predicated the most high-quality genome bins among all genomes binners. Most genome binner performed well for unique strains. Nonetheless, reconstructing common strains still is a substantial challenge for all genome binner. CONCLUSIONS: In conclusion, we tested a set of currently available, state-of-the-art metagenomics hybrid binning tools and provided a guide for selecting tools for metagenomic binning by comparing range of purity, completeness, adjusted rand index, and the number of high-quality reconstructed bins. Furthermore, available information for future binning strategy were concluded. BioMed Central 2020-07-28 /pmc/articles/PMC7469296/ /pubmed/32723290 http://dx.doi.org/10.1186/s12859-020-03667-3 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Yue, Yi
Huang, Hao
Qi, Zhao
Dou, Hui-Min
Liu, Xin-Yi
Han, Tian-Fei
Chen, Yue
Song, Xiang-Jun
Zhang, You-Hua
Tu, Jian
Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title_full Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title_fullStr Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title_full_unstemmed Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title_short Evaluating metagenomics tools for genome binning with real metagenomic datasets and CAMI datasets
title_sort evaluating metagenomics tools for genome binning with real metagenomic datasets and cami datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7469296/
https://www.ncbi.nlm.nih.gov/pubmed/32723290
http://dx.doi.org/10.1186/s12859-020-03667-3
work_keys_str_mv AT yueyi evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT huanghao evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT qizhao evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT douhuimin evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT liuxinyi evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT hantianfei evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT chenyue evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT songxiangjun evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT zhangyouhua evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets
AT tujian evaluatingmetagenomicstoolsforgenomebinningwithrealmetagenomicdatasetsandcamidatasets