Cargando…

MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms

BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT:...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiao, Yuyang, Jia, Ben, Hu, Zhiqiang, Sun, Chen, Xiang, Yijin, Wei, Chaochun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6104016/
https://www.ncbi.nlm.nih.gov/pubmed/30134953
http://dx.doi.org/10.1186/s13062-018-0220-y
_version_ 1783349406610751488
author Qiao, Yuyang
Jia, Ben
Hu, Zhiqiang
Sun, Chen
Xiang, Yijin
Wei, Chaochun
author_facet Qiao, Yuyang
Jia, Ben
Hu, Zhiqiang
Sun, Chen
Xiang, Yijin
Wei, Chaochun
author_sort Qiao, Yuyang
collection PubMed
description BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT: Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. CONCLUSION: Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. REVIEWERS: This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0220-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6104016
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61040162018-08-30 MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms Qiao, Yuyang Jia, Ben Hu, Zhiqiang Sun, Chen Xiang, Yijin Wei, Chaochun Biol Direct Research BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT: Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. CONCLUSION: Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. REVIEWERS: This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0220-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-22 /pmc/articles/PMC6104016/ /pubmed/30134953 http://dx.doi.org/10.1186/s13062-018-0220-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Qiao, Yuyang
Jia, Ben
Hu, Zhiqiang
Sun, Chen
Xiang, Yijin
Wei, Chaochun
MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title_full MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title_fullStr MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title_full_unstemmed MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title_short MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
title_sort metabing2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6104016/
https://www.ncbi.nlm.nih.gov/pubmed/30134953
http://dx.doi.org/10.1186/s13062-018-0220-y
work_keys_str_mv AT qiaoyuyang metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms
AT jiaben metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms
AT huzhiqiang metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms
AT sunchen metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms
AT xiangyijin metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms
AT weichaochun metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms