Cargando…
MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms
BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT:...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6104016/ https://www.ncbi.nlm.nih.gov/pubmed/30134953 http://dx.doi.org/10.1186/s13062-018-0220-y |
_version_ | 1783349406610751488 |
---|---|
author | Qiao, Yuyang Jia, Ben Hu, Zhiqiang Sun, Chen Xiang, Yijin Wei, Chaochun |
author_facet | Qiao, Yuyang Jia, Ben Hu, Zhiqiang Sun, Chen Xiang, Yijin Wei, Chaochun |
author_sort | Qiao, Yuyang |
collection | PubMed |
description | BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT: Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. CONCLUSION: Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. REVIEWERS: This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0220-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6104016 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61040162018-08-30 MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms Qiao, Yuyang Jia, Ben Hu, Zhiqiang Sun, Chen Xiang, Yijin Wei, Chaochun Biol Direct Research BACKGROUND: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. RESULT: Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. CONCLUSION: Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. REVIEWERS: This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13062-018-0220-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-08-22 /pmc/articles/PMC6104016/ /pubmed/30134953 http://dx.doi.org/10.1186/s13062-018-0220-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Qiao, Yuyang Jia, Ben Hu, Zhiqiang Sun, Chen Xiang, Yijin Wei, Chaochun MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title | MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title_full | MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title_fullStr | MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title_full_unstemmed | MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title_short | MetaBinG2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
title_sort | metabing2: a fast and accurate metagenomic sequence classification system for samples with many unknown organisms |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6104016/ https://www.ncbi.nlm.nih.gov/pubmed/30134953 http://dx.doi.org/10.1186/s13062-018-0220-y |
work_keys_str_mv | AT qiaoyuyang metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms AT jiaben metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms AT huzhiqiang metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms AT sunchen metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms AT xiangyijin metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms AT weichaochun metabing2afastandaccuratemetagenomicsequenceclassificationsystemforsampleswithmanyunknownorganisms |