Cargando…

Metagenomic binning through low-density hashing

MOTIVATION: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce ‘low-densit...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Yunan, Yu, Yun William, Zeng, Jianyang, Berger, Bonnie, Peng, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330020/
https://www.ncbi.nlm.nih.gov/pubmed/30010790
http://dx.doi.org/10.1093/bioinformatics/bty611
_version_ 1783386911902007296
author Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger, Bonnie
Peng, Jian
author_facet Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger, Bonnie
Peng, Jian
author_sort Luo, Yunan
collection PubMed
description MOTIVATION: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce ‘low-density’ locality sensitive hashing to bioinformatics, with the addition of Gallager codes for even coverage, enabling quick and accurate metagenomic binning. RESULTS: On public benchmarks, Opal halves the error on precision/recall (F1-score) as compared with both alignment-based and alignment-free methods for species classification. We demonstrate even more marked improvement at higher taxonomic levels, allowing for the discovery of novel lineages. Furthermore, the innovation of low-density, even-coverage hashing should itself prove an essential methodological advance as it enables the application of machine learning to other bioinformatic challenges. AVAILABILITY AND IMPLEMENTATION: Full source code and datasets are available at http://opal.csail.mit.edu and https://github.com/yunwilliamyu/opal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6330020
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63300202019-01-15 Metagenomic binning through low-density hashing Luo, Yunan Yu, Yun William Zeng, Jianyang Berger, Bonnie Peng, Jian Bioinformatics Original Papers MOTIVATION: Vastly greater quantities of microbial genome data are being generated where environmental samples mix together the DNA from many different species. Here, we present Opal for metagenomic binning, the task of identifying the origin species of DNA sequencing reads. We introduce ‘low-density’ locality sensitive hashing to bioinformatics, with the addition of Gallager codes for even coverage, enabling quick and accurate metagenomic binning. RESULTS: On public benchmarks, Opal halves the error on precision/recall (F1-score) as compared with both alignment-based and alignment-free methods for species classification. We demonstrate even more marked improvement at higher taxonomic levels, allowing for the discovery of novel lineages. Furthermore, the innovation of low-density, even-coverage hashing should itself prove an essential methodological advance as it enables the application of machine learning to other bioinformatic challenges. AVAILABILITY AND IMPLEMENTATION: Full source code and datasets are available at http://opal.csail.mit.edu and https://github.com/yunwilliamyu/opal. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-01-15 2018-07-13 /pmc/articles/PMC6330020/ /pubmed/30010790 http://dx.doi.org/10.1093/bioinformatics/bty611 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Luo, Yunan
Yu, Yun William
Zeng, Jianyang
Berger, Bonnie
Peng, Jian
Metagenomic binning through low-density hashing
title Metagenomic binning through low-density hashing
title_full Metagenomic binning through low-density hashing
title_fullStr Metagenomic binning through low-density hashing
title_full_unstemmed Metagenomic binning through low-density hashing
title_short Metagenomic binning through low-density hashing
title_sort metagenomic binning through low-density hashing
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330020/
https://www.ncbi.nlm.nih.gov/pubmed/30010790
http://dx.doi.org/10.1093/bioinformatics/bty611
work_keys_str_mv AT luoyunan metagenomicbinningthroughlowdensityhashing
AT yuyunwilliam metagenomicbinningthroughlowdensityhashing
AT zengjianyang metagenomicbinningthroughlowdensityhashing
AT bergerbonnie metagenomicbinningthroughlowdensityhashing
AT pengjian metagenomicbinningthroughlowdensityhashing