Cargando…
Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519199/ https://www.ncbi.nlm.nih.gov/pubmed/37753177 http://dx.doi.org/10.7717/peerj.16129 |
_version_ | 1785109655593156608 |
---|---|
author | Ho, Harrison Chovatia, Mansi Egan, Rob He, Guifen Yoshinaga, Yuko Liachko, Ivan O’Malley, Ronan Wang, Zhong |
author_facet | Ho, Harrison Chovatia, Mansi Egan, Rob He, Guifen Yoshinaga, Yuko Liachko, Ivan O’Malley, Ronan Wang, Zhong |
author_sort | Ho, Harrison |
collection | PubMed |
description | Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies—Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR’s ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr. |
format | Online Article Text |
id | pubmed-10519199 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105191992023-09-26 Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning Ho, Harrison Chovatia, Mansi Egan, Rob He, Guifen Yoshinaga, Yuko Liachko, Ivan O’Malley, Ronan Wang, Zhong PeerJ Bioinformatics Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies—Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR’s ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr. PeerJ Inc. 2023-09-22 /pmc/articles/PMC10519199/ /pubmed/37753177 http://dx.doi.org/10.7717/peerj.16129 Text en © 2023 Ho et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Ho, Harrison Chovatia, Mansi Egan, Rob He, Guifen Yoshinaga, Yuko Liachko, Ivan O’Malley, Ronan Wang, Zhong Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title | Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title_full | Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title_fullStr | Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title_full_unstemmed | Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title_short | Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
title_sort | integrating chromatin conformation information in a self-supervised learning model improves metagenome binning |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519199/ https://www.ncbi.nlm.nih.gov/pubmed/37753177 http://dx.doi.org/10.7717/peerj.16129 |
work_keys_str_mv | AT hoharrison integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT chovatiamansi integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT eganrob integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT heguifen integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT yoshinagayuko integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT liachkoivan integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT omalleyronan integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning AT wangzhong integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning |