Cargando…

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning

Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of...

Descripción completa

Detalles Bibliográficos
Autores principales: Ho, Harrison, Chovatia, Mansi, Egan, Rob, He, Guifen, Yoshinaga, Yuko, Liachko, Ivan, O’Malley, Ronan, Wang, Zhong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519199/
https://www.ncbi.nlm.nih.gov/pubmed/37753177
http://dx.doi.org/10.7717/peerj.16129
_version_ 1785109655593156608
author Ho, Harrison
Chovatia, Mansi
Egan, Rob
He, Guifen
Yoshinaga, Yuko
Liachko, Ivan
O’Malley, Ronan
Wang, Zhong
author_facet Ho, Harrison
Chovatia, Mansi
Egan, Rob
He, Guifen
Yoshinaga, Yuko
Liachko, Ivan
O’Malley, Ronan
Wang, Zhong
author_sort Ho, Harrison
collection PubMed
description Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies—Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR’s ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.
format Online
Article
Text
id pubmed-10519199
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-105191992023-09-26 Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning Ho, Harrison Chovatia, Mansi Egan, Rob He, Guifen Yoshinaga, Yuko Liachko, Ivan O’Malley, Ronan Wang, Zhong PeerJ Bioinformatics Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies—Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR’s ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr. PeerJ Inc. 2023-09-22 /pmc/articles/PMC10519199/ /pubmed/37753177 http://dx.doi.org/10.7717/peerj.16129 Text en © 2023 Ho et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Ho, Harrison
Chovatia, Mansi
Egan, Rob
He, Guifen
Yoshinaga, Yuko
Liachko, Ivan
O’Malley, Ronan
Wang, Zhong
Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title_full Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title_fullStr Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title_full_unstemmed Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title_short Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
title_sort integrating chromatin conformation information in a self-supervised learning model improves metagenome binning
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10519199/
https://www.ncbi.nlm.nih.gov/pubmed/37753177
http://dx.doi.org/10.7717/peerj.16129
work_keys_str_mv AT hoharrison integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT chovatiamansi integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT eganrob integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT heguifen integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT yoshinagayuko integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT liachkoivan integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT omalleyronan integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning
AT wangzhong integratingchromatinconformationinformationinaselfsupervisedlearningmodelimprovesmetagenomebinning