Cargando…

ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning

The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. A...

Descripción completa

Detalles Bibliográficos
Autores principales: Mineeva, Olga, Danciu, Daniel, Schölkopf, Bernhard, Ley, Ruth E., Rätsch, Gunnar, Youngblut, Nicholas D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174551/
https://www.ncbi.nlm.nih.gov/pubmed/37126495
http://dx.doi.org/10.1371/journal.pcbi.1011001
_version_ 1785040056961990656
author Mineeva, Olga
Danciu, Daniel
Schölkopf, Bernhard
Ley, Ruth E.
Rätsch, Gunnar
Youngblut, Nicholas D.
author_facet Mineeva, Olga
Danciu, Daniel
Schölkopf, Bernhard
Ley, Ruth E.
Rätsch, Gunnar
Youngblut, Nicholas D.
author_sort Mineeva, Olga
collection PubMed
description The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization.
format Online
Article
Text
id pubmed-10174551
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101745512023-05-12 ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning Mineeva, Olga Danciu, Daniel Schölkopf, Bernhard Ley, Ruth E. Rätsch, Gunnar Youngblut, Nicholas D. PLoS Comput Biol Research Article The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization. Public Library of Science 2023-05-01 /pmc/articles/PMC10174551/ /pubmed/37126495 http://dx.doi.org/10.1371/journal.pcbi.1011001 Text en © 2023 Mineeva et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Mineeva, Olga
Danciu, Daniel
Schölkopf, Bernhard
Ley, Ruth E.
Rätsch, Gunnar
Youngblut, Nicholas D.
ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title_full ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title_fullStr ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title_full_unstemmed ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title_short ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
title_sort resmico: increasing the quality of metagenome-assembled genomes with deep learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174551/
https://www.ncbi.nlm.nih.gov/pubmed/37126495
http://dx.doi.org/10.1371/journal.pcbi.1011001
work_keys_str_mv AT mineevaolga resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning
AT danciudaniel resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning
AT scholkopfbernhard resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning
AT leyruthe resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning
AT ratschgunnar resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning
AT youngblutnicholasd resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning