Cargando…
ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning
The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. A...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174551/ https://www.ncbi.nlm.nih.gov/pubmed/37126495 http://dx.doi.org/10.1371/journal.pcbi.1011001 |
_version_ | 1785040056961990656 |
---|---|
author | Mineeva, Olga Danciu, Daniel Schölkopf, Bernhard Ley, Ruth E. Rätsch, Gunnar Youngblut, Nicholas D. |
author_facet | Mineeva, Olga Danciu, Daniel Schölkopf, Bernhard Ley, Ruth E. Rätsch, Gunnar Youngblut, Nicholas D. |
author_sort | Mineeva, Olga |
collection | PubMed |
description | The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization. |
format | Online Article Text |
id | pubmed-10174551 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-101745512023-05-12 ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning Mineeva, Olga Danciu, Daniel Schölkopf, Bernhard Ley, Ruth E. Rätsch, Gunnar Youngblut, Nicholas D. PLoS Comput Biol Research Article The number of published metagenome assemblies is rapidly growing due to advances in sequencing technologies. However, sequencing errors, variable coverage, repetitive genomic regions, and other factors can produce misassemblies, which are challenging to detect for taxonomically novel genomic data. Assembly errors can affect all downstream analyses of the assemblies. Accuracy for the state of the art in reference-free misassembly prediction does not exceed an AUPRC of 0.57, and it is not clear how well these models generalize to real-world data. Here, we present the Residual neural network for Misassembled Contig identification (ResMiCo), a deep learning approach for reference-free identification of misassembled contigs. To develop ResMiCo, we first generated a training dataset of unprecedented size and complexity that can be used for further benchmarking and developments in the field. Through rigorous validation, we show that ResMiCo is substantially more accurate than the state of the art, and the model is robust to novel taxonomic diversity and varying assembly methods. ResMiCo estimated 7% misassembled contigs per metagenome across multiple real-world datasets. We demonstrate how ResMiCo can be used to optimize metagenome assembly hyperparameters to improve accuracy, instead of optimizing solely for contiguity. The accuracy, robustness, and ease-of-use of ResMiCo make the tool suitable for general quality control of metagenome assemblies and assembly methodology optimization. Public Library of Science 2023-05-01 /pmc/articles/PMC10174551/ /pubmed/37126495 http://dx.doi.org/10.1371/journal.pcbi.1011001 Text en © 2023 Mineeva et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Mineeva, Olga Danciu, Daniel Schölkopf, Bernhard Ley, Ruth E. Rätsch, Gunnar Youngblut, Nicholas D. ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title | ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title_full | ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title_fullStr | ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title_full_unstemmed | ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title_short | ResMiCo: Increasing the quality of metagenome-assembled genomes with deep learning |
title_sort | resmico: increasing the quality of metagenome-assembled genomes with deep learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10174551/ https://www.ncbi.nlm.nih.gov/pubmed/37126495 http://dx.doi.org/10.1371/journal.pcbi.1011001 |
work_keys_str_mv | AT mineevaolga resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning AT danciudaniel resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning AT scholkopfbernhard resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning AT leyruthe resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning AT ratschgunnar resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning AT youngblutnicholasd resmicoincreasingthequalityofmetagenomeassembledgenomeswithdeeplearning |