Cargando…

CheckV assesses the quality and completeness of metagenome-assembled viral genomes

Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host...

Descripción completa

Detalles Bibliográficos
Autores principales: Nayfach, Stephen, Camargo, Antonio Pedro, Schulz, Frederik, Eloe-Fadrosh, Emiley, Roux, Simon, Kyrpides, Nikos C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group US 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116208/
https://www.ncbi.nlm.nih.gov/pubmed/33349699
http://dx.doi.org/10.1038/s41587-020-00774-7
_version_ 1783691344504422400
author Nayfach, Stephen
Camargo, Antonio Pedro
Schulz, Frederik
Eloe-Fadrosh, Emiley
Roux, Simon
Kyrpides, Nikos C.
author_facet Nayfach, Stephen
Camargo, Antonio Pedro
Schulz, Frederik
Eloe-Fadrosh, Emiley
Roux, Simon
Kyrpides, Nikos C.
author_sort Nayfach, Stephen
collection PubMed
description Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions.
format Online
Article
Text
id pubmed-8116208
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group US
record_format MEDLINE/PubMed
spelling pubmed-81162082021-05-26 CheckV assesses the quality and completeness of metagenome-assembled viral genomes Nayfach, Stephen Camargo, Antonio Pedro Schulz, Frederik Eloe-Fadrosh, Emiley Roux, Simon Kyrpides, Nikos C. Nat Biotechnol Article Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions. Nature Publishing Group US 2020-12-21 2021 /pmc/articles/PMC8116208/ /pubmed/33349699 http://dx.doi.org/10.1038/s41587-020-00774-7 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Nayfach, Stephen
Camargo, Antonio Pedro
Schulz, Frederik
Eloe-Fadrosh, Emiley
Roux, Simon
Kyrpides, Nikos C.
CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title_full CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title_fullStr CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title_full_unstemmed CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title_short CheckV assesses the quality and completeness of metagenome-assembled viral genomes
title_sort checkv assesses the quality and completeness of metagenome-assembled viral genomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116208/
https://www.ncbi.nlm.nih.gov/pubmed/33349699
http://dx.doi.org/10.1038/s41587-020-00774-7
work_keys_str_mv AT nayfachstephen checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes
AT camargoantoniopedro checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes
AT schulzfrederik checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes
AT eloefadroshemiley checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes
AT rouxsimon checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes
AT kyrpidesnikosc checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes