Cargando…
CheckV assesses the quality and completeness of metagenome-assembled viral genomes
Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group US
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116208/ https://www.ncbi.nlm.nih.gov/pubmed/33349699 http://dx.doi.org/10.1038/s41587-020-00774-7 |
_version_ | 1783691344504422400 |
---|---|
author | Nayfach, Stephen Camargo, Antonio Pedro Schulz, Frederik Eloe-Fadrosh, Emiley Roux, Simon Kyrpides, Nikos C. |
author_facet | Nayfach, Stephen Camargo, Antonio Pedro Schulz, Frederik Eloe-Fadrosh, Emiley Roux, Simon Kyrpides, Nikos C. |
author_sort | Nayfach, Stephen |
collection | PubMed |
description | Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions. |
format | Online Article Text |
id | pubmed-8116208 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group US |
record_format | MEDLINE/PubMed |
spelling | pubmed-81162082021-05-26 CheckV assesses the quality and completeness of metagenome-assembled viral genomes Nayfach, Stephen Camargo, Antonio Pedro Schulz, Frederik Eloe-Fadrosh, Emiley Roux, Simon Kyrpides, Nikos C. Nat Biotechnol Article Millions of new viral sequences have been identified from metagenomes, but the quality and completeness of these sequences vary considerably. Here we present CheckV, an automated pipeline for identifying closed viral genomes, estimating the completeness of genome fragments and removing flanking host regions from integrated proviruses. CheckV estimates completeness by comparing sequences with a large database of complete viral genomes, including 76,262 identified from a systematic search of publicly available metagenomes, metatranscriptomes and metaviromes. After validation on mock datasets and comparison to existing methods, we applied CheckV to large and diverse collections of metagenome-assembled viral sequences, including IMG/VR and the Global Ocean Virome. This revealed 44,652 high-quality viral genomes (that is, >90% complete), although the vast majority of sequences were small fragments, which highlights the challenge of assembling viral genomes from short-read metagenomes. Additionally, we found that removal of host contamination substantially improved the accurate identification of auxiliary metabolic genes and interpretation of viral-encoded functions. Nature Publishing Group US 2020-12-21 2021 /pmc/articles/PMC8116208/ /pubmed/33349699 http://dx.doi.org/10.1038/s41587-020-00774-7 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Nayfach, Stephen Camargo, Antonio Pedro Schulz, Frederik Eloe-Fadrosh, Emiley Roux, Simon Kyrpides, Nikos C. CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title | CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title_full | CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title_fullStr | CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title_full_unstemmed | CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title_short | CheckV assesses the quality and completeness of metagenome-assembled viral genomes |
title_sort | checkv assesses the quality and completeness of metagenome-assembled viral genomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116208/ https://www.ncbi.nlm.nih.gov/pubmed/33349699 http://dx.doi.org/10.1038/s41587-020-00774-7 |
work_keys_str_mv | AT nayfachstephen checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes AT camargoantoniopedro checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes AT schulzfrederik checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes AT eloefadroshemiley checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes AT rouxsimon checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes AT kyrpidesnikosc checkvassessesthequalityandcompletenessofmetagenomeassembledviralgenomes |