Cargando…
ProDeGe: a computational protocol for fully automated decontamination of genomes
Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequenc...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681846/ https://www.ncbi.nlm.nih.gov/pubmed/26057843 http://dx.doi.org/10.1038/ismej.2015.100 |
_version_ | 1782405783880728576 |
---|---|
author | Tennessen, Kristin Andersen, Evan Clingenpeel, Scott Rinke, Christian Lundberg, Derek S Han, James Dangl, Jeff L Ivanova, Natalia Woyke, Tanja Kyrpides, Nikos Pati, Amrita |
author_facet | Tennessen, Kristin Andersen, Evan Clingenpeel, Scott Rinke, Christian Lundberg, Derek S Han, James Dangl, Jeff L Ivanova, Natalia Woyke, Tanja Kyrpides, Nikos Pati, Amrita |
author_sort | Tennessen, Kristin |
collection | PubMed |
description | Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence. |
format | Online Article Text |
id | pubmed-4681846 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Nature Publishing Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-46818462016-01-01 ProDeGe: a computational protocol for fully automated decontamination of genomes Tennessen, Kristin Andersen, Evan Clingenpeel, Scott Rinke, Christian Lundberg, Derek S Han, James Dangl, Jeff L Ivanova, Natalia Woyke, Tanja Kyrpides, Nikos Pati, Amrita ISME J Short Communication Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence. Nature Publishing Group 2016-01 2015-06-09 /pmc/articles/PMC4681846/ /pubmed/26057843 http://dx.doi.org/10.1038/ismej.2015.100 Text en Copyright © 2016 International Society for Microbial Ecology http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Short Communication Tennessen, Kristin Andersen, Evan Clingenpeel, Scott Rinke, Christian Lundberg, Derek S Han, James Dangl, Jeff L Ivanova, Natalia Woyke, Tanja Kyrpides, Nikos Pati, Amrita ProDeGe: a computational protocol for fully automated decontamination of genomes |
title | ProDeGe: a computational protocol for fully automated decontamination of genomes |
title_full | ProDeGe: a computational protocol for fully automated decontamination of genomes |
title_fullStr | ProDeGe: a computational protocol for fully automated decontamination of genomes |
title_full_unstemmed | ProDeGe: a computational protocol for fully automated decontamination of genomes |
title_short | ProDeGe: a computational protocol for fully automated decontamination of genomes |
title_sort | prodege: a computational protocol for fully automated decontamination of genomes |
topic | Short Communication |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681846/ https://www.ncbi.nlm.nih.gov/pubmed/26057843 http://dx.doi.org/10.1038/ismej.2015.100 |
work_keys_str_mv | AT tennessenkristin prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT andersenevan prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT clingenpeelscott prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT rinkechristian prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT lundbergdereks prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT hanjames prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT dangljeffl prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT ivanovanatalia prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT woyketanja prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT kyrpidesnikos prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes AT patiamrita prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes |