Cargando…

ProDeGe: a computational protocol for fully automated decontamination of genomes

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequenc...

Descripción completa

Detalles Bibliográficos
Autores principales: Tennessen, Kristin, Andersen, Evan, Clingenpeel, Scott, Rinke, Christian, Lundberg, Derek S, Han, James, Dangl, Jeff L, Ivanova, Natalia, Woyke, Tanja, Kyrpides, Nikos, Pati, Amrita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681846/
https://www.ncbi.nlm.nih.gov/pubmed/26057843
http://dx.doi.org/10.1038/ismej.2015.100
_version_ 1782405783880728576
author Tennessen, Kristin
Andersen, Evan
Clingenpeel, Scott
Rinke, Christian
Lundberg, Derek S
Han, James
Dangl, Jeff L
Ivanova, Natalia
Woyke, Tanja
Kyrpides, Nikos
Pati, Amrita
author_facet Tennessen, Kristin
Andersen, Evan
Clingenpeel, Scott
Rinke, Christian
Lundberg, Derek S
Han, James
Dangl, Jeff L
Ivanova, Natalia
Woyke, Tanja
Kyrpides, Nikos
Pati, Amrita
author_sort Tennessen, Kristin
collection PubMed
description Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.
format Online
Article
Text
id pubmed-4681846
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-46818462016-01-01 ProDeGe: a computational protocol for fully automated decontamination of genomes Tennessen, Kristin Andersen, Evan Clingenpeel, Scott Rinke, Christian Lundberg, Derek S Han, James Dangl, Jeff L Ivanova, Natalia Woyke, Tanja Kyrpides, Nikos Pati, Amrita ISME J Short Communication Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies. On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). The procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence. Nature Publishing Group 2016-01 2015-06-09 /pmc/articles/PMC4681846/ /pubmed/26057843 http://dx.doi.org/10.1038/ismej.2015.100 Text en Copyright © 2016 International Society for Microbial Ecology http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Short Communication
Tennessen, Kristin
Andersen, Evan
Clingenpeel, Scott
Rinke, Christian
Lundberg, Derek S
Han, James
Dangl, Jeff L
Ivanova, Natalia
Woyke, Tanja
Kyrpides, Nikos
Pati, Amrita
ProDeGe: a computational protocol for fully automated decontamination of genomes
title ProDeGe: a computational protocol for fully automated decontamination of genomes
title_full ProDeGe: a computational protocol for fully automated decontamination of genomes
title_fullStr ProDeGe: a computational protocol for fully automated decontamination of genomes
title_full_unstemmed ProDeGe: a computational protocol for fully automated decontamination of genomes
title_short ProDeGe: a computational protocol for fully automated decontamination of genomes
title_sort prodege: a computational protocol for fully automated decontamination of genomes
topic Short Communication
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4681846/
https://www.ncbi.nlm.nih.gov/pubmed/26057843
http://dx.doi.org/10.1038/ismej.2015.100
work_keys_str_mv AT tennessenkristin prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT andersenevan prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT clingenpeelscott prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT rinkechristian prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT lundbergdereks prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT hanjames prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT dangljeffl prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT ivanovanatalia prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT woyketanja prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT kyrpidesnikos prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes
AT patiamrita prodegeacomputationalprotocolforfullyautomateddecontaminationofgenomes