Cargando…

Removing contaminants from databases of draft genomes

Metagenomic sequencing of patient samples is a very promising method for the diagnosis of human infections. Sequencing has the ability to capture all the DNA or RNA from pathogenic organisms in a human sample. However, complete and accurate characterization of the sequence, including identification...

Descripción completa

Detalles Bibliográficos
Autores principales: Lu, Jennifer, Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034898/
https://www.ncbi.nlm.nih.gov/pubmed/29939994
http://dx.doi.org/10.1371/journal.pcbi.1006277
_version_ 1783337959513128960
author Lu, Jennifer
Salzberg, Steven L.
author_facet Lu, Jennifer
Salzberg, Steven L.
author_sort Lu, Jennifer
collection PubMed
description Metagenomic sequencing of patient samples is a very promising method for the diagnosis of human infections. Sequencing has the ability to capture all the DNA or RNA from pathogenic organisms in a human sample. However, complete and accurate characterization of the sequence, including identification of any pathogens, depends on the availability and quality of genomes for comparison. Thousands of genomes are now available, and as these numbers grow, the power of metagenomic sequencing for diagnosis should increase. However, recent studies have exposed the presence of contamination in published genomes, which when used for diagnosis increases the risk of falsely identifying the wrong pathogen. To address this problem, we have developed a bioinformatics system for eliminating contamination as well as low-complexity genomic sequences in the draft genomes of eukaryotic pathogens. We applied this software to identify and remove human, bacterial, archaeal, and viral sequences present in a comprehensive database of all sequenced eukaryotic pathogen genomes. We also removed low-complexity genomic sequences, another source of false positives. Using this pipeline, we have produced a database of “clean” eukaryotic pathogen genomes for use with bioinformatics classification and analysis tools. We demonstrate that when attempting to find eukaryotic pathogens in metagenomic samples, the new database provides better sensitivity than one using the original genomes while offering a dramatic reduction in false positives.
format Online
Article
Text
id pubmed-6034898
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60348982018-07-19 Removing contaminants from databases of draft genomes Lu, Jennifer Salzberg, Steven L. PLoS Comput Biol Research Article Metagenomic sequencing of patient samples is a very promising method for the diagnosis of human infections. Sequencing has the ability to capture all the DNA or RNA from pathogenic organisms in a human sample. However, complete and accurate characterization of the sequence, including identification of any pathogens, depends on the availability and quality of genomes for comparison. Thousands of genomes are now available, and as these numbers grow, the power of metagenomic sequencing for diagnosis should increase. However, recent studies have exposed the presence of contamination in published genomes, which when used for diagnosis increases the risk of falsely identifying the wrong pathogen. To address this problem, we have developed a bioinformatics system for eliminating contamination as well as low-complexity genomic sequences in the draft genomes of eukaryotic pathogens. We applied this software to identify and remove human, bacterial, archaeal, and viral sequences present in a comprehensive database of all sequenced eukaryotic pathogen genomes. We also removed low-complexity genomic sequences, another source of false positives. Using this pipeline, we have produced a database of “clean” eukaryotic pathogen genomes for use with bioinformatics classification and analysis tools. We demonstrate that when attempting to find eukaryotic pathogens in metagenomic samples, the new database provides better sensitivity than one using the original genomes while offering a dramatic reduction in false positives. Public Library of Science 2018-06-25 /pmc/articles/PMC6034898/ /pubmed/29939994 http://dx.doi.org/10.1371/journal.pcbi.1006277 Text en © 2018 Lu, Salzberg http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lu, Jennifer
Salzberg, Steven L.
Removing contaminants from databases of draft genomes
title Removing contaminants from databases of draft genomes
title_full Removing contaminants from databases of draft genomes
title_fullStr Removing contaminants from databases of draft genomes
title_full_unstemmed Removing contaminants from databases of draft genomes
title_short Removing contaminants from databases of draft genomes
title_sort removing contaminants from databases of draft genomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034898/
https://www.ncbi.nlm.nih.gov/pubmed/29939994
http://dx.doi.org/10.1371/journal.pcbi.1006277
work_keys_str_mv AT lujennifer removingcontaminantsfromdatabasesofdraftgenomes
AT salzbergstevenl removingcontaminantsfromdatabasesofdraftgenomes