Cargando…

Abundant Human DNA Contamination Identified in Non-Primate Genome Databases

During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences...

Descripción completa

Detalles Bibliográficos
Autores principales: Longo, Mark S., O'Neill, Michael J., O'Neill, Rachel J.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040168/
https://www.ncbi.nlm.nih.gov/pubmed/21358816
http://dx.doi.org/10.1371/journal.pone.0016410
_version_ 1782198282092544000
author Longo, Mark S.
O'Neill, Michael J.
O'Neill, Rachel J.
author_facet Longo, Mark S.
O'Neill, Michael J.
O'Neill, Rachel J.
author_sort Longo, Mark S.
collection PubMed
description During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring.
format Text
id pubmed-3040168
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30401682011-02-25 Abundant Human DNA Contamination Identified in Non-Primate Genome Databases Longo, Mark S. O'Neill, Michael J. O'Neill, Rachel J. PLoS One Research Article During routine screens of the NCBI databases using human repetitive elements we discovered an unlikely level of nucleotide identity across a broad range of phyla. To ascertain whether databases containing DNA sequences, genome assemblies and trace archive reads were contaminated with human sequences, we performed an in depth search for sequences of human origin in non-human species. Using a primate specific SINE, AluY, we screened 2,749 non-primate public databases from NCBI, Ensembl, JGI, and UCSC and have found 492 to be contaminated with human sequence. These represent species ranging from bacteria (B. cereus) to plants (Z. mays) to fish (D. rerio) with examples found from most phyla. The identification of such extensive contamination of human sequence across databases and sequence types warrants caution among the sequencing community in future sequencing efforts, such as human re-sequencing. We discuss issues this may raise as well as present data that gives insight as to how this may be occurring. Public Library of Science 2011-02-16 /pmc/articles/PMC3040168/ /pubmed/21358816 http://dx.doi.org/10.1371/journal.pone.0016410 Text en Longo et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Longo, Mark S.
O'Neill, Michael J.
O'Neill, Rachel J.
Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title_full Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title_fullStr Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title_full_unstemmed Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title_short Abundant Human DNA Contamination Identified in Non-Primate Genome Databases
title_sort abundant human dna contamination identified in non-primate genome databases
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040168/
https://www.ncbi.nlm.nih.gov/pubmed/21358816
http://dx.doi.org/10.1371/journal.pone.0016410
work_keys_str_mv AT longomarks abundanthumandnacontaminationidentifiedinnonprimategenomedatabases
AT oneillmichaelj abundanthumandnacontaminationidentifiedinnonprimategenomedatabases
AT oneillrachelj abundanthumandnacontaminationidentifiedinnonprimategenomedatabases