Cargando…

Human Contamination in Public Genome Assemblies

Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available geno...

Descripción completa

Detalles Bibliográficos
Autores principales: Kryukov, Kirill, Imanishi, Tadashi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5017631/
https://www.ncbi.nlm.nih.gov/pubmed/27611326
http://dx.doi.org/10.1371/journal.pone.0162424
_version_ 1782452784920002560
author Kryukov, Kirill
Imanishi, Tadashi
author_facet Kryukov, Kirill
Imanishi, Tadashi
author_sort Kryukov, Kirill
collection PubMed
description Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.
format Online
Article
Text
id pubmed-5017631
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50176312016-09-27 Human Contamination in Public Genome Assemblies Kryukov, Kirill Imanishi, Tadashi PLoS One Research Article Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases. Public Library of Science 2016-09-09 /pmc/articles/PMC5017631/ /pubmed/27611326 http://dx.doi.org/10.1371/journal.pone.0162424 Text en © 2016 Kryukov, Imanishi http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kryukov, Kirill
Imanishi, Tadashi
Human Contamination in Public Genome Assemblies
title Human Contamination in Public Genome Assemblies
title_full Human Contamination in Public Genome Assemblies
title_fullStr Human Contamination in Public Genome Assemblies
title_full_unstemmed Human Contamination in Public Genome Assemblies
title_short Human Contamination in Public Genome Assemblies
title_sort human contamination in public genome assemblies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5017631/
https://www.ncbi.nlm.nih.gov/pubmed/27611326
http://dx.doi.org/10.1371/journal.pone.0162424
work_keys_str_mv AT kryukovkirill humancontaminationinpublicgenomeassemblies
AT imanishitadashi humancontaminationinpublicgenomeassemblies