Cargando…
Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data
Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to suppo...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4213012/ https://www.ncbi.nlm.nih.gov/pubmed/25354084 http://dx.doi.org/10.1371/journal.pone.0110808 |
_version_ | 1782341784398987264 |
---|---|
author | Lusk, Richard W. |
author_facet | Lusk, Richard W. |
author_sort | Lusk, Richard W. |
collection | PubMed |
description | Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from ‘blank’ samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood. Contamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized. |
format | Online Article Text |
id | pubmed-4213012 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-42130122014-11-05 Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data Lusk, Richard W. PLoS One Research Article Trace quantities of contaminating DNA are widespread in the laboratory environment, but their presence has received little attention in the context of high throughput sequencing. This issue is highlighted by recent works that have rested controversial claims upon sequencing data that appear to support the presence of unexpected exogenous species. I used reads that preferentially aligned to alternate genomes to infer the distribution of potential contaminant species in a set of independent sequencing experiments. I confirmed that dilute samples are more exposed to contaminating DNA, and, focusing on four single-cell sequencing experiments, found that these contaminants appear to originate from a wide diversity of clades. Although negative control libraries prepared from ‘blank’ samples recovered the highest-frequency contaminants, low-frequency contaminants, which appeared to make heterogeneous contributions to samples prepared in parallel within a single experiment, were not well controlled for. I used these results to show that, despite heavy replication and plausible controls, contamination can explain all of the observations used to support a recent claim that complete genes pass from food to human blood. Contamination must be considered a potential source of signals of exogenous species in sequencing data, even if these signals are replicated in independent experiments, vary across conditions, or indicate a species which seems a priori unlikely to contaminate. Negative control libraries processed in parallel are essential to control for contaminant DNAs, but their limited ability to recover low-frequency contaminants must be recognized. Public Library of Science 2014-10-29 /pmc/articles/PMC4213012/ /pubmed/25354084 http://dx.doi.org/10.1371/journal.pone.0110808 Text en © 2014 Richard W http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Lusk, Richard W. Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title | Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title_full | Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title_fullStr | Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title_full_unstemmed | Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title_short | Diverse and Widespread Contamination Evident in the Unmapped Depths of High Throughput Sequencing Data |
title_sort | diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4213012/ https://www.ncbi.nlm.nih.gov/pubmed/25354084 http://dx.doi.org/10.1371/journal.pone.0110808 |
work_keys_str_mv | AT luskrichardw diverseandwidespreadcontaminationevidentintheunmappeddepthsofhighthroughputsequencingdata |