Cargando…

Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data

BACKGROUND: The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants—DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can redu...

Descripción completa

Detalles Bibliográficos
Autores principales: Davis, Nicole M., Proctor, Diana M., Holmes, Susan P., Relman, David A., Callahan, Benjamin J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6298009/
https://www.ncbi.nlm.nih.gov/pubmed/30558668
http://dx.doi.org/10.1186/s40168-018-0605-2
_version_ 1783381246762549248
author Davis, Nicole M.
Proctor, Diana M.
Holmes, Susan P.
Relman, David A.
Callahan, Benjamin J.
author_facet Davis, Nicole M.
Proctor, Diana M.
Holmes, Susan P.
Relman, David A.
Callahan, Benjamin J.
author_sort Davis, Nicole M.
collection PubMed
description BACKGROUND: The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants—DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls. RESULTS: Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants. CONCLUSIONS: Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0605-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6298009
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62980092018-12-19 Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data Davis, Nicole M. Proctor, Diana M. Holmes, Susan P. Relman, David A. Callahan, Benjamin J. Microbiome Methodology BACKGROUND: The accuracy of microbial community surveys based on marker-gene and metagenomic sequencing (MGS) suffers from the presence of contaminants—DNA sequences not truly present in the sample. Contaminants come from various sources, including reagents. Appropriate laboratory practices can reduce contamination, but do not eliminate it. Here we introduce decontam (https://github.com/benjjneb/decontam), an open-source R package that implements a statistical classification procedure that identifies contaminants in MGS data based on two widely reproduced patterns: contaminants appear at higher frequencies in low-concentration samples and are often found in negative controls. RESULTS: Decontam classified amplicon sequence variants (ASVs) in a human oral dataset consistently with prior microscopic observations of the microbial taxa inhabiting that environment and previous reports of contaminant taxa. In metagenomics and marker-gene measurements of a dilution series, decontam substantially reduced technical variation arising from different sequencing protocols. The application of decontam to two recently published datasets corroborated and extended their conclusions that little evidence existed for an indigenous placenta microbiome and that some low-frequency taxa seemingly associated with preterm birth were contaminants. CONCLUSIONS: Decontam improves the quality of metagenomic and marker-gene sequencing by identifying and removing contaminant DNA sequences. Decontam integrates easily with existing MGS workflows and allows researchers to generate more accurate profiles of microbial communities at little to no additional cost. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s40168-018-0605-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-17 /pmc/articles/PMC6298009/ /pubmed/30558668 http://dx.doi.org/10.1186/s40168-018-0605-2 Text en © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Davis, Nicole M.
Proctor, Diana M.
Holmes, Susan P.
Relman, David A.
Callahan, Benjamin J.
Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title_full Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title_fullStr Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title_full_unstemmed Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title_short Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
title_sort simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6298009/
https://www.ncbi.nlm.nih.gov/pubmed/30558668
http://dx.doi.org/10.1186/s40168-018-0605-2
work_keys_str_mv AT davisnicolem simplestatisticalidentificationandremovalofcontaminantsequencesinmarkergeneandmetagenomicsdata
AT proctordianam simplestatisticalidentificationandremovalofcontaminantsequencesinmarkergeneandmetagenomicsdata
AT holmessusanp simplestatisticalidentificationandremovalofcontaminantsequencesinmarkergeneandmetagenomicsdata
AT relmandavida simplestatisticalidentificationandremovalofcontaminantsequencesinmarkergeneandmetagenomicsdata
AT callahanbenjaminj simplestatisticalidentificationandremovalofcontaminantsequencesinmarkergeneandmetagenomicsdata