Cargando…

Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data

High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms...

Descripción completa

Detalles Bibliográficos
Autores principales: Palmer, Jonathan M., Jusino, Michelle A., Banik, Mark T., Lindner, Daniel L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978393/
https://www.ncbi.nlm.nih.gov/pubmed/29868296
http://dx.doi.org/10.7717/peerj.4925
_version_ 1783327523851993088
author Palmer, Jonathan M.
Jusino, Michelle A.
Banik, Mark T.
Lindner, Daniel L.
author_facet Palmer, Jonathan M.
Jusino, Michelle A.
Banik, Mark T.
Lindner, Daniel L.
author_sort Palmer, Jonathan M.
collection PubMed
description High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality.
format Online
Article
Text
id pubmed-5978393
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-59783932018-06-04 Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data Palmer, Jonathan M. Jusino, Michelle A. Banik, Mark T. Lindner, Daniel L. PeerJ Biodiversity High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer (ITS) amplicons, we created two ITS spike-in control mock communities composed of cloned DNA in plasmids: a biological mock community, consisting of ITS sequences from fungal taxa, and a synthetic mock community (SynMock), consisting of non-biological ITS-like sequences. Using these spike-in controls we show that: (1) a non-biological synthetic control (e.g., SynMock) is the best solution for parameterizing bioinformatics pipelines, (2) pre-clustering steps for variable length amplicons are critically important, (3) a major source of bias is attributed to the initial polymerase chain reaction (PCR) and thus HTAS read abundances are typically not representative of starting values. We developed AMPtk, a versatile software solution equipped to deal with variable length amplicons and quality filter HTAS data based on spike-in controls. While we describe herein a non-biological SynMock community for ITS sequences, the concept and AMPtk software can be widely applied to any HTAS dataset to improve data quality. PeerJ Inc. 2018-05-28 /pmc/articles/PMC5978393/ /pubmed/29868296 http://dx.doi.org/10.7717/peerj.4925 Text en http://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication (http://creativecommons.org/publicdomain/zero/1.0/) . This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Biodiversity
Palmer, Jonathan M.
Jusino, Michelle A.
Banik, Mark T.
Lindner, Daniel L.
Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title_full Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title_fullStr Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title_full_unstemmed Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title_short Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
title_sort non-biological synthetic spike-in controls and the amptk software pipeline improve mycobiome data
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5978393/
https://www.ncbi.nlm.nih.gov/pubmed/29868296
http://dx.doi.org/10.7717/peerj.4925
work_keys_str_mv AT palmerjonathanm nonbiologicalsyntheticspikeincontrolsandtheamptksoftwarepipelineimprovemycobiomedata
AT jusinomichellea nonbiologicalsyntheticspikeincontrolsandtheamptksoftwarepipelineimprovemycobiomedata
AT banikmarkt nonbiologicalsyntheticspikeincontrolsandtheamptksoftwarepipelineimprovemycobiomedata
AT lindnerdaniell nonbiologicalsyntheticspikeincontrolsandtheamptksoftwarepipelineimprovemycobiomedata