Cargando…

DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses

BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Zepeda-Mendoza, Marie Lisandra, Bohmann, Kristine, Carmona Baez, Aldo, Gilbert, M. Thomas P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855357/
https://www.ncbi.nlm.nih.gov/pubmed/27142414
http://dx.doi.org/10.1186/s13104-016-2064-9
_version_ 1782430354053791744
author Zepeda-Mendoza, Marie Lisandra
Bohmann, Kristine
Carmona Baez, Aldo
Gilbert, M. Thomas P.
author_facet Zepeda-Mendoza, Marie Lisandra
Bohmann, Kristine
Carmona Baez, Aldo
Gilbert, M. Thomas P.
author_sort Zepeda-Mendoza, Marie Lisandra
collection PubMed
description BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5′-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. RESULTS: We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. CONCLUSIONS: DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2064-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4855357
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48553572016-05-05 DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses Zepeda-Mendoza, Marie Lisandra Bohmann, Kristine Carmona Baez, Aldo Gilbert, M. Thomas P. BMC Res Notes Technical Note BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5′-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. RESULTS: We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. CONCLUSIONS: DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2064-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-03 /pmc/articles/PMC4855357/ /pubmed/27142414 http://dx.doi.org/10.1186/s13104-016-2064-9 Text en © Zepeda-Mendoza et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Zepeda-Mendoza, Marie Lisandra
Bohmann, Kristine
Carmona Baez, Aldo
Gilbert, M. Thomas P.
DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title_full DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title_fullStr DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title_full_unstemmed DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title_short DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
title_sort dame: a toolkit for the initial processing of datasets with pcr replicates of double-tagged amplicons for dna metabarcoding analyses
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855357/
https://www.ncbi.nlm.nih.gov/pubmed/27142414
http://dx.doi.org/10.1186/s13104-016-2064-9
work_keys_str_mv AT zepedamendozamarielisandra dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses
AT bohmannkristine dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses
AT carmonabaezaldo dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses
AT gilbertmthomasp dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses