Cargando…
DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses
BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and c...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855357/ https://www.ncbi.nlm.nih.gov/pubmed/27142414 http://dx.doi.org/10.1186/s13104-016-2064-9 |
_version_ | 1782430354053791744 |
---|---|
author | Zepeda-Mendoza, Marie Lisandra Bohmann, Kristine Carmona Baez, Aldo Gilbert, M. Thomas P. |
author_facet | Zepeda-Mendoza, Marie Lisandra Bohmann, Kristine Carmona Baez, Aldo Gilbert, M. Thomas P. |
author_sort | Zepeda-Mendoza, Marie Lisandra |
collection | PubMed |
description | BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5′-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. RESULTS: We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. CONCLUSIONS: DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2064-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4855357 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-48553572016-05-05 DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses Zepeda-Mendoza, Marie Lisandra Bohmann, Kristine Carmona Baez, Aldo Gilbert, M. Thomas P. BMC Res Notes Technical Note BACKGROUND: DNA metabarcoding is an approach for identifying multiple taxa in an environmental sample using specific genetic loci and taxa-specific primers. When combined with high-throughput sequencing it enables the taxonomic characterization of large numbers of samples in a relatively time- and cost-efficient manner. One recent laboratory development is the addition of 5′-nucleotide tags to both primers producing double-tagged amplicons and the use of multiple PCR replicates to filter erroneous sequences. However, there is currently no available toolkit for the straightforward analysis of datasets produced in this way. RESULTS: We present DAMe, a toolkit for the processing of datasets generated by double-tagged amplicons from multiple PCR replicates derived from an unlimited number of samples. Specifically, DAMe can be used to (i) sort amplicons by tag combination, (ii) evaluate PCR replicates dissimilarity, and (iii) filter sequences derived from sequencing/PCR errors, chimeras, and contamination. This is attained by calculating the following parameters: (i) sequence content similarity between the PCR replicates from each sample, (ii) reproducibility of each unique sequence across the PCR replicates, and (iii) copy number of the unique sequences in each PCR replicate. We showcase the insights that can be obtained using DAMe prior to taxonomic assignment, by applying it to two real datasets that vary in their complexity regarding number of samples, sequencing libraries, PCR replicates, and used tag combinations. Finally, we use a third mock dataset to demonstrate the impact and importance of filtering the sequences with DAMe. CONCLUSIONS: DAMe allows the user-friendly manipulation of amplicons derived from multiple samples with PCR replicates built in a single or multiple sequencing libraries. It allows the user to: (i) collapse amplicons into unique sequences and sort them by tag combination while retaining the sample identifier and copy number information, (ii) identify sequences carrying unused tag combinations, (iii) evaluate the comparability of PCR replicates of the same sample, and (iv) filter tagged amplicons from a number of PCR replicates using parameters of minimum length, copy number, and reproducibility across the PCR replicates. This enables an efficient analysis of complex datasets, and ultimately increases the ease of handling datasets from large-scale studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2064-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-05-03 /pmc/articles/PMC4855357/ /pubmed/27142414 http://dx.doi.org/10.1186/s13104-016-2064-9 Text en © Zepeda-Mendoza et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Zepeda-Mendoza, Marie Lisandra Bohmann, Kristine Carmona Baez, Aldo Gilbert, M. Thomas P. DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title | DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title_full | DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title_fullStr | DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title_full_unstemmed | DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title_short | DAMe: a toolkit for the initial processing of datasets with PCR replicates of double-tagged amplicons for DNA metabarcoding analyses |
title_sort | dame: a toolkit for the initial processing of datasets with pcr replicates of double-tagged amplicons for dna metabarcoding analyses |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4855357/ https://www.ncbi.nlm.nih.gov/pubmed/27142414 http://dx.doi.org/10.1186/s13104-016-2064-9 |
work_keys_str_mv | AT zepedamendozamarielisandra dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses AT bohmannkristine dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses AT carmonabaezaldo dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses AT gilbertmthomasp dameatoolkitfortheinitialprocessingofdatasetswithpcrreplicatesofdoubletaggedampliconsfordnametabarcodinganalyses |