Cargando…
PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS seq...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997939/ https://www.ncbi.nlm.nih.gov/pubmed/32016319 http://dx.doi.org/10.1093/database/baz155 |
_version_ | 1783493783430627328 |
---|---|
author | Banchi, Elisa Ametrano, Claudio G Greco, Samuele Stanković, David Muggia, Lucia Pallavicini, Alberto |
author_facet | Banchi, Elisa Ametrano, Claudio G Greco, Samuele Stanković, David Muggia, Lucia Pallavicini, Alberto |
author_sort | Banchi, Elisa |
collection | PubMed |
description | DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS. |
format | Online Article Text |
id | pubmed-6997939 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69979392020-02-10 PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding Banchi, Elisa Ametrano, Claudio G Greco, Samuele Stanković, David Muggia, Lucia Pallavicini, Alberto Database (Oxford) Original Article DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS. Oxford University Press 2020-02-04 /pmc/articles/PMC6997939/ /pubmed/32016319 http://dx.doi.org/10.1093/database/baz155 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Banchi, Elisa Ametrano, Claudio G Greco, Samuele Stanković, David Muggia, Lucia Pallavicini, Alberto PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title | PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title_full | PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title_fullStr | PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title_full_unstemmed | PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title_short | PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding |
title_sort | planits: a curated sequence reference dataset for plant its dna metabarcoding |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997939/ https://www.ncbi.nlm.nih.gov/pubmed/32016319 http://dx.doi.org/10.1093/database/baz155 |
work_keys_str_mv | AT banchielisa planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding AT ametranoclaudiog planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding AT grecosamuele planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding AT stankovicdavid planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding AT muggialucia planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding AT pallavicinialberto planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding |