Cargando…

PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding

DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS seq...

Descripción completa

Detalles Bibliográficos
Autores principales: Banchi, Elisa, Ametrano, Claudio G, Greco, Samuele, Stanković, David, Muggia, Lucia, Pallavicini, Alberto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997939/
https://www.ncbi.nlm.nih.gov/pubmed/32016319
http://dx.doi.org/10.1093/database/baz155
_version_ 1783493783430627328
author Banchi, Elisa
Ametrano, Claudio G
Greco, Samuele
Stanković, David
Muggia, Lucia
Pallavicini, Alberto
author_facet Banchi, Elisa
Ametrano, Claudio G
Greco, Samuele
Stanković, David
Muggia, Lucia
Pallavicini, Alberto
author_sort Banchi, Elisa
collection PubMed
description DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS.
format Online
Article
Text
id pubmed-6997939
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-69979392020-02-10 PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding Banchi, Elisa Ametrano, Claudio G Greco, Samuele Stanković, David Muggia, Lucia Pallavicini, Alberto Database (Oxford) Original Article DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS. Oxford University Press 2020-02-04 /pmc/articles/PMC6997939/ /pubmed/32016319 http://dx.doi.org/10.1093/database/baz155 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Banchi, Elisa
Ametrano, Claudio G
Greco, Samuele
Stanković, David
Muggia, Lucia
Pallavicini, Alberto
PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title_full PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title_fullStr PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title_full_unstemmed PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title_short PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding
title_sort planits: a curated sequence reference dataset for plant its dna metabarcoding
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6997939/
https://www.ncbi.nlm.nih.gov/pubmed/32016319
http://dx.doi.org/10.1093/database/baz155
work_keys_str_mv AT banchielisa planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding
AT ametranoclaudiog planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding
AT grecosamuele planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding
AT stankovicdavid planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding
AT muggialucia planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding
AT pallavicinialberto planitsacuratedsequencereferencedatasetforplantitsdnametabarcoding