Cargando…

A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data

BACKGROUND: The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has eme...

Descripción completa

Detalles Bibliográficos
Autores principales: Dubois, Benjamin, Debode, Frédéric, Hautier, Louis, Hulin, Julie, Martin, Gilles San, Delvaux, Alain, Janssen, Eric, Mingeot, Dominique
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264521/
https://www.ncbi.nlm.nih.gov/pubmed/35804326
http://dx.doi.org/10.1186/s12863-022-01067-5
_version_ 1784742981646942208
author Dubois, Benjamin
Debode, Frédéric
Hautier, Louis
Hulin, Julie
Martin, Gilles San
Delvaux, Alain
Janssen, Eric
Mingeot, Dominique
author_facet Dubois, Benjamin
Debode, Frédéric
Hautier, Louis
Hulin, Julie
Martin, Gilles San
Delvaux, Alain
Janssen, Eric
Mingeot, Dominique
author_sort Dubois, Benjamin
collection PubMed
description BACKGROUND: The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence. RESULTS: We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids ‘black box’ systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets. CONCLUSION: This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-use plant ITS2 and rbcL databases for which the prediction accuracy has been assessed and compared to that of other published databases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-022-01067-5.
format Online
Article
Text
id pubmed-9264521
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-92645212022-07-09 A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data Dubois, Benjamin Debode, Frédéric Hautier, Louis Hulin, Julie Martin, Gilles San Delvaux, Alain Janssen, Eric Mingeot, Dominique BMC Genom Data Research BACKGROUND: The DNA metabarcoding approach has become one of the most used techniques to study the taxa composition of various sample types. To deal with the high amount of data generated by the high-throughput sequencing process, a bioinformatics workflow is required and the QIIME2 platform has emerged as one of the most reliable and commonly used. However, only some pre-formatted reference databases dedicated to a few barcode sequences are available to assign taxonomy. If users want to develop a new custom reference database, several bottlenecks still need to be addressed and a detailed procedure explaining how to develop and format such a database is currently missing. In consequence, this work is aimed at presenting a detailed workflow explaining from start to finish how to develop such a curated reference database for any barcode sequence. RESULTS: We developed DB4Q2, a detailed workflow that allowed development of plant reference databases dedicated to ITS2 and rbcL, two commonly used barcode sequences in plant metabarcoding studies. This workflow addresses several of the main bottlenecks connected with the development of a curated reference database. The detailed and commented structure of DB4Q2 offers the possibility of developing reference databases even without extensive bioinformatics skills, and avoids ‘black box’ systems that are sometimes encountered. Some filtering steps have been included to discard presumably fungal and misidentified sequences. The flexible character of DB4Q2 allows several key sequence processing steps to be included or not, and downloading issues can be avoided. Benchmarking the databases developed using DB4Q2 revealed that they performed well compared to previously published reference datasets. CONCLUSION: This study presents DB4Q2, a detailed procedure to develop custom reference databases in order to carry out taxonomic analyses with QIIME2, but also with other bioinformatics platforms if desired. This work also provides ready-to-use plant ITS2 and rbcL databases for which the prediction accuracy has been assessed and compared to that of other published databases. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12863-022-01067-5. BioMed Central 2022-07-08 /pmc/articles/PMC9264521/ /pubmed/35804326 http://dx.doi.org/10.1186/s12863-022-01067-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Dubois, Benjamin
Debode, Frédéric
Hautier, Louis
Hulin, Julie
Martin, Gilles San
Delvaux, Alain
Janssen, Eric
Mingeot, Dominique
A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title_full A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title_fullStr A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title_full_unstemmed A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title_short A detailed workflow to develop QIIME2-formatted reference databases for taxonomic analysis of DNA metabarcoding data
title_sort detailed workflow to develop qiime2-formatted reference databases for taxonomic analysis of dna metabarcoding data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264521/
https://www.ncbi.nlm.nih.gov/pubmed/35804326
http://dx.doi.org/10.1186/s12863-022-01067-5
work_keys_str_mv AT duboisbenjamin adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT debodefrederic adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT hautierlouis adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT hulinjulie adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT martingillessan adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT delvauxalain adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT jansseneric adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT mingeotdominique adetailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT duboisbenjamin detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT debodefrederic detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT hautierlouis detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT hulinjulie detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT martingillessan detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT delvauxalain detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT jansseneric detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata
AT mingeotdominique detailedworkflowtodevelopqiime2formattedreferencedatabasesfortaxonomicanalysisofdnametabarcodingdata