Cargando…

Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies

BACKGROUND: In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely ac...

Descripción completa

Detalles Bibliográficos
Autores principales: Mugnai, Francesco, Costantini, Federica, Chenuil, Anne, Leduc, Michèle, Gutiérrez Ortega, José Miguel, Meglécz, Emese
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835706/
https://www.ncbi.nlm.nih.gov/pubmed/36643652
http://dx.doi.org/10.7717/peerj.14616
_version_ 1784868722907807744
author Mugnai, Francesco
Costantini, Federica
Chenuil, Anne
Leduc, Michèle
Gutiérrez Ortega, José Miguel
Meglécz, Emese
author_facet Mugnai, Francesco
Costantini, Federica
Chenuil, Anne
Leduc, Michèle
Gutiérrez Ortega, José Miguel
Meglécz, Emese
author_sort Mugnai, Francesco
collection PubMed
description BACKGROUND: In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. METHODS: We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. RESULTS: The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6–1% of the Amplicon Sequence Variants (ASVs).
format Online
Article
Text
id pubmed-9835706
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-98357062023-01-13 Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies Mugnai, Francesco Costantini, Federica Chenuil, Anne Leduc, Michèle Gutiérrez Ortega, José Miguel Meglécz, Emese PeerJ Biodiversity BACKGROUND: In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. METHODS: We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. RESULTS: The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6–1% of the Amplicon Sequence Variants (ASVs). PeerJ Inc. 2023-01-09 /pmc/articles/PMC9835706/ /pubmed/36643652 http://dx.doi.org/10.7717/peerj.14616 Text en ©2022 Mugnai et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Mugnai, Francesco
Costantini, Federica
Chenuil, Anne
Leduc, Michèle
Gutiérrez Ortega, José Miguel
Meglécz, Emese
Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_full Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_fullStr Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_full_unstemmed Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_short Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
title_sort be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9835706/
https://www.ncbi.nlm.nih.gov/pubmed/36643652
http://dx.doi.org/10.7717/peerj.14616
work_keys_str_mv AT mugnaifrancesco bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT costantinifederica bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT chenuilanne bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT leducmichele bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT gutierrezortegajosemiguel bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies
AT megleczemese bepositivecustomizedreferencedatabasesandnewlocalbarcodesbalancefalsetaxonomicassignmentsinmetabarcodingstudies