Cargando…

DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies

Taxa are frequently labeled incertae sedis when their placement is debated at ranks above the species level, such as their subgeneric, generic, or subtribal placement. This is a pervasive problem in groups with complex systematics due to difficulties in identifying suitable synapomorphies. In this s...

Descripción completa

Detalles Bibliográficos
Autores principales: Talavera, Gerard, Lukhtanov, Vladimir, Pierce, Naomi E, Vila, Roger
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830075/
https://www.ncbi.nlm.nih.gov/pubmed/34022059
http://dx.doi.org/10.1093/sysbio/syab038
_version_ 1784648203859132416
author Talavera, Gerard
Lukhtanov, Vladimir
Pierce, Naomi E
Vila, Roger
author_facet Talavera, Gerard
Lukhtanov, Vladimir
Pierce, Naomi E
Vila, Roger
author_sort Talavera, Gerard
collection PubMed
description Taxa are frequently labeled incertae sedis when their placement is debated at ranks above the species level, such as their subgeneric, generic, or subtribal placement. This is a pervasive problem in groups with complex systematics due to difficulties in identifying suitable synapomorphies. In this study, we propose combining DNA barcodes with a multilocus backbone phylogeny in order to assign taxa to genus or other higher-level categories. This sampling strategy generates molecular matrices containing large amounts of missing data that are not distributed randomly: barcodes are sampled for all representatives, and additional markers are sampled only for a small percentage. We investigate the effects of the degree and randomness of missing data on phylogenetic accuracy using simulations for up to 100 markers in 1000-tips trees, as well as a real case: the subtribe Polyommatina (Lepidoptera: Lycaenidae), a large group including numerous species with unresolved taxonomy. Our simulation tests show that when a strategic and representative selection of species for higher-level categories has been made for multigene sequencing (approximately one per simulated genus), the addition of this multigene backbone DNA data for as few as 5–10% of the specimens in the total data set can produce high-quality phylogenies, comparable to those resulting from 100% multigene sampling. In contrast, trees based exclusively on barcodes performed poorly. This approach was applied to a 1365-specimen data set of Polyommatina (including ca. 80% of described species), with nearly 8% of representative species included in the multigene backbone and the remaining 92% included only by mitochondrial COI barcodes, a phylogeny was generated that highlighted potential misplacements, unrecognized major clades, and placement for incertae sedis taxa. We use this information to make systematic rearrangements within Polyommatina, and to describe two new genera. Finally, we propose a systematic workflow to assess higher-level taxonomy in hyperdiverse groups. This research identifies an additional, enhanced value of DNA barcodes for improvements in higher-level systematics using large data sets. [Birabiro; DNA barcoding; incertae sedis; Kipepeo; Lycaenidae; missing data; phylogenomic; phylogeny; Polyommatina; supermatrix; systematics; taxonomy]
format Online
Article
Text
id pubmed-8830075
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88300752022-02-11 DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies Talavera, Gerard Lukhtanov, Vladimir Pierce, Naomi E Vila, Roger Syst Biol Regular Articles Taxa are frequently labeled incertae sedis when their placement is debated at ranks above the species level, such as their subgeneric, generic, or subtribal placement. This is a pervasive problem in groups with complex systematics due to difficulties in identifying suitable synapomorphies. In this study, we propose combining DNA barcodes with a multilocus backbone phylogeny in order to assign taxa to genus or other higher-level categories. This sampling strategy generates molecular matrices containing large amounts of missing data that are not distributed randomly: barcodes are sampled for all representatives, and additional markers are sampled only for a small percentage. We investigate the effects of the degree and randomness of missing data on phylogenetic accuracy using simulations for up to 100 markers in 1000-tips trees, as well as a real case: the subtribe Polyommatina (Lepidoptera: Lycaenidae), a large group including numerous species with unresolved taxonomy. Our simulation tests show that when a strategic and representative selection of species for higher-level categories has been made for multigene sequencing (approximately one per simulated genus), the addition of this multigene backbone DNA data for as few as 5–10% of the specimens in the total data set can produce high-quality phylogenies, comparable to those resulting from 100% multigene sampling. In contrast, trees based exclusively on barcodes performed poorly. This approach was applied to a 1365-specimen data set of Polyommatina (including ca. 80% of described species), with nearly 8% of representative species included in the multigene backbone and the remaining 92% included only by mitochondrial COI barcodes, a phylogeny was generated that highlighted potential misplacements, unrecognized major clades, and placement for incertae sedis taxa. We use this information to make systematic rearrangements within Polyommatina, and to describe two new genera. Finally, we propose a systematic workflow to assess higher-level taxonomy in hyperdiverse groups. This research identifies an additional, enhanced value of DNA barcodes for improvements in higher-level systematics using large data sets. [Birabiro; DNA barcoding; incertae sedis; Kipepeo; Lycaenidae; missing data; phylogenomic; phylogeny; Polyommatina; supermatrix; systematics; taxonomy] Oxford University Press 2021-07-19 /pmc/articles/PMC8830075/ /pubmed/34022059 http://dx.doi.org/10.1093/sysbio/syab038 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the Society of Systematic Biologists. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (https://creativecommons.org/licenses/by-nc/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Talavera, Gerard
Lukhtanov, Vladimir
Pierce, Naomi E
Vila, Roger
DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title_full DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title_fullStr DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title_full_unstemmed DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title_short DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies
title_sort dna barcodes combined with multilocus data of representative taxa can generate reliable higher-level phylogenies
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8830075/
https://www.ncbi.nlm.nih.gov/pubmed/34022059
http://dx.doi.org/10.1093/sysbio/syab038
work_keys_str_mv AT talaveragerard dnabarcodescombinedwithmultilocusdataofrepresentativetaxacangeneratereliablehigherlevelphylogenies
AT lukhtanovvladimir dnabarcodescombinedwithmultilocusdataofrepresentativetaxacangeneratereliablehigherlevelphylogenies
AT piercenaomie dnabarcodescombinedwithmultilocusdataofrepresentativetaxacangeneratereliablehigherlevelphylogenies
AT vilaroger dnabarcodescombinedwithmultilocusdataofrepresentativetaxacangeneratereliablehigherlevelphylogenies