Cargando…

A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering

Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for...

Descripción completa

Detalles Bibliográficos
Autores principales: Johnson, Matthew G, Pokorny, Lisa, Dodsworth, Steven, Botigué, Laura R, Cowan, Robyn S, Devault, Alison, Eiserhardt, Wolf L, Epitawalage, Niroshini, Forest, Félix, Kim, Jan T, Leebens-Mack, James H, Leitch, Ilia J, Maurin, Olivier, Soltis, Douglas E, Soltis, Pamela S, Wong, Gane Ka-shu, Baker, William J, Wickett, Norman J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568016/
https://www.ncbi.nlm.nih.gov/pubmed/30535394
http://dx.doi.org/10.1093/sysbio/syy086
_version_ 1783427195070316544
author Johnson, Matthew G
Pokorny, Lisa
Dodsworth, Steven
Botigué, Laura R
Cowan, Robyn S
Devault, Alison
Eiserhardt, Wolf L
Epitawalage, Niroshini
Forest, Félix
Kim, Jan T
Leebens-Mack, James H
Leitch, Ilia J
Maurin, Olivier
Soltis, Douglas E
Soltis, Pamela S
Wong, Gane Ka-shu
Baker, William J
Wickett, Norman J
author_facet Johnson, Matthew G
Pokorny, Lisa
Dodsworth, Steven
Botigué, Laura R
Cowan, Robyn S
Devault, Alison
Eiserhardt, Wolf L
Epitawalage, Niroshini
Forest, Félix
Kim, Jan T
Leebens-Mack, James H
Leitch, Ilia J
Maurin, Olivier
Soltis, Douglas E
Soltis, Pamela S
Wong, Gane Ka-shu
Baker, William J
Wickett, Norman J
author_sort Johnson, Matthew G
collection PubMed
description Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5–15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
format Online
Article
Text
id pubmed-6568016
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65680162019-06-18 A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering Johnson, Matthew G Pokorny, Lisa Dodsworth, Steven Botigué, Laura R Cowan, Robyn S Devault, Alison Eiserhardt, Wolf L Epitawalage, Niroshini Forest, Félix Kim, Jan T Leebens-Mack, James H Leitch, Ilia J Maurin, Olivier Soltis, Douglas E Soltis, Pamela S Wong, Gane Ka-shu Baker, William J Wickett, Norman J Syst Biol Regular Articles Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5–15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself. Oxford University Press 2019-07 2018-12-10 /pmc/articles/PMC6568016/ /pubmed/30535394 http://dx.doi.org/10.1093/sysbio/syy086 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the Society of Systematic Biologists. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Regular Articles
Johnson, Matthew G
Pokorny, Lisa
Dodsworth, Steven
Botigué, Laura R
Cowan, Robyn S
Devault, Alison
Eiserhardt, Wolf L
Epitawalage, Niroshini
Forest, Félix
Kim, Jan T
Leebens-Mack, James H
Leitch, Ilia J
Maurin, Olivier
Soltis, Douglas E
Soltis, Pamela S
Wong, Gane Ka-shu
Baker, William J
Wickett, Norman J
A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title_full A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title_fullStr A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title_full_unstemmed A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title_short A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
title_sort universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering
topic Regular Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568016/
https://www.ncbi.nlm.nih.gov/pubmed/30535394
http://dx.doi.org/10.1093/sysbio/syy086
work_keys_str_mv AT johnsonmatthewg auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT pokornylisa auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT dodsworthsteven auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT botiguelaurar auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT cowanrobyns auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT devaultalison auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT eiserhardtwolfl auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT epitawalageniroshini auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT forestfelix auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT kimjant auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT leebensmackjamesh auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT leitchiliaj auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT maurinolivier auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT soltisdouglase auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT soltispamelas auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT wongganekashu auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT bakerwilliamj auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT wickettnormanj auniversalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT johnsonmatthewg universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT pokornylisa universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT dodsworthsteven universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT botiguelaurar universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT cowanrobyns universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT devaultalison universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT eiserhardtwolfl universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT epitawalageniroshini universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT forestfelix universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT kimjant universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT leebensmackjamesh universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT leitchiliaj universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT maurinolivier universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT soltisdouglase universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT soltispamelas universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT wongganekashu universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT bakerwilliamj universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering
AT wickettnormanj universalprobesetfortargetedsequencingof353nucleargenesfromanyfloweringplantdesignedusingkmedoidsclustering