Cargando…

The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data

BACKGROUND: Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become...

Descripción completa

Detalles Bibliográficos
Autores principales: de la Torre-Bárcena, Jose Eduardo, Kolokotronis, Sergios-Orestis, Lee, Ernest K., Stevenson, Dennis Wm., Brenner, Eric D., Katari, Manpreet S., Coruzzi, Gloria M., DeSalle, Rob
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2685480/
https://www.ncbi.nlm.nih.gov/pubmed/19503618
http://dx.doi.org/10.1371/journal.pone.0005764
_version_ 1782167330650849280
author de la Torre-Bárcena, Jose Eduardo
Kolokotronis, Sergios-Orestis
Lee, Ernest K.
Stevenson, Dennis Wm.
Brenner, Eric D.
Katari, Manpreet S.
Coruzzi, Gloria M.
DeSalle, Rob
author_facet de la Torre-Bárcena, Jose Eduardo
Kolokotronis, Sergios-Orestis
Lee, Ernest K.
Stevenson, Dennis Wm.
Brenner, Eric D.
Katari, Manpreet S.
Coruzzi, Gloria M.
DeSalle, Rob
author_sort de la Torre-Bárcena, Jose Eduardo
collection PubMed
description BACKGROUND: Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group. METHODOLOGY: We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations. CONCLUSIONS: We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.
format Text
id pubmed-2685480
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26854802009-06-04 The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data de la Torre-Bárcena, Jose Eduardo Kolokotronis, Sergios-Orestis Lee, Ernest K. Stevenson, Dennis Wm. Brenner, Eric D. Katari, Manpreet S. Coruzzi, Gloria M. DeSalle, Rob PLoS One Research Article BACKGROUND: Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group. METHODOLOGY: We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations. CONCLUSIONS: We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized. Public Library of Science 2009-06-02 /pmc/articles/PMC2685480/ /pubmed/19503618 http://dx.doi.org/10.1371/journal.pone.0005764 Text en de la Torre-Bárcena et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
de la Torre-Bárcena, Jose Eduardo
Kolokotronis, Sergios-Orestis
Lee, Ernest K.
Stevenson, Dennis Wm.
Brenner, Eric D.
Katari, Manpreet S.
Coruzzi, Gloria M.
DeSalle, Rob
The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title_full The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title_fullStr The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title_full_unstemmed The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title_short The Impact of Outgroup Choice and Missing Data on Major Seed Plant Phylogenetics Using Genome-Wide EST Data
title_sort impact of outgroup choice and missing data on major seed plant phylogenetics using genome-wide est data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2685480/
https://www.ncbi.nlm.nih.gov/pubmed/19503618
http://dx.doi.org/10.1371/journal.pone.0005764
work_keys_str_mv AT delatorrebarcenajoseeduardo theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT kolokotronissergiosorestis theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT leeernestk theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT stevensondenniswm theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT brennerericd theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT katarimanpreets theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT coruzzigloriam theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT desallerob theimpactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT delatorrebarcenajoseeduardo impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT kolokotronissergiosorestis impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT leeernestk impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT stevensondenniswm impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT brennerericd impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT katarimanpreets impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT coruzzigloriam impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata
AT desallerob impactofoutgroupchoiceandmissingdataonmajorseedplantphylogeneticsusinggenomewideestdata