Cargando…

Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)

BACKGROUND: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating ‘noisy’ data with misleading phylogenetic signal has also increased, and may disproportio...

Descripción completa

Detalles Bibliográficos
Autores principales: Parks, Matthew, Cronn, Richard, Liston, Aaron
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475122/
https://www.ncbi.nlm.nih.gov/pubmed/22731878
http://dx.doi.org/10.1186/1471-2148-12-100
_version_ 1782246911223267328
author Parks, Matthew
Cronn, Richard
Liston, Aaron
author_facet Parks, Matthew
Cronn, Richard
Liston, Aaron
author_sort Parks, Matthew
collection PubMed
description BACKGROUND: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating ‘noisy’ data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution. RESULTS: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set. CONCLUSIONS: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses.
format Online
Article
Text
id pubmed-3475122
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34751222012-10-23 Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae) Parks, Matthew Cronn, Richard Liston, Aaron BMC Evol Biol Research Article BACKGROUND: Through next-generation sequencing, the amount of sequence data potentially available for phylogenetic analyses has increased exponentially in recent years. Simultaneously, the risk of incorporating ‘noisy’ data with misleading phylogenetic signal has also increased, and may disproportionately influence the topology of weakly supported nodes and lineages featuring rapid radiations and/or elevated rates of evolution. RESULTS: We investigated the influence of phylogenetic noise in large data sets by applying two fundamental strategies, variable site removal and long-branch exclusion, to the phylogenetic analysis of a full plastome alignment of 107 species of Pinus and six Pinaceae outgroups. While high overall phylogenetic resolution resulted from inclusion of all data, three historically recalcitrant nodes remained conflicted with previous analyses. Close investigation of these nodes revealed dramatically different responses to data removal. Whereas topological resolution and bootstrap support for two clades peaked with removal of highly variable sites, the third clade resolved most strongly when all sites were included. Similar trends were observed using long-branch exclusion, but patterns were neither as strong nor as clear. When compared to previous phylogenetic analyses of nuclear loci and morphological data, the most highly supported topologies seen in Pinus plastome analysis are congruent for the two clades gaining support from variable site removal and long-branch exclusion, but in conflict for the clade with highest support from the full data set. CONCLUSIONS: These results suggest that removal of misleading signal in phylogenomic datasets can result not only in increased resolution for poorly supported nodes, but may serve as a tool for identifying erroneous yet highly supported topologies. For Pinus chloroplast genomes, removal of variable sites appears to be more effective than long-branch exclusion for clarifying phylogenetic hypotheses. BioMed Central 2012-06-25 /pmc/articles/PMC3475122/ /pubmed/22731878 http://dx.doi.org/10.1186/1471-2148-12-100 Text en Copyright ©2012 Parks et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Parks, Matthew
Cronn, Richard
Liston, Aaron
Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title_full Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title_fullStr Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title_full_unstemmed Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title_short Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae)
title_sort separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from pinus l. (pinaceae)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3475122/
https://www.ncbi.nlm.nih.gov/pubmed/22731878
http://dx.doi.org/10.1186/1471-2148-12-100
work_keys_str_mv AT parksmatthew separatingthewheatfromthechaffmitigatingtheeffectsofnoiseinaplastomephylogenomicdatasetfrompinuslpinaceae
AT cronnrichard separatingthewheatfromthechaffmitigatingtheeffectsofnoiseinaplastomephylogenomicdatasetfrompinuslpinaceae
AT listonaaron separatingthewheatfromthechaffmitigatingtheeffectsofnoiseinaplastomephylogenomicdatasetfrompinuslpinaceae