Cargando…

Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice

Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities cl...

Descripción completa

Detalles Bibliográficos
Autores principales: Shavit Grievink, Liat, Penny, David, Holland, Barbara R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641631/
https://www.ncbi.nlm.nih.gov/pubmed/23471508
http://dx.doi.org/10.1093/gbe/evt032
_version_ 1782268043843338240
author Shavit Grievink, Liat
Penny, David
Holland, Barbara R.
author_facet Shavit Grievink, Liat
Penny, David
Holland, Barbara R.
author_sort Shavit Grievink, Liat
collection PubMed
description Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this.
format Online
Article
Text
id pubmed-3641631
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36416312013-05-02 Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice Shavit Grievink, Liat Penny, David Holland, Barbara R. Genome Biol Evol Letter Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. Oxford University Press 2013 2013-03-06 /pmc/articles/PMC3641631/ /pubmed/23471508 http://dx.doi.org/10.1093/gbe/evt032 Text en © The Author(s) 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Letter
Shavit Grievink, Liat
Penny, David
Holland, Barbara R.
Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title_full Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title_fullStr Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title_full_unstemmed Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title_short Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
title_sort missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice
topic Letter
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641631/
https://www.ncbi.nlm.nih.gov/pubmed/23471508
http://dx.doi.org/10.1093/gbe/evt032
work_keys_str_mv AT shavitgrievinkliat missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice
AT pennydavid missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice
AT hollandbarbarar missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice