Cargando…
Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice
Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities cl...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641631/ https://www.ncbi.nlm.nih.gov/pubmed/23471508 http://dx.doi.org/10.1093/gbe/evt032 |
_version_ | 1782268043843338240 |
---|---|
author | Shavit Grievink, Liat Penny, David Holland, Barbara R. |
author_facet | Shavit Grievink, Liat Penny, David Holland, Barbara R. |
author_sort | Shavit Grievink, Liat |
collection | PubMed |
description | Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. |
format | Online Article Text |
id | pubmed-3641631 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-36416312013-05-02 Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice Shavit Grievink, Liat Penny, David Holland, Barbara R. Genome Biol Evol Letter Phylogenetic studies based on molecular sequence alignments are expected to become more accurate as the number of sites in the alignments increases. With the advent of genomic-scale data, where alignments have very large numbers of sites, bootstrap values close to 100% and posterior probabilities close to 1 are the norm, suggesting that the number of sites is now seldom a limiting factor on phylogenetic accuracy. This provokes the question, should we be fussy about the sites we choose to include in a genomic-scale phylogenetic analysis? If some sites contain missing data, ambiguous character states, or gaps, then why not just throw them away before conducting the phylogenetic analysis? Indeed, this is exactly the approach taken in many phylogenetic studies. Here, we present an example where the decision on how to treat sites with missing data is of equal importance to decisions on taxon sampling and model choice, and we introduce a graphical method for illustrating this. Oxford University Press 2013 2013-03-06 /pmc/articles/PMC3641631/ /pubmed/23471508 http://dx.doi.org/10.1093/gbe/evt032 Text en © The Author(s) 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Letter Shavit Grievink, Liat Penny, David Holland, Barbara R. Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title | Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title_full | Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title_fullStr | Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title_full_unstemmed | Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title_short | Missing Data and Influential Sites: Choice of Sites for Phylogenetic Analysis Can Be As Important As Taxon Sampling and Model Choice |
title_sort | missing data and influential sites: choice of sites for phylogenetic analysis can be as important as taxon sampling and model choice |
topic | Letter |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3641631/ https://www.ncbi.nlm.nih.gov/pubmed/23471508 http://dx.doi.org/10.1093/gbe/evt032 |
work_keys_str_mv | AT shavitgrievinkliat missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice AT pennydavid missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice AT hollandbarbarar missingdataandinfluentialsiteschoiceofsitesforphylogeneticanalysiscanbeasimportantastaxonsamplingandmodelchoice |