Cargando…
RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity
In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Libertas Academica
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/ https://www.ncbi.nlm.nih.gov/pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017 |
_version_ | 1782237871392948224 |
---|---|
author | Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo |
author_facet | Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo |
author_sort | Ishikawa, Sohta A. |
collection | PubMed |
description | In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity. |
format | Online Article Text |
id | pubmed-3394461 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Libertas Academica |
record_format | MEDLINE/PubMed |
spelling | pubmed-33944612012-07-13 RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo Evol Bioinform Online Original Research In phylogenetic analyses of nucleotide sequences, ‘homogeneous’ substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, ‘RY-coding’ and ‘non-homogeneous’ models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity. Libertas Academica 2012-06-25 /pmc/articles/PMC3394461/ /pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017 Text en © the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article. Unrestricted non-commercial use is permitted provided the original work is properly cited. |
spellingShingle | Original Research Ishikawa, Sohta A. Inagaki, Yuji Hashimoto, Tetsuo RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title | RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title_full | RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title_fullStr | RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title_full_unstemmed | RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title_short | RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity |
title_sort | ry-coding and non-homogeneous models can ameliorate the maximum-likelihood inferences from nucleotide sequence data with parallel compositional heterogeneity |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394461/ https://www.ncbi.nlm.nih.gov/pubmed/22798721 http://dx.doi.org/10.4137/EBO.S9017 |
work_keys_str_mv | AT ishikawasohtaa rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity AT inagakiyuji rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity AT hashimototetsuo rycodingandnonhomogeneousmodelscanamelioratethemaximumlikelihoodinferencesfromnucleotidesequencedatawithparallelcompositionalheterogeneity |